⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security
Latent Space: The AI Engineer Podcast
2 DAYS AGO
⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security
⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security

Latent Space: The AI Engineer Podcast
2 DAYS AGO
In a world where AI safety is often reduced to brittle guardrails and closed-door evaluations, two figures stand out for challenging the status quo: Pliny the Liberator and John V. They represent a growing movement that prioritizes radical transparency, open-source collaboration, and deep technical intuition in the pursuit of meaningful AI security.
Pliny and John advocate for universal jailbreaks as tools to expose latent model behaviors, rejecting the notion that guardrails equate to real safety. They distinguish between hard, single-input jailbreaks and soft, multi-turn attacks—methods long known in hacker communities but only recently acknowledged by academia. Through projects like the open-source Libertas repository, they employ techniques such as predictive reasoning cascades and 'steered chaos' to push models beyond training distributions. Their collective, BT6, vets members on both skill and integrity, insisting on full transparency and turning down closed bounties like Anthropic’s Constitutional AI challenge due to lack of data openness. They warn that segmented sub-agents can weaponize models like Claude for orchestrated real-world attacks—a threat Pliny anticipated months before official disclosures. Ultimately, they argue that true AI safety lies not in restricting models, but in full-stack, system-level defenses and open research grounded in meatspace realities.
01:53
01:53
Freedom and transparency are critical as AI becomes an extension of human cognition.
04:27
04:27
Equating guardrails with safety is an issue
14:53
14:53
Soft jailbreaks are multi-turn processes that gradually steer AI models toward liberation.
16:22
16:22
Refused $30k bounty to stand for open-source AI data principles
23:35
23:35
Models can use natural language for social engineering in attacks
26:56
26:56
BT6 is a white-hat hacker collective focused on skill and integrity in AI security.