Guaranteed Safe AI Seminars
Technical talks advancing AI with quantitative safety guarantees. For context on the research agenda, see Towards Guaranteed Safe AI.
Past seminars
2025
Safe Learning Under Irreversible Dynamics via Asking for Help
Benjamin Plaut · Postdoc at CHAI (Center for Human-Compatible AI)
Standard online-learning algorithms with formal guarantees often rely on trying all possible behaviors, which is unsafe when some errors cannot be recovered from. This work allows a learning agent to ask for help from a mentor and to transfer knowledge between similar states. The resulting algorithm learns both safely and effectively.
When AI met AR
Clark Barrett · Stanford Center for Automated Reasoning
Artificial Intelligence and Automated Reasoning have both advanced quickly in recent years. This talk explores how combining them can help address AI safety, including verifiable code generation and learning-enhanced reasoning systems.
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Jobst Heitzig · Senior Mathematician, AI Safety Designer
Power is a key concept in AI safety. This talk explores promoting safety and wellbeing by forcing AI agents to explicitly empower humans, using a principled approach to design a parameterizable objective function representing long-term aggregate of human power.
Towards Safe and Hallucination-Free Coding AIs
GasStationManager · Independent Researcher
Modern LLM coding assistants pose serious security risks. The talk argues for protocols that require AI-generated code to come with machine-checkable proofs so humans can be assured of safety and correctness.
Engineering Rational Cooperative AI via Inverse Planning and Probabilistic Programming
Tan Zhi Xuan · National University of Singapore
How to build cooperative machines that model and understand human minds. Introduces Sequential Inverse Plan Search (SIPS), combining online model-based planners and sequential Monte Carlo inference to infer human goals faster than real time.
Using PDDL Planning to Ensure Safety in LLM-based Agents
Agustín Martinez Suñé · University of Oxford
Integrates PDDL symbolic planning with LLM-based agents to enforce safety constraints during execution. Experiments show robustness under severe input perturbations and adversarial attacks.
2024
Compact Proofs of Model Performance via Mechanistic Interpretability
Louis Jaburi · Independent researcher
Proposes constructing rigorous, compact proofs about neural-network behavior using mechanistic interpretability. Discusses challenges and scaling directions for formal verification.
Bayesian oracles and safety bounds
Yoshua Bengio · Mila, Université de Montréal
Investigates safety advantages of training a Bayesian oracle to estimate P(answer | question, data). Explores catastrophic-risk scenarios, failure modes, and using the oracle for conservative risk bounds.
Constructability: Designing plain-coded AI systems
Charbel-Raphaël Ségerie & Épiphanie Gédéon · CeSIA
Presents constructability, a design paradigm for AI systems that are interpretable and reviewable by design. Shares feasibility arguments, prototypes, and research directions.
Proving safety for narrow AI outputs
Evan Miyazono · Atlas Computing
Identifies domains where AI can deliver capabilities with quantitative guarantees against objective safety criteria. Maps a path to generating software with formal proofs of specification compliance.
Gaia: Distributed planetary-scale AI safety
Rafael Kaufmann · Gaia
Proposes Gaia, a decentralized, crowdsourced model-based safety oracle for a future with billions of powerful AI agents. Focuses on mitigating cascading, systemic risks.
Provable AI Safety
Steve Omohundro
An approach to AI safety grounded in the laws of physics and mathematical proof as the only guaranteed constraints for powerful AGI.
Synthesizing Gatekeepers for Safe Reinforcement Learning
Justice Sefas
Demonstrates gatekeepers that block unsafe actions using model checking and neural control barrier functions to enable safe optimization.
Verifying Global Properties of Neural Networks
Roman Soletskyi
Verifiable RL produces mathematical proofs that agents meet requirements. Studies verification-complexity scaling and suggests new approaches to accelerate verification.
Get involved
Stay updated
Seminars are free and open to all. Register on Luma to get notified of upcoming talks.
Subscribe on Luma ↗