Guaranteed Safe AI Seminars

Since April 2024

Technical talks advancing AI with quantitative safety guarantees. For context on the research agenda, see Towards Guaranteed Safe AI.

Past seminars

2025

Dec

Safe Learning Under Irreversible Dynamics via Asking for Help

Benjamin Plaut · Postdoc at CHAI (Center for Human-Compatible AI)

Standard online-learning algorithms with formal guarantees often rely on trying all possible behaviors, which is unsafe when some errors cannot be recovered from. This work allows a learning agent to ask for help from a mentor and to transfer knowledge between similar states. The resulting algorithm learns both safely and effectively.

Watch arXiv:2502.14043

Nov

When AI met AR

Clark Barrett · Stanford Center for Automated Reasoning

Artificial Intelligence and Automated Reasoning have both advanced quickly in recent years. This talk explores how combining them can help address AI safety, including verifiable code generation and learning-enhanced reasoning systems.

Watch arXiv:2305.11087 Clover (arXiv)

Oct

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Jobst Heitzig · Senior Mathematician, AI Safety Designer

Power is a key concept in AI safety. This talk explores promoting safety and wellbeing by forcing AI agents to explicitly empower humans, using a principled approach to design a parameterizable objective function representing long-term aggregate of human power.

Watch arXiv:2508.00159

Sep

Towards Safe and Hallucination-Free Coding AIs

GasStationManager · Independent Researcher

Modern LLM coding assistants pose serious security risks. The talk argues for protocols that require AI-generated code to come with machine-checkable proofs so humans can be assured of safety and correctness.

Watch Blog post Provably-Correct Vibe Coding

Jul

Engineering Rational Cooperative AI via Inverse Planning and Probabilistic Programming

Tan Zhi Xuan · National University of Singapore

How to build cooperative machines that model and understand human minds. Introduces Sequential Inverse Plan Search (SIPS), combining online model-based planners and sequential Monte Carlo inference to infer human goals faster than real time.

Watch NeurIPS 2020 arXiv:2402.17930 arXiv:2402.13399

Jan

Using PDDL Planning to Ensure Safety in LLM-based Agents

Agustín Martinez Suñé · University of Oxford

Integrates PDDL symbolic planning with LLM-based agents to enforce safety constraints during execution. Experiments show robustness under severe input perturbations and adversarial attacks.

Watch

2024

Dec

Compact Proofs of Model Performance via Mechanistic Interpretability

Louis Jaburi · Independent researcher

Proposes constructing rigorous, compact proofs about neural-network behavior using mechanistic interpretability. Discusses challenges and scaling directions for formal verification.

Watch arXiv:2406.11779 arXiv:2410.07476

Nov

Bayesian oracles and safety bounds

Yoshua Bengio · Mila, Université de Montréal

Investigates safety advantages of training a Bayesian oracle to estimate P(answer | question, data). Explores catastrophic-risk scenarios, failure modes, and using the oracle for conservative risk bounds.

Watch

Aug

Constructability: Designing plain-coded AI systems

Charbel-Raphaël Ségerie & Épiphanie Gédéon · CeSIA

Presents constructability, a design paradigm for AI systems that are interpretable and reviewable by design. Shares feasibility arguments, prototypes, and research directions.

Watch

Jul

Proving safety for narrow AI outputs

Evan Miyazono · Atlas Computing

Identifies domains where AI can deliver capabilities with quantitative guarantees against objective safety criteria. Maps a path to generating software with formal proofs of specification compliance.

Watch

Jun

Gaia: Distributed planetary-scale AI safety

Rafael Kaufmann · Gaia

Proposes Gaia, a decentralized, crowdsourced model-based safety oracle for a future with billions of powerful AI agents. Focuses on mitigating cascading, systemic risks.

Watch

May

Provable AI Safety

Steve Omohundro

An approach to AI safety grounded in the laws of physics and mathematical proof as the only guaranteed constraints for powerful AGI.

Watch

Apr

Synthesizing Gatekeepers for Safe Reinforcement Learning

Justice Sefas

Demonstrates gatekeepers that block unsafe actions using model checking and neural control barrier functions to enable safe optimization.

Watch

Apr

Verifying Global Properties of Neural Networks

Roman Soletskyi

Verifiable RL produces mathematical proofs that agents meet requirements. Studies verification-complexity scaling and suggests new approaches to accelerate verification.

Get involved

Speak Propose a talk or suggest a speaker. We feature researchers working on formal verification, safe learning, interpretability, and related approaches.

Volunteer Help with seminar operations, research briefs, speaker outreach, or video editing.

Stay updated

Seminars are free and open to all. Register on Luma to get notified of upcoming talks.

Subscribe on Luma ↗