Guaranteed Safe AI Seminars
Présentations techniques pour faire avancer l'IA avec des garanties de sûreté quantitatives. Pour le contexte de l'agenda de recherche, voir Towards Guaranteed Safe AI.
Séminaires passés
2025
Safe Learning Under Irreversible Dynamics via Asking for Help
Benjamin Plaut · Postdoc at CHAI (Center for Human-Compatible AI)
Standard online-learning algorithms with formal guarantees often rely on trying all possible behaviors, which is unsafe when some errors cannot be recovered from. This work allows a learning agent to ask for help from a mentor and to transfer knowledge between similar states. The resulting algorithm learns both safely and effectively.
When AI met AR
Clark Barrett · Stanford Center for Automated Reasoning
Artificial Intelligence and Automated Reasoning have both advanced quickly in recent years. This talk explores how combining them can help address AI safety, including verifiable code generation and learning-enhanced reasoning systems.
Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Jobst Heitzig · Senior Mathematician, AI Safety Designer
Power is a key concept in AI safety. This talk explores promoting safety and wellbeing by forcing AI agents to explicitly empower humans, using a principled approach to design a parameterizable objective function representing long-term aggregate of human power.
Towards Safe and Hallucination-Free Coding AIs
GasStationManager · Independent Researcher
Modern LLM coding assistants pose serious security risks. The talk argues for protocols that require AI-generated code to come with machine-checkable proofs so humans can be assured of safety and correctness.
Engineering Rational Cooperative AI via Inverse Planning and Probabilistic Programming
Tan Zhi Xuan · National University of Singapore
How to build cooperative machines that model and understand human minds. Introduces Sequential Inverse Plan Search (SIPS), combining online model-based planners and sequential Monte Carlo inference to infer human goals faster than real time.
Using PDDL Planning to Ensure Safety in LLM-based Agents
Agustín Martinez Suñé · University of Oxford
Integrates PDDL symbolic planning with LLM-based agents to enforce safety constraints during execution. Experiments show robustness under severe input perturbations and adversarial attacks.
2024
Compact Proofs of Model Performance via Mechanistic Interpretability
Louis Jaburi · Independent researcher
Proposes constructing rigorous, compact proofs about neural-network behavior using mechanistic interpretability. Discusses challenges and scaling directions for formal verification.
Bayesian oracles and safety bounds
Yoshua Bengio · Mila, Université de Montréal
Investigates safety advantages of training a Bayesian oracle to estimate P(answer | question, data). Explores catastrophic-risk scenarios, failure modes, and using the oracle for conservative risk bounds.
Constructability: Designing plain-coded AI systems
Charbel-Raphaël Ségerie & Épiphanie Gédéon · CeSIA
Presents constructability, a design paradigm for AI systems that are interpretable and reviewable by design. Shares feasibility arguments, prototypes, and research directions.
Proving safety for narrow AI outputs
Evan Miyazono · Atlas Computing
Identifies domains where AI can deliver capabilities with quantitative guarantees against objective safety criteria. Maps a path to generating software with formal proofs of specification compliance.
Gaia: Distributed planetary-scale AI safety
Rafael Kaufmann · Gaia
Proposes Gaia, a decentralized, crowdsourced model-based safety oracle for a future with billions of powerful AI agents. Focuses on mitigating cascading, systemic risks.
Provable AI Safety
Steve Omohundro
An approach to AI safety grounded in the laws of physics and mathematical proof as the only guaranteed constraints for powerful AGI.
Synthesizing Gatekeepers for Safe Reinforcement Learning
Justice Sefas
Demonstrates gatekeepers that block unsafe actions using model checking and neural control barrier functions to enable safe optimization.
Verifying Global Properties of Neural Networks
Roman Soletskyi
Verifiable RL produces mathematical proofs that agents meet requirements. Studies verification-complexity scaling and suggests new approaches to accelerate verification.
S'impliquer
Rester informé
Les séminaires sont gratuits et ouverts à tous. Inscrivez-vous sur Luma pour être notifié des prochaines présentations.
S'abonner sur Luma ↗