Guaranteed Safe AI Seminars

Depuis avril 2024

Présentations techniques pour faire avancer l'IA avec des garanties de sûreté quantitatives. Pour le contexte de l'agenda de recherche, voir Towards Guaranteed Safe AI.

Séminaires passés

2025

déc

Safe Learning Under Irreversible Dynamics via Asking for Help

Benjamin Plaut · Postdoc at CHAI (Center for Human-Compatible AI)

Standard online-learning algorithms with formal guarantees often rely on trying all possible behaviors, which is unsafe when some errors cannot be recovered from. This work allows a learning agent to ask for help from a mentor and to transfer knowledge between similar states. The resulting algorithm learns both safely and effectively.

nov

When AI met AR

Clark Barrett · Stanford Center for Automated Reasoning

Artificial Intelligence and Automated Reasoning have both advanced quickly in recent years. This talk explores how combining them can help address AI safety, including verifiable code generation and learning-enhanced reasoning systems.

oct

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Jobst Heitzig · Senior Mathematician, AI Safety Designer

Power is a key concept in AI safety. This talk explores promoting safety and wellbeing by forcing AI agents to explicitly empower humans, using a principled approach to design a parameterizable objective function representing long-term aggregate of human power.

sep

Towards Safe and Hallucination-Free Coding AIs

GasStationManager · Independent Researcher

Modern LLM coding assistants pose serious security risks. The talk argues for protocols that require AI-generated code to come with machine-checkable proofs so humans can be assured of safety and correctness.

juil

Engineering Rational Cooperative AI via Inverse Planning and Probabilistic Programming

Tan Zhi Xuan · National University of Singapore

How to build cooperative machines that model and understand human minds. Introduces Sequential Inverse Plan Search (SIPS), combining online model-based planners and sequential Monte Carlo inference to infer human goals faster than real time.

jan

Using PDDL Planning to Ensure Safety in LLM-based Agents

Agustín Martinez Suñé · University of Oxford

Integrates PDDL symbolic planning with LLM-based agents to enforce safety constraints during execution. Experiments show robustness under severe input perturbations and adversarial attacks.

2024

déc

Compact Proofs of Model Performance via Mechanistic Interpretability

Louis Jaburi · Independent researcher

Proposes constructing rigorous, compact proofs about neural-network behavior using mechanistic interpretability. Discusses challenges and scaling directions for formal verification.

nov

Bayesian oracles and safety bounds

Yoshua Bengio · Mila, Université de Montréal

Investigates safety advantages of training a Bayesian oracle to estimate P(answer | question, data). Explores catastrophic-risk scenarios, failure modes, and using the oracle for conservative risk bounds.

août

Constructability: Designing plain-coded AI systems

Charbel-Raphaël Ségerie & Épiphanie Gédéon · CeSIA

Presents constructability, a design paradigm for AI systems that are interpretable and reviewable by design. Shares feasibility arguments, prototypes, and research directions.

juil

Proving safety for narrow AI outputs

Evan Miyazono · Atlas Computing

Identifies domains where AI can deliver capabilities with quantitative guarantees against objective safety criteria. Maps a path to generating software with formal proofs of specification compliance.

juin

Gaia: Distributed planetary-scale AI safety

Rafael Kaufmann · Gaia

Proposes Gaia, a decentralized, crowdsourced model-based safety oracle for a future with billions of powerful AI agents. Focuses on mitigating cascading, systemic risks.

mai

Provable AI Safety

Steve Omohundro

An approach to AI safety grounded in the laws of physics and mathematical proof as the only guaranteed constraints for powerful AGI.

avr

Synthesizing Gatekeepers for Safe Reinforcement Learning

Justice Sefas

Demonstrates gatekeepers that block unsafe actions using model checking and neural control barrier functions to enable safe optimization.

avr

Verifying Global Properties of Neural Networks

Roman Soletskyi

Verifiable RL produces mathematical proofs that agents meet requirements. Studies verification-complexity scaling and suggests new approaches to accelerate verification.

S'impliquer

Présenter Proposez une présentation ou suggérez un conférencier. Nous recevons des chercheurs travaillant sur la vérification formelle, l'apprentissage sûr, l'interprétabilité et approches connexes.
Bénévolat Aidez aux opérations des séminaires, aux synthèses de recherche, à la sollicitation de conférenciers ou au montage vidéo.

Rester informé

Les séminaires sont gratuits et ouverts à tous. Inscrivez-vous sur Luma pour être notifié des prochaines présentations.

S'abonner sur Luma ↗