Progress on Causal Influence Diagrams

AI Safety Fundamentals: Alignment - Een podcast door BlueDot Impact

Probeer Podimo de eerste 60! dagen gratis

Luister 30 dagen gratis naar exclusieve podcasts en duizenden luisterboeken

Categorieën:

By Tom Everitt, Ryan Carey, Lewis Hammond, James Fox, Eric Langlois, and Shane LeggAbout 2 years ago, we released the first few papers on understanding agent incentives using causal influence diagrams. This blog post will summarize progress made since then. What are causal influence diagrams? A key problem in AI alignment is understanding agent incentives. Concerns have been raised that agents may be incentivized to avoid correction, manipulate users, or inappropriately influence their learni...

Visit the podcast's native language site