Debate Update: Obfuscated Arguments Problem

AI Safety Fundamentals: Alignment - Een podcast door BlueDot Impact

Probeer Podimo de eerste 60! dagen gratis

Luister 30 dagen gratis naar exclusieve podcasts en duizenden luisterboeken

Categorieën:

This is an update on the work on AI Safety via Debate that we previously wrote about here. What we did: We tested the debate protocol introduced in AI Safety via Debate with human judges and debaters. We found various problems and improved the mechanism to fix these issues (details of these are in the appendix). However, we discovered that a dishonest debater can often create arguments that have a fatal error, but where it is very hard to locate the error. We don’t have a fix for th...

Visit the podcast's native language site