AIAP: Synthesizing a human's preferences into a utility function with Stuart Armstrong

Future of Life Institute Podcast - Een podcast door Future of Life Institute

Categorieën:

In his Research Agenda v0.9: Synthesizing a human's preferences into a utility function, Stuart Armstrong develops an approach for generating friendly artificial intelligence. His alignment proposal can broadly be understood as a kind of inverse reinforcement learning where most of the task of inferring human preferences is left to the AI itself. It's up to us to build the correct assumptions, definitions, preference learning methodology, and synthesis process into the AI system such that it will be able to meaningfully learn human preferences and synthesize them into an adequate utility function. In order to get this all right, his agenda looks at how to understand and identify human partial preferences, how to ultimately synthesize these learned preferences into an "adequate" utility function, the practicalities of developing and estimating the human utility function, and how this agenda can assist in other methods of AI alignment. Topics discussed in this episode include: -The core aspects and ideas of Stuart's research agenda -Human values being changeable, manipulable, contradictory, and underdefined -This research agenda in the context of the broader AI alignment landscape -What the proposed synthesis process looks like -How to identify human partial preferences -Why a utility function anyway? -Idealization and reflective equilibrium -Open questions and potential problem areas Here you can find the podcast page: https://futureoflife.org/2019/09/17/synthesizing-a-humans-preferences-into-a-utility-function-with-stuart-armstrong/ Important timestamps:  0:00 Introductions  3:24 A story of evolution (inspiring just-so story) 6:30 How does your “inspiring just-so story” help to inform this research agenda? 8:53 The two core parts to the research agenda  10:00 How this research agenda is contextualized in the AI alignment landscape 12:45 The fundamental ideas behind the research project  15:10 What are partial preferences?  17:50 Why reflexive self-consistency isn’t enough  20:05 How are humans contradictory and how does this affect the difficulty of the agenda? 25:30 Why human values being underdefined presents the greatest challenge  33:55 Expanding on the synthesis process  35:20 How to extract the partial preferences of the person  36:50 Why a utility function?  41:45 Are there alternative goal ordering or action producing methods for agents other than utility functions? 44:40 Extending and normalizing partial preferences and covering the rest of section 2  50:00 Moving into section 3, synthesizing the utility function in practice  52:00 Why this research agenda is helpful for other alignment methodologies  55:50 Limits of the agenda and other problems  58:40 Synthesizing a species wide utility function  1:01:20 Concerns over the alignment methodology containing leaky abstractions  1:06:10 Reflective equilibrium and the agenda not being a philosophical ideal  1:08:10 Can we check the result of the synthesis process? 01:09:55 How did the Mahatma Armstrong idealization process fail?  01:14:40 Any clarifications for the AI alignment community?  You Can take a short (4 minute) survey to share your feedback about the podcast here: www.surveymonkey.com/r/YWHDFV7

Visit the podcast's native language site