EA - What misalignment looks like as capabilities scale by richard ngo

The Nonlinear Library: EA Forum - Een podcast door The Nonlinear Fund

Podcast artwork

Categorieën:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What misalignment looks like as capabilities scale, published by richard ngo on August 11, 2022 on The Effective Altruism Forum. This report is intended as a concise introduction to the alignment problem for people familiar with machine learning. It translates previous arguments about misalignment into the context of deep learning by walking through an illustrative AGI training process (a framing drawn from an earlier report by Ajeya Cotra), and outlines possible research directions for addressing different facets of the problem. Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. Without substantial action to prevent it, AGIs will likely use their intelligence to pursue goals which are very undesirable (in other words, misaligned) from a human perspective. This report aims to cover the key arguments for this claim in a way that’s as succinct, concrete and technically-grounded as possible. My core claims are that: It’s worth thinking about risks from AGI in advance Realistic training processes lead to the development of misaligned goals, in particular because neural networks trained via reinforcement learning will Learn to plan towards achieving a range of goals Gain more reward by deceptively pursuing misaligned goals Generalize in ways which undermine obedience More people should pursue research directions which address these problems It’s worth thinking about risks from AGI in advance By AGI I mean an artificial agent which applies domain-general cognitive skills (such as reasoning, memory, and planning) to perform at or above human level on a wide range of cognitive tasks (such as running a company, writing a software program, or formulating a new scientific theory). This isn’t a precise definition—but it’s common in science for important concepts to start off vague, and become clearer over time (e.g. “energy” in 17th-century physics; “fitness” in early-19th-century biology; “computation” in early-20th-century mathematics). Analogously, “general intelligence” is a sufficiently important driver of humanity’s success to be worth taking seriously even if we don’t yet have good ways to formalize or measure it. On the metrics which we can track, though, machine learning has made significant advances, especially over the last decade. Some which are particularly relevant to AGI include few-shot learning (and advances in sample efficiency more generally), cross-task generalization, and multi-step reasoning. While hindsight bias makes it easy to see these achievements as part of a natural progression, I suspect that even a decade ago the vast majority of machine learning researchers would have been confident that these capabilities were much further away. I think it would be similarly overconfident to conclude that AGI is too far away to bother thinking about. A recent survey of ML researchers from NeurIPS and ICML gave a median estimate of 2059 for the year in which AI will outperform humans at all tasks (although their responses were sensitive to question phrasing) (Grace et al., 2022). This fits with the finding that, under reasonable projections of compute growth, we will be able to train neural networks as large as the human brain in a matter of decades (Cotra, 2020). But the capabilities of neural networks are currently advancing much faster than our ability to understand how they work or interpret their cognition; if this trend continues, we’ll build AGIs which match human performance on many important tasks without being able to robustly verify that they’ll behave as intended. And given the strong biological constraints on the size, speed, and architecture of human brains, it seems very unlikely that humans are anywhere near an upper bound on general intelligence. The differences betwe...

Visit the podcast's native language site