AI Safety Fundamentals: Alignment

Een podcast door BlueDot Impact

Probeer Podimo de eerste 60! dagen gratis

Luister 30 dagen gratis naar exclusieve podcasts en duizenden luisterboeken

83 Afleveringen

Constitutional AI Harmlessness from AI Feedback
Gepubliceerd: 19-7-2024
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Gepubliceerd: 19-7-2024
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Gepubliceerd: 19-7-2024
Chinchilla’s Wild Implications
Gepubliceerd: 17-6-2024
Deep Double Descent
Gepubliceerd: 17-6-2024
Intro to Brain-Like-AGI Safety
Gepubliceerd: 17-6-2024
Eliciting Latent Knowledge
Gepubliceerd: 17-6-2024
Toy Models of Superposition
Gepubliceerd: 17-6-2024
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Gepubliceerd: 17-6-2024
Discovering Latent Knowledge in Language Models Without Supervision
Gepubliceerd: 17-6-2024
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
Gepubliceerd: 17-6-2024
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
Gepubliceerd: 17-6-2024
Imitative Generalisation (AKA ‘Learning the Prior’)
Gepubliceerd: 17-6-2024
An Investigation of Model-Free Planning
Gepubliceerd: 17-6-2024
Low-Stakes Alignment
Gepubliceerd: 17-6-2024
Gradient Hacking: Definitions and Examples
Gepubliceerd: 17-6-2024
Empirical Findings Generalize Surprisingly Far
Gepubliceerd: 17-6-2024
Compute Trends Across Three Eras of Machine Learning
Gepubliceerd: 13-6-2024
Worst-Case Thinking in AI Alignment
Gepubliceerd: 29-5-2024
Public by Default: How We Manage Information Visibility at Get on Board
Gepubliceerd: 12-5-2024

1 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment