EA - Join the interpretability research hackathon by Esben Kran
The Nonlinear Library: EA Forum - Een podcast door The Nonlinear Fund
Categorieën:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Join the interpretability research hackathon, published by Esben Kran on October 28, 2022 on The Effective Altruism Forum.TLDR; Participate online or in-person in London, Aarhus, and Tallinn on the weekend 11th to 13th November in a fun and intense AI safety research hackathon focused on interpretability research. We invite mid-career professionals to join but it is open for everyone (also no-coders) and we will create starter code templates to help you kickstart your team’s projects. Join here.Below is an FAQ-style summary of what you can expect (navigate it with the table of contents on the left).What is it?The Interpretability Hackathon is a weekend-long event where you participate in teams (1-6) to create interesting and fun research. You submit a PDF report that summarizes and discusses your findings in the context of AI safety. These reports will be judged by our panel and you can win up to $1,000!It runs from 11th Nov to 13th Nov (in two weeks) and you’re welcome to join for a part of it (see further down). We get an interesting talk by an expert in the field and hear more about the topic.Everyone can participate and we encourage you to join especially if you’re considering AI safety from another career . We prepare templates for you to start out your projects and you’ll be surprised what you can accomplish in just a weekend – especially with your new-found friends!Read more about how to join, what you can expect, the schedule, and what previous participants have said about being part of the hackathon below.Where can I join?You can join the event both in-person and online but everyone needs to make an account and join the jam on the itch.io page.The in-person locations include the LEAH offices in London right by UCL, Imperial, King’s College, and London School of Economics (link); Aarhus University in Aarhus, Denmark (link), and Tallinn, Estonia (link). The virtual event space is on GatherTown (link).Everyone should join the Discord to ask questions, see updates and announcements, find team members, and more. Join here.What are some examples of interpretability projects I could make?You can check out a bunch of interesting, smaller interpretability project ideas on AI Safety Ideas such as reconstructing the input from neural activations, evaluating the alignment tax of interpretable models, or making models’ uncertainty interpretable.Other examples of practical projects can be to find new ways to visualize features in language models, such as Anthropic has been working on, distilling mechanistic interpretability research, create a demo for a much more interpretable language model, or map out how a possible interpretable AGI might look with our current lens.You can also do projects in explainability about how much humans understand why the outputs of language models look the way they do, how humans see attention visualizations, or maybe even the interpretability of humans themselves and take inspiration from the brain and neuroscience.Also check out the results from the last hackathon to see what you might accomplish during just one weekend. The judges were really quite impressed with the full reports given the time constraint! You can also read the complete projects here.InspirationRedwood Research's interpretability tools: http://interp-tools.redwoodresearch.org/The activation atlas:/The Tensorflow playground:/The Neural Network Playground (train simple neural networks in the browser):/Visualize different neural network architectures: http://alexlenail.me/NN-SVG/index.htmlDigestible researchDistill publication on visualizing neural network weightsAndrej Karpathy's "Understanding what convnets learn"Looking inside a neural netYou can also see more on the resources page.Why should I join?There’s loads of reasons to join! Here are ju...
