EA - Exploring Metaculus’ community predictions by Vasco Grilo

The Nonlinear Library: EA Forum - Een podcast door The Nonlinear Fund

Podcast artwork

Categorieën:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Exploring Metaculus’ community predictions, published by Vasco Grilo on March 24, 2023 on The Effective Altruism Forum.SummaryI really like Metaculus!I have collected and analysed in this Sheet metrics about Metaculus’ questions outside of question groups, and their Metaculus’ community predictions (see tab “TOC”). The Colab to extract the data and calculate the metrics is here.The mean metrics vary a lot across categories, and the same is seemingly true for correlations among metrics. So one should not assume the performance across all questions is representative of that within each of Metaculus’ categories. To illustrate:Across categories, the 5th and 95th percentiles of the mean normalised outcome are 0 and 0.784, and of the mean Brier score are 0.0369 and 0.450. For context, the Brier score is 0.25 (= 0.5^2) for the maximally uncertain probability of 0.5.According to Metaculus’ track record page, the mean Brier score for Metaculus’ community predictions evaluated at all times is 0.126 for all questions, but 0.237 for those about artificial intelligence. So Metaculus’ community predictions about probabilities look good in general, but they perform close to random predictions for artificial intelligence.There can be significant differences between Metaculus community predictions and Metaculus’ predictions. For instance, the mean Brier score of the latter for artificial intelligence is 0.168, which is way more accurate than the 0.237 of the former.According to my results, Metaculus’ community predictions are:In general (i.e. considering all questions), less accurate for questions:Whose predictions are more extreme under Bayesian updating (correlation coefficient R = 0.346, and p-value p = 0).With a greater amount of updating (R = 0.262, and p = 0).With a greater difference between amount of updating and uncertainty reduction (R = 0.256, and p = 0).For the category of artificial intelligence, less accurate for questions with:Greater difference between amount of updating and uncertainty reduction (R = 0.361, and p = 0.0387).More predictions (R = 0.316, and p = 0.0729).A greater amount of updating (R = 0.282, and p = 0.111).Compatible with Bayesian updating in general, in the sense I failed to reject it during the 2nd half of the period during which each question was or has been open (mean p-value of 0.425).If you want to know how much to trust a given prediction from Metaculus, I think it is sensible to check Metaculus’ track record for similar past questions (more here).AcknowledgementsThanks to Charles Dillon, Misha Yagudin from Arb Research, and Peter Mühlbacher and Ryan Beck from Metaculus.Dark crystall ball in a bright foggy galaxy. Generated by OpenAI's DALL-E.IntroductionI really like Metaculus!MethodsI believe it would be important to better understand how much to trust Metaculus’ predictions. To that end, I have determined in this Sheet (see tab “TOC”) metrics about all Metaculus’ questions outside of question groups with an ID from 1 to 15000 on 13 March 2023, and their Metaculus’ community predictions. The metrics for each question are:Tags, which identify the Metaculus’ category.Publish time (year).Close time (year).Resolve time (year).Time from publish to close (year).Time from close to resolve (year).Time from publish to resolve (year).Number of forecasters.Number of predictions.Number of analysed dates, which is the number of instances at which the predictions were assessed.Total belief movement, which is a measure of the amount of updating, and is the sum of the belief movements, which are the squared differences between 2 consecutive beliefs.The values of the beliefs range from 0 to 1, and can respect a:Probability.Ratio between an expectation and difference between the maximum and minimum allowed by Metaculus.T...

Visit the podcast's native language site