Mantic Monday: Mantic Matt Y

The current interest in forecasting grew out of Iraq-War-era exasperation with the pundit class. Pundits were constantly saying stuff, like “Saddam definitely has WMDs, trust me, I’m an expert”, then getting proven wrong, then continuing to get treated as authorities and thought leaders. Occasionally they would apologize, but they’d be back to telling us what we Had To Believe the next week.

You don’t want a rule that if a pundit ever gets anything wrong, we stop trusting them forever. Warren Buffett gets some things wrong, Zeynep Tufecki gets some things wrong, even Nostradamus would have gotten some things wrong if he’d said anything clearly enough to pin down what he meant. The best we can hope for is people with a good win-loss record. But how do you measure win-loss record? Lots of people worked on this (especially Philip Tetlock) and we ended up with the kind of probabilistic predictions a lot of people use now.

But not pundits. We never did get the world where pundits, bloggers, and other commentators post predictions clearly in a way where they can check up on them later.

…until recently! As far as I know, the first official journalists to do something like this were Dylan Matthews, Kelsey Piper and Sigal Samuel at Vox. They’re trying again this year, but now they’re joined by a pretty big name in traditional punditry - Matt Yglesias, formerly of Vox, now here at Substack. In theory you can read the relevant post here, but it’s paywalled. We’ll start with the predictions themselves, then talk about what this means for journalism. Here are the questions to be predicted:

1. Jon Ossoff and Raphael Warnock win the Georgia Senate races
2. The same party wins both Senate races in Georgia
3. Joe Biden ends the year with his approval rating higher than his disapproval rating
4. Joe Biden ends the year with his approval rating above 50%
5. US GDP growth in 2021 is the fastest of any year of the 21st century
6. The year-end unemployment rate is below 5 percent
7. The year-end unemployment rate is above 4 percent
8. Lakers win the NBA championship
9. Joe Biden ends the year as president
10. Nancy Pelosi sets a definitive retirement schedule
11. A vacancy arises on the Supreme Court
12. The EU ends the year with more confirmed Covid-19 deaths than the US
13. Substack will still be around
14. People will still be writing takes asking if Substack is really sustainable
15. Apple releases new iMacs powered by Apple silicon
16. Apple does not release a new Mac Pro powered by Apple silicon
17. Monthly year-on-year core CPI growth does not go above 2 percent
18. Monthly year-on-year core CPI growth does not go above 3 percent
19. Lloyd Austin not confirmed as Defense Secretary
20. No federal tax increases are enacted
21. Biden administration unilaterally relieves some but not all student debt
22. United States rejoins JCPOA and Iran resumes compliance
23. Israel and Saudi Arabia establish official diplomatic relations
24. US and China reach agreement to lift Trump-era tariffs
25. Slow Boring will exceed 10,000 paid members

Metaculus asked Yglesias for permission to put some of the predictions up on their platform, to see if their crowdsourced forecasts could beat his; he graciously agreed. Here are the predictions. Yglesias’ numbers are bold and in parentheses. Metaculus’ numbers are in brackets (not all questions are on Metaculus).

1. Jon Ossoff and Raphael Warnock win the Georgia Senate races (60%)
2. The same party wins both Senate races in Georgia (95%)
3. Joe Biden ends the year with his approval rating higher than his disapproval rating (70%) [83%]
4. Joe Biden ends the year with his approval rating above 50% (60%) [60%]
5. US GDP growth in 2021 is the fastest of any year of the 21st century (80%) [84%]
6. The year-end unemployment rate is below 5 percent (80%)
7. The year-end unemployment rate is above 4 percent (80%)
8. Lakers win the NBA championship (25%) [25%]
9. Joe Biden ends the year as president (95%) [96%]
10. Nancy Pelosi sets a definitive retirement schedule (60%)
11. A vacancy arises on the Supreme Court(70%) [50%]
12. The EU ends the year with more confirmed Covid-19 deaths than the US (60%)[80%]
13. Substack will still be around (95%)
14. People will still be writing takes asking if Substack is really sustainable (80%)
15. Apple releases new iMacs powered by Apple silicon(90%) [84%]
16. Apple does not release a new Mac Pro powered by Apple silicon(70%) [53%]
17. Monthly year-on-year core CPI growth does not go above 2 percent (70%)
18. Monthly year-on-year core CPI growth does not go above 3 percent (90%)
19. Lloyd Austin not confirmed as Defense Secretary(60%)
20. No federal tax increases are enacted (95%)
21. Biden administration unilaterally relieves some but not all student debt (80%)
22. United States rejoins JCPOA and Iran resumes compliance (80%)
23. Israel and Saudi Arabia establish official diplomatic relations (70%)[38%]
24. US and China reach agreement to lift Trump-era tariffs (70%)
25. Slow Boring will exceed 10,000 paid members(70%) [75%]

Yglesias and Metaculus agree on most things (not Israel/Saudi Arabia, though!). Some of the disagreements might come from Yglesias making his predictions in late December and Metaculus opening theirs in February, which is kind of unfair to Matt.

But okay. Zoom out a second. End of the year comes, and Yglesias gets an adjusted log odds score of 1.2. Or 0.5. Or -100. What does it mean? That he’s a good pundit? A bad pundit? Nothing whatsoever?

Although Yglesias (and a few other people) doing something like this is a great step, it doesn’t, on its own, bring us to a future where we can easily notice bad pundits and stop listening to them.

It’s easy to get a higher score than someone else by answering easier questions. The classic example is:

1. The sun will come up tomorrow: 100%
2. …and the next day: 100%
3. …and the day after that: 100%

Do this enough and you can get whatever score you want. Yglesias isn’t doing this - he’s predicting genuinely difficult things. But if the set of questions he predicts is slightly easier or harder than the set of questions someone else predicts, his score can be artificially higher or lower.

What you really want is to have everyone answering the same questions. But that’s not really what punditry is about. I don’t even know what an “Apple Silicon” is, I don’t claim to understand it, and my work as a blogger doesn’t involve making any predictions about it. I will fail all questions that involve pontificating on “Apple Silicon”, and that’s fine. But that means you can’t ask me and Matt Yglesias to answer the same set of questions to decide which of us is a “better pundit”. In fact, you don’t want to do this - part of being a good pundit is knowing what your areas of expertise are, and limiting yourself to them.

And also, as great as this list is, it seems kind of…artificial. It’s only tangentially related to what happens when Matt does his job as a pundit and influences the rest of us. When I look at his Substack, the first post to strike my interest is Wealth Isn’t What Matters (ironically, behind a paywall) - saying that on-paper wealth statistics aren’t a great way to judge who is or isn’t needy or taxable. This is a great point (you’ll have to trust me, because of the paywall), but it’s not obvious to me that Matt’s ability to predict whether there will be a vacancy on the Supreme Court this year correlates very well with whether he can teach us about the flaws in on-paper wealth statistics.

Maybe a better example of this is last Thursday’s The Coming Mild Inflation (also paywalled), where he says that there will probably be some mild inflation around the second half of 2021, but that this won’t spiral into terrible 70s style inflation and overall will be good for the economy. This is another great post, and I learned a lot from it, and after reading it I agree with its perspective. And in theory it’s predictable - in fact, Matt’s predictions about CPI growth are coming from the same model this post is. So if he’s right about his CPI predictions, we can trust this post a little more.

This was how I did things for a few years, but lately I’ve been trying to cut out the middleman and just put predictions about the things I’m predicting (don’t worry, I’ll also make a longer predictions thread later). Like, instead of predicting the chance that Biden will stay president in 2022, and then telling you things about antidepressants, I’m just posting about antidepressants and then at the bottom posting the predictions that the post implies (in this case, things like what drugs will or won’t get approved by the FDA). Then I put all of them into a thread where you can see them all at once, and, sometimes, bet on them. If I were Yglesias, I would have ended the inflation post with some predictions about what inflation would be, whether conditional on there being mild inflation it would spiral out-of-control, and maybe predictions about how the economy would do conditional on different levels of inflation. Maybe not actually, because this would have been really hard and there are so many other factors affecting the economy that probably inflation doesn’t explain much of the variance. But this is the direction I would want people to be thinking in.

When this really starts getting good is when there’s a prediction market you can compare to. Imagine Dumb ’00s Pundit #X predicted “We’ll definitely find WMDs in Iraq”, and the prediction markets said 50% chance. After a while, you can start noticing that the pundit underperforms the prediction markets on the specific topics they’re interested in. Now you do have some evidence that maybe they’re a fraud.

Metaculus has markets for some of Yglesias’ predictions, but it’s not a great comparison. For one thing, Metaculites got an extra two months to think about them and watch what happened. For another, the Metaculites got to see Yglesias’s predictions, but Yglesias didn’t get to see the Metaculites.

I think this is the same as saying “this pundit could make money on the prediction markets”, if we interpret the pundit’s claim as being a bid. So one way of scoring pundits would be “how would an investment portfolio based on buying each claim made by this pundit do?” (apparently Byrne Hobart would do very well in this system)

This doesn’t quite capture everything we want from punditry. It’s possible to imagine a version of Matt Yglesias who’s usually wrong about testable claims (or at least no righter than the markets) but still does great work doing things like explaining why on-paper wealth statistics aren’t accurate. Also, we should cherish people who are often extremely wrong but occasionally right when nobody else is; these people would lose a lot of money, but still introduce important ideas to the conversation.

In fact, I think a lot of commentators would admit they’re not really in the prediction business and probably couldn’t do it any better than anyone else. I think that getting commentators to admit this would be a fine outcome for this multi-decade line of research - as long as they stuck to it. If I claim that nothing I say has predictable real-world consequences, I shouldn’t be in the business of assessing the risks/benefits of Vitamin D, or which theories of antidepressant action make sense, or telling Republicans they could win if they focused on class more. If I learned I was wrong (or at least no righter than average) about all those things, I guess I could continue to write - this post I’m writing now has no predictable real-world consequences - but it would have to be a much humbler and pared-down blog.

Right now this barely matters because there aren’t prediction markets for most of these things. When I’m saying I don’t believe the latest antidepressant study, I’m not disagreeing with the probability estimate of a prediction market, I’m making an assertion in a field there isn’t a market in. I feel weird having to do this. It’s like - get with the program, social technology!

In my ideal world, it’s silly for random psychiatrists to be speculating on psychiatry papers. There would already be good prediction markets in which ones will or won’t pan out. There would be a few teams, people, and companies who are known for being great at trading in them, and who have expertise in knowing which people are real experts who should be consulted. Probably some of those real experts are currently psychiatry professors at Stanford, and others are obsessive autodidacts we don’t know anything about yet. People like me would check to see if we are one of those people, mostly find we weren’t, and then be happy to take a less exciting position explaining findings to the public without trying to second-guess them.

I’m not sure there will ever be a world where prediction markets are quite that omnipresent, but I’m grateful to Matt Yglesias and Metaculus for helping take a first step.