Mantic Monday: Judging April COVID Predictions

Since this is getting broader than just Metaculus, I’m changing the name to Mantic Monday, after an obscure word for “oracular” (and changing the preview image to a mantis, since I don’t know how else to visually represent “mantic”. And posting it early Tuesday morning because I’m late).

In April 2020, I made my yearly predictions, and many of them were about the (then new) coronavirus pandemic.

Two other people on Less Wrong, Zvi and Bucky, decided to test themselves against me by trying to predict the same questions. Zvi saw my answers beforehand; Bucky didn’t. Here’s how we did (except where otherwise stated, all predictions are for 12/31/20):

Black statements are those judged true, red statements false. The numbers on the left are our predictions, so for example I said there was a 60% chance that Bay Area lockdowns would extend beyond June 15.

You can see a list of the full questions and why I graded them the way I did in the appendix at the bottom.

I scored these using a logarthmic scoring rule, adjusted so that guessing 50-50 always gave zero points. It’s not very intuitive. Getting everything maximally right gives a score of about 14; guessing 50-50 for everything gives a score of 0, getting everything maximally wrong gives a score of negative infinity.

Bucky and I got negative scores; Zvi’s was originally negative, but after a last-minute rules change it became just barely positive again. Is this bad? Well, the two of us did worse than just guessing 50% for everything. Part of this was that the scoring system very heavily penalized wrong answers compared to how it rewarded correct ones. But another part was that we were just really wrong about a lot of things. All three of us said there was a 80%+ chance that the election would get called within a day; it wasn’t. All three of us thought there was a 90% chance lockdowns would end by November (though the other two participants say this was a bit ambiguous because some lockdowns eased up before November, then tightened again). These highly confident wrong answers cancelled out a lot of other good work.

Wringing an absolute judgment from a scoring rule like this is hard, but getting relative standings is more straightforward. I did better than Bucky, and Zvi did better than me. These are just twenty questions, so it could just be coincidence. I don’t know Bucky, but I know Zvi pretty well and in his case it’s not. He’s a former professional trader and sports gambler, and he does coronavirus modeling in his spare time with his family of expert biologists. Obviously he would win this one. I will never bet against Zvi on anything. Last year he bet against me on what restaurant I would have dinner at, without knowing anything about my situation or food preferences, and won anyway.

There’s something weird about treating a deadly plague like a game show or a horse race. But I like knowing whose opinions to defer to and how much to defer to them, versus when I should stick to my guns. I ran last week’s post about coronavirus by Zvi, he disagreed with some of it, and on consideration I made the edits he wanted. But also, I like being able to see my biases and where I need to improve. For example, I see I did the best of the three of us on purely biological questions (like whether the virus would be seasonal), which makes sense since I have medical training. But I did worst at predicting events in my own life (like whether I would get COVID myself). In fact, Zvi pointed out that some of my predictions about my own life were actually inconsistent with each other; embarrassing! I think this is a fluke; on past predictions I haven’t done worse on private questions than on scientific ones. But it gives me something to watch out for, to see if maybe I get too emotional about personal questions and my accuracy breaks down.

Assorted Links

1: An alert reader linked me to another play money prediction market, Hypermind, with about two dozen questions including some focused on France and Africa. I’m not too excited by anything up there now, but it looks like they’re working on an AI forecasting contest.

2: Vitalik Buterin talks about his adventures winning $50,000 betting against Trump on Ethereum prediction market Augur. It took a pretty complicated chain of crypto contracts to make it work, and I look forward to the time when people will be able to use this technology fluidly without having to literally be Vitalik Buterin. He concludes (like many people) that prediction markets were kind of dumb this past election but there are reasons to think they can get smarter fast:

I expect prediction markets to become an increasingly important Ethereum application in the years to come. The 2020 election was only the beginning; I expect more interest in prediction markets going forward, not just for elections but for conditional predictions, decision-making and other applications as well. The amazing promises of what prediction markets could bring if they work mathematically optimally will, of course, continue to collide with the limits of human reality, and hopefully, over time, we will get a much clearer view of exactly where this new social technology can provide the most value.

3: This week on Metaculus: will a third-party candidate win 5%+ of the popular vote in 2024? Users say 15% chance, which I started out thinking was way too high. But they reminded me that Perot did in both ‘92 and ‘96, and if something’s happened two of the last eight times it could have, maybe it’s actually kind of common? Add that to the constant threats by Trumpist or anti-Trumpist conservatives to split from the Republican party, and maybe they’re not crazy? I’m still betting against.

4: Also, will Bitcoin outperform the US stock market over the next five years, at 51%. I started out thinking - of course it’s 50-50! By the efficient market hypothesis, if any asset was obviously going to do better than another, people would change the price until it wasn’t. But on second thought that’s wrong - stocks have a higher than 50% chance of beating treasuries over the same period because of a risk premium. Maybe there’s no intuitive way to think about this, you have to have opinions on the underlying fundamentals, and it’s only 51% by coincidence?

Thanks to everyone who sent me interesting articles on Metaculus’ scoring function; I’ll talk about that next time.

APPENDIX: Coronavirus Questions

1. Will the Bay Area stay locked down (eg restaurants closed) beyond June 15, 2020?

I judged this as “yes” - there was a pretty significant lockdown and you could not eat meals in a restaurant. This was what I meant when writing the question, but I can imagine other people objecting because the lockdown relaxed a little and some restaurants were open for take-out or delivery. But it’s my question, I get to decide how it resolves, and I judge this as true.

2. Will the Bay Area stay locked down until Election Day (11/3/2020)?

Same problem as before, plus “until” could suggest it had to be continuous, but I’m still judging this true.

3. Will there be fewer than 100,000 US deaths from coronavirus?

Johns Hopkins says there were about 353,000 on 12/31/20, so false.

4. …fewer than 300,000 US coronavirus deaths

Still false.

5. …fewer than 3 million US coronavirus deaths

True.

6. Will the US have the highest official coronavirus death toll of any country?

True. US was at 353,000, second place was Brazil with 195K.

7. …the highest death toll as per expert guesses of real numbers?

I haven’t seen these numbers, but I haven’t seen anyone make a plausible case some other country is off by a factor of two. This was mostly in there to cover China (or someone) lying about their death count in an obvious way, which didn’t happen. True.

8. Will New York City still be widely considered the worst-hit US city?

Man, what is “widely considered”? NYC had the highest total death toll, but not the highest total number of cases or highest death toll per capita (although all the cities that had higher death tolls per capita were small and maybe shouldn’t count in the same league). I wanted to judge this false, but Zvi actually polled people and they said it was, so I decided to say true.

9. Will China’s reach 100,000 official cases?

China’s official case count on 12/31/20 was 95,963, so false.

10. By 12/31/20, will a coronavirus vaccine have been approved for general use and given to at least 10,000 people somewhere in the First World?

According to Our World In Data, hundreds of thousands of people in the US and UK had gotten vaccines by 12/20/20. True.

11. Will the scientific consensus end up being that hydroxychloroquine was significantly effective?

UpToDate, the closest thing to a canonical medical recommendation site, currently says: “We suggest not using hydroxychloroquine or chloroquine in hospitalized patients given the lack of clear benefit and potential for toxicity. In June 2020, the US FDA revoked its emergency use authorization for these agents in patients with severe COVID-19, noting that the known and potential benefits no longer outweighed the known and potential risks”. False.

12. Will I personally will get coronavirus (as per my best guess if I had it; positive test not needed)

As far as I know I haven’t gotten it. False.

13. Will someone I’m close to (housemate or close family member) get coronavirus?

Nobody I know personally got COVID, although a few of my patients did. False.

14. Will the general consensus by that we (the people of April 2020, when I was writing these predictions) were overreacting?

I don’t feel like there’s a general consensus in this direction. False.

15. …that we were under-reacting: 50%

April 2020 was when there were strict nationwide lockdowns and everyone was panicking. There were many things we should have been doing better but I don’t think we were under-reacting then per se. False.

16. Will there be a general consensus that summer made coronavirus significantly less dangerous?

Most papers I read now agree that coronavirus is a seasonal disease and that it’s not a coincidence that cases went down in summer and up in winter. True.

17. Will there be a catastrophic second wave (50K+ US deaths, or more major lockdowns, after at least a month without these things) in autumn?

In August 2020 there were only 30K US deaths; in December 2020 there were 70K. We never got completely out of lockdown, but the lockdowns got much tighter in December. True.

18. Will I personally get back to working not-at-home?

My office briefly opened in October or so for very essential business, but I chose not to go in. Then it shut again in November. I never went back. False.

19. Will at least half of states send every voter a mail-in ballot in 2020 presidential election?

According to this article, only ten states did this. False.

20. Will the PredictIt be uncertain (less than 95% sure) who won the presidential election for more than 24 hours after Election Day?

This was meant to be a proxy for whether there would be a lot of uncertainty about who won the election because of trouble counting mail-in ballots. There was - it took the major networks a few days to call it for Biden, whereas they usually can do it that night. But unrelatedly, even after the networks called it for Biden the prediction markets failed to converge; some combination of high fees, transaction limits, and very stubborn Trump supporters. I think you could still buy shares of Biden for 94 cents almost up until the inauguration. Definitely true.