2020 Predictions: Calibration Results

At the beginning of every year, I make predictions. At the end of every year, I score them (this year I’m very late). Here are 2014, 2015, 2016, 2017, 2018, and 2019.

And here are the predictions I made for 2020. Some predictions are redacted because they involve my private life or the lives of people close to me. Usually I use strikethrough for things that didn’t happen, but since Substack doesn’t let me strikethrough text or change its color or do anything interesting, I’ve had to turn the ones that didn’t happen into links. Italicized are getting thrown out because they were confusing or conditional on something that didn’t happen. I can’t decide if they’re true or not. All of these judgments were as of December 31 2020, not as of now.

(Remember, link means something that didn’t happen, not something I was wrong about. We have a debate every year over whether 50% predictions are meaningful in this paradigm; feel free to continue it.)

CORONAVIRUS:
1. Bay Area lockdown (eg restaurants closed) will be extended beyond June 15: 60%
2. …until Election Day: 10%
3. Fewer than 100,000 US coronavirus deaths: 10%
4. Fewer than 300,000 US coronavirus deaths: 50%
5. Fewer than 3 million US coronavirus deaths: 90%
6. US has highest official death toll of any country: 80%
7. US has highest death toll as per expert guesses of real numbers: 70%
8. NYC widely considered worst-hit US city: 90%
9. China’s (official) case number goes from its current 82,000 to 100,000 by the end of the year: 70%
10. A coronavirus vaccine has been approved for general use and given to at least 10,000 people somewhere in the First World: 50%
11. Best scientific consensus ends up being that hydroxychloroquine was significantly effective: 20%
12. I personally will get coronavirus (as per my best guess if I had it; positive test not needed): 30%
13. Someone I am close to (housemate or close family member) will get coronavirus: 60%
14. General consensus is that we (April 2020 US) were overreacting: 50%
15. General consensus is that we (April 2020 US) were underreacting: 20%
16. General consensus is that summer made coronavirus significantly less dangerous: 70%
17. …and there is a catastrophic (50K+ US deaths, or more major lockdowns, after at least a month without these things) second wave in autumn: 30%
18. I personally am back to working not-at-home: 90%
19. At least half of states send every voter a mail-in ballot in 2020 presidential election: 20%
20. PredictIt is uncertain (less than 95% sure) who won the presidential election for more than 24 hours after Election Day. 20%

POLITICS:
21. Democrats nominate Biden, and he remains nominee on Election Day: 90%
22. Balance of evidence available on Election Day supports (as per my opinion) Tara Reade accusation: 90%
23. Conditional on me asking about Reade on SSC survey, average survey-taker’s credence in her accusation is greater than 50%: 70%
24. …greater than 75%: 10%
25. …greater than credence in Kavanaugh accusation asked in the same format: 40%
26. Trump is re-elected President: 50%
27. Democrats keep the House: 70%
28. Republicans keep the Senate: 50%
29. Trump approval rating higher than 43% on June 1: 30%
30. Biden polling higher than Trump on June 1: 70%
31. At least one new Supreme Court Justice: 20%
32. I vote Democrat for President: 80%
33. Boris still UK PM: 90%
34. No new state leaves EU: 90%
35. UK, EU extend “transition” trade deal: 80%
36. Kim Jong-Un alive and in power: 60%

ECON AND TECH:
37. Dow is above 25,000: 70%
38. …above 30,000: 20%
39. Bitcoin is above $5,000: 70%
40. …above $10,000: 20%
41. I have bought a Surface Book 3 laptop: 60%
42. Crew Dragon reaches orbit: 80%
43. Starship reaches orbit: 40%

SSC, ETC:
44. I do another Nootropics Survey this year: 70%
45. I do another SSC Survey this year: 90%
46. I start a Reader SSC Survey this year: 60%
47. I start a SSC Book Review Contest this year: 70%
48. I run another Adversarial Collaboration Contest this year: 10%
49. I publish [redacted]: 20%
50. I publish [redacted]: 50%
51. I publish [redacted]: 60%
52. I publish Studies On Slack: 80%
53. …conditional on being published, it gets at least 40,000 pageviews: 10%
54. I publish [redacted]: 60%
55. …conditional on being published, it gets at least 40,000 pageviews: 50%
56. More hits this year than last: 70%
57. Most hits ever this year: 20%
58. I finish Unsong revision this year: 40%
59. New co-blogger with more than 3 posts: 10%

FRIENDS:
60. No new long-term (1 month +) residents at group house by the end of the year: 70%
61. Koios has said his first clear comprehensible word: 50%
62. [redacted] still lives here at the end of the year: 40%
63. [redacted] still lives here at the end of the year: 60%
64. [redacted] still lives here at the end of the year: 80%
65. [redacted] still lives here at the end of the year: 80%
66. I still live here at the end of the year: 95%
67. [redacted]: 10%
68. Quarantine has lifted enough to restart regular D&D game: 95%
69. [redacted] and [redacted] are still dating: 80%
70. [redacted] and [redacted] are still dating: 80%
71. [redacted]: 50%

PROFESSIONAL
72. I’ve gotten at least one new patient to do a full wake therapy protocol: 60%
73. I have specific, set-in-motion plans to quit work / start my own business: 5%
74. I work the same schedule and locations I did before the coronavirus: 80%
75. I get a bonus for 2020: 20%

PERSONAL:
76. [redacted]: 70%
77. [redacted]: 70%
78. [redacted]: 95%
79. I travel to Alaska this year: 60%
80. [redacted]: 40%
81. [redacted]: 20%
82. I go on at least three dates with someone I haven’t met yet: 20%
83. [redacted]: 10%
84. [redacted]: 30%
85. [redacted]: 10%
86. I try one biohacking project per month x at least 5 of the last 6 months of 2020: 30%
87. I find at least one new supplement I take or expect to take regularly x 3 months: 20%
88. Not eating meat at home: 40%
89. Weight below 200: 50%
90. Weight below 190: 10%
91. [redacted]: 90%
92. [redacted]: 30%
93. [redacted]: 5%
94. I travel outside the country at least once: 10%
95. I get back into meditating seriously (at least ten minutes a day, five days a week) for at least a month: 10%
96. At least ten tweets in 2020: 80%
97. I eat at/from Sliver more than any other restaurant in Q4 2020: 50%
98. [redacted]: 30%
99. I do pushups and situps at least 3 days/week in average week of Q4 2020: 60%
100. I write the post scoring these predictions before 2/1/21: 70%

To make binning easier, I’ve converted 5% predictions into 95% predictions of the opposite, 10% predictions into 90% predictions of the opposite, and so on. So:

Of 50% predictions, I got 2 right and 7 wrong, for a total of 22%
Of 60% predictions, I got 8 right and 8 wrong, for a total of 50%
Of 70% predictions, I got 15 right and 6 wrong, for a total of 71%
Of 80% predictions, I got 14 right and 8 wrong, for a total of 64%
Of 90% predictions, I got 16 right and 3 wrong, for a total of 84%

Here’s the usual graph:

For the first time, I was consistently overconfident (below the green line of perfect calibration) in every bin (except 70%). It wasn’t a complete disaster, except at 50% (again, feel free to have your usual debate over whether 50% predictions are meaningful) and 95% (too low a sample size), but it was still bad.

Most of my mistakes were correlated with two big errors: expecting COVID lockdown to last much shorter than it did, and missing the NYT situation and its aftermath. This isn’t an excuse - part of what you’re supposed to do in being calibrated is understand the possibility that black swan events could happen that throw everything off.

…except - wait. Is it an excuse? This year had more black swan events than usual; it was the weirdest year since I started doing these predictions in 2014. If you know that there will be frequent normal years but also some rare weird years, but you don’t know which are which, in order to get the right calibration overall you should end up looking slightly underconfident during normal years, and significantly overconfident during weird ones, to average out to the right level. In 2019 (which was pretty normal, as years go) I was 4% underconfident; this year I was about 10% overconfident. Maybe if I go back long enough it will all average out? Remind me to check this sometime.

Other possible lessons from this year: I apparently overestimate the likelihood that my friends’ relationships will stay together (but am pretty well-calibrated about mine). I’m bad at predicting my future purchasing decisions. I did a decent job predicting how much self-improvement I’d do, and it went okay.

Next up: I’m going to try to grade my predictions for the Trump administration, then make some more predictions in this format for next year. But along with that, I want to keep a running prediction log of things I’m actually blogging about, or things I’m thinking about at the time - you can see what I have so far here.