[go: up one dir, main page]

Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Insights into the accuracy of social scientists’ forecasts of societal change

Abstract

How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing the accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment on social media, and gender–career and racial bias. After we provided them with historical trend data on the relevant domain, social scientists submitted pre-registered monthly forecasts for a year (Tournament 1; N = 86 teams and 359 forecasts), with an opportunity to update forecasts on the basis of new data six months later (Tournament 2; N = 120 teams and 546 forecasts). Benchmarking forecasting accuracy revealed that social scientists’ forecasts were on average no more accurate than those of simple statistical models (historical means, random walks or linear regressions) or the aggregate forecasts of a sample from the general public (N = 802). However, scientists were more accurate if they had scientific expertise in a prediction domain, were interdisciplinary, used simpler models and based predictions on prior data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Social scientists’ average forecasting errors, compared against different benchmarks.
Fig. 2: Forecasts and ground truth—are forecasts anchoring on the last few historical data points?
Fig. 3: Ratios of forecasting errors among benchmarks compared to scientific forecasts.
Fig. 4: Cross-tournament consistency in the ranking of domains in terms of forecasting inaccuracy.
Fig. 5: Forecasting errors by prediction approach.
Fig. 6: Contributions of specific forecasting strategies and team characteristics to forecasting accuracy.

Similar content being viewed by others

Data availability

All data used in the main text and supplementary analysis are accessible on GitHub (https://github.com/grossmania/Forecasting-Tournament). All prior data presented to the forecasters are available at https://predictions.uwaterloo.ca/. Historical and ground truth markers were obtained from Project FiveThirtyEight (https://projects.fivethirtyeight.com/polls/generic-ballot), Gallup (https://news.gallup.com/poll/203198/presidential-approval-ratings-donald-trump.aspx), Project Implicit (see the Open Science Framework website at https://osf.io/t4bnj) and the US Census Bureau (https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html).

Code availability

Our project page at https://github.com/grossmania/Forecasting-Tournament displays all code from this paper. See the Reporting Summary for the R packages and their versions.

References

  1. Hutcherson, C. et al. On the accuracy, media representation, and public perception of psychological scientists’ judgments of societal change. Preprint at https://doi.org/10.31234/osf.io/g8f9s (2023).

  2. Collins, H. & Evans, R. Rethinking Expertise (Univ. of Chicago Press, 2009).

  3. Fama, E. F. Efficient capital markets: a review of theory and empirical work. J. Finance 25, 383–417 (1970).

    Article  Google Scholar 

  4. Tetlock, P. E. Expert Political Judgement: How Good Is It? (Princeton University Press, 2017).

  5. Hofman, J. M. et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Mandel, D. R. & Barnes, A. Accuracy of forecasts in strategic intelligence. Proc. Natl Acad. Sci. USA 111, 10984–10989 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Makridakis, S., Spiliotis, E. & Assimakopoulos, V. The M4 Competition: 100,000 time series and 61 forecasting methods. Int. J. Forecast. 36, 54–74 (2020).

    Article  Google Scholar 

  8. Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

  9. Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).

    Article  CAS  PubMed  Google Scholar 

  10. Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Fincher, C. L. & Thornhill, R. Parasite-stress promotes in-group assortative sociality: the cases of strong family ties and heightened religiosity. Behav. Brain Sci. 35, 61–79 (2012).

    Article  PubMed  Google Scholar 

  12. Varnum, M. E. W. & Grossmann, I. Pathogen prevalence is associated with cultural changes in gender equality. Nat. Hum. Behav. 1, 0003 (2016).

    Article  Google Scholar 

  13. Schaller, M. & Murray, D. R. Pathogens, personality, and culture: disease prevalence predicts worldwide variability in sociosexuality, extraversion, and openness to experience. J. Pers. Soc. Psychol. 95, 212–221 (2008).

    Article  PubMed  Google Scholar 

  14. van Leeuwen, F., Park, J. H., Koenig, B. L. & Graham, J. Regional variation in pathogen prevalence predicts endorsement of group-focused moral concerns. Evol. Hum. Behav. 33, 429–437 (2012).

    Article  Google Scholar 

  15. Hawkley, L. C. & Cacioppo, J. T. Loneliness matters: a theoretical and empirical review of consequences and mechanisms. Ann. Behav. Med. 40, 218–227 (2010).

    Article  PubMed  Google Scholar 

  16. Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl Acad. Sci. USA 117, 8398–8403 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Liberman, M. Reproducible Research and the Common Task Method (2015); https://www.simonsfoundation.org/event/reproducible-research-and-the-common-task-method/

  18. Hyndman, R. J. & Koehler, A. B. Another look at measures of forecast accuracy. Int. J. Forecast. 22, 679–688 (2006).

    Article  Google Scholar 

  19. Eyal, P., David, R., Andrew, G., Zak, E. & Ekaterina, D. Data quality of platforms and panels for online behavioral research. Behav. Res. Methods https://doi.org/10.3758/s13428-021-01694-3 (2021).

  20. Genz, A. & Bretz, F. Computation of Multivariate Normal and t Probabilities (Springer, 2009).

  21. Green, K. C. & Armstrong, J. S. Simple versus complex forecasting: the evidence. J. Bus. Res. 68, 1678–1685 (2015).

    Article  Google Scholar 

  22. Grossmann, I., Twardus, O., Varnum, M. E. W., Jayawickreme, E. & McLevey, J. Expert predictions of societal change: insights from the World After COVID Project. Am. Psychol. 77, 276–290 (2022).

    Article  PubMed  Google Scholar 

  23. Grossmann, I., Huynh, A. C. & Ellsworth, P. C. Emotional complexity: clarifying definitions and cultural correlates. J. Pers. Soc. Psychol. 111, 895–916 (2016).

    Article  PubMed  Google Scholar 

  24. Alves, H., Koch, A. & Unkelbach, C. Why good is more alike than bad: processing implications. Trends Cogn. Sci. 21, 69–79 (2017).

    Article  PubMed  Google Scholar 

  25. Dimant, E. et al. Politicizing mask-wearing: predicting the success of behavioral interventions among Republicans and Democrats in the U.S. Sci. Rep. 12, 7575 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Dunning, D., Heath, C. & Suls, J. M. Flawed self-assessment. Psychol. Sci. Public Interest 5, 69–106 (2004).

    Article  PubMed  Google Scholar 

  27. Grossmann, I. et al. The science of wisdom in a polarized world: knowns and unknowns. Psychol. Inq. 31, 103–133 (2020).

    Article  Google Scholar 

  28. Porter, T. et al. Predictors and consequences of intellectual humility. Nat. Rev. Psychol. 1, 524–536 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mellers, B., Tetlock, P. E. & Arkes, H. R. Forecasting tournaments, epistemic humility and attitude depolarization. Cognition 188, 19–26 (2019).

    Article  PubMed  Google Scholar 

  30. Grossmann, I. et al. Training for wisdom: the distanced-self-reflection diary method. Psychol. Sci. 32, 381–394 (2021).

    Article  PubMed  Google Scholar 

  31. Klein, R. A. et al. Many Labs 2: investigating variation in replicability across samples and settings. Adv. Methods Pract. Psychol. Sci. 1, 443–490 (2018).

    Article  Google Scholar 

  32. Voslinsky, A. & Azar, O. H. Incentives in experimental economics. J. Behav. Exp. Econ. 93, 101706 (2021).

    Article  Google Scholar 

  33. Cerasoli, C. P., Nicklin, J. M. & Ford, M. T. Intrinsic motivation and extrinsic incentives jointly predict performance: a 40-year meta-analysis. Psychol. Bull. 140, 980–1008 (2014).

    Article  PubMed  Google Scholar 

  34. Richard, F. D., Bond, C. F. Jr. & Stokes-Zoota, J. J. One hundred years of social psychology quantitatively described. Rev. Gen. Psychol. 7, 331–363 (2003).

    Article  Google Scholar 

  35. Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behav. Brain Sci. 33, 61–83 (2010).

    Article  PubMed  Google Scholar 

  36. Yarkoni, T. The generalizability crisis. Behav. Brain Sci. 45, e1 (2022).

    Article  Google Scholar 

  37. Cesario, J. What can experimental studies of bias tell us about real-world group disparities? Behav. Brain Sci. https://doi.org/10.1017/S0140525X21000017 (2021).

  38. IJzerman, H. et al. Use caution when applying behavioural science to policy. Nat. Hum. Behav. 4, 1092–1094 (2020).

    Article  PubMed  Google Scholar 

  39. Varnum, M. E. W. & Grossmann, I. Cultural change: the how and the why. Perspect. Psychol. Sci. 12, 956–972 (2017).

    Article  PubMed  Google Scholar 

  40. Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16, 199–231 (2001).

  41. Lewin, K. Defining the ‘field at a given time’. Psychol. Rev. 50, 292–310 (1943).

    Article  Google Scholar 

  42. Turchin, P., Currie, T. E., Turner, E. A. L. & Gavrilets, S. War, space, and the evolution of Old World complex societies. Proc. Natl Acad. Sci. USA 110, 16384–16389 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Brockwell, P. J. & Davis, R. A. Introduction to Time Series and Forecasting (Springer, 2016); https://doi.org/10.1007/978-3-319-29854-2

  44. Makridakis, S. & Taleb, N. Living in a world of low levels of predictability. Int. J. Forecast. 25, 840–844 (2009).

    Article  Google Scholar 

  45. Hitchens, N. M., Brooks, H. E. & Kay, M. P. Objective limits on forecasting skill of rare events. Weather Forecast. 28, 525–534 (2013).

    Article  Google Scholar 

  46. Jebb, A. T., Tay, L., Wang, W. & Huang, Q. Time series analysis for psychological research: examining and forecasting change. Front. Psychol. 6, 727 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Van Bavel, J. et al. Using social and behavioural science to support COVID-19 pandemic response. Nat. Hum. Behav. 4, 460–471 (2020).

    Article  PubMed  Google Scholar 

  48. Seitz, B. M. et al. The pandemic exposes human nature: 10 evolutionary insights. Proc. Natl Acad. Sci. USA 117, 27767–27776 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Schaller, M. & Park, J. H. The behavioral immune system (and why it matters). Curr. Dir. Psychol. Sci. 20, 99–103 (2011).

    Article  Google Scholar 

  50. Wang, I. M., Michalak, N. M. & Ackerman, J. M. in The SAGE Handbook of Personality and Individual Differences: Origins of Personality and Individual Differences Vol. 2 (eds Zeigler-Hill, V. & Shackelford, T. K.) 321–345 (2018); https://doi.org/10.4135/9781526451200.n18

  51. Luhmann, M. Using Big Data to study subjective well-being. Curr. Opin. Behav. Sci. 18, 28–33 (2017).

    Article  Google Scholar 

  52. Schwartz, H. A. et al. Predicting individual well-being through the language of social media. Biocomputing 2016 https://doi.org/10.1142/9789814749411_0047 (2016).

  53. Kiritchenko, S., Zhu, X. & Mohammad, S. M. Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014).

    Article  Google Scholar 

  54. Witters, D. & Harter, J. In U.S., Life Ratings Plummet to 12-Year Low (2020); https://news.gallup.com/poll/391331/life-ratings-drop-month-low.aspx

  55. Axt, J. R. The best way to measure explicit racial attitudes is to ask about them. Soc. Psychol. Pers. Sci. 9, 896–906 (2018).

    Article  Google Scholar 

  56. Nosek, B. A. et al. Pervasiveness and correlates of implicit attitudes and stereotypes. Eur. Rev. Soc. Psychol. 18, 36–88 (2007).

    Article  Google Scholar 

  57. Hehman, E., Flake, J. K. & Calanchini, J. Disproportionate use of lethal force in policing is associated with regional racial biases of residents. Soc. Psychol. Pers. Sci. 9, 393–401 (2018).

    Article  Google Scholar 

  58. Ofosu, E. K., Chambers, M. K., Chen, J. M. & Hehman, E. Same-sex marriage legalization associated with reduced implicit and explicit antigay bias. Proc. Natl Acad. Sci. USA 116, 8846–8851 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Charlesworth, T. E. S. & Banaji, M. R. Patterns of implicit and explicit attitudes: I. Long-term change and stability from 2007 to 2016. Psychol. Sci. 30, 174–192 (2019).

    Article  PubMed  Google Scholar 

  60. Greenwald, A. G., Nosek, B. A. & Banaji, M. R. Understanding and using the Implicit Association Test: I. An improved scoring algorithm. J. Pers. Soc. Psychol. 85, 197–216 (2003).

    Article  PubMed  Google Scholar 

  61. Gobet, F. The future of expertise: the need for a multidisciplinary approach. J. Expertise 1, 107–113 (2018).

    Google Scholar 

  62. Lenth, R., Singmann, H., Love, J. & Maxime, H. emmeans: Estimated marginal means, aka least-squares means. R package version 1.8.0 (2020).

  63. R Core Team. R: A Language and Environment for Statistical Computing (2022).

  64. Gelman, A. Scaling regression inputs by dividing by two standard deviations. Stat. Med. 27, 2865–2873 (2008).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This programme of research was supported by the Basic Research Program at the National Research University Higher School of Economics (M. Fabrykant), John Templeton Foundation grant no. 62260 (I.G. and P.E.T.), Kega 079UK-4/2021 (P.K.), Ministerio de Ciencia e Innovación España grants no. PID2019-111512RB-I00-HMDM and no. HDL-HS-280218 (A.A.), the National Center for Complementary & Integrative Health of the National Institutes of Health under award no. K23AT010879 (S.B.G.), National Science Foundation RAPID grant no. 2026854 (M.E.W.V.), PID2019-111512RB-I00 (M.S.), NPO Systemic Risk Institute grant no. LX22NPO5101 (I.R.), the Slovak Research and Development Agency under contract no. APVV-20-0319 (M.A.), Social Sciences and Humanities Research Council of Canada Insight grant no. 435-2014-0685 (I.G.), Social Sciences and Humanities Research Council of Canada Connection grant no. 611-2020-0190 (I.G.), and Swiss National Science Foundation grant no. PP00P1_170463 (O. Strijbis). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank J. Axt for providing monthly estimates of Project Implicit data and the members of the Forecasting Collaborative who chose to remain anonymous for their contribution to the tournaments.

Author information

Authors and Affiliations

Consortia

Contributions

Conceptualization: I.G., A.R., C.A.H., M.E.W.V., L.T. and P.E.T. Data curation: I.G., K.S., G.T.S. and O.J.T. Forecasting: S.A., M.K.D., X.E.G., M. J. Hirshberg, M.K.-Y., D.R.M., L.R., A.V., L.W., M.A., A.A., P.A., K.B., G.B., F.B., E.B., C.B., M.B., C.K.B., D.T.B., E.M.C., R.C., B.-T.C., W.J.C., C.W.C., L.G.C., M. Davis, M.V.D., N.A.D., J.D.D., M. Dziekan, C.T.E., E.S., M. Fabrykant, M. Firat, G.T.F., J.A.F., J.M.G., S.B.G., A.G., J.G., L.G.-V., S.D.G., S.H., A.H., M. J. Hornsey, P.D.L.H., A.I., B.J., P.K., Y.J.K., R.K., D.G.L., H.-W.L., N.M.L., V.Y.Q.L., A.W.L., A.L.L., C.R.M., M. Maier, N.M.M., D.S.M., A.A.M., M. Misiak, K.O.R.M., J.M.N., J.N., K.N., J.O., T.O., M.P.-C., S.P., J.P., Q.R., I.R., R.M.R., Y.R., E.R., L.S., A.S., M.S., A.T.S., O. Simonsson, M.-C.S., C.-C.T., T.T., B.A.T., D.T., D.C.K.T., J.M.T., L.U., D.V., L.V.W., H.A.V., Q.W., K.W., M.E.W., C.E.W., T.Y., K.Y., S.Y., V.R.A., J.R.A.-H., P.A.B., A.B., L.C., M.C., S.D.-H., Z.E.F., C.R.K., S.T.K., A.L.O., L.M., M.S.M., M.F.R.C.M., E.K.M., P.M., J.B.N., W.N., R.B.R., P.S., A.H.S., O. Strijbis, D.S., E.T., A.v.L., J.G.V., M.N.A.W. and T.W. Formal analysis: I.G. and C.A.H. Funding acquisition: I.G. Investigation: I.G., A.R. and C.A.H. Methodology: I.G., A.R., C.A.H., K.S., M.E.W.V., S.A., D.R.M., L.R., L.T., A.V., R.N.C., L.U. and D.V. Project administration: I.G., A.R., M.E.W.V., M.K.-Y. and O.J.T. Resources: I.G., A.R., J.N. and G.T.S. Supervision: I.G. Validation: K.S., X.E.G. and L.W. Visualization: I.G. and M.K.D. Writing—original draft: I.G. Writing—review and editing: I.G., A.R., C.A.H., K.S., M.E.W.V., S.A., M.K.D., X.E.G., M. J. Hirshberg, M.K.-Y., D.R.M., L.R., L.T., A.V., L.W., M.A., A.A., P.A., K.B., G.B., F.B., E.B., C.B., M.B., C.K.B., D.T.B., E.M.C., R.C., B.-T.C., W.J.C., R.N.C., C.W.C., L.G.C., M. Davis, M.V.D., N.A.D., J.D.D., M. Dziekan, C.T.E., E.S., M. Fabrykant, M. Firat, G.T.F., J.A.F., J.M.G., S.B.G., A.G., J.G., L.G.-V., S.D.G., S.H., A.H., M. J. Hornsey, P.D.L.H., A.I., B.J., P.K., Y.J.K., R.K., D.G.L., H.-W.L., N.M.L., V.Y.Q.L., A.W.L., A.L.L., C.R.M., M. Maier, N.M.M., D.S.M., A.A.M., M. Misiak, K.O.R.M., J.M.N., K.N., J.O., T.O., M.P.-C., S.P., J.P., Q.R., I.R., R.M.R., Y.R., E.R., L.S., A.S., M.S., A.T.S., O. Simonsson, M.-C.S., C.-C.T., T.T., B.A.T., P.E.T., D.T., D.C.K.T., J.M.T., L.V.W., H.A.V., Q.W., K.W., M.E.W., C.E.W., T.Y., K.Y. and S.Y.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Richard Klein and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–15, Tables 1–9 and Appendices 1–5.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

The Forecasting Collaborative. Insights into the accuracy of social scientists’ forecasts of societal change. Nat Hum Behav 7, 484–501 (2023). https://doi.org/10.1038/s41562-022-01517-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-022-01517-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing