My Favourite Scientific Papers of 2020

Just as I summarise my favourite books of 2020 I also want to highlight some excellent scientific papers I read this year. Some might have been published earlier than 2020, some of them are peer-reviewed, some of them come from Arxiv. The five papers I have selected cover various levels of technicality, but all are closely related to my area of research, statistics. Interestingly, I discovered these five different papers through five different channels!

Once my top papers were chosen I realised that all of the first authors’ names suggest that they are men. Not only do I see it as a call for action to the whole scientific community to encourage diversity, but as a personal duty of mine to actively search for papers published by women and other marginalised groups in 2021.

So without further ado, here are my top papers of 2020, accompanied by a brief explanation on why they made it onto this list, their topic, how I found the paper, and my suggested target audience!

Holes in Bayesian Statistics (Gelman and Yao, 2020)

Topic: Criticism of Bayesian statistics
Discovered through: the Arxiv newsletter
Target audience: anyone who has an opinion (positive or negative) on Bayesian statistics or wants to have one

My opinion: I am a Bayesian through and through, and therefore I find it essential to confront myself with the criticism raised against it. Interestingly, in this paper this has been done by two statisticians who are inherently Bayesian themselves. They excel at showcasing examples where the theory, that I thought to be impenetrable, fails us in real life experiments (e.g. quantum physics). Moreover, they raise important aspects in the discussion of frequentist and Bayesian statistics.

How not to factor a miracle (Wise, 2015)

Topic: Seeing the world through the eyes of maths
Discovered through: a coffee chat with a friend
Target audience: scientists and scientifically minded people

My opinion: In this paper the author succeeds in abstracting the scientific process into two broad categories: reduction (looking at all characteristics of one individual) and co-reduction (looking at one characteristic of all individuals). After reading this paper I could not help but notice this distinction in a variety of situations and the conflicts that arise when we try to use one technique where the other one would be appropriate. In addition, the paper also talks about the “miracle” of scientific discovery and what aspects of it are truly outstanding.

Making Bayesian Predictive Models Interpretable: A Decision Theoretic Approach (Afrabandpey et al., 2019)

Topic: making algorithms interpretable
Discovered through: a lecture by Samuel Kaski that was posted on Twitter
Target audience: scientists with some experience in modelling

My opinion: What I liked best about this paper is its underlying idea: using a decision tree as interpretable model and additive model as reference model. Crucially, this duett of models performs better than restricting the prior to simple models for one model to serve both needs, while still offering some notion of interpretability. I am very curious to see where the field of interpretable algorithms is going in the next couple of years!

Exchangeability, Correlation, and Bayes’ Effect (O’Neill, 2009)

Topic: Exchangeability and Bayesian statistics
Discovered through: an ancient Stackexchange post
Target audience: scientists with some notion of statistics, statisticians of any level (especially students)

My opinion: I discovered this paper already a few years ago, but I constantly revisit it, hence its place on this list. This paper gives a great explanation of the otherwise often hard to understand concept of exchangeability, while simultaneously providing a manifesto for Bayesian statistics. I really wish I read this paper much earlier in my education as it provides such a well-rounded account of some basic, fundamental statistical concepts.

Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election (Meng, 2018)

Topic: Big data paradoxon
Discovered through: a footnote in a seminar
Target audience: statisticians of any level

My opinion: The main idea of the paper is rather straightforward: even extremely modest correlation between the value of interest and if it is reported or not can cause huge problems in our analyses. This paper contains a wonderful debate on big data and the dangers that come with it. It is definitely on the lengthy side, but less mathematically affine readers can skip some sections without losing the overall message of the paper.

References:

Afrabandpey, H., Peltola, T., Piironen, J., Vehtari, A., & Kaski, S. (2020). A decision-theoretic approach for model interpretability in Bayesian framework. Machine Learning, 109(9), 1855–1876. https://doi.org/10.1007/s10994-020-05901-8

Gelman, A., & Yao, Y. (2020). Holes in Bayesian Statistics. ArXiv:2002.06467 [Math, Stat]. http://arxiv.org/abs/2002.06467

Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2), 685–726. https://doi.org/10.1214/18-AOAS1161SF

O’Neill, B. (2009). Exchangeability, Correlation, and Bayes’ Effect. International Statistical Review, 77(2), 241–250. https://doi.org/10.1111/j.1751-5823.2008.00059.x

Wise, D. K. (2015). How not to factor a miracle. ArXiv:1512.05217 [Physics]. http://arxiv.org/abs/1512.05217