The publicly available datasets on confirmed COVID-19 cases and deaths provide a key opportunity to better understand the drivers of the pandemic. Research using these datasets has been growing at a very fast pace (see an indicative list of references in supplementary material 1). However, little attention has been paid to the reliability of this type of epidemiological data to make statistical inferences.
Our initial aim was to produce a detailed statistical analysis of the relationship between weather conditions and the spread of COVID-19. This question has attracted significant attention from the media (e.g. Ravilious 2020; Clive Cookson 2020) and the research community (e.g. Araujo and Naimi 2020; Carleton et al. 2020; see a wider list in supplementary material 1) due to the possibility that summer weather might slow the spread of the virus. After going through all the steps of such an analysis, we reached the unexpected conclusion that the limitations of the available COVID-19 data are so severe that we would not be able to make any reliable statistical inference. This applies, for example, to the data provided by the John Hopkins University (Dong et al. 2020) and the data collated by Xu et al. (2020).
Cohen, F., Schwarz, M., Li, S., Lu, Y. & Jani, A. (2020). 'The Challenge of Using Epidemiological Case Count Data: The Example of Confirmed COVID-19 Cases and the Weather'. Environmental and Resource Economics: Perspectives on the Economics of the Environment in the Shadow of Coronavirus, 76, pp.447-517.