Covid Waves

A data visualisation project by Jan Kühn, September 2022

What is this about?

After more than two years of living with the pandemic of the new Corona virus Sars-CoV2 one thing we know for sure is that the virus spreads in waves. But how do they move across a continent – in this case Europe – that has many countries and even more policies to deal with the threat of this new pathogen? I used data to try to answer that question. The result is an animation of detected COVID-19 cases throughout the time period from February 2020 to June 2022. One possible result can be seen here or on YouTube:

anim-10M-moving14d_pop-830px-fps28.mp4

About the data

For COVID-19 cases, I used data of the (great!) COVID19-European-Regional-Tracker. It's a project by Asjad Naqvi that collected data from many sources and adapted them to Eurostat's geographic NUTS regions (the so-called "Nomenclature of territorial units for statistics" (NUTS)) in its 2016 version (why not 2021, you ask? Read more here).

About the script

The script creates an animation of the data using Plotly Choropleth Mapbox and GeoJson files provided by Eurostat (see includes/plot.py). It allows to define different characteristics of the animation like time frame, metrics, speed (frames per second), map details etc. in the file settings.py.

The import, cleaning, and transformation of the data is done in includes/prepare.py. This includes removing some extreme outliers and values below zero (both due to data corrections). It then adds missing dates for each NUTS region and interpolates missing values between known data points. In a last step before the export, different metrics are calculated both for daily data and for weekly aggregated data.

In settings.py, the cleaning process can be set to be repeated (setting: update_data: True). In that case, the original data in data/european-regional-tracker.csv is imported and cleaned as described above. If in that case refresh_source is set to True, the data is fetched from the COVID19-European-Regional-Tracker repository first.

Metrics

There are multiple metrics available to be used for the visualisation (to be set in settings.py). Default is the 14-day moving average of daily detected cases per million by NUTS region (moving14d_pop).

Daily data

cases_pop: Daily detected cases per million by NUTS region
moving7d_pop: 7-day moving average of daily detected cases per million by NUTS region
moving14d_pop: 14-day moving average of daily detected cases per million by NUTS region
moving28d_pop: 4-week moving average of daily detected cases per million by NUTS region
cumulated_pop: Cumulated detected cases per million by NUTS region

Aggregated weekly data

cases_pop_weekly: Weekly detected cases per million by NUTS region
moving4w_pop: 4-week moving average of detected weekly cases per million by NUTS region
moving8w_pop: 8-week moving average of detected weekly cases per million by NUTS region

Use of colors

Defining colors and break points for this dataset is rather challenging, because the magnitude of detected cases varies a lot both over time and geographically. For that reason, analyzing the data I chose to use red as the 'medium' color and dark purple to black as the maximum. The break points are quantiles at 20%, 40%, 60%, 80%, 90%, 95%, and 99%.

That way, during the first waves of the pandemic in 2020, maximum numbers are reaching the red area while later on (especially with much more contagious variants like Omikron) maximums are dark purple to black. This allows for differentiation both in the first and in later waves.

I checked the color scheme with the Coblis Color blindness simulator to see if it works well for those with one or another type of color blindness. In all but one cases, the results looked good to me. I still have to come up with a solution for Monochromacy/Achromatopsia though. Hints are very welcome.

File formats

The script allows to select between png and webp for the exported images and between gif and webp for the animation. The mp4 files are created using ffmpeg aside from the script:

ffmpeg -framerate 28 -pattern_type glob -i "*.png" -c:v libx264 -crf 6 -pix_fmt yuv420p output.mp4

-framerate sets the number of frames per second, i.e. the speed
-pattern_type glob defines to use glob to search for the file to be added
-i "*.png" means that all PNG files in the current folder will be included
-c:v sets the video codec to be used, in this case libx264
-crf sets the compression (0 = lossless/largest file, 51 = worst quality/small file)
-pix_fmt yuv420p sets the pixel format (for compatibility reasons)

Caveats

COVID-19 data has many caveats and actually it's difficult to compare countries, especially regarding detected cases. Different testing policies leading to different numbers of tests and different criteria of who to test distort the numbers not only between countries, but also between different points in time, as policies change. Also, higher numbers of detected cases don't necessarily translate to the same increase in hospital admissions or deaths from COVID-19.

But even taking into account those caveats, the data shows some interesting trends. At some moments during the pandemic, COVID-19 waves seem to spread through the continent (almost) regardless of borders. This happened, for example after June 2021 when detected cases started to rise steeply on the Iberian Peninsula and then "moved" to France in Summer and the rest of Europe in December/January, just to "send back" an even bigger wave in January/February 2022:

Missing data

Although the COVID19 European Regional Tracker is quite comprehensive, there is still missing data. For example, data for Portugal is just provided from November 2020 to March 2022 and on a weekly basis. The data for Latvia ends in July 2021, for Poland in January 2022, for Spain in April 2022, and for Hungary in May 2022. Other countries like Switzerland partly lack data.

Missing data points in between existing ones are interpolated so that there is a smooth transition in the animation. Missing data at the beginning and the end appear as gray areas on the map.

Cumulated data is less exact due to missing data. For regions where data is available at a later date, cumulated counts are lower than they really are because those dates are missing. For missing data at the end, the same is true. Just that for aesthetic reasons missing data is filled with the last available value.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Thanks ...

... to Asjad Naqvi and the COVID19-European-Regional-Tracker for preparing the data.

... to Jan T. Sott (idleberg) for the markdown of the Creative Commons license text.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
data		data
examples		examples
includes		includes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covid Waves

What is this about?

About the data

About the script

Metrics

Daily data

Aggregated weekly data

Use of colors

File formats

Caveats

Missing data

License

Thanks ...

About

Releases

Packages

Languages

License

yotkadata/covid-waves

Folders and files

Latest commit

History

Repository files navigation

Covid Waves

What is this about?

About the data

About the script

Metrics

Daily data

Aggregated weekly data

Use of colors

File formats

Caveats

Missing data

License

Thanks ...

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages