The quarantine has failed. Give me a worst case scenario, and make it grim! - Richard Nixon's head, Futurama
Disclaimer: Unlike my previous post, there's no analysis in the end of this one. This is just a documentation of my thought process.
I've been busy with other stuff, and coupled with a severe writer's block, I've been putting off this post for a long time. However, inspiration struck in the form of recent COVID related lockdown extensions here in BC. I've decided to do an animated choropleth of the spread of COVID-19 in Canada.
I don't want to use R. In my previous post, a render involving just 4 unique frames caused R to seize up and refuse to work. I had to settle for a 20fps render, and even that took about 10 minutes despite having 16 gigs of RAM to work with. Python's better at this sort of stuff, so I'm going to stick to Python this time.
I don't have to go through the trouble of learning a new language because I'm already comfortable with Python, so I jump ahead to finding a data source. I stumble upon an ArcGIS database with COVID data from various Canadian sources, that proves to be everything I'm looking for. They update the data everyday and they also supply shapefiles for the Canadian Health Regions. This is golden.
Next, I look up useful libraries I could use and come across geopandas (an extension of pandas) for geographical data. and plotly for interactive plotting. Installing them should be as easy as pip install geopandas plotly.
It seems that one of the dependencies for geopandas is fiona which is an API for GDAL, which in turn is a library for translating raster and vector geospatial data formats. This requires that I have GDAL installed, so I try doing that. OSGeo4W is a project that brings various FOSS Geospatial software to Windows (which is sadly my current development environment). The installer includes a collection of all the software under the OSGeo4W umbrella. However, we do have the option to install GDAL and its Python bindings alone, since the entire software suite is several GBs in size. With the installation successfully done, we can safely proceed.
Got the same error again. I take a deeper look into the error code, and a quick Google search gives me new information. gdal-config, a script that fiona requires, is not available on the OSGeo4W build of GDAL and is only present in the Linux builds. Why do I punish myself by continuing to stick to Windows?
Apparently, Anaconda has a painless geopandas installation. I don't like Anaconda because it adds another Python environment for me to manage, but it looks like I don't have a choice. conda install geopandas does the job. Installation takes a few minutes, but was painless as promised.
Let's get right to the code.
It is at this point I realize why there's not a lot of Canadian choropleths showing data that has anything to do with the populace. More than 50% of the population live in less than 0.2% of the total land area. We can see the Health Regions that encompass Metro Vancouver, Calgary, Edmonton, GTA, Montréal, Ottawa, and Québec City highlighted on the map, and that is where everyone lives. This map is going to look exactly like this at any point of time. Definitely not worth the trouble of animating it. After all the trouble I went through to install geopandas too.
Now, I'm not going to let this goldmine of a data source go to waste. I'm going to at the very least make a nice graph showing the rise in COVID cases in BC through the year.
That's curious. Covid numbers shouldn't have a regular pattern. While looking at the data, I noticed that there were a few days where daily reported cases were all zero.
Turns out, BCCDC doesn't report numbers on weekends and gives out accumulated numbers on Mondays or sometimes, Tuesdays. This leads to a misleading graph, as BC has had nowhere near 2000 cases a day yet. We can correct for this pretty easily, and our healthcare workers need a break too, so it's all good.
There looks to be some discrepancy in the totals, eg. Total Recovered went from 17206 to 16834. The code to find the cumulative sum can't have a bug since it is literally a one-liner. So either my computer is bad at math or the BCCDC's numbers don't add up. Now, there might very well be a perfectly sensible reason for the numbers being this way, but right now I have to choose between data that's honest to the source, and data that doesn't have squiggles in it. I make the sensible decision and go with the one that looks better (Pro tip: this is generally considered a bad move).
The anomalous dip in active cases in mid September is due to a delay in reporting data as is explained in this news article. Other than that, the graph follows a pretty standard exponential growth. I really really hope that BC's healthcare system holds out through this pandemic and we continue to see the majority of affected people making a full recovery.