Now I'm sick! I shudder to think what this cold will do to me. Yesterday, I was nearly killed by a tight hat.
- Prof. Farnswoth, Futurama

While putting on the finishing touches to my previous article, an idea struck me that would better make use of the ArcGIS data goldmine that I came across. I could use it to see what age groups were primarily being affected at various stages of the pandemic, and how they got affected.

The Idea

I theorize that youngsters who stayed at home during the early phase would be less affected initially, but would be the most affected in the second wave because, by that time, they got tired of staying home and not being able to socialize. The inverse is likely to be true for the older population who initially didn't take the virus too seriously, but soon learnt how dangerous it could be.

Similarly, type of exposure would primarily be due to travel in the initial phases, but would shift to close-contact during the second wave. Outbreaks are likely to be a more serious issue for senior citizens.

The Test

A quick glance tells me that the data isn't as detailed as I'd like it to be. Most cases have unreported age and exposure type. I look to see if some provinces have more details, but it turns out BC and Alberta don't report exposure type, and Québec doesn't report either age or exposure type. Ontario is my best bet, but even their reporting doesn't have as much detail as I would like. The other provinces, thankfully, don't have enough cases to be making graphs about.

Looks like the theory regarding type of exposure can't be validated with this data. Luckily, BC seems to be pretty comprehensive with the data that they do report. Let's see if something can be made of the age and gender data from BC.

     date_reported age_group  gender  n  
0       2020-01-26     40-49    Male  1
1       2020-02-02     50-59  Female  1
2       2020-02-05     20-29  Female  1
3       2020-02-05     30-39    Male  1
4       2020-02-11     30-39  Female  1
...            ...       ...     ... ..
3356    2020-12-04     70-79  Female  1
3357    2020-12-04     70-79    Male  1
3358    2020-12-04       80+  Female  2
3359    2020-12-04       80+    Male  1
3360    2020-12-04       <20    Male  2

[3361 rows x 4 columns]
A look at the data
COVID doesn't discriminate based on gender

Doesn't look like gender is a factor in how the disease spreads. What we can see, however, is a clear pattern in age of affected people through time. I used size of the markers to represent age, but that doesn't seem to be the best way to represent it. I'm regrouping the data into just 4 age groups, because the current split of 8 is too much precision and leads to clutter. For simplicities sake, lets call these groups children (<20), millennials (20-39), Gen X (40-59), and boomers (60+).

Age is not just a number

The dip in late September is misleading. It is due to a delay in reporting that was caused by transition to a new data collection system. source

There appears to be a stark difference in the  number of cases in different age groups. This is not fully conclusive, however. We don't know how this correlates to actual age distribution in the population. It could just be that majority of the population are millennials and that's why there are more cases in that age group. So I get that data from Statistics Canada. It's outdated (from the 2016 census), but that shouldn't be a problem as the age distribution can't have changed by a lot in 4 years.

Group		|	Age		|	Distribution
Children	|	<20		|	20.45%
Millenials	|	20-39	|	25.77%
Gen X		|	40-59	|	28.54%
Boomers		|	60+		|	25.24%
Roughly equal in size

So, if age wasn't a factor in the spread of COVID, the infection rate should be pretty similar in all these groups given that these age groups are similar in size .

This makes it obvious that children and the elderly are less affected than expected in the second wave, while millennials are falling sick in far greater numbers.

The Result

I had to make something animated!

So, my theory was at least semi-accurate. The first wave, it seems, affected everyone but children without discrimination. But as predicted, the elderly were relatively less affected in the second wave. The numbers had proved that they were at the highest risk of mortality when affected by the disease, and I think that this sobered them to the reality of the pandemic.

On the other hand, millennials had become complacent by the end of the first wave. After all, we'd come out safe. BC did great. The pandemic was over. Isn't that the way things have always been? The anti-vax movement came about because people forgot the horrors of smallpox; climate change deniers exist because the effects of melting glaciers aren't immediately noticeable; and Warner Bros. made Batman vs. Superman because of the success of Nolan's The Dark Knight trilogy. History repeats itself, and sadly, complacency is the natural order of things. So once the first wave died down, we went back to life as it was before. Holiday season exacerbated the issue when everyone went out and partied like there was no tomorrow. But then, tomorrow came, and with it arrived a new wave of the virus.

No. of people who just can't take it anymore

Gen X, in true Gen X fashion, are neither here nor there. The rate of infection among this age group is in line with what is expected. I wonder if the graph would have looked similar if we'd split the age groups to '<20', '20-49', and '50+' instead. Unfortunately, I can't test that because of how the original data is structured.

School going children were unaffected during the first wave, but started getting infected when schools reopened in September. Although, I'd assume community spread from outside schools played a bigger role in the spread of the disease in their age group. Thankfully, children have so far been the least affected. Let's hope that that continues to be the case.

All in all, speaking nothing of how age biologically affects the transmission of this strain of coronavirus, it is pretty clear that different social tendencies in different age groups played a significant role in the comeback of COVID-19 in British Columbia.

The data and code used in this post can be found on my GitHub.