Rooted firmly in the world of data and analytics, my mind often wanders to the bits and bytes of things. One such thought caught hold of my mind recently.
How much disk space is being used up by mentions of COVID-19 and its variant references just in social media?
Vox.com ran an article on March 12, 2020, “How coronavirus took over social media” which you can read here. Some very smart folks at Awario.com then created a real-time mention tracker for coronavirus which you can ogle here. (No affiliate links; I don’t get paid if you click those links.)
Turns out, between February 11 and August 12, coronavirus and variant phrases have been mentioned just about 213.9 million times on social media. What’s even crazier is that Awario.com estimates the total reach of those mentions at 9.547 trillion.
If you’re still reading, here’s something pretty nerdy to consider. On disk, the characters C O V I D 1 9 without any spaces between occupy just 7 bytes. Coronavirus occupies 11 bytes. If you add novel before it, that’s 17 bytes. Additionally, SARS-CoV-2 is 10 bytes.
For fun, let’s assume an average of 10 bytes per mention. Therefore, mentions alone occupy around 2.14 gigabytes (GB) on disk. Now, let’s scale those 10 bytes to their total reach. That pushes us into the interesting territory of about 96 terabytes (TB) on disk.
For reference, the Library of Congress estimates that the digital size of the printed library contains about 15TB of data.
So, since February 11 — just 183 days ago — the social media reach of COVID-19 has generated about just over 6 times the amount of data in the Library of Congress!
One additional reference: approximately 2.5 million terabytes of data are generated daily (IBM’s estimate). So, while COVID-19 has generated 6 libraries of congress in the last 183 days, it pales compared with the 166,667 libraries created daily by global data creation. That’s a lot of cat videos!
I think I’ll start abbreviating it C19.
Congrats if you made it this far. Thanks for reading!