Using Data Science in a Pandemic

Examples from COVID-19

As the coronavirus pandemic continues to unfold each day, and while most of us are confined to our homes, it is almost impossible to avoid the latest news and daily briefings. And in this endless stream of updates and information, there seems to be one constant — and that is the use of data. It seems as though every article or news report is sharing or referencing data visualizations, data-driven conclusions and modeled predictions, and even just data.

Datasets are being compiled and updated daily and pushed to the masses. The use of data throughout this pandemic has undeniably provided a face and something tangible, to an otherwise invisible enemy. As a new student to data science, I wanted to dig into some examples and use cases that really show how powerful data can be in real life settings.

Predicting the course of COVID-19 with complex models

While I don’t have the exact statistic, I think it is quite easy to assume that at least half of the data visualizations in circulation are illustrating model-driven predictions. Models guided by data are attempting to predict the course of the virus (and eventually when we can go back to ‘normal’).

These models take into account any available data that can be found, as well as a fair amount of assumptions inferred by past data and also, of course, a modest amount of uncertainties. While there are no perfect models, federal agencies like the CDC and the NIH, and researchers worldwide have increasingly relied on prediction models to forecast future scenarios and help guide policymakers.

“Models are imperfect, but they’re better than flying blind — if you use them right.”

Modeling is an incredibly powerful tool and in the case of COVID-19, can help be the difference between worldwide death tolls in the thousands versus the millions. These models are helping us to predict infection rates and hospitals to plan for surge capacity — as well as how effective social distancing measures can be.

In one example, the below graph (which I’m sure most people have seen more times than they wish) illustrates the concept of how infection and hospitalization rates can be influenced by social distancing measures.

This ‘flatten the curve’ graph is possible due to the power of predictive modeling. The red curve represents the prediction of the frequency of infection rates without measures and the blue curve predicts the frequency of infection rates with social distancing measures.

Searching for a treatment with Artificial Intelligence

Along with modeling, Artificial Intelligence (AI) has also been deployed in numerous ways throughout the COVID-19 outbreak. One such project has been the use of AI to identify potential treatments. More specifically, AI is computing whether current drugs in circulation could be repurposed to treat the coronavirus.

AI is able to digest large volumes of scientific literature and medical research to find connections between the genetic and biological properties of diseases and the composition and action of drugs. AI is able to screen through simulation tests far faster than any human ever could.

For example, Google’s AI company DeepMind has predicted the structure of the proteins of the coronavirus which could prove useful in developing new drugs.

Data for data’s sake — providing insight into the pandemic

Finally, while data sources make it possible for the prediction models and machine learning capabilities as discussed above — it is also equally significant to acknowledge the importance of just having data throughout this time. Whether in a pandemic or otherwise, but especially in a pandemic, it’s always important to have the facts and to stay informed.

As a founding philosopher (typically attributed to Sir Francis Bacon) once said, “Knowledge is Power”. And one undeniable in today’s world is how important the availability and accessibility of data truly is. Many data sets are being compiled, collected and shared publicly for research and analysis — and for anyone to get involved or to simply stay informed.

For example, Johns Hopkins CSSE has assembled the following dashboard, Coronavirus COVID-19 Global Cases to track the virus worldwide. This dashboard aggregates key metrics from a confirmed cases dataset.

This dashboard is built upon a dataset, publicly available on Github, which is linked to on the dashboard. While data is being used in the COVID-19 pandemic to do groundbreaking work — one impact that should not be glanced over is just the fact that data is available as a source of information and insight into the pandemic. The Johns Hopkins dashboard has been receiving over one billion interactions a day.

Data can be used to present the facts and current state of affairs, as is. For example, graphs such as the following, offer a visual into the actual numbers and information as it unfolds.

While there are many more examples in addition to the above, the use of data science in a global pandemic such as with COVID-19 is truly critical to helping societies effectively and efficiently deal with the outbreak.


Data Science | Machine Learning