Executive Summary

COVID-19 impacted the daily life of citizens as well as the operations of companies, investors and governments all over the world. Thus, for these institutions, it is important to understand the topics related to COVID-19 and the evolution of this topics through time to understand how the trust of the public is evolving, to identify relevant interventions the government needs to undertake to identify new opportunities or to know when the perfect timing is for restarting business operations.

The impact of the pandemic is recorded in many countries by their local news and articles which is the common source for citizens to get information. With this in mind, we developed a methodology in order to help institutions understand the topics and their evolution in 2020 based on global news.

For our project, we used the Global Knowledge Dataset from GDELT which contains 182 GB of data related to news and articles. After this, we extracted relevant news related to COVID-19. We implemented a topic modeling algorithm called Latent Dirichlet Allocation (LDA) to extract the most relevant topics and we traced the evolution of the topics by Trace Topic Evolution using Jaccard Similarity.

Our project identified different categories of topics from those that persisted to some that were just momentary. We analyzed them and found their trend, and the most common tags and words used in the news. Some of the topics found were related to fatalities, healthcare, readiness, and unemployment among others. We also mapped the sentiment or tone from GDELT to each topic to better understand how each topic is being perceived by the people.