The Global Database of Events, Language, and Tone (GDELT) scours international news sources to catalogue georeferenced events occuring around the world. GDELT has been collecting this information since 1979 and provides it freely available for open research.
To data, GDELT offers its free database which contains over quarter-billion georeferenced events at the city level. Information about global events is updated on a daily basis from news sources such as AfricaNews, Agence France Presse, Associated Press Online, Associated Press Worldstream, BBC Monitoring, Christian Science Monitor, Facts on File, Foreign Broadcast Information Service, United Press International, and the Washington Post.
Additionally, GDELT pulls events from the NY Times, the Associate Press, and Google News. GDELT uses TABARI (Textual Analysis By Augmented Replacement Instructions) to extract the location and information to populate 58 fields of data used to compile the GDELT database.
What is GDELT?
As defined on the GDELT site:
The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level globally, to make all of this data freely available for open research, and to provide daily updates to create the first “realtime social sciences earth observatory.”
In a nutshell, the database collects information about geopolitical events such as riots, protests, and police action along geopolitical, religion, and geographic information stored in 58 fields.
Access GDELT Data
GDELT data is made freely available and can be accessed and used without restrictions. There are several download options including a smaller file containing events from 1979 to 2013, the full database containing over 200 million records for events from January 1, 1979 to March 31, 2013.
The final download option contains the daily updates from April 1, 2013 to present.
Visualizing GDELT Data
One researcher who visualized events from the database is currently making the rounds on the Internet. While the visualization helps demonstrate the depths of data mined by GDELT, it also highlights the challenges of comparing events over a time period when other factors are influencing trends.
John Beieler, a PhD student in Political Science at Penn State, has mapped out all protest data from the full GDELT recordset (1979 – 2013). A time lapse map shows a increasing frequency of protests around the world, increasing in intensity in the 21st century. The map was created using CartoDB.
While the visualization may given the impression that the world’s frequency of protests is increasing dramatically in the last few years, the frequency also correlates with the dramatic increase in news articles. In face, the paper entitled, “GDELT: Global Data on Events, Location and Tone, 1979-2012″ presented at the International Studies Association Annual Conference in April 2013 states:
Unsurprisingly, given the very substantial changes over the past two decade in both the international news environment and the availability of news on the web, this is anything but constant, and shows a dramatic increase since the beginning of the twenty-first century.
As Foreign Policy notes, “While the scale of GDELT’s database is impressive, it’s influenced by its source: international news reporting. Kalev Leetaru, the Yahoo! fellow at Georgetown University working on the GDELT project, told FP by email that the apparent uptick in protests around the world starting in the mid-1990s may be misleading. “In some other work we are doing right now, preliminary results suggest that as a percentage of all events captured in GDELT, protests have not become more common overall,” he explained. “So, the majority of that increase in protest events over time stems from the increase in available digital media,” especially news.”
Difficulties of Mapping the News
The Wall Street Journal discusses “Why It’s Hard to Map Media Attention” by Carl Bialik (aka “The Numbers Guy”). The subject of his perspective piece is a recent entry on the french blog, L’Observatoire des Médias, that mapped out the intensity of news articles across the world.
By clicking on one of the listed news outlets, the map changes to show a cartogram with the size of each country shown in proportion to the number of news stories published in that particular paper about it.
The Wall Street Journal article delves into the difficulties of mapping out news locations stating, “These maps make use of developments in mapping that make it easier to turn numbers into attractive, informative graphics. But in this case, the underlying numbers are problematic. For one thing, it turns out to be difficult to distinguish news stories about the African nation of Chad from articles about Chad Kroeger or Chad Johnson.“
Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual Conference, April 2013. San Diego, CA. – See more at: http://gdelt.utdallas.edu/about.html#sthash.9qM8qAGx.dpuf