Global Terrorism: A Machine Learning Approach to Predict the Perpetrator Group Using Incident Information – A Capstone Project

This is a project presented by Farzan Sajahan, Sendil A B & Akshaya Srivatsan, PGP DSBA students, in the AICTE Sponsored Online International Conference on Data Science, Machine learning and its applications (ICDML-2020). A follow-up paper was published in the conference journal.

Global terrorism has been on the rise – both in terms of the number of incidents reported as well as total damage, resulting in social damage and political instability. During the year 1970 -2018, it was revealed that half of the world’s terror incidents were unattributed, i.e., it was not possible to attribute these incidents to a single terrorist group with reasonable certainty. Apart from the direct problem of attribution of incidents, there is also an indirect problem of ‘opportunistic’ groups claiming attacks that they did not execute, thereby falsely exaggerating their destructive power as a terrorist outfit. In this study, our learners aimed to build a machine learning model to predict the terrorist group responsible for a given terrorist attack by analysing the information from the attack, including its location, mode and socioeconomic factors of the location where the incident occurred.

In this study, terror incident data from 1990-2018 across 180 countries were collected. The incident level information (such as date, location, weapons used, number of people killed, etc.) was sourced from the Global Terrorism Database (GTD) provided by the National Consortium for the Study of Terrorism and Responses to Terrorism at the University of Maryland, USA. The socioeconomic indicators (such as GDP, Inflation, Unemployment, GINI, Literacy rate, FDI, Refugees, Human Rights score, Freedom index) were sourced from the World Bank and United Nations Development Program (UNDP). The study sample consisted of data on 150,000 terror incidents.

The study suggested a strong relationship of the perpetrator group with mode and type of attack, location and several socioeconomic indicators. The study indicated a highly skewed distribution where 95% of the terrorist activities were carried out by 450 different terror groups, and the remaining terrorist incidents were carried by ~1750 groups. The high-level results of the analysis suggested that the fingerprint of a terror group can be identified using the geographical location, target choices, attack strategies, the extent of damage, number of terrorists deployed, and whether or not the attack was claimed by some group. Several ML models such as Naïve Bayes, Support Vector Machines, and Random Forest were built, and the best model was able to predict the responsible terror group for a terror attack with an accuracy of 85%. It demonstrated the features responsible for misclassification. 

This project was featured in the news column for its outstanding concept. Here is the link to access the media coverage.

Source :