Predicting Crimes In Chicago From Weather

Alan Fu (yuhengfu2017@u.northwestern.edu)

View the Project on GitHub fuyuheng/EECS-349-Project

Problem

In this project, my task is to predict the crime rate in Chicago in a certain day (specifically, will the number of instances of crimes be higher or lower than the year's average) given the weather parameters in that day. In a more fundamental level, I wish to investigate the relationship between weather and crime rate, if there is any. The motivation for this project emerges from my experience with Chicago’s weather since my attendance at Northwestern. I have noticed significance effect of weather on my mood and behavior, so have my close friends in Northwestern. I wish to further explore this effect in a more quantified and rigorous manner, and I decided to use crimes as the specific measurement of human behavior (specifically, how violent human turn to each other). Important applications may arise if the model can predict crimes in Chicago in reasonable accuracy from weather. The effect of weather on crime can be a valuable addition to the existing crime-predicting tools such as PREDPOL to achieve greater accuracy in predicting crimes, enabling law enforcement agencies to stop crimes better, faster, or even before they happen. The prediction made by the program can also be used in other psychological or sociological studies on weather and human behaviors to gain further insights.

Solution

Two main sets of data is utilized in this project, one is the weather data recorded at the Midway Airport, and the other is the crime data from City of Chicago Data Portal. The weather data contains daily average values of temperature, dew point temperature, humidity, wind direction, wind speed, precipitation, and atmospheric pressure; the crime data contains the total numbers of instances of crimes in Chicago in a given day, along with the crime counts for each individual type of the following types: sex offense, theft, assault and battery, burglary, rubbery, substance, homicide, gambling, prostitution, arson, and others. The model of primary focus in this project is decision tree, which is used to predict if there will be more or less crimes comparing to the yearly average in a given day given the weather parameter. Weka, an open-source data mining package, is utilized for training and testing the tree. Specifically, the J48 tree model in Weka has shown the highest accuracy among all tree models in Weka for this data set and is therefore used.

Results

The tree is generated with Weka J48 model and tested with 10-fold-cross-validation. The accuracies of using the same J48 model to predict the count of each type of crime is plotted as below: As we can see, total counts can be predicted from weather with an accuracy around 62%, while the assault and battery type counts per day can be most accurately predicted from weather parameters (an accuracy nearly 78%). Consider how many potential factors there are that might affect crime rates in a city in a given time, we can consider there to be a reasonably strong connection between weather and crime rates. The decision tree for total counts can be visualized as below: Note that 2 means that crime count will be higher than yearly average, while 1 means lower.

Analysis

As seen in the simple tree shown above, temperature shows the most dominant relationship with total crime counts in a given day. Contrary to as many would expect, crime counts do not simply increase as temperature increase, but are likely to be higher than yearly average when the temperature is either too high or too low. This insight matches the research results shown in a 2013 New York Times article, which concludes that "episodes of extreme climate make people more violent toward one another". While temperature is within a "comfort range", precipitation steps in and predicts that the higher the precipitation, the more likely crime counts will be lower than yearly average.

About

This project is conducted by Alan Fu (yuhengfu2017@u.northwestern.edu), a sophomore computer science and physics double major student from Northwestern University, for the class EECS 349 (Machine Learning). To learn more about this project such as the data cleanup process, the reason behind categorizing crime counts into binary variables, the results of the exploration of different models for this data set other than J48, more detailed analysis on the trees produced by J48, and suggestions for future work, please read the final report pdf, which is included in the zip package offered for download.