Crime poses a particularly interesting data challenge — it has both geospatial and temporal dimensions, and may be affected by many different types of features — weather, city infrastructure, population demographics, public events, and government policy. Here, I show that a combination of machine learning, time series modeling, and geostatistics is more effective at predicting future crime than any of these techniques alone. Using a variety of public data sets, including police reports, the US census, Foursquare, newspapers, and the weather, I discuss how to merge, visualize, model, and deploy this type of multi-dimensional data, using PostGIS, spatial mapping, time-series analyses, dimensionality reduction, machine learning, and a public REST API. With this model, I ask where crime will occur next, what predicts it, and what we can do to prevent it in the future.
Jorie Koster-Hale a broadly-trained data scientist at Dataiku with expertise in neuroscience, healthcare data, and machine learning. Prior to joining Dataiku, she completed her Ph.D. in Cognitive Neuroscience at Massachusetts Institute of Technology and worked as a Postdoctoral Fellow at Harvard. Jorie currently resides in Paris, where she builds predictive models and eats pain au chocolat.