StaySafeNYC App

Motivation

While there are many applications available that provide statistics on crime in New York City, it is not easy to find any that provide those statistics at the city block or several city blocks level. The recent publishing of detailed incident-level data by the NYPD on violent crime in all the five boroughs of New York City, for the 2015 calendar year, have provided a unique opportunity to analyze crime rates for different parts of the city. The very high spatial and temporal granularity of the data means that rather than aggregating crime rates to the neighborhood or borough level, as is often the case, one can look at trends to within 2-3 city blocks and maybe in the process be able to identify potential crime hotspots. In addition, by mapping each crime incident to a census tract which is the smallest census unit for which detailed demographic data is available, one is able to extract demographic information for each crime location. The possibility of using this data to extract meaningful insights was therefore my primary motivation. I also saw this as an opportunity to build an interactive tool that the user could use to get a better understanding of the crime rates in his or her neighborhood. Ultimately, I hope to incorporate a prediction engine that will provide the user with a prediction on the possibility of falling victim to a violent crime at a given location, on a specific date and time.

How it Works:

A user enters a request on the webform, in the form of a street address. Using the google API, the street address is geocoded into latitude and longitude coordinates which are then converted into a census block number using the Federal Communication Commission (FCC) API. From this census block number, an 11-digit number corresponding to the census tract GEOID is extracted and used in the query to a MySQL database to obtain crime and demographic information for that census tract. The results from the database are used to compute the average crime rates for the census tract as well as for the borough of the street address entered, and the whole of New York city. The crime rates are computed per 1000 people, and for each class of violent crime, i.e., burglary, robbery, murder, grand lacerny, rape, felony assault, and grand lacerny of motor vehicle. The results are then graphed in an interactive chart using the D3.js JavaScript visualization library. In addition, a map is generated showing the location of each crime incident, the date and time of occurence, and the nature of the incident, for all crimes that occured within the census tract.

The chart below provides a summary of the data flow in the application.

Application Flow Chart

Below is a summary of the development to production workflow and the resources utilized.

Deployment Flow Chart

Technologies Applied:

Front end:

JavaScript; jQuery, a JavaScript library; D3.js, a JavaScript visualization library; ColorBrewer (to generate color schemes); Google Maps; Bootstrap for templating; CSS; HTML

Back end:

MySQL database; Flask, a Python based web framework built on Werkzeug and Jinja2; Gunicorn, a WSGI application server that takes requests from Nginx for dynamic content and passes these requests to Flask; Python libraries – NumPy, Pandas, SciPy; Supervisor for process automation

Others:

Google API for geocoding; FCC API for location GEOID; Amazon Web Services (AWS) for production deployment; Git for version control; Jupyter Notebook for testing features and APIs

Predictive Engine

I am currently working on a predictive engine component to the app. To extract features for a logistic regression model, I am combining data from two sources: (1) The incident level crime data used in this app to provide location and time features, and the mapping of each crime incident to a census tract to provide demographic features. (2) Harvesting geotagged tweets using the twitter public API for a bounding box of coordinates defined for the New York City latitude-longtitude coordinates will allow me to add features based on sentiment analysis on the tweets. The idea is to score each geotagged tweet for negativity based on a dictionary of words associated with crime or crime incident. Geotagged tweets also encode information about the dynamics of a location, i.e., the number of people moving in and out of a place at a given time. I intend to leverage these data too to provide additional features to the model.

Visit the App StaySafeNYC

Scripts and notebooks related to this analysis can be found in this GitHub repository.

-->