Journal #6: Analysis & Rendering of data

In considering the main research questions developed for this project, this week’s programming exploration and focus looked to answering the four major questions:  

1.         Which borough has had the most accidents?
2.         Which borough has had the least accidents?
3.         What are the most likely contributing factors for accidents in the greater NYC area?
4.         Are there commonalities among larger accidents?

NYPD data was planned to be evaluated for the time period of July 2012 to March 2018. At first, an analysis was done to isolate the count of the number of motor vehicle collisions per borough, showing Brooklyn to be in the lead for most motor vehicle collisions being a whopping 270,128! (But this is data for the past 8 years).   image

 

While this table represents a staggering amount of vehicle collisions, the thought is that:  a) this represents a vast amount of vehicle collisions that may be too large of a dataset, so a sampling of the data may be necessary; and b) the data would be more meaningful displayed on a map. That being said, motor vehicle collisions for last year only, 2017, ended up being evaluated to also get a more relevant picture of the recent collisions within the last year. image

This new evaluation of data showed the Brooklyn to still have the most motor vehicle collisions at 44,636 accidents in 2017 alone, while Staten Island has the least with only 6,213 accidents. Using the ggmap library, a map of the 5 boroughs was pulled into R to plot the vehicle collisions against.


image

Clearly, there are so many points on the map that the data all looks joined together, though distinguished by color to be representative of each borough, and it’s difficult to determine from afar off the actual latitudinal and longitudinal locations of collisions on the map itself--this by itself demonstrates the staggering amounts of motor vehicle collisions in one year alone!image

While this showed interesting data, it was still too granular to look at from afar off, so I decided to zoom in on each borough. After some research, I was able to find a line of code to pull the map of an area I specified, and zoom in on that area if I so chose. In combination with code I previously had, I used:  

 

mapDataSI<-get_map("Staten Island", zoom = 12)%>%ggmap() + geom_point(data=Collisions2017, aes(x=LONGITUDE, y=LATITUDE,color=BOROUGH), size=.1, alpha=I(0.7)) + ggtitle("Vehicle Collisions 2017")

 

To produce the following results showing collisions in Staten Island in a more clear way:  


image

This will be repeated for each of the 5 boroughs, showing clusters of where the majority of motor vehicle collisions occur, for example in the Manhattan area alone:  


imageimage

Given that it’s on a map, it will be easier to determine the sections where these are occurring to infer possible causes of increased motor vehicle collisions. In this sample, the majority of motor vehicle collisions in 2017 took place around the midtown and Lower East Side areas. This part of NYC is often congested with both pedestrians and cars--especially commercial vehicles. In the evaluation, I was also able to isolate and identify the reasons for collisions in the past year:

 image

 

This data showed that commonalities in vehicle collisions throughout 2017 were distracted driving, following other vehicles too closely, or failing to yield to other drivers--or essentially obey the rules of the road. During this evaluation, it’s interesting to think about my own personal driving habits, and ensure that I’m doing my part to be a safe driver and not add to the number of collisions. It also shed light on driving habits of others that I’ve witnessed in my own driving experiences, which could have lead to potential accidents, and sometimes do. The data also accounts for unspecified reasons which could likely be the fault of no driver in particular, or some additional reason the NYPD officer did not classify.

Data also showed that larger vehicle collisions (classified by the number of people injured) had the commonality of an accelerator being defective as a major commonality, also noting that a fair amount of these took place with emergency vehicles, namely ambulances. image

It’s so interesting to evaluate this data and see how much of a story about driving and collisions in the greater NYC area can be told. This also makes me much more aware as a driver myself.

    • Dana Paniagua
      Dana Paniagua

      Hi Siobhan,

      I enjoyed reading through your data and understanding how you were able to create and code map visuals. The way you explained your process was great. I also like how you connected the data to your own driving experiences.

    Quantitative Literature Analysis Spring 2018

    Quantitative Literature Analysis Spring 2018

    Here is the online home for our Quant Lit Analysis Class for Spring 2018.

    Latest comments