Siobhan's Programming Journal #2

This week’s R challenge was a bit less challenging than last week’s, with the most difficult part being to find some historical data to clean up and evaluate. Quite a bit of the data I came across had CSVs that were outdated or unable to be found. After doing some searching and exploration, I found some census data--namely data that categorized different demographics of people by zip code. I thought that it might be interesting to see the percentage of males versus females in each zip code, and since I’m a native to Brooklyn, I decided to focus my data analysis on Brooklyn zip codes.


I started out importing the csv into R which has been giving me a bit of a hard time converting it into a dataframe to view, so I have to navigate to the csv in the File option in the R environment and open it that way to view and save it in R. After opening, I saw that Brooklyn zip codes (after some Googling) begin with “112” so I had to isolate rows that I wanted to include in my graph. I also knew I was focusing on columns with “Percent Male” and “Percent Female” and “Jurisdiction Name”. imageimage

It was a bit of a rough start getting my graph together, and eventually I got a line graph, which I morphed into a scatter plot eventually. Points were plotted, but the data representation did not feel as meaningful to me because of the way it was arranged, especially after reading through the different data representations Nathan Yau created and referenced in the text. After some reading of Visualize This and a few Google searches, I stretched the x-axis to better fit the data and spread it out to tell a more meaningful story. imageimage

Lastly, I wanted to make the y-axis percentages to be actual percentages rather than decimal points. With the addition of scale_y_continuous(labels = percent) to the plot code, I was able to alter and finalize the scatter plot to something that I wanted it to be. image

I’m starting to learn more in R now and get more accustomed to manipulating plots and graphs. It’s still not 100% what I would’ve wanted though, and I know that more could be done to perfect this graph and what it’s representing. I was not as frustrated during this programming exercise as I was during the first one. I even wish I could add some cool features like showing the zip code and percent when hovering over a point, or even being able to visualize this data on an actual map of the five boroughs, which I will surely explore further. So far, I found this exercise to be the most enjoyable so far.

    • Gerald Ardito
      Gerald Ardito


      I appreciate  the work you did and the story you were trying to tell and the story you told about your own process.

      Thanks for all of it.

    Quantitative Literature Analysis Spring 2018

    Quantitative Literature Analysis Spring 2018

    Here is the online home for our Quant Lit Analysis Class for Spring 2018.