Siobhan's Programming Journal #4

For this week’s programming prompt, I evaluated ELA mean average scores for New York public school students. The data focused specifically on students within the five boroughs--Brooklyn, Queens, Bronx, Staten Island, and Manhattan. The initial goal of my data was to create a scatter plot of all the students’ average grades, then produce a regression line to best fit some type of trend within the data.

First, I needed to acquire some data, so I did a few searches to find databases with open and available literary data. I then thought it might be interesting to see school ELA scores, given that student literacy is a large portion of their successful educational outcomes, and literary knowledge plays an important role in that. I found some data for ELA exam scores from 2013-2017 that I was able to download. I converted that data into a csv file, then manually imported that data into R in the same way by navigating then adding it in as a dataframe. image

I chose to incorporate the use of plotly in the creation of my graph. My vision of the displaying this data was to use a scatter plot of some sort. I tried grouping by the borough and mean score, which gave me a more concise table, but I realized plotted only a single average score for each borough. It created a line graph.


I tried adjusting the parameters which I was graphing by. I decided to plot from the original dataframe rather than the simplified one. I the goal was to see the mean scores for year broken down by borough as well. So far, the scores came out isolated by year--so it was as type of scatter plot but not exactly as I had envisioned (more scattered and fitting the regression line through it). Perhaps it’s the way I’m pulling data from the table into plotly. image

I noticed that when I tried adding in axis titles at least, I kept getting an error. I used this line of code:  plot<-plot_ly(BoroughELAScores, x=BoroughELAScores$Year, y=BoroughELAScores$`Mean Scale Score`, type='scatter', mode='markers') %>% layout(xaxis = "Year", yaxis = "Average Scores"), but I kept getting this error:  Error: $ operator is invalid for atomic vectors.

It’s probably the last bit of code: %>% layout(xaxis = "Year", yaxis = "Average Scores"). Removing that removed the error as well. I even played around with the x-axis to try to tell a different story of average scores per grade level.


It appears I had left something out, so in further exploration and reflecting on the previous project, I was able to add in similar lines of code for the layout:  


scorePlot<-plot_ly(BoroughELAScores, x=BoroughELAScores$Grade, y=BoroughELAScores$`Mean Scale Score`, type='scatter', color= ~Borough) %>% layout(title="Average ELA Scores by Student Grade", xaxis=list(title="Grade"), yaxis=list(title="Average Test Score"))


and get my labels working again with color to denote the school borough per grade! (It’s the simplest things that I forget sometimes!)  I felt like scores by grade tell a more meaningful story about student performance in the ELA--from this data, I was able to see that students in grade 6 were performing the lowest on average in Greater NYC Area schools and schools in the Bronx are among the lowest average scores from the data, if I am reading it correctly.


I’m finding with the different visualizations in R, the bigger picture can truly be seen, which is what I remember hearing and reading about during the beginning of this semester. Now I’m understanding what the speakers meant, what Yau meant by the power of data and varying visualizations. I’m starting to see theory morphed into practice.

    • Siobhan Wilmot-Dunbar
      Siobhan Wilmot-Dunbar

      Thanks Dana! 

      Are you referring to this site? I used to this site quite a bit in my reference--the scatter plot section--and tried to match up my information to align with what they had in the code and reflected in the visuals. So for example, I tried identifying what part of the example was the table reference, what part(s) referred to specific columns, then tried to replicate that for my data. I'm not sure if that makes a lot of sense but that was the way I was thinking about it in my head. When I tried looking up references on other sites like Stack Overflow it confused me a bit. 

      I hope this is making sense.. feel free to let me know if it doesn't...I could try explaining another way! 

    Quantitative Literature Analysis Spring 2018

    Quantitative Literature Analysis Spring 2018

    Here is the online home for our Quant Lit Analysis Class for Spring 2018.