I recently came across a detailed analysis by Todd W. Schneider of 1.1 billion New York city cab rides that occurred from 2009– 2015. In addition to this massive dataset provided by the New York City Taxi & Limousine Commission, Schneider also incorporated a public dataset of 19 million Uber rides from April– September 2014 and January– June 2015.
This analysis appears to be intended for New Yorkers, as a lot of the analysis and visualizations assume the audience has basic geographic familiarity with NYC and an understanding of New York lingo and culture. While this presentation would definitely still be interesting for those interested in transportation and unfamiliar with NYC, Schneider focuses less on general transportation analysis (eg. average fare or trip time) and more on New York specific analysis (eg. which neighborhoods are up late and taxi trips taken from Goldman Sachs).
Schneider appears to have multiple goals in this presentation. One is to comprehensively explore a wide variety of questions in this data set. He includes numerous graphs and figures, each addressing a different aspect of the dataset, but it’s almost overwhelming how many figures are presented. Though data junkies would enjoy the comprehensive nature of this presentation, I think most readers will get overwhelmed by sheer number of graphs. In addition, in my opinion, the large number of figures buries some of the most interesting aspects of the data, reducing the efficacy of his analysis. For instance, about halfway down his (long) post, Schneider has a simple bar graph showing that rainstorms don’t appear to affect daily ridership. This was the most surprising conclusion for me personally, as the common thought is that taxis are impossible to get during rainstorms, so the fact that it’s hidden halfway down his post is disappointing.
Intentionally or not, he also appears to be advocating for usage of public transit over taxis in parts of his analysis. In the section dedicated to airports, he concludes that “depending on the time of day and how close you are to a subway stop, your expected travel time might be better on public transit than in a cab, and you could save a bunch of money.” As New Yorkers can then customize the visualizations to show expected travel time to the airports from their own neighborhood, I think this part of the presentation is very effective. It’s much more powerful and relatable to show viewers time averages of taxi trips from their own neighbourhood rather than averages across the whole city, making this section of analysis one of the most powerful in the whole presentation.