sunl – 2017 Data Storytelling Studio @ MIT

This is a summary of types of data I created and were captured in digital form on 5/1.

At 9 AM I wake up to my phone’s alarm. I briefly check my e-mail before heading downstairs to grab some breakfast. Already I am creating data: by using Google chrome on my phone several entities are tracking my behavior. Google is logging my behavior due to using Chrome, MIT because I am connected to their routers and checking their e-mail servers. We could go further and say MIT’s ISP, DNS servers, etc. are also logging data but at that point they don’t know the data is me, Lawrence Sun, browsing the internet.

At breakfast I swipe my ID at the registrar. This is logged by MIT’s techcash and dining services. While eating, I catch up on various things on my phone. Reddit, Gmail, Quora, and the New York Times are all logging data about my visit.

I leave my dorm for class. Because I am now moving with my phone, Google is tracking my location with my phone’s GPS.

After class, I get a burrito at Anna’s and I pay for it with my credit card. Both Anna’s accounting services and my bank (Bank of America) log this transaction.

I then go to my afternoon class. I open my laptop and start browsing the course website; it is hosted by CSAIL and the course notes are being hosted by NB. Both CSAIL and NB are logging my behavior.

After my classes are over, I return back to my dorm and stay off the grid for a few hours. Dinner time comes around and again I swipe my ID and my meal is logged. After dinner, I work on some work for my classes. I need to read a paper for one of my courses so I visit arXiv to retrieve the paper. After I finish reading the paper, I submit answers to some questions to an MIT PDOS website. Both arXiv and PDOS are logging my activity. After this I visit MIT Stellar to browse the upcoming homework for another one of my classes. Finally, I am left writing this blog post, leaving another data footprint at WordPress in this case.

I was reading about the recent selling of IPv4 addresses by MIT to Amazon and in some of the discussion a rather old but classic data visualization popped up:

This is, of course, the xkcd “map of the internet”. The data that is being shown is which entities own certain IP address ranges: essentially blocks of the internet. For example, in the data visualization we see that MIT owns IP prefixes that start with 18.

The audience of this is the same as the usual audience to xkcd, which is very broadly speaking nerds on the internet. The goal of the presentation I think is to show that relatively few players control the whole internet; you’d think that with there being over 4 billion possible IP addresses there would be a lot of freedom but in reality there are only around 100-200 players who own everything and license out IPs to others.

I think the visualization is effective given the target audience. To a general person, this is probably too cluttered because so much data is being shown. However, as xkcd viewers are generally “nerdier”, they will be willing to spend more time to investigate and thus that issue wouldn’t immediately discredit the visualization. The fractal mapping explained at the bottom is an efficient way to compress the previously 256 data points to a 16 x 16 square while still keeping contiguous regions together (so for example, the blocks Europe owns are all grouped together) which greatly enhance readability given the constraints they’re working with. Probably the only glaring flaw I’d say is this is outdated; this was made 11 years ago and the IP address layout has changed quite a bit, so it shouldn’t really be used as a discussion point today anymore. However, in its time I think it did a great job given the target audience and the data it wanted to present.

Author: sunl

Lawrence Sun’s Data Log

A Map of the Internet