Monday, January 5, 2015

Getting my hands on the NYC data

Started looking at the various data files that NYC DOT provides.

There are two CSV files, one called "location" and the other "signs". The locations file has street information - which street it's on, and which streets you came from and will go to. The signs file has the actual text on the sign.

There's also a "shapefile", which is a GIS format file. After some fiddling I figured out how to open the file in ArcGIS. The software is surprisingly unintuitive. After getting it open, I found it difficult to navigate because of slow performance. There might be too much data for my laptop to handle. This file seems to contain the parking sign text as well as the geolocation. 

I'd like to get this data into a CSV file. I wonder if I can do that?

Luckily I can probably get started. I said my first task was to categorize signs. For example:

Sign: "no parking tuesday 10am-12pm"
Current time: Tuesday 10:30am
Result -> no

I feel like there needs to be a preprocessing step. I need to get the applicable day(s), the applicable time(s), and the rule itself. This could be an interesting (hopefully trivial) NLP problem.

No comments:

Post a Comment