Friday, April 3, 2015

Furthering the pipeline

I've figured out the bare minimum of the parser's XML output, to get the descriptions with two time slots (presumably a starting and ending time).

I wanted to take this information and add it back into the JSON, perhaps adding two new fields like "starting time" and "ending time". Obviously, this is only applicable to a portion of the whole dataset, but it could be enough for something interesting. I also wanted to add a "no parking" field, perhaps a binary/boolean value. Debating whether to look for a word like "except".

Previously I had trouble parsing the entire file. After some fiddling I found more success parsing by adding periods at the end of each sentence. I thought this might be enough to get the parser to not crap out. Turns out, I was half right. Nothing crashed, but nothing finished either. I don't know how long I let it run, its possibly my computer went to sleep at some point, but after hours of real time it still was not complete. I had to stop and restart my computer because my internet stopped working. This computer really is awful sometimes.

Perhaps I'll try again with 500 entries.


No comments:

Post a Comment