Continuing along the thread of labels, it's natural to think of feature vectors. Each label would represent a dimension in the vector.
Features
- starting time
- ending time
- applicable Sunday
- applicable Monday
- applicable Tuesday
- applicable Wednesday
- applicable Thursday
- applicable Friday
- applicable Saturday
- applicable on holiday
- "no parking" or "no standing" or "no stopping"
11 features so far. Is that too many?
I've also thought about how to best represent the output, or even what it should be. Went through a few iterations before settling on this for now:
- Sunday rules
- 12am - 12:30am - parking allowed
- 12:31am - 1:00am - parking allowed
- (etc)
- 11:31pm - 11:59pm - parking allowed
- Monday rules
- (etc)
- (all other days of the week)
This is a very high dimensional vector. Shouldn't be an issue though. In terms of size it can be represented by a few bytes. And we aren't running any classification on these features (I think).
---
Looked into parsing the times of day from the text. Some light Googling suggested I go with Stanford CoreNLP, some Java code they wrote. I tried to install and use a python wrapper for it, but the test failed. Might try another, might try to fix the problem with the text, might go straight for the real thing.
No comments:
Post a Comment