Monday, January 12, 2015

Feature vectors

Continuing along the thread of labels, it's natural to think of feature vectors. Each label would represent a dimension in the vector. 

Features
  • starting time
  • ending time
  • applicable Sunday
  • applicable Monday
  • applicable Tuesday
  • applicable Wednesday
  • applicable Thursday
  • applicable Friday
  • applicable Saturday
  • applicable on holiday
  • "no parking" or "no standing" or "no stopping"
11 features so far. Is that too many?

I've also thought about how to best represent the output, or even what it should be. Went through a few iterations before settling on this for now:
  • Sunday rules
    • 12am - 12:30am - parking allowed
    • 12:31am - 1:00am - parking allowed
    • (etc)
    • 11:31pm - 11:59pm - parking allowed
  • Monday rules
    • (etc)
  • (all other days of the week)
This is a very high dimensional vector. Shouldn't be an issue though. In terms of size it can be represented by a few bytes. And we aren't running any classification on these features (I think). 

---
Looked into parsing the times of day from the text. Some light Googling suggested I go with Stanford CoreNLP, some Java code they wrote. I tried to install and use a python wrapper for it, but the test failed. Might try another, might try to fix the problem with the text, might go straight for the real thing.

No comments:

Post a Comment