Tuesday, March 17, 2015

Bah

Seems like the NLP parser looks for "sentences" as entities that end in periods. Makes sense, in the natural language context. Unfortunately my data isn't formatted that way, but it should be easy to add periods at the end of everything.

Just for easier access later, using this command to parse the file:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -file input.txt

Notice the removal of the annotators. Not sure why I need to do this. Either I'm using it wrong, or there's something wrong with the tool.

No comments:

Post a Comment