Monday, April 27, 2015

Trying to get more data

I tried running the parser again on the entire dataset. I removed the "dccoref" (spelling?) annotator, because it sounded like the "co reference" stuff might be related to tying things between sentences, which for my case is obviously unnecessary. I let it run, and came back to it to see that it ran out of memory, garbage collection overhead limit exceeded, etc. It says it ran for 2 hours 26 minutes, I'll assume that's true.

I could try to run the parser code myself, doing each sentence one at a time, and printing out a progress report.

In other news, I tackled the simple task of adding "am" or "pm" when they weren't present.


No comments:

Post a Comment