I tried running the parser again on the entire dataset. I removed the "dccoref" (spelling?) annotator, because it sounded like the "co reference" stuff might be related to tying things between sentences, which for my case is obviously unnecessary. I let it run, and came back to it to see that it ran out of memory, garbage collection overhead limit exceeded, etc. It says it ran for 2 hours 26 minutes, I'll assume that's true.
I could try to run the parser code myself, doing each sentence one at a time, and printing out a progress report.
In other news, I tackled the simple task of adding "am" or "pm" when they weren't present.
No comments:
Post a Comment