Coreference Resolution using OpenNLP
I recently ran into the same problem and wrote up some blog notes for using OpenNLP 1.5.x tools. It's a bit dense to copy in its entirety, so here's a link with more details.
At a high level, you need to load the appropriate OpenNLP coreference model libraries and also the WordNet 3.0 dictionary. Given those dependencies, initializing the linker object is pretty straightforward:
// LinkerMode should be TEST
//Note: I tried LinkerMode.EVAL before realizing that this was the problem
Linker _linker = new DefaultLinker("lib/opennlp/coref", LinkerMode.TEST);
Using the Linker, however, is a bit less obvious. You need to:
- Break the content down into sentences and the corresponding tokens
- Create a Parse object for each sentence
Wrap each sentence Parse so as to indicate the sentence ordering:
final DefaultParse parseWrapper = new DefaultParse(parse, idx);
Iterate over each sentence parse ane use the Linker to get the Mention objects from each parse:
final Mention[] extents = _linker.getMentionFinder().getMentions(parseWrapper);
Finally, use the Linker to identify the distinct entities across all of the Mention objects:
DiscourseEntity[] entities = _linker.getEntities(arrayOfAllMentions);
There is little coreference resolution documentation for OpenNLP at the moment except for a very short mention of how to run it in the readme.
If you're not invested in using OpenNLP, then consider the Stanford CoreNLP package, which includes a Java example of how to run it, including how to perform coreference resolution using the package. It also includes a page summarizing it's performance, and the papers published on the coreference package.