Coreference Resolution using OpenNLP

I recently ran into the same problem and wrote up some blog notes for using OpenNLP 1.5.x tools. It's a bit dense to copy in its entirety, so here's a link with more details.


At a high level, you need to load the appropriate OpenNLP coreference model libraries and also the WordNet 3.0 dictionary. Given those dependencies, initializing the linker object is pretty straightforward:

// LinkerMode should be TEST
//Note: I tried LinkerMode.EVAL before realizing that this was the problem
Linker _linker = new DefaultLinker("lib/opennlp/coref", LinkerMode.TEST);

Using the Linker, however, is a bit less obvious. You need to:

  1. Break the content down into sentences and the corresponding tokens
  2. Create a Parse object for each sentence
  3. Wrap each sentence Parse so as to indicate the sentence ordering:

    final DefaultParse parseWrapper = new DefaultParse(parse, idx);
  4. Iterate over each sentence parse ane use the Linker to get the Mention objects from each parse:

    final Mention[] extents =
       _linker.getMentionFinder().getMentions(parseWrapper);
  5. Finally, use the Linker to identify the distinct entities across all of the Mention objects:

    DiscourseEntity[] entities = _linker.getEntities(arrayOfAllMentions);

There is little coreference resolution documentation for OpenNLP at the moment except for a very short mention of how to run it in the readme.

If you're not invested in using OpenNLP, then consider the Stanford CoreNLP package, which includes a Java example of how to run it, including how to perform coreference resolution using the package. It also includes a page summarizing it's performance, and the papers published on the coreference package.

Tags:

Nlp

Opennlp