Coreference Resolution using OpenNLP

I recently ran into the same problem and wrote up some blog notes for using OpenNLP 1.5.x tools. It's a bit dense to copy in its entirety, so here's a link with more details.

At a high level, you need to load the appropriate OpenNLP coreference model libraries and also the WordNet 3.0 dictionary. Given those dependencies, initializing the linker object is pretty straightforward:

// LinkerMode should be TEST
//Note: I tried LinkerMode.EVAL before realizing that this was the problem
Linker _linker = new DefaultLinker("lib/opennlp/coref", LinkerMode.TEST);

Using the Linker, however, is a bit less obvious. You need to:

Break the content down into sentences and the corresponding tokens
Create a Parse object for each sentence
Wrap each sentence Parse so as to indicate the sentence ordering:
```
final DefaultParse parseWrapper = new DefaultParse(parse, idx);
```
Iterate over each sentence parse ane use the Linker to get the Mention objects from each parse:
```
final Mention[] extents =
   _linker.getMentionFinder().getMentions(parseWrapper);
```
Finally, use the Linker to identify the distinct entities across all of the Mention objects:
```
DiscourseEntity[] entities = _linker.getEntities(arrayOfAllMentions);
```

There is little coreference resolution documentation for OpenNLP at the moment except for a very short mention of how to run it in the readme.

If you're not invested in using OpenNLP, then consider the Stanford CoreNLP package, which includes a Java example of how to run it, including how to perform coreference resolution using the package. It also includes a page summarizing it's performance, and the papers published on the coreference package.

Coreference Resolution using OpenNLP

Tags:

Nlp

Opennlp

Related

Recent Posts