How to replace a word by its most representative mention using Stanford CoreNLP Coreferences module
The challenge is you need to make sure that the token isn't part of its representative mention. For example, the token "Judy" has "Judy 's" as its representative mention, so if you replace it in the phrase "Judy 's", you'll end up with the double "'s".
You can check if the token is part of its representative mention by comparing their indices. You should only replace the token if its index is either smaller than the startIndex
of the representative mention, or larger than the endIndex
of the representative mention. Otherwise you just keep the token.
The relevant part of your code will now look like this:
if (token.index() < reprMent.startIndex || token.index() > reprMent.endIndex) {
for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
CoreLabel matchedLabel = corefSentenceTokens.get(i - 1);
resolved.add(matchedLabel.word());
newwords += matchedLabel.word() + " ";
}
}
else {
resolved.add(token.word());
}
In addition, and to speed up the process, you can also replace your first if-condition by:
if (chain==null || chain.getMentionsInTextualOrder().size() == 1)
After all, if the length of the co-reference chain is just 1, there is no use looking for a representative mention.
private void doTest(String text){
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation doc = new Annotation(text);
pipeline.annotate(doc);
Map<Integer, CorefChain> corefs = doc.get(CorefChainAnnotation.class);
List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
List<String> resolved = new ArrayList<String>();
for (CoreMap sentence : sentences) {
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens) {
Integer corefClustId= token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
System.out.println(token.word() + " --> corefClusterID = " + corefClustId);
CorefChain chain = corefs.get(corefClustId);
System.out.println("matched chain = " + chain);
if(chain==null){
resolved.add(token.word());
System.out.println("Adding the same word "+token.word());
}else{
int sentINdx = chain.getRepresentativeMention().sentNum -1;
System.out.println("sentINdx :"+sentINdx);
CoreMap corefSentence = sentences.get(sentINdx);
List<CoreLabel> corefSentenceTokens = corefSentence.get(TokensAnnotation.class);
String newwords = "";
CorefMention reprMent = chain.getRepresentativeMention();
System.out.println("reprMent :"+reprMent);
System.out.println("Token index "+token.index());
System.out.println("Start index "+reprMent.startIndex);
System.out.println("End Index "+reprMent.endIndex);
if (token.index() <= reprMent.startIndex || token.index() >= reprMent.endIndex) {
for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
CoreLabel matchedLabel = corefSentenceTokens.get(i - 1);
resolved.add(matchedLabel.word().replace("'s", ""));
System.out.println("matchedLabel : "+matchedLabel.word());
newwords += matchedLabel.word() + " ";
}
}
else {
resolved.add(token.word());
System.out.println("token.word() : "+token.word());
}
System.out.println("converting " + token.word() + " to " + newwords);
}
System.out.println();
System.out.println();
System.out.println("-----------------------------------------------------------------");
}
}
String resolvedStr ="";
System.out.println();
for (String str : resolved) {
resolvedStr+=str+" ";
}
System.out.println(resolvedStr);
}
Gave perfect answer.
John drove to Judy’s house. He made her dinner. -----> John drove to Judy 's house . John made Judy dinner . Tom is a smart boy. He know a lot of thing. -----> Tom is a smart Tom . Tom know a lot of thing .