Parse text to get the proper nouns (names and organizations) - python nltk

There is a better way to extract names of people and organizations

from nltk import pos_tag, ne_chunk
from nltk.tokenize import SpaceTokenizer

tokenizer = SpaceTokenizer()
toks = tokenizer.tokenize(sentence)
pos = pos_tag(toks)
chunked_nes = ne_chunk(pos) 

nes = [' '.join(map(lambda x: x[0], ne.leaves())) for ne in chunked_nes if isinstance(ne, nltk.tree.Tree)]

However all Named Entity Recognizers commit errors. If you really don't want to miss any proper name, you could use a dict of Proper Names and check if the name is contained in the dict.


You might want to have a look at python-nameparser. It tries to guess capitalization of names also. Sorry for the incomplete answer but I don't have much experience using python-nameparser.

Best of luck!

Tags:

Python

Nltk