Named entity recognition in Spacy
In Spacy version 3 the Transformers from Hugging Face are fine-tuned to the operations that Spacy provided in previous versions, but with better results.
Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on
Install the dependencies:
sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2
Download a model:
python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base
Here's a list of available models.
And then use it as you would normally do:
import spacy
text = 'Type something here which can be related to something, e.g Stack Over Flow organization'
nlp = spacy.load('en_core_web_trf')
document = nlp(text)
print(document.ents)
References:
Learn about Transformers and Attention.
Read a summary about the different Trasnformers architectures.
Learn about the Transformers fine-tune done by Spacy.
As per spacy documentation for Name Entity Recognition here is the way to extract name entity
import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
ResultName Entity: (China,)
To make "Alphabet" a 'Noun' append it with "The".
doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
Name Entity: (Alphabet, China)