NLTK available languages for stopwords

When you import the stopwords using:

from nltk.corpus import stopwords
english_stopwords = stopwords.words(language)

you are retrieving the stopwords based upon the fileid (language). In order to see all available stopword languages, you can retrieve the list of fileids using:

Click to copy

from nltk.corpus import stopwords
print(stopwords.fileids())

in the case of nltk v3.4.5, this returns 23 languages:

Click to copy

['arabic', 
 'azerbaijani', 
 'danish', 
 'dutch', 
 'english', 
 'finnish', 
 'french', 
 'german', 
 'greek',
 'hungarian', 
 'indonesian', 
 'italian', 
 'kazakh', 
 'nepali', 
 'norwegian', 
 'portuguese', 
 'romanian', 
 'russian', 
 'slovene', 
 'spanish', 
 'swedish', 
 'tajik', 
 'turkish']

Click to copy

os.listdir('/root/nltk_data/corpora/stopwords/')

['hungarian',
 'swedish',
 'kazakh',
 'norwegian',
 'finnish',
 'arabic',
 'indonesian',
 'portuguese',
 'turkish',
 'azerbaijani',
 'slovene',
 'spanish',
 'danish',
 'nepali',
 'romanian',
 'greek',
 'dutch',
 'README',
 'tajik',
 'german',
 'english',
 'russian',
 'french',
 'italian']

First check if you have downloaded nltk packages.
If not you can download it using below:

Click to copy

import nltk
nltk.download()

After this you can find stopword language files in below path.

Click to copy

C:/Users/username/AppData/Roming/nltk_data/corpora/stopwords

There are 21 languages supported by it (I installed nltk few days back, so this number must be up to date). You can pass filename as parameter in

nltk.corpus.stopwords.words('langauage')

NLTK available languages for stopwords

Tags:

Python

Nlp

Stop Words

Nltk

Related

Recent Posts