How do I create my own NLTK text from a text file?

For a structured import of multiple files:

from nltk.corpus import PlaintextCorpusReader

# RegEx or list of file names
files = ".*\.txt"

corpus0 = PlaintextCorpusReader("/path/", files)
corpus  = nltk.Text(corpus0.words())

see: NLTK 3 book / section 1.9

Found the answer myself. That's embarrassing. Or awesome.

From Ch. 3:

f=open('my-file.txt','rU')
raw=f.read()
tokens = nltk.word_tokenize(raw)
text = nltk.Text(tokens)

Does the trick.

How do I create my own NLTK text from a text file?

Tags:

Python

Nltk

Related

Recent Posts