Iterating through a string word by word
When you do -
for word in string:
You are not iterating through the words in the string, you are iterating through the characters in the string. To iterate through the words, you would first need to split the string into words , using str.split()
, and then iterate through that . Example -
my_string = "this is a string"
for word in my_string.split():
print (word)
Please note, str.split()
, without passing any arguments splits by all whitespaces (space, multiple spaces, tab, newlines, etc).
for word in string.split():
print word
This is one way to do it:
string = "this is a string"
ssplit = string.split()
for word in ssplit:
print (word)
Output:
this
is
a
string
Using nltk.
from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize("This is a string.")
words_in_each_sentence = word_tokenize(sentences)
You may use TweetTokenizer for parsing casual text with emoticons and such.