A CountVectorizer allows you to create features that correspond to N-grams of characters. It also makes it possible to generate features from the N-grams of words. code example

Example: countvectorizer with list of list

corpus = [["this is spam, 'SPAM'"],["this is ham, 'HAM'"],["this is nothing, 'NOTHING'"]]

from sklearn.feature_extraction.text import CountVectorizer
bag_of_words = CountVectorizer(tokenizer=lambda doc: doc, lowercase=False).fit_transform(splited_labels_from_corpus)

Tags:

Misc Example