Apply CountVectorizer to column with list of words in rows in Python

As I found no other way to avoid an error, I joined the lists in column

train[col]=train[col].apply(lambda x: " ".join(x) )
test[col]=test[col].apply(lambda x: " ".join(x) )

Only after that I started to get the result

Click to copy

X_train = cv.fit_transform(train[col])
X_train=pd.DataFrame(X_train.toarray(), columns=cv.get_feature_names())

To apply CountVectorizer to list of words you should disable analyzer.

Click to copy

x=[['ab','cd'], ['ab','de']]
vectorizer = CountVectorizer(analyzer=lambda x: x)
vectorizer.fit_transform(x).toarray()

Out:
array([[1, 1, 0],
       [1, 0, 1]], dtype=int64)

Apply CountVectorizer to column with list of words in rows in Python

Tags:

Python

Word

Sparse Matrix

Bag

Countvectorizer

Related

Recent Posts