Train Model fails because 'list' object has no attribute 'lower'
Apply
X = df.text.astype(str)
I had the similar problem but instead of extracting values using .loc[]
or .iloc[]
, I simply used
X = df.text
y = df.target
which converts the dataframe column to Series
having list
as each row and tokenized items as objects
in each row. The series looked similar to what Alex had:
print(X)
So, only .astype(str)
worked for me.
Result:
The TFIDF Vectorizer should expect an array of strings. So if you pass him an array of arrays of tokenz, it crashes.
add this code .apply(lambda x: ' '.join(x))
after X_train and y_train and it should work.
Answer from http://www.davidsbatista.net/blog/2018/02/28/TfidfVectorizer/
from sklearn.feature_extraction.text import CountVectorizer
def dummy(doc):
return doc
tfidf = CountVectorizer(
tokenizer=dummy,
preprocessor=dummy,
)
docs = [
['hello', 'world', '.'],
['hello', 'world'],
['again', 'hello', 'world']
]
tfidf.fit(docs)
tfidf.get_feature_names()
# ['.', 'again', 'hello', 'world']