RandomForestClassfier.fit(): ValueError: could not convert string to float
You have to do some encoding before using fit()
. As it was told fit()
does not accept strings, but you solve this.
There are several classes that can be used :
LabelEncoder
: turn your string into incremental valueOneHotEncoder
: use One-of-K algorithm to transform your String into integer
Personally, I have post almost the same question on Stack Overflow some time ago. I wanted to have a scalable solution, but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective, but if you have a lot of different strings the matrix will grow very quickly and memory will be required.
LabelEncoding worked for me (basically you've to encode your data feature-wise) (mydata is a 2d array of string datatype):
myData=np.genfromtxt(filecsv, delimiter=",", dtype ="|a20" ,skip_header=1);
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
for i in range(*NUMBER OF FEATURES*):
myData[:,i] = le.fit_transform(myData[:,i])