RandomForestClassfier.fit(): ValueError: could not convert string to float

You have to do some encoding before using fit(). As it was told fit() does not accept strings, but you solve this.

There are several classes that can be used :

  • LabelEncoder : turn your string into incremental value
  • OneHotEncoder : use One-of-K algorithm to transform your String into integer

Personally, I have post almost the same question on Stack Overflow some time ago. I wanted to have a scalable solution, but didn't get any answer. I selected OneHotEncoder that binarize all the strings. It is quite effective, but if you have a lot of different strings the matrix will grow very quickly and memory will be required.


LabelEncoding worked for me (basically you've to encode your data feature-wise) (mydata is a 2d array of string datatype):

myData=np.genfromtxt(filecsv, delimiter=",", dtype ="|a20" ,skip_header=1);

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
for i in range(*NUMBER OF FEATURES*):
    myData[:,i] = le.fit_transform(myData[:,i])