Python's Xgoost: ValueError('feature_names may not contain [, ] or <')

I know it's late but writing this answer here for other folks who might face this. Here is what I found after facing this issue: This error typically happens if your column names have the symbols [ or ] or <. Here is an example:

import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor

# test input data with string, int, and symbol-included columns 
df = pd.DataFrame({'0': np.random.randint(0, 2, size=100),
                   '[test1]': np.random.uniform(0, 1, size=100),
                   'test2': np.random.uniform(0, 1, size=100),
                  3: np.random.uniform(0, 1, size=100)})

target = df.iloc[:, 0]
predictors = df.iloc[:, 1:]

# basic xgb model
xgb0 = XGBRegressor(objective= 'reg:linear')
xgb0.fit(predictors, target)

The code above will throw an error:

ValueError: feature_names may not contain [, ] or <

But if you remove those square brackets from '[test1]' then it works fine. Below is a generic way of removing [, ] or < from your column names:

import re
import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor
regex = re.compile(r"\[|\]|<", re.IGNORECASE)

# test input data with string, int, and symbol-included columns 
df = pd.DataFrame({'0': np.random.randint(0, 2, size=100),
                   '[test1]': np.random.uniform(0, 1, size=100),
                   'test2': np.random.uniform(0, 1, size=100),
                  3: np.random.uniform(0, 1, size=100)})

df.columns = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in df.columns.values]

target = df.iloc[:, 0]
predictors = df.iloc[:, 1:]

# basic xgb model
xgb0 = XGBRegressor(objective= 'reg:linear')
xgb0.fit(predictors, target)

For more read this code line form xgboost core.py: xgboost/core.py. That's the check failing which the error is thrown.

Python's Xgoost: ValueError('feature_names may not contain [, ] or <')

Tags:

Python

Pandas

Numpy

Scikit Learn

Xgboost

Related

Recent Posts