LightGBM 'Using categorical_feature in Dataset.' Warning?
I presume that you get this warning in a call to lgb.train
. This function also has argument categorical_feature
, and its default value is 'auto'
, which means taking categorical columns from pandas.DataFrame
(documentation). The warning, which is emitted at this line, indicates that, despite lgb.train
has requested that categorical features be identified automatically, LightGBM will use the features specified in the dataset instead.
To avoid the warning, you can give the same argument categorical_feature
to both lgb.Dataset
and lgb.train
. Alternatively, you can construct the dataset with categorical_feature=None
and only specify the categorical features in lgb.train
.
Like user andrey-popov
described you can use the lgb.train
's categorical_feature
parameter to get rid of this warning.
Below is a simple example with some code how you could do it:
# Define categorical features
cat_feats = ['item_id', 'dept_id', 'store_id',
'cat_id', 'state_id', 'event_name_1',
'event_type_1', 'event_name_2', 'event_type_2']
...
# Define the datasets with the categorical_feature parameter
train_data = lgb.Dataset(X.loc[train_idx],
Y.loc[train_idx],
categorical_feature=cat_feats,
free_raw_data=False)
valid_data = lgb.Dataset(X.loc[valid_idx],
Y.loc[valid_idx],
categorical_feature=cat_feats,
free_raw_data=False)
# And train using the categorical_feature parameter
lgb.train(lgb_params,
train_data,
valid_sets=[valid_data],
verbose_eval=20,
categorical_feature=cat_feats,
num_boost_round=1200)