Remove prefix (or suffix) substring from column headers in pandas
python < 3.9, pandas < 1.4
Use str.strip
/rstrip
:
# df.columns = df.columns.str.strip('_x')
# Or,
df.columns = df.columns.str.rstrip('_x') # strip suffix at the right end only.
df.columns
# Index(['W', 'First', 'Last', 'Slice'], dtype='object')
To avoid the issue highlighted in the comments:
Beware of strip() if any column name starts or ends with either _ or x beyond the suffix.
You could use str.replace
,
df.columns = df.columns.str.replace(r'_x$', '')
df.columns
# Index(['W', 'First', 'Last', 'Slice'], dtype='object')
Update: python >= 3.9, pandas >= 1.4
From version 1.4, you will soon be able to use str.removeprefix
/str.removesuffix
.
Examples:
s = pd.Series(["str_foo", "str_bar", "no_prefix"])
s
0 str_foo
1 str_bar
2 no_prefix
dtype: object
s.str.removeprefix("str_")
0 foo
1 bar
2 no_prefix
dtype: object
s = pd.Series(["foo_str", "bar_str", "no_suffix"])
s
0 foo_str
1 bar_str
2 no_suffix
dtype: object
s.str.removesuffix("_str")
0 foo
1 bar
2 no_suffix
dtype: object
Note that 1.4 is not out yet, but you can play with this feature by installing a development environment of pandas.
df.columns = [col[:-2] for col in df.columns if col[-2:]=='_x' else col]
or
df.columns = [col.replace('_x', '') for col in df.columns]