pandas dataframe return first word in string for column
It's simple. Use as follows:
df['make'] = df['id'].str.split(' ').str[0]
Consider a regex solution with loc
where it extracts everything before first space:
df.loc[df['make']=='', 'make'] = df['id'].str.extract('(.*) ', expand=False)
Alternatively, use numpy's where
which allows the if/then/else conditional logic:
df['make'] = np.where(df['make']=='',
df['id'].str.extract('(.*) ', expand=False),
df['make'])
IDK why but with the part below
df.loc[df.make == '', 'make']
OR
df.loc[df['make'] == '', 'make']
I get the error - KeyError: 'make'
So instead I did (in case someone sees the same error):
df['make'] = df['id']
df['make'] = df.id.str.split().str.get(0)
Worked for me.
Use str.split
and str.get
and assign using loc
only where df.make == ''
df.loc[df.make == '', 'make'] = df.id.str.split().str.get(0)
print df
id make
0 abarth 1.4 abarth
1 abarth 1 abarth
2 land rover 1.3 rover
3 land rover 2 rover
4 land rover 5 rover
5 mazda 4.55 mazda