Change one value based on another value in pandas
One option is to use Python's slicing and indexing features to logically evaluate the places where your condition holds and overwrite the data there.
Assuming you can load your data directly into pandas
with pandas.read_csv
then the following code might be helpful for you.
import pandas
df = pandas.read_csv("test.csv")
df.loc[df.ID == 103, 'FirstName'] = "Matt"
df.loc[df.ID == 103, 'LastName'] = "Jones"
As mentioned in the comments, you can also do the assignment to both columns in one shot:
df.loc[df.ID == 103, ['FirstName', 'LastName']] = 'Matt', 'Jones'
Note that you'll need pandas
version 0.11 or newer to make use of loc
for overwrite assignment operations.
Another way to do it is to use what is called chained assignment. The behavior of this is less stable and so it is not considered the best solution (it is explicitly discouraged in the docs), but it is useful to know about:
import pandas
df = pandas.read_csv("test.csv")
df['FirstName'][df.ID == 103] = "Matt"
df['LastName'][df.ID == 103] = "Jones"
This question might still be visited often enough that it's worth offering an addendum to Mr Kassies' answer. The dict
built-in class can be sub-classed so that a default is returned for 'missing' keys. This mechanism works well for pandas. But see below.
In this way it's possible to avoid key errors.
>>> import pandas as pd
>>> data = { 'ID': [ 101, 201, 301, 401 ] }
>>> df = pd.DataFrame(data)
>>> class SurnameMap(dict):
... def __missing__(self, key):
... return ''
...
>>> surnamemap = SurnameMap()
>>> surnamemap[101] = 'Mohanty'
>>> surnamemap[301] = 'Drake'
>>> df['Surname'] = df['ID'].apply(lambda x: surnamemap[x])
>>> df
ID Surname
0 101 Mohanty
1 201
2 301 Drake
3 401
The same thing can be done more simply in the following way. The use of the 'default' argument for the get
method of a dict object makes it unnecessary to subclass a dict.
>>> import pandas as pd
>>> data = { 'ID': [ 101, 201, 301, 401 ] }
>>> df = pd.DataFrame(data)
>>> surnamemap = {}
>>> surnamemap[101] = 'Mohanty'
>>> surnamemap[301] = 'Drake'
>>> df['Surname'] = df['ID'].apply(lambda x: surnamemap.get(x, ''))
>>> df
ID Surname
0 101 Mohanty
1 201
2 301 Drake
3 401
The original question addresses a specific narrow use case. For those who need more generic answers here are some examples:
Creating a new column using data from other columns
Given the dataframe below:
import pandas as pd
import numpy as np
df = pd.DataFrame([['dog', 'hound', 5],
['cat', 'ragdoll', 1]],
columns=['animal', 'type', 'age'])
In[1]:
Out[1]:
animal type age
----------------------
0 dog hound 5
1 cat ragdoll 1
Below we are adding a new description
column as a concatenation of other columns by using the +
operation which is overridden for series. Fancy string formatting, f-strings etc won't work here since the +
applies to scalars and not 'primitive' values:
df['description'] = 'A ' + df.age.astype(str) + ' years old ' \
+ df.type + ' ' + df.animal
In [2]: df
Out[2]:
animal type age description
-------------------------------------------------
0 dog hound 5 A 5 years old hound dog
1 cat ragdoll 1 A 1 years old ragdoll cat
We get 1 years
for the cat (instead of 1 year
) which we will be fixing below using conditionals.
Modifying an existing column with conditionals
Here we are replacing the original animal
column with values from other columns, and using np.where
to set a conditional substring based on the value of age
:
# append 's' to 'age' if it's greater than 1
df.animal = df.animal + ", " + df.type + ", " + \
df.age.astype(str) + " year" + np.where(df.age > 1, 's', '')
In [3]: df
Out[3]:
animal type age
-------------------------------------
0 dog, hound, 5 years hound 5
1 cat, ragdoll, 1 year ragdoll 1
Modifying multiple columns with conditionals
A more flexible approach is to call .apply()
on an entire dataframe rather than on a single column:
def transform_row(r):
r.animal = 'wild ' + r.type
r.type = r.animal + ' creature'
r.age = "{} year{}".format(r.age, r.age > 1 and 's' or '')
return r
df.apply(transform_row, axis=1)
In[4]:
Out[4]:
animal type age
----------------------------------------
0 wild hound dog creature 5 years
1 wild ragdoll cat creature 1 year
In the code above the transform_row(r)
function takes a Series
object representing a given row (indicated by axis=1
, the default value of axis=0
will provide a Series
object for each column). This simplifies processing since you can access the actual 'primitive' values in the row using the column names and have visibility of other cells in the given row/column.
You can use map
, it can map vales from a dictonairy or even a custom function.
Suppose this is your df:
ID First_Name Last_Name
0 103 a b
1 104 c d
Create the dicts:
fnames = {103: "Matt", 104: "Mr"}
lnames = {103: "Jones", 104: "X"}
And map:
df['First_Name'] = df['ID'].map(fnames)
df['Last_Name'] = df['ID'].map(lnames)
The result will be:
ID First_Name Last_Name
0 103 Matt Jones
1 104 Mr X
Or use a custom function:
names = {103: ("Matt", "Jones"), 104: ("Mr", "X")}
df['First_Name'] = df['ID'].map(lambda x: names[x][0])