How to create a new data frame based on conditions from another data frame
In the current version of Pandas, the .ix
has deprecated; instead, use .loc
.
temp_df = df_complete.loc[]
I think you need boolean indexing
with loc
for selecting only columns col a
and col c
:
temp_df = df_complete.loc[(df_complete['type'] == 'NDD') &
(df_complete['writer'] == 'Mary') &
(df_complete['status'] != '7'), ['col a','col c']]
#rename columns
temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]
Sample:
df_complete = pd.DataFrame({'type': ['NDD','NDD','NT'],
'writer':['Mary','Mary','John'],
'status':['4','5','6'],
'col a': [1,3,5],
'col b': [5,3,6],
'col c': [7,4,3]}, index=[3,4,5])
print (df_complete)
col a col b col c status type writer
3 1 5 7 4 NDD Mary
4 3 3 4 5 NDD Mary
5 5 6 3 6 NT John
temp_df = df_complete.loc[(df_complete['type'] == 'NDD') &
(df_complete['writer'] == 'Mary') &
(df_complete['status'] != '7'), ['col a','col c']]
print (temp_df)
col a col c
3 1 7
4 3 4
temp_df = temp_df.rename(columns={'col a':'col A','col c':'col C'})
#add new column
temp_df['col B'] = 'good'
#reorder columns
temp_df = temp_df[['col A','col B','col C']]
print (temp_df)
col A col B col C
3 1 good 7
4 3 good 4