Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

You can compare a timestamp (Timestamp('2000-01-01 00:00:00')) to a string, pandas will convert the string to Timestamp for you. But once you set the value to 0, you cannot compare an int to a str.

Another way to go around this is to change order of your operations.

filters = df[0] > 0.7
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')

df[filters] = 0
print(df.loc[mask & filters])

Also, you mentioned you want to set column 0 to 0 if it exceeds 0.7, so df[df[0]>0.7] = 0 does not do exactly what you want: it sets the entire rows to 0. Instead:

df.loc[df[0] > 0.7, 0] = 0

Then you should not have any problem with the original mask.

If check output problem is datetimes are set by 0, because no columns for set are specified, so pandas set all columns:

df[df[0] > 0.7] = 0

print (df.head(10))
          0         1         2                 date
0  0.420593  0.519151  0.149883  2000-01-01 00:00:00
1  0.014364  0.503533  0.601206  2000-01-02 00:00:00
2  0.099144  0.090100  0.799383  2000-01-03 00:00:00
3  0.411158  0.144419  0.964909  2000-01-04 00:00:00
4  0.151470  0.424896  0.376281  2000-01-05 00:00:00
5  0.000000  0.000000  0.000000                    0
6  0.292871  0.868168  0.353377  2000-01-07 00:00:00
7  0.536018  0.737273  0.356857  2000-01-08 00:00:00
8  0.364068  0.314311  0.475165  2000-01-09 00:00:00
9  0.000000  0.000000  0.000000                    0

Solution is set only numeric columns by DataFrame.select_dtypes:

df.loc[df[0] > 0.7, df.select_dtypes(np.number).columns] = 0
#or specify columns by list
#df.loc[df[0] > 0.7, [0,1]] = 0

print (df.head(10))
          0         1         2       date
0  0.416697  0.459268  0.146755 2000-01-01
1  0.645391  0.742737  0.023878 2000-01-02
2  0.000000  0.000000  0.000000 2000-01-03
3  0.456387  0.996946  0.450155 2000-01-04
4  0.000000  0.000000  0.000000 2000-01-05
5  0.000000  0.000000  0.000000 2000-01-06
6  0.265673  0.951874  0.175133 2000-01-07
7  0.434855  0.762386  0.653668 2000-01-08
8  0.000000  0.000000  0.000000 2000-01-09
9  0.000000  0.000000  0.000000 2000-01-10

Another solution is create DatetimeIndex if all another columns are numeric:

df = df.set_index('date')
df.loc[df[0] > 0.7] = 0

print (df.head(10))
                   0         1         2
date                                    
2000-01-01  0.316875  0.584754  0.925727
2000-01-02  0.000000  0.000000  0.000000
2000-01-03  0.326266  0.746555  0.825070
2000-01-04  0.492115  0.508553  0.971966
2000-01-05  0.160850  0.403678  0.107497
2000-01-06  0.000000  0.000000  0.000000
2000-01-07  0.047433  0.103412  0.789594
2000-01-08  0.527788  0.415356  0.926681
2000-01-09  0.468794  0.458531  0.435696
2000-01-10  0.261224  0.599815  0.435548

Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

Tags:

Python

Pandas

Related

Recent Posts