Why does pd.concat change the resulting datatype from int to float?

Because of this -

timestamp      7188 non-null int64
sunrise        7176 non-null float64
...

timestamp has 7188 non-null values, while sunrise and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.

Since NaNs are of dtype=float, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.

That's the why, but that doesn't really solve your problem. Your options at this point are

drop those rows with NaNs using dropna
fill those NaNs with some default integeral value using fillna

(Now you may downcast these rows to int.)

Alternatively, if you perform pd.concat with join='inner', NaNs are not introduced and the dtypes are preserved.

pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')

       timestamp        sunrise         sunset  temperature     pressure  \    
0  1521681600000  1521696105000  1521740761000     2.490000  1018.000000   
1  1521681900000  1521696105000  1521740761000     2.408333  1017.833333   
2  1521682200000  1521696105000  1521740761000     2.326667  1017.666667   
3  1521682500000  1521696105000  1521740761000     2.245000  1017.500000   
4  1521682800000  1521696105000  1521740761000     2.163333  1017.333333   

   humidity  
0      99.0  
1      99.0  
2      99.0  
3      99.0  
4      99.0

With option 3, an inner join is performed on the indexes of each dataframe.

As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaNs discussed in this answer.

Why does pd.concat change the resulting datatype from int to float?

Tags:

Python

Pandas

Concat

Dataframe

Related

Recent Posts