Why does pd.concat change the resulting datatype from int to float?
Because of this -
timestamp 7188 non-null int64
sunrise 7176 non-null float64
...
timestamp
has 7188 non-null values, while sunrise
and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.
Since NaNs are of dtype=float
, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.
That's the why, but that doesn't really solve your problem. Your options at this point are
- drop those rows with NaNs using
dropna
- fill those NaNs with some default integeral value using
fillna
(Now you may downcast these rows to int.)
Alternatively, if you perform
pd.concat
withjoin='inner'
, NaNs are not introduced and the dtypes are preserved.pd.concat((timestamp, dataSun, dataData), axis=1, join='inner') timestamp sunrise sunset temperature pressure \ 0 1521681600000 1521696105000 1521740761000 2.490000 1018.000000 1 1521681900000 1521696105000 1521740761000 2.408333 1017.833333 2 1521682200000 1521696105000 1521740761000 2.326667 1017.666667 3 1521682500000 1521696105000 1521740761000 2.245000 1017.500000 4 1521682800000 1521696105000 1521740761000 2.163333 1017.333333 humidity 0 99.0 1 99.0 2 99.0 3 99.0 4 99.0
With option 3, an inner join is performed on the indexes of each dataframe.
As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaNs discussed in this answer.