python dataframe converting multiple datetime formats
Just use to_datetime
, it's man/woman enough to handle both those formats:
In [4]:
df['col'] = pd.to_datetime(df['col'])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
col 4 non-null datetime64[ns]
val 4 non-null object
dtypes: datetime64[ns](1), object(1)
memory usage: 96.0+ bytes
The df now looks likes this:
In [5]:
df
Out[5]:
col val
0 2013-12-01 00:00:00 value1
1 2014-01-22 00:00:01 value2
2 2013-12-10 00:00:00 value3
3 2013-12-31 00:00:00 value4
It works for me. I had two formats in my column 'fecha_hechos'. The formats where:
- 2015/03/02
- 10/02/2010
what I did was:
carpetas_cdmx['Timestamp'] = pd.to_datetime(carpetas_cdmx.fecha_hechos, format='%Y/%m/%d %H:%M:%S', errors='coerce')
mask = carpetas_cdmx.Timestamp.isnull()
carpetas_cdmx.loc[mask, 'Timestamp'] = pd.to_datetime(carpetas_cdmx[mask]['fecha_hechos'], format='%d/%m/%Y %H:%M',errors='coerce')
were: carpetas_cdmx
is my DataFrame
and fecha_hechos
the column with my formats
I had two different date formats in the same column Temps
, similar to the OP, which look like the following;
01.03.2017 00:00:00.000
01/03/2017 00:13
The timings are as follows for the two different code snippets;
v['Timestamp1'] = pd.to_datetime(v.Temps)
Took 25.5408718585968 seconds
v['Timestamp'] = pd.to_datetime(v.Temps, format='%d/%m/%Y %H:%M', errors='coerce')
mask = v.Timestamp.isnull()
v.loc[mask, 'Timestamp'] = pd.to_datetime(v[mask]['Temps'], format='%d.%m.%Y %H:%M:%S.%f',
errors='coerce')
Took 0.2923243045806885 seconds
In other words, if you have a small number of known formats for your datetimes, don't use to_datetime without a format!
You can create a new column :
test_df['col1'] = pd.Timestamp(test_df['col']).to_datetime()
and then drop col and rename col1.