How can I copy DataFrames with datetimes from Stack Overflow into Python?
I usually copy the entire string and then parse it. It's not perfect and you usually have to edit both the string and the dataframe to make it usable. Here is one example. This solution was already provided in this answer. I've only added the bit about parsing date/time.
import pandas as pd
from io import StringIO
from dateutil.parser import parse
# I added two more column names `date` and `time`.
# An advantage of having the string in your python code is that
# you can edit it in your text editor/jupyter notebook quickly and directly.
s = """date time A
2020-01-01 09:20:00 0
2020-01-01 09:21:00 1
2020-01-01 09:22:00 2
2020-01-01 09:23:00 3
2020-01-01 09:24:00 4"""
# Parse using whitespace separator. This will still not be perfect as we can
# see below.
df = pd.read_csv(StringIO(s), sep="\s+", index_col=False)
df
# date time A
# 0 2020-01-01 09:20:00 0
# 1 2020-01-01 09:21:00 1
# 2 2020-01-01 09:22:00 2
# 3 2020-01-01 09:23:00 3
# 4 2020-01-01 09:24:00 4
# Combine date and time column together and drop the individual columns.
df['datetime'] = df['date'] + " " + df['time']
df = df.drop(['date', 'time'], axis=1)
# Use a somewhat universal parser in dateutil.parser.parse to parse the
# dates into proper dateime object.
df['datetime'] = df['datetime'].apply(parse)
df
# A datetime
# 0 0 2020-01-01 09:20:00
# 1 1 2020-01-01 09:21:00
# 2 2 2020-01-01 09:22:00
# 3 3 2020-01-01 09:23:00
# 4 4 2020-01-01 09:24:00
df.index
# RangeIndex(start=0, stop=5, step=1)
df.dtypes
# A int64
# datetime datetime64[ns]
# dtype: object
df.columns
# Index(['A', 'datetime'], dtype='object')
One method to provide formatted and parseable dataframe on StackOverflow is by outputting a csv-formatted string.
# Continued from above
print(df.to_csv(index=False))
# A,datetime
# 0,2020-01-01 09:20:00
# 1,2020-01-01 09:21:00
# 2,2020-01-01 09:22:00
# 3,2020-01-01 09:23:00
# 4,2020-01-01 09:24:00
# We can indeed parse nicely from the csv-formatted string
s_redux = df.to_csv(index=False)
pd.read_csv(StringIO(s_redux))
# A datetime
# 0 0 2020-01-01 09:20:00
# 1 1 2020-01-01 09:21:00
# 2 2 2020-01-01 09:22:00
# 3 3 2020-01-01 09:23:00
# 4 4 2020-01-01 09:24:00
Here is one attempt at parsing the second example dataframe. As before, we do need to make some "edits" to the dataframe to make it usable.
import pandas as pd
from io import StringIO
from dateutil.parser import parse
s=""" dates values cat
0 2020-01-01 09:20:00 0.758513 a
1 2020-01-01 09:21:00 0.337325 b
2 2020-01-01 09:22:00 0.618372 b
3 2020-01-01 09:23:00 0.878714 b
4 2020-01-01 09:24:00 0.311069 b"""
df = pd.read_csv(StringIO(s), sep="\s+").reset_index()
df
# level_0 level_1 dates values cat
# 0 0 2020-01-01 09:20:00 0.758513 a
# 1 1 2020-01-01 09:21:00 0.337325 b
# 2 2 2020-01-01 09:22:00 0.618372 b
# 3 3 2020-01-01 09:23:00 0.878714 b
# 4 4 2020-01-01 09:24:00 0.311069 b
df['dates'] = df['level_1'] + " " + df['dates']
df = df.drop(['level_0', 'level_1'], axis=1)
df['dates'] = df['dates'].apply(parse)
df
# dates values cat
# 0 2020-01-01 09:20:00 0.758513 a
# 1 2020-01-01 09:21:00 0.337325 b
# 2 2020-01-01 09:22:00 0.618372 b
# 3 2020-01-01 09:23:00 0.878714 b
# 4 2020-01-01 09:24:00 0.311069 b
df.dtypes
# dates datetime64[ns]
# values float64
# cat object
# dtype: object