How can I copy DataFrames with datetimes from Stack Overflow into Python?

I usually copy the entire string and then parse it. It's not perfect and you usually have to edit both the string and the dataframe to make it usable. Here is one example. This solution was already provided in this answer. I've only added the bit about parsing date/time.

Click to copy

import pandas as pd
from io import StringIO
from dateutil.parser import parse

# I added two more column names `date` and `time`.
# An advantage of having the string in your python code is that
# you can edit it in your text editor/jupyter notebook quickly and directly.
s = """date time A
2020-01-01 09:20:00  0
2020-01-01 09:21:00  1
2020-01-01 09:22:00  2
2020-01-01 09:23:00  3
2020-01-01 09:24:00  4"""

# Parse using whitespace separator. This will still not be perfect as we can
# see below.
df = pd.read_csv(StringIO(s), sep="\s+", index_col=False)
df
#          date      time  A
# 0  2020-01-01  09:20:00  0
# 1  2020-01-01  09:21:00  1
# 2  2020-01-01  09:22:00  2
# 3  2020-01-01  09:23:00  3
# 4  2020-01-01  09:24:00  4

# Combine date and time column together and drop the individual columns.
df['datetime'] = df['date'] + " " + df['time']
df = df.drop(['date', 'time'], axis=1)

# Use a somewhat universal parser in dateutil.parser.parse to parse the
# dates into proper dateime object.
df['datetime'] = df['datetime'].apply(parse)
df
#    A            datetime
# 0  0 2020-01-01 09:20:00
# 1  1 2020-01-01 09:21:00
# 2  2 2020-01-01 09:22:00
# 3  3 2020-01-01 09:23:00
# 4  4 2020-01-01 09:24:00

df.index
# RangeIndex(start=0, stop=5, step=1)

df.dtypes
# A                    int64
# datetime    datetime64[ns]
# dtype: object

df.columns
# Index(['A', 'datetime'], dtype='object')

One method to provide formatted and parseable dataframe on StackOverflow is by outputting a csv-formatted string.

Click to copy

# Continued from above
print(df.to_csv(index=False))
# A,datetime
# 0,2020-01-01 09:20:00
# 1,2020-01-01 09:21:00
# 2,2020-01-01 09:22:00
# 3,2020-01-01 09:23:00
# 4,2020-01-01 09:24:00

# We can indeed parse nicely from the csv-formatted string 
s_redux = df.to_csv(index=False)
pd.read_csv(StringIO(s_redux))
#    A             datetime
# 0  0  2020-01-01 09:20:00
# 1  1  2020-01-01 09:21:00
# 2  2  2020-01-01 09:22:00
# 3  3  2020-01-01 09:23:00
# 4  4  2020-01-01 09:24:00

Here is one attempt at parsing the second example dataframe. As before, we do need to make some "edits" to the dataframe to make it usable.

Click to copy

import pandas as pd
from io import StringIO
from dateutil.parser import parse

s="""                dates    values cat
0 2020-01-01 09:20:00  0.758513   a
1 2020-01-01 09:21:00  0.337325   b
2 2020-01-01 09:22:00  0.618372   b
3 2020-01-01 09:23:00  0.878714   b
4 2020-01-01 09:24:00  0.311069   b"""

df = pd.read_csv(StringIO(s), sep="\s+").reset_index()
df
#    level_0     level_1     dates    values cat
# 0        0  2020-01-01  09:20:00  0.758513   a
# 1        1  2020-01-01  09:21:00  0.337325   b
# 2        2  2020-01-01  09:22:00  0.618372   b
# 3        3  2020-01-01  09:23:00  0.878714   b
# 4        4  2020-01-01  09:24:00  0.311069   b

df['dates'] = df['level_1'] + " " + df['dates']
df = df.drop(['level_0', 'level_1'], axis=1)
df['dates'] = df['dates'].apply(parse)

df
#                 dates    values cat
# 0 2020-01-01 09:20:00  0.758513   a
# 1 2020-01-01 09:21:00  0.337325   b
# 2 2020-01-01 09:22:00  0.618372   b
# 3 2020-01-01 09:23:00  0.878714   b
# 4 2020-01-01 09:24:00  0.311069   b

df.dtypes
# dates     datetime64[ns]
# values           float64
# cat               object
# dtype: object

How can I copy DataFrames with datetimes from Stack Overflow into Python?

Tags:

Python

Pandas

Clipboard

Copy Paste

Time Series

Related

Recent Posts