Pandas Dataframe - Droping Certain Hours of the Day from 20 Years of Historical Data
Problem here is how you are importing data. There is no indicator whether 04:00 is am or pm? but based on your comments we need to assume it is PM. However input is showing it as AM.
To solve this we need to include two conditions with OR clause.
- 9:30-11:59
- 0:00-4:00
Input:
df = pd.DataFrame({'date': {880551: '2015-07-06 04:00:00', 880552: '2015-07-06 04:02:00',880553: '2015-07-06 04:03:00', 880554: '2015-07-06 04:04:00', 880555: '2015-07-06 04:05:00'},
'open': {880551: 125.00, 880552: 125.36,880553: 125.34, 880554: 125.08, 880555: 125.12},
'high': {880551: 125.00, 880552: 125.36,880553: 125.34, 880554: 125.11, 880555: 125.12},
'low': {880551: 125.00, 880552: 125.32,880553: 125.21, 880554: 125.05, 880555: 125.12},
'close': {880551: 125.00, 880552: 125.32,880553: 125.21, 880554: 125.05, 880555: 125.12},
'volume': {880551: 141, 880552: 200,880553: 750, 880554: 17451, 880555: 1000},
},
)
df.head()
date open high low close volume
880551 2015-07-06 04:00:00 125.00 125.00 125.00 125.00 141
880552 2015-07-06 04:02:00 125.36 125.36 125.32 125.32 200
880553 2015-07-06 04:03:00 125.34 125.34 125.21 125.21 750
880554 2015-07-06 04:04:00 125.08 125.11 125.05 125.05 17451
880555 2015-07-06 04:05:00 125.12 125.12 125.12 125.12 1000
from datetime import time
start_first = time(9, 30)
end_first = time(11, 59)
start_second = time(0, 00)
end_second = time(4,00)
df['date'] = pd.to_datetime(df['date'])
df= df[(df['date'].dt.time.between(start_first, end_first)) | (df['date'].dt.time.between(start_second, end_second))]
df
date open high low close volume
880551 2015-07-06 04:00:00 125.0 125.0 125.0 125.0 141
Above is not good practice, and I strongly discourage to use this kind of ambiguous data. long time solution is to correctly populate data with am/pm.
We can achieve it in two way in case of correct data format:
1) using datetime
from datetime import time
start = time(9, 30)
end = time(16)
df['date'] = pd.to_datetime(df['date'])
df= df[df['date'].dt.time.between(start, end)]
2) using between time, which only works with datetime index
df['date'] = pd.to_datetime(df['date'])
df = (df.set_index('date')
.between_time('09:30', '16:00')
.reset_index())
If you still face error, edit your question with line by line approach and exact error.
I think the answer is already in the comments (@Parfait's .between_time) but that it got lost in debugging issues. It appears your df['date']
column is not of type Datetime
yet.
This should be enough to fix that and get the required result:
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.between_time('9:30', '16:00')