Pandas and Matplotlib - fill_between() vs datetime64

As WillZ pointed out, Pandas 0.21 broke unutbu's workaround. Converting datetimes to dates, however, can have significantly negative impacts on data analysis. This solution currently works and keeps datetime:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
d = data['Date'].dt.to_pydatetime()
plt.plot_date(d, data['A'], '-')
plt.plot_date(d, data['B'], '-')


plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

fill_between with datetime64 constraint

EDIT: As per jedi's comment, I set out to determine the fastest approach of the three options below:

method1 = original answer
method2 = jedi's comment + original answer
method3 = jedi's comment

method2 was slightly faster, but much more consistent, and thus I have edited the above answer to reflect the best approach.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time


N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
time_data = pd.DataFrame(columns=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'])
method1 = []
method2 = []
method3 = []
for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method1.append(time.clock() - start)

for i  in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method2.append(time.clock() - start)

for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['A'], '-')
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['B'], '-')


        plt.fill_between(data['Date'].dt.to_pydatetime(), data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method3.append(time.clock() - start)

time_data.loc['method1'] = method1
time_data.loc['method2'] = method2
time_data.loc['method3'] = method3
print(time_data)
plt.errorbar(time_data.index, time_data.mean(axis=1), yerr=time_data.std(axis=1))

time test of 3 methods on converting time data for plotting a DataFrame

I had a similar problem. I have a DataFrame that looks something like this:

date        upper     lower 
2018-10-10  0.999614  0.146746
2018-10-26  0.999783  0.333178
2019-01-02  0.961252  0.176736
2019-01-08  0.977487  0.371374
2019-01-09  0.923230  0.286423
2019-01-10  0.880961  0.294823
2019-01-11  0.846933  0.303679
2019-01-14  0.846933  0.303679
2019-01-15  0.800336  0.269864
2019-01-16  0.706114  0.238787

with dtypes:

date     datetime64[ns]
upper           float64
lower           float64

The following results in the error from the initial post

plt.fill_between(dplot.date, dplot.lower, dplot.upper, alpha=.2)

Interestingly,

plt.fill_between(dplot.date.values, dplot.lower, dplot.upper, alpha=.2)

works perfectly fine.

Pandas registers a converter in matplotlib.units.registry which converts a number of datetime types (such as pandas DatetimeIndex, and numpy arrays of dtype datetime64) to matplotlib datenums, but it does not handle Pandas Series with dtype datetime64.

In [67]: import pandas.tseries.converter as converter

In [68]: c = converter.DatetimeConverter()

In [69]: type(c.convert(df['Date'].values, None, None))
Out[69]: numpy.ndarray              # converted (good)

In [70]: type(c.convert(df['Date'], None, None))
Out[70]: pandas.core.series.Series  # left unchanged

fill_between checks for and uses a converter to handle the data if it exists.

So as a workaround, you could convert the dates to a NumPy array of datetime64's:

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)

For example,

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='D')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
               'Date': dates})
plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

enter image description here

I encountered this issue after upgrading to Pandas 0.21. My code ran fine previously with fill_between() but broke after the upgrade.

It turns out that this fix mentioned in @unutbu 's answer, which is what I had before anyway, only works if the DatetimeIndex contains date objects rather than datetime objects that has time info.

Looking at the example above, what I did to fix it was to add the following line before calling fill_between():

d['Date'] = [z.date() for z in d['Date']]

Pandas and Matplotlib - fill_between() vs datetime64

Tags:

Python

Pandas

Matplotlib

Related

Recent Posts