Seaborn: Avoid plotting missing values (line plot)
Based on Denziloe answer:
there are three options:
1) Use pandas
or matplotlib
.
2) If you need seaborn
: not what it's for but for regular dates like abovepointplot
can use out of the box.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.pointplot(
ax=ax,
data=df, x="Date", y="Data"
)
ax.set_xticklabels([])
plt.show()
graph built on data from the question will look as below:
Pros:
- easy to implement
- an outlier in the data which is surrounded by
None
will be easy to notice on the graph
Cons:
- it takes a long time to generate such a graph (compared to
lineplot
) - when there are many points it becomes hard to read such graphs
3) If you need seaborn
and you need lineplot
:
hue
argument can be used to put the separate sections in separate buckets. We number the sections using the occurrences of nans.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.lineplot(
ax=ax
, data=df, x="Date", y="Data"
, hue=df["Data"].isna().cumsum()
, palette=["blue"]*sum(df["Data"].isna())
, legend=False, markers=True
)
ax.set_xticklabels([])
plt.show()
Pros:
- lineplot
- easy to read
- generated faster than point plot
Cons:
- an outlier in the data which is surrounded by
None
will not be drawn on the chart
The graph will look as below:
Try setting NaN values to np.inf
-- Seaborn doesn't draw those points, and doesn't connect the points before with points after.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
# Make example data
s = """2018-01-01
2018-01-02,100
2018-01-03,105
2018-01-04
2018-01-05,95
2018-01-06,90
2018-01-07,80
2018-01-08
2018-01-09"""
df = pd.DataFrame([row.split(",") for row in s.split("\n")], columns=["Date", "Data"])
df = df.replace("", np.nan)
df["Date"] = pd.to_datetime(df["Date"])
df["Data"] = df["Data"].astype(float)
Three options:
1) Use pandas
or matplotlib
.
2) If you need seaborn
: not what it's for but for regular dates like yours you can use pointplot
out of the box.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.pointplot(
ax=ax,
data=df, x="Date", y="Data"
)
ax.set_xticklabels([])
plt.show()
3) If you need seaborn
and you need lineplot
: I've looked at the source code and it looks like lineplot
drops nans from the DataFrame before plotting. So unfortunately it's not possible to do it properly. You could use some advanced hackery though and use the hue
argument to put the separate sections in separate buckets. We number the sections using the occurrences of nans.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.lineplot(
ax=ax,
data=df, x="Date", y="Data",
hue=df["Data"].isna().cumsum(), palette=["black"]*sum(df["Data"].isna()), legend=False, markers=True
)
ax.set_xticklabels([])
plt.show()
Unfortunately the markers argument appears to be broken currently so you'll need to fix it if you want to see dates that have nans on either side.