set new index for pandas DataFrame (interpolating?)
This is works well:
import numpy as np
import pandas as pd
def interp(df, new_index):
"""Return a new DataFrame with all columns values interpolated
to the new_index values."""
df_out = pd.DataFrame(index=new_index)
df_out.index.name = df.index.name
for colname, col in df.iteritems():
df_out[colname] = np.interp(new_index, df.index, col)
return df_out
I wonder if you're up against one of pandas limitations; it seems like you have limited choices for aligning your df to an arbitrary set of numbers (your newindex
).
For example, your stated newindex
only overlaps with the first and last numbers in index
, so linear interpolation (rightly) interpolates a straight line between the start (2) and end (27) of your index
.
import numpy as np
import pandas as pd
%matplotlib inline
index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)
df = pd.DataFrame(x, index=index)
newindex = np.linspace(min(index), max(index), 100)
df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)
df.plot()
df_reindexed.plot()
If you change newindex
to provide more overlapping points with your original data set, interpolation works in a more expected manner:
newindex = np.linspace(min(index), max(index), 26)
df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)
df.plot()
df_reindexed.plot()
There are other methods that do not require one to manually align the indices, but the resulting curve (while technically correct) is probably not what one wants:
newindex = np.linspace(min(index), max(index), 1000)
df_reindexed = df.reindex(index = newindex, method = 'ffill')
df.plot()
df_reindexed.plot()
I looked at the pandas docs but I couldn't identify an easy solution.
https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-reindexing
I have adopted the following solution:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
def reindex_and_interpolate(df, new_index):
return df.reindex(df.index | new_index).interpolate(method='index', limit_direction='both').loc[new_index]
index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)
df = pd.DataFrame(x, index=index)
newindex = pd.Float64Index(np.linspace(min(index)-5, max(index)+5, 50))
df_reindexed = reindex_and_interpolate(df, newindex)
plt.figure()
plt.scatter(df.index, df.values, color='red', alpha=0.5)
plt.scatter(df_reindexed.index, df_reindexed.values, color='green', alpha=0.5)
plt.show()