Fitting a curve to a power-law distribution with curve_fit does not work

Your func_powerlaw is not strictly a power law, as it has an additive constant.

Generally speaking, if you want a quick visual appraisal of a power law relation, you would

plot(log(x),log(y))

loglog(x,y)

Both of them should give a straight line, although there are subtle differences among them (in particular, regarding curve fitting).

All this without the additive constant, which messes up the power law relation.

If you want to fit a power law that weighs data according to the log-log scale (typically desirable), you can use code below.

import numpy as np
from scipy.optimize import curve_fit

def powlaw(x, a, b) :
    return a * np.power(x, b)
def linlaw(x, a, b) :
    return a + x * b

def curve_fit_log(xdata, ydata) :
    """Fit data to a power law with weights according to a log scale"""
    # Weights according to a log scale
    # Apply fscalex
    xdata_log = np.log10(xdata)
    # Apply fscaley
    ydata_log = np.log10(ydata)
    # Fit linear
    popt_log, pcov_log = curve_fit(linlaw, xdata_log, ydata_log)
    #print(popt_log, pcov_log)
    # Apply fscaley^-1 to fitted data
    ydatafit_log = np.power(10, linlaw(xdata_log, *popt_log))
    # There is no need to apply fscalex^-1 as original data is already available
    return (popt_log, pcov_log, ydatafit_log)

As the traceback states, the maximum number of function evaluations was reached without finding a stationary point (to terminate the algorithm). You can increase the maximum number using the option maxfev. For this example, setting maxfev=2000 is large enough to successfully terminate the algorithm.

However, the solution is not satisfactory. This is due to the algorithm choosing a (default) initial estimate for the variables, which, for this example, is not good (the large number of iterations required is an indicator of this). Providing another initialization point (found by simple trial and error) results in a good fit, without the need to increase maxfev.

The two fits and a visual comparison with the data is shown below.

x = np.asarray([ 1000, 3250, 5500, 10000, 32500, 55000, 77500, 100000, 200000 ])
y = np.asarray([ 1100, 500, 288, 200, 113, 67, 52, 44, 5 ])

sol1 = curve_fit(func_powerlaw, x, y, maxfev=2000 )
sol2 = curve_fit(func_powerlaw, x, y, p0 = np.asarray([-1,10**5,0]))

enter image description here

Fitting a curve to a power-law distribution with curve_fit does not work

Tags:

Python

Numpy

Scipy

Scikit Learn

Related

Recent Posts