Fitting a Weibull distribution using Scipy

It is easy to verify which result is the true MLE, just need a simple function to calculate log likelihood:

>>> def wb2LL(p, x): #log-likelihood
    return sum(log(stats.weibull_min.pdf(x, p[1], 0., p[0])))
>>> adata=loadtxt('/home/user/stack_data.csv')
>>> wb2LL(array([6.8820748596850905, 1.8553346917584836]), adata)
-8290.1227946678173
>>> wb2LL(array([5.93030013, 1.57463497]), adata)
-8410.3327470347667

The result from fit method of exponweib and R fitdistr (@Warren) is better and has higher log likelihood. It is more likely to be the true MLE. It is not surprising that the result from GAMLSS is different. It is a complete different statistic model: Generalized Additive Model.

Still not convinced? We can draw a 2D confidence limit plot around MLE, see Meeker and Escobar's book for detail). Multi-dimensional Confidence Region

Again this verifies that array([6.8820748596850905, 1.8553346917584836]) is the right answer as loglikelihood is lower that any other point in the parameter space. Note:

Click to copy

>>> log(array([6.8820748596850905, 1.8553346917584836]))
array([ 1.92892018,  0.61806511])

BTW1, MLE fit may not appears to fit the distribution histogram tightly. An easy way to think about MLE is that MLE is the parameter estimate most probable given the observed data. It doesn't need to visually fit the histogram well, that will be something minimizing mean square error.

BTW2, your data appears to be leptokurtic and left-skewed, which means Weibull distribution may not fit your data well. Try, e.g. Gompertz-Logistic, which improves log-likelihood by another about 100. enter image description here Cheers!

I know it's an old post, but I just faced a similar problem and this thread helped me solve it. Thought my solution might be helpful for others like me:

Click to copy

# Fit Weibull function, some explanation below
params = stats.exponweib.fit(data, floc=0, f0=1)
shape = params[1]
scale = params[3]
print 'shape:',shape
print 'scale:',scale

#### Plotting
# Histogram first
values,bins,hist = plt.hist(data,bins=51,range=(0,25),normed=True)
center = (bins[:-1] + bins[1:]) / 2.

# Using all params and the stats function
plt.plot(center,stats.exponweib.pdf(center,*params),lw=4,label='scipy')

# Using my own Weibull function as a check
def weibull(u,shape,scale):
    '''Weibull distribution for wind speed u with shape parameter k and scale parameter A'''
    return (shape / scale) * (u / scale)**(shape-1) * np.exp(-(u/scale)**shape)

plt.plot(center,weibull(center,shape,scale),label='Wind analysis',lw=2)
plt.legend()

Some extra info that helped me understand:

Scipy Weibull function can take four input parameters: (a,c),loc and scale. You want to fix the loc and the first shape parameter (a), this is done with floc=0,f0=1. Fitting will then give you params c and scale, where c corresponds to the shape parameter of the two-parameter Weibull distribution (often used in wind data analysis) and scale corresponds to its scale factor.

From docs:

Click to copy

exponweib.pdf(x, a, c) =
    a * c * (1-exp(-x**c))**(a-1) * exp(-x**c)*x**(c-1)

If a is 1, then

Click to copy

exponweib.pdf(x, a, c) =
    c * (1-exp(-x**c))**(0) * exp(-x**c)*x**(c-1)
  = c * (1) * exp(-x**c)*x**(c-1)
  = c * x **(c-1) * exp(-x**c)

From this, the relation to the 'wind analysis' Weibull function should be more clear

I was curious about your question and, despite this is not an answer, it compares the Matlab result with your result and with the result using leastsq, which showed the best correlation with the given data:

enter image description here

The code is as follows:

Click to copy

import scipy.stats as s
import numpy as np
import matplotlib.pyplot as plt
import numpy.random as mtrand
from scipy.integrate import quad
from scipy.optimize import leastsq

## my distribution (Inverse Normal with shape parameter mu=1.0)
def weib(x,n,a):
    return (a / n) * (x / n)**(a-1) * np.exp(-(x/n)**a)

def residuals(p,x,y):
    integral = quad( weib, 0, 16, args=(p[0],p[1]) )[0]
    penalization = abs(1.-integral)*100000
    return y - weib(x, p[0],p[1]) + penalization

#
data = np.loadtxt("stack_data.csv")


x = np.linspace(data.min(), data.max(), 100)
n, bins, patches = plt.hist(data,bins=x, normed=True)
binsm = (bins[1:]+bins[:-1])/2

popt, pcov = leastsq(func=residuals, x0=(1.,1.), args=(binsm,n))

loc, scale = 1.58463497, 5.93030013
plt.plot(binsm,n)
plt.plot(x, weib(x, loc, scale),
         label='weib matlab, loc=%1.3f, scale=%1.3f' % (loc, scale), lw=4.)
loc, scale = s.exponweib.fit_loc_scale(data, 1, 1)
plt.plot(x, weib(x, loc, scale),
         label='weib stack, loc=%1.3f, scale=%1.3f' % (loc, scale), lw=4.)
plt.plot(x, weib(x,*popt),
         label='weib leastsq, loc=%1.3f, scale=%1.3f' % tuple(popt), lw=4.)

plt.legend(loc='upper right')
plt.show()

My guess is that you want to estimate the shape parameter and the scale of the Weibull distribution while keeping the location fixed. Fixing loc assumes that the values of your data and of the distribution are positive with lower bound at zero.

floc=0 keeps the location fixed at zero, f0=1 keeps the first shape parameter of the exponential weibull fixed at one.

Click to copy

>>> stats.exponweib.fit(data, floc=0, f0=1)
[1, 1.8553346917584836, 0, 6.8820748596850905]
>>> stats.weibull_min.fit(data, floc=0)
[1.8553346917584836, 0, 6.8820748596850549]

The fit compared to the histogram looks ok, but not very good. The parameter estimates are a bit higher than the ones you mention are from R and matlab.

Update

The closest I can get to the plot that is now available is with unrestricted fit, but using starting values. The plot is still less peaked. Note values in fit that don't have an f in front are used as starting values.

Click to copy

>>> from scipy import stats
>>> import matplotlib.pyplot as plt
>>> plt.plot(data, stats.exponweib.pdf(data, *stats.exponweib.fit(data, 1, 1, scale=02, loc=0)))
>>> _ = plt.hist(data, bins=np.linspace(0, 16, 33), normed=True, alpha=0.5);
>>> plt.show()

exponweib fit

Fitting a Weibull distribution using Scipy

Tags:

Python

Numpy

Distribution

Scipy

Weibull

Related

Recent Posts