Generate random numbers from lognormal distribution in python
You have the mode and the standard deviation of the log-normal distribution. To use the rvs()
method of scipy's lognorm
, you have to parameterize the distribution in terms of the shape parameter s
, which is the standard deviation sigma
of the underlying normal distribution, and the scale
, which is exp(mu)
, where mu
is the mean of the underlying distribution.
You pointed out that making this reparameterization requires solving a quartic polynomial. For that, we can use the numpy.poly1d
class. Instances of that class have a roots
attribute.
A little algebra shows that exp(sigma**2)
is the unique positive real root of the polynomial
x**4 - x**3 - (stddev/mode)**2 = 0
where stddev
and mode
are the given standard deviation and mode of the log-normal distribution, and for that solution, the scale
(i.e. exp(mu)
) is
scale = mode*x
Here's a function that converts the mode and standard deviation to the shape and scale:
def lognorm_params(mode, stddev):
"""
Given the mode and std. dev. of the log-normal distribution, this function
returns the shape and scale parameters for scipy's parameterization of the
distribution.
"""
p = np.poly1d([1, -1, 0, 0, -(stddev/mode)**2])
r = p.roots
sol = r[(r.imag == 0) & (r.real > 0)].real
shape = np.sqrt(np.log(sol))
scale = mode * sol
return shape, scale
For example,
In [155]: mode = 123
In [156]: stddev = 99
In [157]: sigma, scale = lognorm_params(mode, stddev)
Generate a sample using the computed parameters:
In [158]: from scipy.stats import lognorm
In [159]: sample = lognorm.rvs(sigma, 0, scale, size=1000000)
Here's the standard deviation of the sample:
In [160]: np.std(sample)
Out[160]: 99.12048952171304
And here's some matplotlib code to plot a histogram of the sample, with a vertical line drawn at the mode of the distribution from which the sample was drawn:
In [176]: tmp = plt.hist(sample, normed=True, bins=1000, alpha=0.6, color='c', ec='c')
In [177]: plt.xlim(0, 600)
Out[177]: (0, 600)
In [178]: plt.axvline(mode)
Out[178]: <matplotlib.lines.Line2D at 0x12c5a12e8>
The histogram:
If you want to generate the sample using numpy.random.lognormal()
instead of scipy.stats.lognorm.rvs()
, you can do this:
In [200]: sigma, scale = lognorm_params(mode, stddev)
In [201]: mu = np.log(scale)
In [202]: sample = np.random.lognormal(mu, sigma, size=1000000)
In [203]: np.std(sample)
Out[203]: 99.078297384090902
I haven't looked into how robust poly1d
's roots
algorithm is, so be sure to test for a wide range of possible input values. Alternatively, you can use a solver from scipy to solve the above polynomial for x
. You can bound the solution using:
max(sqrt(stddev/mode), 1) <= x <= sqrt(stddev/mode) + 1
The log-normal distribution is (confusingly) the result of applying the exponential function to a normal distribution. Wikipedia gives the relationship between the parameters as
where μ and σ are the mean and standard deviation of what you call the "underlying normal distribution", and m and v are the mean and variance of the log-normal distribution.
Now, what you say you have is the mode and standard deviation of the log-normal distribution. The variance v is just the square of the standard deviation. Getting from the mode to m is trickier: again quoting that Wikipedia article, if the mean is then the mode is . From this, and the above, we can deduce that
where n is the mode of the log-normal distribution and v, m are as above. This reduces to a quartic,
or
where u = m2. I suspect this is the same quartic you mentioned in your question. It can be solved, but like most quartics, the radical form of the solutions are a giant hairball. The most practical approach for your purposes might be to plug numeric values for n and v into the above and then use a numeric solver to find the positive root(s).
Sorry I can't be more help. This is really a math question, not a programming question; you might get more helpful answers on https://math.stackexchange.com/.