python divide by zero encountered in log - logistic regression

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was

epsilon = 1e-5    

def cost(X, y, theta):
    m = X.shape[0]
    yp = expit(X @ theta)
    cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
    return cost

Cause:

This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.

Cost function:

J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])

Now, consider the following part in the above code snippet:

np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))

Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.


I'm guessing your data has negative values in it. You can't log a negative.

import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan

There are a lot of different ways to transform your data that should help, if this is the case.


You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator @ for matrix multiplication — and breaking it up as suggested in the comments.

Here is your cost function:

def cost(X, y, theta, regTerm):
    m = X.shape[0]  # or y.shape, or even p.shape after the next line, number of training set
    p = expit(X @ theta)
    log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
    J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
    return J

You can clean up your gradient function along the same lines.

By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.