Closed
Description
Describe the workflow you want to enable
As a user of Scikit-learn, I want to be able to calculate the McFadden's pseudo R-squared for a binary logistic regression model for that we need log-likelihood and null log-likelihood.
Describe your proposed solution
I use the following functions and I propose to add them in the library as well.
For Log Likelihood -
import numpy as np
from sklearn.linear_model import LogisticRegression
def log_likelihood(model, X, y):
"""Calculate the log-likelihood of a binary logistic regression model.
Parameters
----------
model : sklearn.linear_model.LogisticRegression
A trained binary logistic regression model.
X : array-like, shape (n_samples, n_features)
Feature matrix.
y : array-like, shape (n_samples,)
Binary class labels.
Returns
-------
log_likelihood : float
The log-likelihood of the model.
"""
# Get predicted probabilities
pred_probs = model.predict_proba(X)[:, 1]
# Calculate log-likelihood
log_likelihood = np.sum(y * np.log(pred_probs) + (1 - y) * np.log(1 - pred_probs))
return log_likelihood
For Null Log Likelihood -
import numpy as np
from sklearn.metrics import log_loss
def null_log_likelihood(y):
"""Calculate the null log-likelihood of a binary logistic regression model.
Parameters
----------
y : array-like, shape (n_samples,)
Binary class labels.
Returns
-------
null_log_likelihood : float
The null log-likelihood of the model.
"""
# Calculate the proportion of positive class labels
p0 = y.mean()
# Create an array of predicted probabilities equal to the proportion of positive class labels
probs = p0 * np.ones_like(y)
# Calculate the null log-likelihood using the log_loss function from scikit-learn
null_log_likelihood = -log_loss(y, probs, normalize=False)
return null_log_likelihood
Describe alternatives you've considered, if relevant
One alternative to adding these functions to Scikit-learn would be for users to use statsmodels.
However, adding these functions to Scikit-learn would make it easier for users to calculate log-likelihood and null log-likelihood within the Scikit-learn ecosystem and would provide a standardized implementation.
Additional context
No response