Add functions for calculating log-likelihood and null log-likelihood

Describe the workflow you want to enable

As a user of Scikit-learn, I want to be able to calculate the McFadden's pseudo R-squared for a binary logistic regression model for that we need log-likelihood and null log-likelihood.

Describe your proposed solution

I use the following functions and I propose to add them in the library as well.

For Log Likelihood -

import numpy as np
from sklearn.linear_model import LogisticRegression

def log_likelihood(model, X, y):
    """Calculate the log-likelihood of a binary logistic regression model.

    Parameters
    ----------
    model : sklearn.linear_model.LogisticRegression
        A trained binary logistic regression model.
    X : array-like, shape (n_samples, n_features)
        Feature matrix.
    y : array-like, shape (n_samples,)
        Binary class labels.

    Returns
    -------
    log_likelihood : float
        The log-likelihood of the model.
    """

    # Get predicted probabilities
    pred_probs = model.predict_proba(X)[:, 1]

    # Calculate log-likelihood
    log_likelihood = np.sum(y * np.log(pred_probs) + (1 - y) * np.log(1 - pred_probs))

    return log_likelihood

For Null Log Likelihood -

import numpy as np
from sklearn.metrics import log_loss

def null_log_likelihood(y):
    """Calculate the null log-likelihood of a binary logistic regression model.

    Parameters
    ----------
    y : array-like, shape (n_samples,)
        Binary class labels.

    Returns
    -------
    null_log_likelihood : float
        The null log-likelihood of the model.
    """

    # Calculate the proportion of positive class labels
    p0 = y.mean()

    # Create an array of predicted probabilities equal to the proportion of positive class labels
    probs = p0 * np.ones_like(y)

    # Calculate the null log-likelihood using the log_loss function from scikit-learn
    null_log_likelihood = -log_loss(y, probs, normalize=False)

    return null_log_likelihood

Describe alternatives you've considered, if relevant

One alternative to adding these functions to Scikit-learn would be for users to use statsmodels.
However, adding these functions to Scikit-learn would make it easier for users to calculate log-likelihood and null log-likelihood within the Scikit-learn ecosystem and would provide a standardized implementation.

Additional context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe the workflow you want to enable

Describe your proposed solution

For Log Likelihood -

For Null Log Likelihood -

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Describe the workflow you want to enable

Describe your proposed solution

For Log Likelihood -

For Null Log Likelihood -

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions