8000 Add functions for calculating log-likelihood and null log-likelihood · Issue #25982 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Add functions for calculating log-likelihood and null log-likelihood #25982
Closed
@SinghAnkur28

Description

@SinghAnkur28

Describe the workflow you want to enable

As a user of Scikit-learn, I want to be able to calculate the McFadden's pseudo R-squared for a binary logistic regression model for that we need log-likelihood and null log-likelihood.

Describe your proposed solution

I use the following functions and I propose to add them in the library as well.

For Log Likelihood -

import numpy as np
from sklearn.linear_model import LogisticRegression

def log_likelihood(model, X, y):
    """Calculate the log-likelihood of a binary logistic regression model.

    Parameters
    ----------
    model : sklearn.linear_model.LogisticRegression
        A trained binary logistic regression model.
    X : array-like, shape (n_samples, n_features)
        Feature matrix.
    y : array-like, shape (n_samples,)
        Binary class labels.

    Returns
    -------
    log_likelihood : float
        The log-likelihood of the model.
    """

    # Get predicted probabilities
    pred_probs = model.predict_proba(X)[:, 1]

    # Calculate log-likelihood
    log_likelihood = np.sum(y * np.log(pred_probs) + (1 - y) * np.log(1 - pred_probs))

    return log_likelihood

For Null Log Likelihood -

import numpy as np
from sklearn.metrics import log_loss

def null_log_likelihood(y):
    """Calculate the null log-likelihood of a binary logistic regression model.

    Parameters
    ----------
    y : array-like, shape (n_samples,)
        Binary class labels.

    Returns
    -------
    null_log_likelihood : float
        The null log-likelihood of the model.
    """

    # Calculate the proportion of positive class labels
    p0 = y.mean()

    # Create an array of predicted probabilities equal to the proportion of positive class labels
    probs = p0 * np.ones_like(y)

    # Calculate the null log-likelihood using the log_loss function from scikit-learn
    null_log_likelihood = -log_loss(y, probs, normalize=False)

    return null_log_likelihood

Describe alternatives you've considered, if relevant

One alternative to adding these functions to Scikit-learn would be for users to use statsmodels.
However, adding these functions to Scikit-learn would make it easier for users to calculate log-likelihood and null log-likelihood within the Scikit-learn ecosystem and would provide a standardized implementation.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0