The Logit Model Measurement Problem
The Logit Model Measurement Problem
doi:10.1017/psa.2024.25
ARTICLE
(Received 24 October 2023; revised 11 April 2024; accepted 31 May 2024; first published online 05 December 2024)
Abstract
Traditional wisdom dictates that statistical model outputs are estimates, not measurements.
Despite this, statistical models are employed as measurement instruments in the social
sciences. In this article, I scrutinize the use of a specific model—the logit model—for
psychological measurement. Given the adoption of a criterion for measurement that I call
comparability, I show that the logit model fails to yield measurements due to properties that
follow from its fixed residual variance.
1. Introduction
Do statistical models yield measurements? On the face of it, clearly not. Consider an
example: suppose I need to measure the square footage of my apartment. To do this, I
decide to construct a statistical model predicting square footage given the number of
bathrooms. I gather an immense amount of data on the number of bathrooms and the
square footage of apartments. I have one bathroom, and my model estimates that my
apartment is 621 square feet. It seems wrong to claim to have measured the area of
my apartment. Say that, as it turns out, my apartment is in fact 621 square feet. Even
then, it seems wrong to say that I have measured the square footage of my apartment.
Statistical model outputs are usually described as estimates or predictions, not as
measurements.
Surprisingly, some do treat some statistical models as measurement instruments.
Social scientists like psychologists and economists study attributes that are not
observable: intelligence, well-being, market value, inflation, and so on. Lacking the
option of direct physical measurement, researchers create models to represent and
measure these hidden magnitudes. Using statistical models as measurement
instruments is a pragmatic response to the circumstances in these fields.
Despite this practical necessity, treating statistical models as measurement
instruments is counterintuitive. And so, in this article, I take up the issue of whether
statistical models are measurement instruments. Because statistical models are
varied in their properties, rather than considering statistical models in general,
I consider whether one particular statistical model that is important for psychological
© The Author(s), 2024. Published by Cambridge University Press on behalf of the Philosophy of Science Association. This
is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://
creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided
the original article is properly cited.
2. Background
Statistical models are used to measure unobservable traits in psychology. Because
these traits are unobservable, the models that are used to measure them are called
latent variable models. A latent variable is one that is unobserved, meaning that no data
correspond to the variable. One such latent variable model that is important for
psychological measurement is logistic regression, also known as the logit model. In
this section, I explain what logistic regression is and how it can be understood as a
representation of an unobserved variable that is supposed to exist.
1
Francis Galton used linear regression to model the relationship between the heights of fathers and
sons. He found that very tall fathers would have sons with heights closer to the average, and ditto for
very short fathers. This is why linear regression is called “regression”: Galton saw that there is a
regression to the mean. This also explains the “sophomore slump” in sports.
some data, the coefficients are estimated to minimize the difference between the
model predictions and the observed values for the outcome variable. The value of the
coefficient characterizes the relationship between the outcome variable and the
predictor variable. For example, if a regression model predicts a man’s height given
the height of his father, the coefficient represents the change in a man’s height per
unit increase in the height of his father.
However, linear regression cannot model binary outcome variables like sex or
mortality because it is a continuous linear function. Logistic regression was developed
to address this limitation.
A historical dispute about measuring association among binary variables can provide
insight into this development.
In the twentieth century, statisticians George Udny Yule and Karl Pearson
proposed different ways to measure the degree of association between binary
variables. Yule, a former student of Pearson’s, proposed to measure association with
relative odds, while Pearson developed a measure that depends on positing an
underlying (latent) continuous variable to explain the observed binary data. The
differences in the measures of association revealed deep philosophical disagreements
between Yule and Pearson that I discuss in this section.
As Powers and Xie, (2008) have observed, a parallel dichotomy emerged from the
Yule/Pearson debate for the logit model. According to one interpretation of the logit
model, the outcome is just a log-odds. This interpretation, called the transformational
approach, resembles Yule’s approach to measuring association. According to another,
the outcome variable represents an unobserved variable that explains the behavior of
the manifest binary variable. This interpretation, called the latent variable approach,
resembles Pearson’s approach to measuring association. The second interpretation,
the one that appeals to an unobserved variable, provides an initially plausible (but
ultimately inadequate) justification for the use of the logit model for measurement in
the social sciences.
OR 1
Q : (4)
OR 1
Suppose we have a data set of 1,000 observations with two binary features: whether or
not one has a PhD in philosophy (philosophy PhD) and whether or not one believes in
the external world (external world belief).2 The data are organized in a table with four
cells, such that each combination of values is accounted for (see table 1).
2
These data are made up.
3
A distributional assumption mathematically characterizes the data-generating process that gives
rise to the sample data. Making a distributional assumption will constrain the extent to which one’s
model is determined by the observed data, because one’s data will contribute to parameter estimates
within a particular mathematical form (distribution). For example, if I assume that my data are normal,
my model will look like a bell curve no matter what. The data will determine the width and height of the
bell curve.
explains the observed frequencies, probabilities, and odds. For example, the latent
variable could be recidivism, the tendency of a criminal to reoffend.
It is generally accepted that this approach to the logit model is a descendant of
Pearson’s measure of association based on an underlying continuous variable (Agresti
2002; Breen, Karlson, and Holm 2018; Cox and Snell 1989; Powers and Xie 2008). A
continuous underlying variable Z is posited. At a particular threshold of Z, the
observed binary outcome changes from 1 to 0. If, for example, the observed variable is
state of matter and the underlying variable is temperature, at a threshold 32 F, the
observed variable changes because of the value of the underlying variable.
Mathematically, the latent variable approach to the logit model is expressed as
follows. Suppose that Z is a latent variable,
Zi β0 β1 xi1 . . . βj xij ɛi ; (9)
and
if Zi ≥ 0; Yi 1 (10)
if Zi < 0; Yi 0: (11)
The error term ɛi is assumed to follow a standard logistic distribution with a fixed
variance. Because the model is not considered a probability model in this case, it is
necessary to express the indeterminacy of the model in an error term. That the
variance of the error is fixed will play an important role in my subsequent argument
that the outcome of a logit model cannot be interpreted as a measurement of a latent
variable. From this distributional assumption, it follows that
logitPY 1 β0 β1 x1i . . . βj xji : (12)
Note that unlike the transformational approach, the latent variable approach includes
an error term. The outcome variable in the latent variable approach is not considered
to be fundamentally probabilistic; rather, it is taken to represent the value of a
random variable that is unobserved. Thus the uncertainty is not captured in the
outcome variable of the model. To capture the uncertainty inherent in a regression
model, then, an error term is tacked on. The assumption that the error term follows a
standard logistic distribution defines the relationship between the logit and the
unobserved random variable.
The central question of this article is, do statistical models yield measurements? In
the social sciences, statistical models are sometimes treated as measurement
instruments, meaning that the output of these models are measurements. The latent
variable approach to logistic regression is central to the use of statistical models as
measurement instruments in the social sciences. According to the latent variable
approach, underlying attributes can be represented by statistical models of observed
binary data. In fields that aim to study unobservables like psychological traits or
economic value, this interpretation of the logit model suggests a way to access the
attributes being studied (Everitt 1984; Reise, Ainseoth, and Haviland 2005).
Discrete choice theory in economics, for example, makes use of the latent variable
interpretation of the logit model to measure attributes of individuals that compel
them to make particular choices. Daniel McFadden (1986, 275), the founder of discrete
challenge to their respective position. Finally, I show that the logit model violates
comparability.
divided, whereas an interval scale cannot be. A ratio scale is a scale with a true zero,
such as length. The zero point on the scale indicates an actual absence of length. I can
compare length multiplicatively—it is meaningful to claim that the distance from my
house to the university is twice the distance from my house to the grocery store. This
holds true regardless of whether I measure the distance in meters or yards. Interval
scales have an arbitrary zero, such as temperature. Temperatures cannot be
compared multiplicatively—it is not meaningful to claim that it is twice as hot today
as yesterday, because this relation is not preserved across scales with different zero
points (e.g., Fahrenheit vs. Celsius). Ordinal scales indicate order but violate
additivity. Ratio and interval scales distribute numerals evenly, such that there is the
same distance between 1 and 2 and 3 and 4. Ordinal scales make no such guarantee as
to the distance between each numeral, and so additivity is violated.
The representation theorem preserves empirical relations present in the target
system: suppose the representation theorem maps the empirical relations of shorter
than () and longer than ( ) to the numerical relations < and > ; then, the
representation theorem must be such that if a b, then R is a representation theorem
only if R; a; b ! A < B. The empirical relations preserved over and above mere
ordering (like additivity) determine to which type of scale the empirical relations are
mapped.
Comparability is implied when empirical relations are mapped to any scale that
preserves order: ordinal, interval, or ratio.7 Note that comparability is not a formal
property of a scale—instead, it is a property of measurement outcomes that is
guaranteed by the successful mapping of empirical relations to an ordinal, interval, or
ratio scale.
Recall the nonmeasurement procedures described earlier—the “nearest physical
object” method of measuring height, for example. What goes wrong in this case is that
the “representation theorem” employed fails to preserve any empirical relations in
the mapping to numerical relations. Because two individuals who use this method will
have different units for measure (depending on the physical object nearest to them),
it is not guaranteed that given a b, R; a; b ! A < B.
What it is to measure is to map empirical relations onto an ordinal, interval, or
ratio scale. A measurement procedure performs that mapping while preserving the
empirical relations present in the quantity of interest. If what it is to measure is to
map empirical relations onto an ordinal, interval, or ratio scale while preserving the
empirical relations that are present in the quantity of interest, then comparability is
entailed by measurement.8 A lack of comparability in measurement outcomes, then,
indicates that measurement has not taken place.
7
Nominal scales do not preserve order. Although Stevens (1946) includes nominal scales in his theory
of measurement, nominal scales are not usually considered a type of measurement (Luce and Suppes
2002, 14). My comparability criterion rules out nominal scales for measurement, but this is
uncontroversial.
8
Recall that comparability is the preservation of empirical relations between quantities in the
numerical outcome of measurement.
tell us about the measured object in the world, but always with respect to our
background theories.
Like objectivity, comparability can be model relativized in the following way:
model-relative comparability is the preservation of empirical relations between
quantities in the numerical outcome of measurement given the relevant background
theories. So, even when measurements cannot be taken independently of theoretical
assumptions, measurements can be deemed comparable given the theoretical
assumptions that ground the possibility of measurement.
given years of education explains some of the population variation in salary by years
of education; that is, if the population were to be stratified by years of education,
there would be less variation in each stratum than in the population as a whole.
Because regression models are indeterminate, not all of the variation in the outcome
variable is explained by the predictor variables.
The variation in the outcome variable that is left unexplained by a model is called
the unexplained variance. Adding more predictor variables to the model will increase
the amount of explained variance and decrease the unexplained variance. The
unexplained variance is captured in the model by the error term ɛ. The error term
covers the difference between the outcome predicted by the model and the true
outcome for each observation. The error term has its own distribution that
characterizes how the model errors are distributed. Ideally, it should have a
distribution with a mean of 0 (indicating that the model predictions are not
systematically high or low) and a low variance (indicating that the model makes
accurate predictions).
model, regardless of whether this new predictor variable is correlated with the
predictor variables already in the model (Mood 2010). This is highly unintuitive.
To appreciate how surprising this property is, consider a linear regression model
that predicts student performance on an exam by hours of studying. A coefficient is
estimated and interpreted to mean that, on average, for every additional hour of
studying, there is a 5-point increase in exam score. Now, imagine that we add a
predictor to our model: hours of sleep the night before the exam. If this new predictor
is uncorrelated with hours of studying, we don’t expect the relationship between
hours of studying and exam score to change, because the new predictor doesn’t
“steal” any of the explained variance from our old predictor. If, on the other hand, the
new predictor is correlated with hours of studying, then we expect the strength of the
relationship between hours of studying and exam score to decrease. Equivalently, the
coefficient estimate will decrease. Some of the variance that was explained by the old
predictor in the old model is taken by the new predictor in the new model. For linear
regression, any change in the coefficient estimates upon the addition of a new
variable to the model can be interpreted as reflecting correlation between the
predictor variables.
The logit model has a fixed error variance. Although adding new uncorrelated
predictors to the model will increase the amount of explained variance, it cannot be
compensated for by a smaller error variance, which must remain π2 =3. Instead, all
model terms are scaled differently, meaning that the coefficients for predictor
variables already in the model are altered.
To understand the effect, consider the following abstracted example. Imagine that
we are interested in modeling a variable Y using a predictor variable X1 and that the
total amount of variance in our data of Y is 10. First, suppose our model is a linear
regression. The regression splits up the variance in Y such that 5 is explained by the
predictor variable X1 and 5 remains unexplained, meaning the variance of the errors
is 5.
Now, suppose another predictor, X2 , is added to the model. If X2 is uncorrelated
with X1 , then the variance of Y could be split in the following way: X1 still explains 5,
X2 explains 3, and 2 remains unexplained, meaning that the variance of the errors in
this model is 2. If X2 is correlated with X1 , then the variance of Y could be split in the
following way: X1 now explains 4, X2 explains 4, and 2 remains unexplained, meaning
that the variance of the errors in this model is 2. Observe that when the predictor
variables were correlated in the abstracted case, the second predictor variable “stole”
explained variance from the first predictor variable.
Now, let’s imagine that our model is a logistic regression with a fixed error
variance. Whatever the actual amount of unexplained variance is, it is called π2 =3 (or
approximately 3.29) in our model. Now, imagine that an uncorrelated predictor
variable is added to the model. Although this decreases the amount of unexplained
variance, it is still called π2 =3 in our model. That means that the unit has changed,
because a different amount of variance is labeled with the same numerical value. A
new unit is fixed for this model whenever the amount of unexplained variance
changes. The coefficients for the model are then estimated using this different unit,
meaning that although the actual amount of variance explained by a predictor is
unchanged, the coefficient estimate will change. This is why, even when new
predictors are added to the model that are uncorrelated with predictors already in the
model, the coefficient estimates will change.
Coefficients cannot be treated as straightforward effects. They reflect not only the
effect of the predictor variable on the outcome variable but also some unknown scale
quantity. Whenever the amount of unexplained variance in the model changes, the
scale quantity changes too. Coefficients across models with different sets of predictor
variables cannot be compared, because the amount of unexplained variance is
different across models with different sets of predictors (Karlson, Holm, and Breen
2012). So, if two logit models are constructed to target the same latent trait (such as
well-being) with different predictor variables, the models will produce results given
in different units.
Similarly, outcomes of models with the same predictor variables on different
samples cannot be compared, because the variance of the outcome variable is
guaranteed to be different from sample to sample (Breen, Karlson, and Holm 2018). So,
if one logit model is used to target a latent trait on two different samples (perhaps two
individuals), the model will produce results that are given in different units. The
relation between the units is unknown, because the true amount of unexplained
variance is unknown. This is analogous to the “nearest physical object” method of
measuring height—the procedure fails to preserve the empirical relations in the
mapping to numerical relations. Logit models are single use: they provide information
about the relative importance of predictor variables only within a single model
applied to a single sample.
Previously, I suggested that comparability is a necessary condition of measure-
ment. If the empirical relations between quantities are not preserved in the numerical
outcome of a procedure (relative to relevant background theories), then it is not a
measurement. I have shown that outcomes of the logit model are given in different
units with an unknown relationship, and as such, the empirical relationship between
different outcomes is not preserved by the procedure. The logit model produces
incomparable outcomes and thus is not a measurement instrument.10
Statisticians have periodically highlighted the properties of logistic regression that
I have described, although the consequences of these properties for the use of logistic
regression as a measurement instrument in the social sciences have not been
considered (Gail, Wieand, and Piantadosi 1984). Mood wrote in 2010 that “these
insights have not penetrated our research practice and the inter-relations between
the problems have not been fully appreciated. Moreover, these problems are ignored
or even misreported in commonly used methodology books” (68). Nearly ten years
later, in 2018, Breen, Karlson, and Holm made the same observation: “empirical
10
It could be replied that although the coefficients are scaled by an unknown factor, the values of the
coefficients within a single model can still be used to assess the relative importance of coefficients in the
model in predicting the outcome.
It is correct that the logit model coefficient estimates could impose an ordering of importance over the
predictor variable included in the model. This would constitute an ordinal scale, but not an ordinal scale
over the attribute of interest. In using logistic regression as a measure of a latent variable, the
practitioner is interested in creating a scale of the latent variable, not over the different predictor
variables included in the model.
The following reply could also be made: a priori consideration of the logit model form is unnecessary
given methods to establish validity and reliability of psychometric scales. I leave this for future work.
5. Conclusion
Do statistical models yield measurements? I’ve argued in this article that the logit
model, a particular statistical model, does not yield measurements of latent traits
despite how it is used in the social sciences.11 Comparability, I argued, is a constraint
on what counts as a measurement instrument that is not satisfied by the logit model.
This result depends on the particular properties of the logit model and thus doesn’t
generalize to all statistical models.
In this article, I hope to have demonstrated the following metaphilosophical
principle: statistical models should be evaluated individually to determine the
meaning of their outputs. Social scientists have developed applied methods to cope
with the circumstances in their fields, and in doing so, they have neglected the task of
establishing that the instruments in use preserve the empirical relations of interest in
the mapping to numerical relations.
Acknowledgements. Thanks to Sinan Dogramaci, Sahotra Sarkar, Christian Hennig, Jim Hankinson,
and the organizers and participants of the 2023 Measuring the Human Workshop for comments and
discussion. Thanks especially to the reviewers for their excellent suggestions that improved the article.
References
Agresti, A. 2002. Categorical Data Analysis (2nd ed.). Hoboken, NJ: Wiley-Interscience. https://doi.org/10.
1002/0471249688.
Anderson, C. J. 2009. “Categorical Data Analysis with a Psychometric Twist.” In The SAGE Handbook of
Quantitative Methods in Psychology, edited by R. Millsap and A. Maydeu-Olivares, 311–36. Thousand Oaks,
CA: SAGE. https://doi.org/10.4135/9780857020994.n14.
Borgstede, M., and F. Eggert. 2023. “Squaring the Circle: From Latent Variables to Theory-Based
Measurement.” Theory and Psychology 33 (1):118–37. https://doi.org/10.1177/09593543221127985.
Borsboom, D., G. J. Mellenbergh, and J. Van Heerden. 2003. “The Theoretical Status of Latent Variables.”
Psychological Review 110 (2):203–19. https://doi.org/10.1037/0033-295X.110.2.203.
Breen, R., K. B. Karlson, and A. Holm. 2018. “Interpreting and Understanding Logits, Probits, and Other
Nonlinear Probability Models.” Annual Review of Sociology 44 (1):39–54. https://doi.org/10.1146/
annurev-soc-073117-041429.
Chang, H. 2004. Inventing Temperature: Measurement and Scientific Progress. Oxford: Oxford University Press.
https://doi.org/10.1093/0195171276.001.0001.
Cox, D. R., and E. J. Snell. 1989. Analysis of Binary Data (2nd ed.). New York: Chapman and Hall.
Everitt, B. S. 1984. An Introduction to Latent Variable Models. Dordrecht, Netherlands: Springer. https://doi.
org/10.1007/978-94-009-5564-6.
11
An anonymous reviewer pointed out to me that it may be possible to conceive of the logit model as
yielding measurements of a probability, even if it is not yielding measurements of a latent variable. I will
flag here that it may be possible to argue that the logit model measures epistemic probability, where
epistemic probability is fundamentally conditional on the available evidence and thus comparability
across models with different evidence is not required. Lacking the space to develop this idea here, I will
leave such a possibility for future work.
Gail, M. H., S. Wieand, and S. Piantadosi. 1984. “Biased Estimates of Treatment Effect in Randomized
Experiments with Nonlinear Regressions and Omitted Covariates.” Biometrika 71 (3):431–44. https://
doi.org/10.2307/2336553.
Heilmann, C. 2015. “A New Interpretation of the Representational Theory of Measurement.” Philosophy of
Science 82 (5):787–97. https://doi.org/10.1086/683280.
Hood, S. B. 2013. “Psychological Measurement and Methodological Realism.” Erkenntnis 78 (4):739–61.
https://doi.org/10.1007/s10670-013-9502-z.
Karlson, K. B., A. Holm, and R. Breen. 2012. “Comparing Regression Coefficients Between Same-Sample
Nested Models Using Logit and Probit: A New Method.” Sociological Methodology 42 (1):286–313. https://
doi.org/10.1177/0081175012444861.
Krantz, David, Patrick Suppes, and Robert D. Luce. 1989. Foundations of Measurement: Geometrical, Threshold,
and Probabilistic Representations. New York: Academic Press.
Luce, R. D., D. H. Krantz, Patrick Suppes, and A. Tversky. 1990. Foundations of Measurement: Representation,
Axiomatization, and Invariance. New York: Academic Press. https://doi.org/10.1016/B978-0-12-425403-9.
50010-2.
Luce, Robert D., and Patrick Suppes. 2002. “Representational Measurement Theory.” In Stevens’ Handbook
of Experimental Psychology: Methodology in Experimental Psychology, vol. 4, 3rd ed., edited by D. Luce and R.
Bush, 1–41. Hoboken, NJ: John Wiley. https://doi.org/10.1002/0471214426.pas0401.
McFadden, D. 1986. “The Choice Theory Approach to Market Research.” Marketing Science 5 (4):275–97.
https://doi.org/10.1287/mksc.5.4.275.
Michell, J. 1993. “The Origins of the Representational Theory of Measurement: Helmholtz, Hölder, and
Russell.” Studies in History and Philosophy of Science, Part A 24 (2):185–206. https://doi.org/10.1016/0039-
3681(93)90045-L.
Michell, J. 1997. “Quantitative Science and the Definition of Measurement in Psychology.” British Journal of
Psychology 88 (3):355–83. https://doi.org/10.1111/j.2044-8295.1997.tb02641.x.
Michell, J. 1999. Measurement in Psychology: A Critical History of a Methodological Concept. Cambridge:
Cambridge University Press. https://doi.org/10.1017/CBO9780511490040.
Michell, J. 2005. “The Logic of Measurement: A Realist Overview.” Measurement 38 (4):285–94. https://doi.
org/10.1016/j.measurement.2005.09.004.
Michell, J. 2021. “Representational Measurement Theory: Is Its Number Up?” Theory and Psychology
31 (1):3–23. https://doi.org/10.1177/0959354320930817.
Mood, C. 2010. “Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do
about It.” European Sociological Review 26 (1):67–82. https://doi.org/10.1093/esr/jcp006.
Pearson, K. 1904. On the Theory of Contingency and Its Relation to Association and Normal Correlation. Biometric
Series. London: Dulau.
Pearson, K., and D. Heron. 1913. “On Theories of Association.” Biometrika 9 (1–2):159–315.
Powers, D., and Y. Xie. 2008. Statistical Methods for Categorical Data Analysis. New York: Emerald. https://doi.
org/10.1111/j.1751-5823.2010.00118_3.x.
Rasch, G. 1960. Studies in Mathematical Psychology: I. Probabilistic Models for Some Intelligence and Attainment
Tests. London: Nielsen and Lydiche.
Reise, S., A. Ainseoth, and M. Haviland. 2005. “Item Response Theory: Fundamentals, Applications, and
Promise in Psychological Research.” Current Directions in Psychological Science 14 (2):95–101. https://doi.
org/10.1111/j.0963-7214.2005.00342.x.
Stevens, S. S. 1946. “On the Theory of Scales of Measurement.” Science 103 (2684):677–80. https://doi.org/
10.1126/science.103.2684.677.
Suppes, P., and J. Zinnes. 1963. “Basic Measurement Theory.” In Handbook of Mathematical Psychology, vol.
1, edited by D. Luce and R. Bush, 1–76. Hoboken, NJ: John Wiley. https://doi.org/10.2307/2270274.
Tal, E. 2012. “The Epistemology of Measurement: A Model-Based Account.” PhD diss., University of
Toronto.
Tal, E. 2019. “Individuating Qualities.” Philosophical Studies 176 (4):853–78. https://doi.org/10.1111/phc3.
12089.
Tal, E. 2021. “Two Myths of Representational Measurement.” Perspectives on Science 29 (6):701–41. https://
doi.org/10.1162/posc_a_00391.
Thalos, M. 2023. “The Logic of Measurement: A Defense of Foundationalist Empiricism.” Episteme. https://
doi.org/10.1017/epi.2023.32.
van Fraassen, B. 2012. “Modeling and Measurement: The Criterion of Empirical Grounding.” Philosophy of
Science 79 (5):773–84. https://doi.org/10.1086/667847.
Yule, G. U. 1900. “On the Association of Attributes in Statistics, with Examples from the Material of the
Childhood Society.” Philosophical Transactions of the Royal Society of London, Series A 194:257–319. https://
doi.org/10.1098/rsta.1900.0019.
Yule, G. U. 1912. “On the Methods of Measuring Association between Two Attributes.” Journal of the Royal
Statistical Society 75 (6):579–652. https://doi.org/10.1111/j.2397-2335.1912.tb00463.x.
Cite this article: Fillmore-Patrick, Stella. 2025. “The Logit Model Measurement Problem.” Philosophy of
Science 92 (2):285–303. https://doi.org/10.1017/psa.2024.25