[go: up one dir, main page]

0% found this document useful (0 votes)
11 views11 pages

CLinical Prediction Using Multinomial Logistic Regression

Uploaded by

prabsuddaraju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views11 pages

CLinical Prediction Using Multinomial Logistic Regression

Uploaded by

prabsuddaraju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Journal of Clinical Epidemiology 174 (2024) 111481

ORIGINAL RESEARCH

How to develop, validate, and update clinical prediction models using


multinomial logistic regression
Celina K. Gehringera,b,*, Glen P. Martinc, Ben Van Calsterd,e, Kimme L. Hyricha,f,
Suzanne M.M. Verstappena,f, Jamie C. Sergeanta,b
a
Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, Division of Musculoskeletal and Dermatological Sciences, University of
Manchester, Manchester, UK
b
Centre for Biostatistics, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
c
Division of Informatics, Imaging and Data Sciences, Centre for Health Informatics, University of Manchester, Manchester, UK
d
Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
e
Department of Development & Regeneration, KU Leuven, Leuven, Belgium
f
NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK
Accepted 19 July 2024; Published online 25 July 2024

Abstract
Objectives: Multicategory prediction models (MPMs) can be used in health care when the primary outcome of interest has more than
two categories. The application of MPMs is scarce, possibly due to added methodological complexities compared to binary outcome
models. We provide a guide of how to develop, validate, and update clinical prediction models based on multinomial logistic regression.
Study Design and Setting: We present guidance and recommendations based on recent methodological literature, illustrated by a pre-
viously developed and validated MPM for treatment outcomes in rheumatoid arthritis. Prediction models using multinomial logistic regres-
sion can be developed for nominal outcomes, but also for ordinal outcomes. This article is intended to supplement existing general guidance
on prediction model research.
Results: This guide is split into three parts: 1) outcome definition and variable selection, 2) model development, and 3) model evalu-
ation (including performance assessment, internal and external validation, and model recalibration). We outline how to evaluate and inter-
pret the predictive performance of MPMs. R code is provided.
Conclusion: We recommend the application of MPMs in clinical settings where the prediction of a multicategory outcome is of inter-
est. Future methodological research could focus on MPM-specific considerations for variable selection and sample size criteria for external
validation. Ó 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://
creativecommons.org/licenses/by/4.0/).

Keywords: Clinical prediction model; Prognosis; Multinomial logistic regression; Calibration; Sample size; Validation; Multicategory; Prediction

1. Introduction health care, there are many instances where the outcome
of interest is polytomous, meaning that it has more than
Clinical prediction models (CPMs) use patient charac-
two mutually exclusive categories, which can either be
teristics to estimate an individual’s risk of having (diag- ordinal (ordered) or nominal (unordered). For example, it
nostic models) or developing (prognostic models) an
is of interest to diagnose ovarian tumors as benign, border-
outcome of interest at a particular time point [1]. Across
line, stage I invasive, advanced stage invasive, or secondary
metastatic, rather than as simply benign vs malignant [2]
which acknowledges the heterogeneity between malignant
Funding: This work was funded by Versus Arthritis (grant number
tumors [3,4]. In machine learning literature, such outcomes
21755). Kimme L. Hyrich is supported by the National Institute for Health
and Care Research (NIHR) Manchester Biomedical Research Centre. are referred to as multiclass or multicategory [5e11]. There
* Corresponding author. Centre for Epidemiology Versus Arthritis, are a range of methods that could be used to model polyt-
University of Manchester, Stopford Building, Oxford Road, M13 9PG, omous outcomes [12], with multinomial logistic regression
Manchester, UK. (MLR) being a common statistical approach for developing
E-mail address: celina.gehringer@postgrad.manchester.ac.uk (C.K.
CPMs [12e14]. Multicategory prediction models (MPMs)
Gehringer).

https://doi.org/10.1016/j.jclinepi.2024.111481
0895-4356/Ó 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/
4.0/).
2 C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481

[29,30], the assessment of discrimination [31] and calibra-


What is new? tion [32], and guidance on validation and updating [33].
However, it can be challenging to navigate this methodo-
Key findings logical literature. We are aware of one paper that provides
 This study provides a how-to guide for developing, guidance regarding measures for discrimination perfor-
validating, and updating clinical prediction models mance [34]. Therefore, the aim of this article is to provide
using multinomial logistic regression, using the more extended and up-to-date guidance for statisticians, ep-
prediction of methotrexate treatment outcomes in idemiologists, and their collaborators on how to develop,
terms of disease activity and the discontinuation externally validate, and update MPMs.
due to adverse events in patients with rheumatoid
arthritis as a case study.
2. Clinical example
What this adds to what is known?
 Although multinominal logistic regression models In a previous publication [35], we reported the develop-
are recommended for predicting polytomous out- ment and validation of an MPM for methotrexate (MTX)
comes, their application is relatively scarce. We treatment outcomes in rheumatoid arthritis (RA). We now
hope that the practical guidance presented in this use this example to illustrate concepts in the current paper.
article helps other researchers use this approach. RA is a heterogenous autoimmune condition that causes
swelling and stiffness in the joints of the hands and feet
What is the implication and what should change [36]. Pharmacological management of RA should be
now? commenced as soon as diagnosis is made, with MTX being
 Multinomial logistic regression models should be the recommended first-line therapy [37,38]. However,
considered more often in prognosis and diagnosis response to MTX is not universal, with one study reporting
research. that 43% of patients do not respond to treatment by
6 months [39]. Discontinuation of treatment due to adverse
 Existing prediction model methodologies, such as
events (AEs, ie, gastrointestinal events [40]) is also com-
a sample size calculation for external validation
mon, occurring in 20%e30% of patients in the first year
and fractional polynomials, could be extended to
of taking MTX ([41e43]). Identification of patients that
suit multinomial prediction models.
are at high risk of nonresponse or discontinuation due to
AEs is therefore of clinical importance so that disease con-
trol can be expedited using alternative treatments. We pre-
viously developed an MPM [35] to estimate an individual’s
refer to any risk prediction model for polytomous out- risk of 1) not achieving the defined state of low disease ac-
comes, and this article focuses exclusively on guidance tivity (LDA) [44] at 6 months, or 2) achieving LDA at
for MPMs developed using MLR. MLR is typically used 6 months, or 3) discontinuing due to AEs within 6 months
for nominal outcomes, but has also been recommended of commencing MTX. The predictor variables included in
for ordinal outcomes when the aim is to develop a predic- this model were age, sex, rheumatoid factor, 28-joint dis-
tion model [12]. Note that only statistical modeling of MLR ease activity score, and the Health Assessment Question-
is considered in this paper and not machine learning. An naire Disability Index. This complementary paper aims to
overview of alternative methods for modeling polytomous provide a broader view of translating the methodological
outcomes is provided in articles by Edlinger et al advances surrounding MPMs into practical guidance for
[12](ordinal outcomes), Pate et al [15](nominal outcomes), applied researchers. To help maximize the impact of this
van Calster et al [9] and Kruppa et al [16,17](machine guidance, we encourage readers to also consult the previous
learning). paper [35] with details of the methodological approaches
MPMs have been applied to several clinical settings used in the RA case study.
[2,18e20]; however, the adoption of this approach is still
relatively scarce [13]. While there exist several guidance
documents to advise researchers on how to develop, vali- 3. How to develop, validate, and update a clinical
date and update CPMs for continuous, binary, and time-
prediction model using MLR
to-event outcomes [21e28], there is a lack of such re-
sources for MPMs. This may have contributed to re- This section will provide a guide of how to develop,
searchers preferring to dichotomize their outcome and validate, and update MPMs, with computational guidance
overlooking alternative modeling strategies that allow provided in Section 3.4. Note that the development and
modeling of the nominal outcome directly. Indeed, there validation of any CPM, multinomial or otherwise, should
has been a range of emerging methodological research follow best practice guidelines [45] and be reported using
around MPMs, including sample size considerations the Transparent Reporting for a multivariable prediction
C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481 3

model for Individual Prognosis Or Diagnosis checklist categories are known to have different predictors, separately
[45,46]. In this article, we focus solely on the additional for each of the dichotomous submodels (or outcome pair,
considerations for MPMs based on MLR. All analyses were Section 3.2.1) [2]. The final set of variables for the MPM
carried out in R [47] (version 4.1.2) and code for the clin- would then consist of the variables that were selected in
ical example is provided on GitHub [48]. any of the submodels [2]. We note that, in general, data-
driven variable selection should be avoided where possible,
3.1. Outcome definition and variable selection and that it is currently difficult to provide specific guidance
as variable selection for MPMs is relatively under-
3.1.1. Outcome definition researched.
Before model development, the outcome of an MPM
should be carefully defined. A key consideration is whether 3.2. Model development
knowing the risks of each different outcome category
would help inform different clinical management options; 3.2.1. Multinomial prediction model equation
if so, then the outcome categories should be kept separate Unlike a binary logistic regression model, there are mul-
and modeled directly. For example, in the diagnostic tiple submodels (or outcome pairs) within an MPM devel-
assessment of ovarian tumors as benign, borderline, stage oped using MLR. For an outcome with k categories, the
I invasive, advanced stage invasive, or secondary metastatic MPM has k-1 submodels, which compare each outcome
[2] different tumor subtypes are often managed differently. category (Y 5 2, Y 5 3.Y 5 k) to the chosen reference
In a prognostic setting, a multinomial outcome could also category (typically Y 5 1). Box 1 presents the general
incorporate potential competing risks to the primary equation for calculating the linear predictors (LPs) for a
outcome, if survival competing risks methods [49,50] are threecategory outcome, as well as applying this to the RA
not feasible or necessary. This was the case in our RA example, where five predictor variables were included.
example, where continuous-time competing risks ap-
proaches were inappropriate in what was essentially a 3.2.2. Sample size calculation
discrete-time setting. Sample size guidance for developing CPMs for contin-
In an MPM, a reference outcome category will need to uous, binary and time-to-event models has been developed
be defined and the choice of this reference category de- in the past few years [56e58], with extensions for the devel-
pends on the clinical area. The choice of reference category opment of an MPM recently proposed [29]. Crucially, these
is important for the interpretation of the regression coeffi- sample size calculations help to minimize the level of over-
cients, but does not affect the predicted probabilities of fitting of the CPM, and they ensure that there is sufficient
the model [51]. We recommend that patients and clinicians sample size to precisely estimate key model parameters
are included in the process to define the reference category. (such as the model intercept). A calculation should be per-
formed before the development of an MPM, to determine
3.1.2. Candidate predictor variables the maximum number of predictor parameters relative to
Candidate predictor variables (ie, those for consideration the number of participants, outcome prevalence and ex-
in the model before any data-driven variable selection) can pected predictive performance. The total sample size
be chosen based on predictive relationships reported in the required for an MPM depends on the number of outcome cat-
literature, clinical expertise, availability in clinical care, egories, and more outcome categories require larger sample
measurement heterogeneity [52], and causality (especially sizes [29]. The calculation requires the number of events for
for prognostic outcomes) [53]. These should be considered each outcome category, the number of candidate predictor
at the protocol stage and before the study has begun. When parameters, the targeted level of shrinkage, and either the ex-
using an existing dataset, the number of candidate predic- pected concordance statistic (c-statistic) or the expected
tors included during model development should be adjusted R2 for each submodel. Alternatively, if data for
informed by the available sample size, relative to the min- model development are collected prospectively, a learning
imum required sample size [29](more in Section 3.2.2). It curve approach [59] can be used, potentially in addition to
may be the case that developing a model is not feasible an a priori calculation. See Pate et al [29] for MPM sample
given the sample size, as it does not allow inclusion of size R code and a step-by-step calculation for our clinical
known important predictors. To produce a parsimonious example can be found in the Supplementary material.
model, data-driven methods can be implemented, for which
previously outlined principles [54,55] would also apply for 3.2.3. Missing data
MPMs. These include selecting variables based on signifi- The proportion of missing data in predictor variables
cance level (ie, P value), information criteria (ie, Akaike In- should be reported, per variable and overall [60e62]. The
formation Criterion, Bayesian Information Criterion), and recommended strategy for handling missing data during
penalized likelihood (ie, LASSO) (more in Section 3.2.4). model development depends on whether missingness will
Selecting variables for an MPM in a data-driven way can be present at deployment (ie, prediction time), as previ-
be considered for the model overall or, if the outcome ously proposed by Sisk et al [63]. When no missing data
4 C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481

Box 1 How to calculate LPs and predicted probabilities of a multinomial prediction model (using our
multinomial model with 3 outcomes and 5 covariates as an illustrative example). LP, linear predictor;
LDA, low-disease activity; AEs, adverse events; DAS28, disease activity score based on 28-joints; RF,
rheumatoid factor; HAQ, Health Assessment Questionnaire.
Each submodel has its own LP, with a model intercept b0;i , predictor variables x1 . xp and corresponding regression
coefficients b1.p :
 
PðY 5 2Þ
LP1 5 log 5 b0;1 þ b1;1 x1 þ b2;1 x2 þ .bp;1 xp
PðY 5 1Þ
 
PðY 5 3Þ
LP2 5 log 5 b0;2 þ b1;2 x1 þ b2;2 x2 þ .bp;2 xp
PðY 5 1Þ
The LPs can then be used to calculate predicted probabilities for each outcome category:
1
PðY 5 1Þ 5
1 þ exp ðLP1 Þ þ exp ðLP2 Þ

exp ðLP1 Þ
PðY 5 2Þ 5
1 þ exp ðLP1 Þ þ exp ðLP2 Þ

exp ðLP2 Þ
PðY 5 3Þ 5
1 þ exp ðLP1 Þ þ exp ðLP2 Þ
In our clinical example, where no LDA is the reference category, the model is written as:
 
PðLDAÞ
LP1 5 log 5 b0;1 þ b1;1 Age þ b2;1 DAS28 þ b3;1 RF þ b4;1 sex þ b5;1 HAQ
Pðno LDAÞ
 
PðAEsÞ
LP2 5 log 5 b0;2 þ b1;2 Age þ b2;2 DAS28 þ b3;2 RF þ b4;2 sex þ b5;2 HAQ
Pðno LDAÞ

are anticipated at model deployment, multiple imputation done in CPMs using restricted cubic splines (RCS) or frac-
(including the outcome), or possibly single imputation tional polynomials (FP). While some R packages enable the
methods such as regression imputation (omitting the use of RCS with MLR [70,71], current R packages for FPs
outcome), is recommended for handling missing data dur- [72] do not include this functionality for MPMs. In our RA
ing model development [63]. multinomial prediction study, we manually considered FP-
type powers of continuous variables and compared the
3.2.4. Model fitting Akaike Information Criterion between the different formu-
To develop a multinomial model, an MLR is fitted using lations (described in Supplementary material). We did not
maximum likelihood estimation (Box 1). The model could identify any important nonlinear relationships. Alterna-
also be fit using penalized regression approaches [30], such tively, the FP [72] or RCS [70] function could be applied
as LASSO, ridge methods, or Firth’s correction [64,65]. to each binary submodel but more research, and imple-
These may help limit overfitting when coupled with appro- mentable code, is needed in this area.
priate sample sizes [66]. However, the shrinkage parameter
itself requires estimation [65,67] so ideally the sample size 3.3. Model evaluation
should be sufficiently large to estimate the shrinkage
parameter or large enough to not need shrinkage. This section will outline performance assessment of
MPMs, internal and external validation, and model
3.2.5. Nonlinearity recalibration.
Continuous predictors should not be categorized during
model fitting, to avoid loss of information [68,69]. Instead, 3.3.1. Performance assessment
nonlinearity in the relationship between continuous predic- 3.3.1.1. Calibration. Calibration quantifies the agreement
tors and the outcome should be assessed. This is typically between predicted risks and observed proportions [73].
C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481 5

This is an important performance measure for any CPM as no LDA) was 1.01 [95% confidence interval: 0.87, 1.14]
it estimates accuracy of risk predictions. Calibration of an in the development data, which decreased to 0.78 [0.64,
MPM can be assessed following the hierarchy of risk cali- 0.93] upon external validation. A decrease in c-slope upon
bration, as previously outlined [74]. external validation could be linked to differences in under-
To assess mean calibration, the average predicted risk lying patient populations (case-mix) between the develop-
for each outcome is compared with the overall event rate ment and validation data [74]. The c-intercept for
[73], with overestimation occurring when the average pre- Submodel 1 was 0.00 [0.11, 0.11] in the development
dicted risk is higher than the overall outcome prevalence. data and 0.53 [0.41, 0.65] in the external data, where the
For MPMs, this can be assessed by simply taking the difference could be linked to the difference in outcome
average predicted risk for each outcome category and prevalence between development and validation datasets
comparing this to the observed prevalence for that outcome. (more in Section 3.3.3.2.).
One can then calculate the observed/expected ratio; the to- The c-slope and c-intercept can also be approximated by
tal number of observed outcome events, divided by the total obtaining them for each outcome category as if it were bi-
number of predicted events [57]. A ratio !1 suggests an nary [12]. This simplification yields results that do not
overestimation of risk, a ratio O1 suggests depend on reference category.
underestimation. Moderate calibration is assessed using calibration plots.
Weak calibration is assessed using the calibration slope For an MPM, the calibration plots show scatter of the multi-
(c-slope) and calibration intercept (c-intercept) calculated dimensional relationship between predicted risks and
using the multinomial calibration framework [32], as done observed proportions for each outcome category, rather
in our clinical example. This quantifies whether, on than a one-on-one relationship as for a binary outcome
average, each submodel in the MPM overestimates or un- [32]. This displays whether there is any overestimation or
derestimates risk and does not give risks too extreme or underestimation of risk. Calibration plots can be generated
modest. A c-slope !1 indicates that risk estimates are using vector spline smoothing [32,77,78]. We present the
too extreme, and the model may be overfitted, and a c-inter- calibration plot at external validation of our case study in
cept O0 indicates that the model is underestimating risks Figure.
[73]. C-slopes depend on the choice of reference category
although this can lead to slightly different results, similar 3.3.1.2. Discrimination. Discrimination is the ability of a
conclusions can be drawn [32]. The c-slope is particularly CPM to distinguish between patients with different
important at model development, as it can be used to shrink outcome categories. For MPMs, discrimination can be
model coefficients [75,76] (once adjusted for in-sample measured using the Polytomous Discrimination Index
optimism). At external validation, the intercept and slope (PDI), an extension of the c-statistic to obtain simultaneous
provide a general summary of potential problems with risk discrimination between all outcome categories [31]. The
calibration [73,74]. lower limit of the PDI, which indicates random discrimina-
In our clinical example, the optimism-adjusted c-slope tion, is calculated by 1/k (where k reflects the number of
for the disease activity outcomes (Submodel 1: LDA vs outcome categories) [31]. In our example, the PDI was

Figure. Multinomial calibration plot, using the external performance of our multinomial model for outcomes of methotrexate therapy as an example.
LDA, low disease activity; AEs, adverse events. (For interpretation of the references to color in this figure legend, the reader is referred to the Web
version of this article.)
6 C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481

0.49 in the development setting (lower limit 5 0.33). This submodels), the easy-to-use conditional risk method is rec-
indicates that there is an estimated 49% chance to correctly ommended [34].
identify the patient with a randomly selected outcome cate-
gory from set of three patients with a different outcome 3.3.1.3. Net benefit. The clinical utility of a CPM can be
(one with no LDA, one with LDA, and one with assessed using net benefit and decision curve analysis
discontinuation). [79,80], and interpreted following the existing step-by-
To further investigate discrimination, it may be of inter- step guide [81]. Further research is needed to extend these
est to quantify discrimination of each submodel. For metrics for MPMs.
MPMs, c-statistics between pairs of categories can be
calculated, with values of 1 and 0.5 indicating perfect and 3.3.2. Internal validation
random discrimination, respectively. In our example, Sub- Following the fitting of the MPM, the model’s predictive
model 1 (LDA vs no LDA) had a pairwise c-statistic of performance should be quantified through appropriate in-
0.72 [0.70, 0.75] in the development setting and 0.68 ternal validation techniques to adjust for in-sample opti-
[0.65, 0.71] in external validation. To obtain pairwise c-sta- mism. Bootstrapping is the recommended approach as it
tistics that do not involve the reference category (as in the resamples with replacement, making all data available for

Box 2 How to recalibrate (update the slope and intercept) of a multinomial prediction model, using a model with
three outcome categories (Y).
The LPs of the original multinomial model are denoted by:
 
PðY 5 2Þ
log 5 b0;1 þ b1;1 x1 þ b2;1 x2 þ .bp;1 xp 5 LP1
PðY 5 1Þ
 
PðY 5 3Þ
log 5 b0;2 þ b1;2 x1 þ b2;2 x2 þ .bp;2 xp 5 LP2
PðY 5 1Þ
Where Y 5 2 vs Y 5 1 and Y 5 3 vs Y 5 1 represent the two submodels in the multinomial model (submodel 1 and 2,
respectively). To update this model in the external dataset, the outcome is regressed onto the existing LPs to obtain the
following output:
 
PðY 5 2Þ
log 5 a0;1 þ a1;1 LP1 þ a2;1 LP2
PðY 5 1Þ
 
PðY 5 3Þ
log 5 a0;2 þ a1;2 LP1 þ a2;2 LP2
PðY 5 1Þ
To recalibrate the model in terms of updating the slope and intercept, the alphas in the equation above are used to
estimate the new beta coefficients of the recalibrated model. For the submodel 1 this can be written as:
 
PðY 5 2Þ 
5 a0;1 þ a1;1 b0;1 þ b1;1 x1 þ b2;1 x2 þ .bp;1 xp
log PðY 5 1Þ 
þa2;1 b0;2 þ b1;2 x1 þ b2;2 x2 þ .bp;2 xp
And similarly, for submodel 2. Therefore, the updated coefficient for ‘age’ in submodel 1 is equal to:
ða1;1  b1;1 Þ þ ða2;1  b1;2 Þ 5 g1;1
This is repeated for each covariate in each submodel. The new/updated intercept for submodel 1 would be:
a0;1 þ ða1;1  b0;1 Þ þ ða2;1  b0;2 Þ 5 g0;1
The recalibrated model can be written as:
 
PðY 5 2Þ
log 5 g0;1 þ g1;1 x1 þ g2;1 x2 þ .gp;1 xp
PðY 5 1Þ
 
PðY 5 3Þ
log 5 g0;2 þ g1;2 x1 þ g2;2 x2 þ .gp;2 xp
PðY 5 1Þ
C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481 7

model development (contrary to splitting data into develop- recommend that future research considers specific sample
ment/validation portions). The process for bootstrap inter- size criteria for externally validating an MPM.
nal validation of an MPM is the same as for other types
of CPMs; our internal validation procedure for the RA 3.3.3.2. External validation analysis. Summaries of base-
model is reported in the complementary paper [35]. One line characteristics and outcome prevalence [82,83] should
may also consider cross-validation for the internal valida- be produced to compare case-mix between development
tion process. and validation settings (as per Transparent Reporting for
a multivariable prediction model for Individual Prognosis
Or Diagnosis). It is important to understand heterogeneity
3.3.3. External validation between settings as this impacts model performance at
This section describes the external validation of an external validation. In our clinical example, patients in
MPM. The concepts apply to any method, MLR or other- the validation data had greater disease burden, with a higher
wise, that generates risk estimates for a nominal clinical disease activity score (median 4.6 [IQR: 3.6e5.4])
outcome. compared to the development data (4.2 [3.3e5.2]) and a
higher disability index (1.6 [1.1e2.0] vs. 1.0 [0.5e1.6]).
3.3.3.1. Sample size calculation. Before the external vali- The outcome prevalence for no LDA, LDA, and discontin-
dation of a CPM, it is recommended to conduct sample size uation due to AEs was 40%, 45%, and 15% in the valida-
calculations to obtain the minimum requirements for pre- tion data and 45%, 46%, and 9% respectively at
cise estimates of calibration (observed/expected, c-slope), development. This shows a particular difference in the
discrimination (c-statistic), and clinical utility (net benefit) prevalence of the AEs outcome, which was |1.5 times
in the external data [57]. Sample size criteria for the higher in the validation data. Higher heterogeneity between
external validation of CPMs have been proposed for binary development and validation can impact model calibration,
and continuous outcomes [56,57], but not yet extended to as reported in Section 3.3.1.1.
MPMs. In our example, we therefore adapted the criteria Missing data should be handled in the same way at vali-
by calculating the minimum required sample size for each dation as is intended when the model is used in practice
submodel (one-vs-one) and choosing the larger value, [62,63]. If no missingness is allowed at deployment, but
which depends on the reference category. These calcula- the external validation data contains missing values, this
tions require the end user to specify measures such as should be imputed using the same approach as for the
outcome prevalence, target SE and/or mean and SD of model development data. Where missingness is allowed
the LP. The expected outcome prevalence and mean/SD at deployment, a consistent imputation model between
of the LP can be informed by the development data, and development, validation, and deployment is recommended
the target SE relates to the confidence interval width (pre- [63,84].
cision) that one would like to obtain in the external data For each individual in the external data, the LP and pre-
(worked example in Supplementary material). We dicted probability for each submodel should be calculated

Box 3 R packages for each stage of model development and performance assessment of a multinomial prediction
model.
 Baseline characteristics: tableone [87] for generating baseline tables and quantifying percentage of missing data.
 Variable selection: cv.glmnet() in glmnet [88] for LASSO and ridge; step() in stats [47] for forward, backward, and
stepwise selection logistf for Firth [89].
 Nonlinearity: fp [72] for considering FPs for each binary submodel (a form of backward selection, not yet extended
for MPMs), VGAM [71] for fitting vector splines and rms [70] to fix knot locations across the MLR equations
 Sample size calculations: as of December 2023 not yet implemented into an R package; code for the calculation can
be found in article by Pate et al [29], which requires the pROC [90] library.
 Missing data: mice [91] for multiple imputation.
 Model fitting: multinom() in nnet [92], and VGAM [71].
 Predictive performance:
o Calibration: R code for multinomial calibration framework can be found in paper by van Hoorde et al [32], which
requires VGAM [71] and bayesm [93] libraries.
o Discrimination: Pairwise c-statistics using pROC [90] and code for PDI in paper by Dover et al [94]. SEs can be
computed using mcca [95].
8 C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481

using the equation exactly as developed (Box 1). The same Conceptualization. Glen P. Martin: Writing e review &
performance metrics as described above for model develop- editing, Supervision, Resources, Methodology, Conceptual-
ment can be estimated, and we recommend a particular ization. Ben Van Calster: Writing e review & editing, Su-
focus on calibration curves. We present the multinomial pervision, Resources, Methodology, Conceptualization.
calibration plot of our RA model at external validation in Kimme L. Hyrich: Writing e review & editing, Supervi-
Figure, which highlights some overprediction of the ‘no sion, Resources, Investigation. Suzanne M.M. Verstap-
LDA’ category, and underprediction of the ‘LDA’ category. pen: Writing e review & editing, Investigation,
Supervision. Jamie C. Sergeant: Writing e review & edit-
3.3.4. Model recalibration ing, Supervision, Resources, Methodology,
Upon externally validating a prediction model, it is com- Conceptualization.
mon to find a deterioration in performance relative to the
development setting. Model updating methods have been
proposed as a way of tailoring an existing CPM to suit a Data availability
new target population [85]. These include updating the
models intercept and/or slope, refitting the model in the The data that has been used is confidential.
external data, or adding predictors [82]. Model updating
methods have recently been expanded into the multinomial
setting [32,33]. In our example, we explored updating the Declaration of competing interest
model in two stages [33]: 1) recalibration, where the
model’s slopes and intercepts were updated in the external K.L.H. has received speaker honoraria from Abbvie and
data, which involves proportionally adjusting the original grant income from Bristol Myers Squibb and Pfizer. There
coefficients [86], and 2) model refitting, where model coef- are no competing interests for any other author.
ficients were re-estimated in the external data. The
approach to recalibration of an MPM is outlined in Box
2, for our RA example. Supplementary data

3.4. Computational guidance Supplementary data to this article can be found online at
https://doi.org/10.1016/j.jclinepi.2024.111481.
Box 3 gives suggestions for R packages [47] for each
stage of model development and validation. This is not in-
tended to be an exhaustive list, but lists the packages that References
we used when developing and evaluating our RA model [1] Steyerberg EW, Moons KGM, van der Windt DA, Hayden JA,
[35]. Perel P, Schroter S, et al. Prognosis research strategy (PROGRESS)
3: prognostic model research. PLoS Med 2013;10(2):e1001381.
[2] Van Calster B, Hoorde KV, Valentin L, Testa AC, Fischerova D,
Holsbeke CV, et al. Evaluating the risk of ovarian cancer before sur-
4. Conclusion gery using the ADNEX model to differentiate between benign,
borderline, early and advanced stage invasive, and secondary meta-
This article provides an overview and practical guidance static tumours: prospective multicentre diagnostic study. BMJ 2014;
on how to develop, externally validate, and update MPMs. 349:g5920.
Recent methodological advances such as the sample size [3] Tinelli R, Tinelli A, Tinelli FG, Cicinelli E, Malvasi A. Conservative
calculation for the development of an MPM [29], ways to surgery for borderline ovarian tumors: a review. Gynecol Oncol 2006;
100(1):185e91.
quantify discrimination [31] and calibration of MPMs
[4] Vergote I, Brabanter JD, Fyles A, Bertelsen K, Einhorn N, Sevelda P,
[32] are enabling the greater use of this approach, but a et al. Prognostic importance of degree of differentiation and cyst
guide of how to implement many of the ideas and methods rupture in stage I invasive epithelial ovarian carcinoma. Lancet
is currently lacking. We emphasize that existing best prac- 2001;357(9251):176e82.
tice guidelines for CPMs should be followed [5] Frank E, Kramer S. Ensembles of nested dichotomies for multi-class
problems. 2004. Proc 21st ICML. Available at: https://icml.cc/
[29,45,46,57,60,82]. In this article we outlined MPM-
Conferences/2004/proceedings/papers/128.pdf. Accessed March 15,
specific considerations only. Future research could focus 2024.
on MPM considerations for variable selection, FPs, net [6] Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: a
benefit, and sample size criteria for external validation. unifying approach for margin classifiers. J Mach Learn Res
2000113e41.
[7] Duan K, Keerthi SS, Chu W, Shevade SK, Poo AN. Multi-category
Classification by Soft-Max Combination of Binary Classifiers. In:
CRediT authorship contribution statement Windeatt T, Roli F, editors. Multiple Classifier Systems [Internet].
Berlin, Heidelberg: Springer Berlin Heidelberg; 2003:125e34. (Goos
Celina K. Gehringer: Writing e review & editing, G, Hartmanis J, Van Leeuwen J, editors. Lecture Notes in Computer
Writing e original draft, Software, Resources, Methodol- Science; vol. 2709). Available at: http://link.springer.com/10.1007/3-
ogy, Funding acquisition, Formal analysis, Data curation, 540-44938-8_13. Accessed October 23, 2023.
C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481 9

[8] Huang TK, Weng RC, Lin CJ. Generalized Bradley-Terry models and validation, and assessing the incremental value of a new (bio)marker.
multi-class probability estimates. J Mach Learn Res 200685e115. Heart 2012;98(9):683e90.
[9] Van Calster B, Luts J, Suykens JAK, Condous G, Bourne T, [25] Moons KGM, Kengne AP, Grobbee DE, Royston P, Vergouwe Y,
Timmerman D, et al. Comparing Methods for Multi-class Probabili- Altman DG, et al. Risk prediction models: II. External validation,
ties in Medical Decision Making Using LS-SVMs and Kernel Logis- model updating, and impact assessment. Heart 2012;98(9):691e8.
tic Regression. In: De Sa JM, Alexandre LA, Duch W, Mandic D, [26] Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and
editors. Artificial Neural Networks e ICANN 2007 [Internet]. Berlin, prognostic research: validating a prognostic model. BMJ 2009;338:
Heidelberg: Springer Berlin Heidelberg; 2007:139e48. (Hutchison b605.
D, Kanade T, Kittler J, Kleinberg JM, Mattern F, Mitchell JC, [27] Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG.
et al., editors. Lecture Notes in Computer Science; vol. 4669). Avail- Prognosis and prognostic research: what, why, and how? BMJ
able at: http://link.springer.com/10.1007/978-3-540-74695-9_15. Ac- 2009;338:b375.
cessed October 20, 2023. [28] Altman DG, Royston P. What do we mean by validating a prognostic
[10] Van Calster B, Van Belle V, Condous G, Bourne T, Timmerman D, model? Stat Med 2000;19(4):453e73.
Van Huffel S. Multi-class AUC metrics and weighted alternatives. [29] Pate A, Riley RD, Collins GS, van Smeden M, Van Calster B,
In: 2008 IEEE International Joint Conference on Neural Networks Ensor J, et al. Minimum sample size for developing a multivariable
(IEEE World Congress on Computational Intelligence) [Internet]. prediction model using multinomial logistic regression. Stat Methods
Hong Kong, China: IEEE; 2008:1390e6. Available at: http:// Med Res 2023;32(3):555e71.
ieeexplore.ieee.org/document/4633979/. Accessed October 20, 2023. [30] de Jong VMT, Eijkemans MJC, Van Calster B, Timmerman D,
[11] Van Calster B, Condous G, Kirk E, Bourne T, Timmerman D, Van Moons KGM, Steyerberg EW, et al. Sample size considerations
Huffel S. An application of methods for the probabilistic three- and predictive performance of multinomial logistic prediction
class classification of pregnancies of unknown location. Artif Intell models. Stat Med 2019;38(9):1601e19.
Med 2009;46(2):139e54. [31] Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van
[12] Edlinger M, Smeden M, Alber HF, Wanitschek M, Van Calster B. Huffel S, Steyerberg EW. Extending the c-statistic to nominal polyt-
Risk prediction models for discrete ordinal outcomes: calibration omous outcomes: the Polytomous Discrimination Index. Stat Med
and the impact of the proportional odds assumption. Stat Med 2012;31(23):2610e26.
2022;41(8):1334e60. [32] Van Hoorde K, Vergouwe Y, Timmerman D, Van Huffel S,
[13] Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Steyerberg EW, Van Calster B. Assessing calibration of multinomial
Moons KGM. Polytomous logistic regression analysis could be risk prediction models. Stat Med 2014;33(15):2585e96.
applied more often in diagnostic research. J Clin Epidemiol 2008; [33] Van Calster B, Van Hoorde K, Vergouwe Y, Bobdiwala S,
61(2):125e34. Condous G, Kirk E, et al. Validation and updating of risk models
[14] Martin GP, Sperrin M, Snell KIE, Buchan I, Riley RD. Clinical pre- based on multinomial logistic regression. Diagn Progn Res 2017;
diction models to predict the risk of multiple binary outcomes: a 1(1):2.
comparison of approaches. Stat Med 2021;40(2):498e517. [34] Van Calster B, Vergouwe Y, Looman CWN, Van Belle V,
[15] Pate A, Riley RD, Sperrin M, Van Calster B, Sergeant JC, Peek N, Timmerman D, Steyerberg EW. Assessing the discriminative ability
et al. Developing clinical prediction models for nominal polytomous of risk models for more than two outcome categories. Eur J Epide-
outcomes: a simulation study comparing available approaches miol 2012;27(10):761e70.
[Internet]. Review 2023. Available at: https://www.researchsquare. [35] Gehringer CK, Martin GP, Hyrich KL, Verstappen SMM, Sexton J,
com/article/rs-3121017/v1. Accessed July 26, 2023. Provan SA, et al. Developing and externally validating multinomial
[16] Kruppa J, Liu Y, Biau G, Kohler M, K€onig IR, Malley JD, et al. Prob- prediction models for methotrexate treatment outcomes in patients
ability estimation with machine learning methods for dichotomous with rheumatoid arthritis: results from an international collaboration.
and multicategory outcome: theory. Biom J 2014;56(4):534e63. J Clin Epidemiol 2023;166:111239.
[17] Kruppa J, Liu Y, Diener HC, Holste T, Weimar C, K€onig IR, et al. [36] Bullock J, Rizvi SAA, Saleh AM, Ahmed SS, Do DP, Ansari RA,
Probability estimation with machine learning methods for dichoto- et al. Rheumatoid arthritis: a brief overview of the treatment. Med
mous and multicategory outcome: applications. Biom J 2014;56(4): Princ Pract 2019;27(6):501e7.
564e83. [37] Smolen JS, Landewe RBM, Bergstra SA, Kerschbaumer A,
[18] Schuit E, Kwee A, Westerhuis M, Van Dessel H, Graziosi G, Van Sepriano A, Aletaha D, et al. EULAR recommendations for the man-
Lith J, et al. A clinical prediction model to assess the risk of operative agement of rheumatoid arthritis with synthetic and biological disease-
delivery. BJOG 2012;119(8):915e23. modifying antirheumatic drugs: 2022 update. Ann Rheum Dis 2023;
[19] Roukema J, van Loenhout RB, Steyerberg EW, Moons KGM, 82(1):3e18.
Bleeker SE, Moll HA. Polytomous regression did not outperform [38] Recommendations | Rheumatoid arthritis in adults: management |
dichotomous logistic regression in diagnosing serious bacterial infec- Guidance | NICE [Internet]. NICE. Available at: https://www.nice.
tions in febrile children. J Clin Epidemiol 2008;61(2):135e41. org.uk/guidance/ng100/chapter/Recommendations. Accessed March
[20] Barnes DE, Mehta KM, Boscardin WJ, Fortinsky RH, Palmer RM, 2, 2022.
Kirby KA, et al. Prediction of recovery, dependence or death in elders [39] Sergeant JC, Hyrich KL, Anderson J, Kopec-Harding K, Hope HF,
who become disabled during hospitalization. J Gen Intern Med 2013; Symmons DPM, et al. Prediction of primary non-response to metho-
28(2):261e8. trexate therapy using demographic, clinical and psychosocial vari-
[21] Ramspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M. ables: results from the UK Rheumatoid Arthritis Medication Study
External validation of prognostic models: what, why, how, when (RAMS). Arthritis Res Ther 2018;20(1):1e11.
and where? Clin Kidney J 2021;14(1):49e58. [40] Sherbini AA, Sharma SD, Gwinnutt JM, Hyrich KL,
[22] Royston P, Altman DG. External validation of a Cox prognostic Verstappen SMM. Prevalence and predictors of adverse events with
model: principles and methods. BMC Med Res Methodol 2013; methotrexate mono- and combination-therapy for rheumatoid
13(1):33. arthritis: a systematic review. Rheumatology (Oxford) 2021;60(9):
[23] Steyerberg EW, Vergouwe Y. Towards better clinical prediction 4001e17.
models: seven steps for development and an ABCD for validation. [41] Schnabel A, Herlyn K, Burchardi C, Reinhold-Keller E, Gross WL.
Eur Heart J 2014;35(29):1925e31. Long-term tolerability of methotrexate at doses exceeding 15 mg
[24] Moons KGM, Kengne AP, Woodward M, Royston P, Vergouwe Y, per week in rheumatoid arthritis. Rheumatol Int 1996;15(5):
Altman DG, et al. Risk prediction models: I. Development, internal 195e200.
10 C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481

[42] Kinder AJ, Hassell AB, Brand J, Brownfield A, Grove M, [62] Tsvetanova A, Sperrin M, Peek N, Buchan I, Hyland S, Martin GP.
Shadforth MF. The treatment of inflammatory arthritis with metho- Missing data was handled inconsistently in UK prediction models:
trexate in clinical practice: treatment duration and incidence of a review of method used. J Clin Epidemiol 2021;140:149e58.
adverse drug reactions. Rheumatology 2005;44(1):61e6. [63] Sisk R, Sperrin M, Peek N, van Smeden M, Martin GP. Imputation
[43] Wang W, Zhou H, Liu L. Side effects of methotrexate therapy for and missing indicators for handling missing data in the development
rheumatoid arthritis: a systematic review. Eur J Med Chem 2018; and deployment of clinical prediction models: a simulation study.
158:502e16. Stat Methods Med Res 2023;27:09622802231165001.
[44] van Gestel AM, Haagsma CJ, van Riel PLCM. Validation of rheuma- [64] Firth D. Bias reduction of maximum likelihood estimates. Biometrika
toid arthritis improvement criteria that include simplified joint 199327e38.
counts. Arthritis Rheum 1998;41(10):1845e50. [65] Van Houwelingen JC. Shrinkage and penalized likelihood as methods
[45] Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, to improve predictive accuracy. Stat Neerl 2001;55(1):17e34.
Macaskill P, Steyerberg EW, et al. Transparent reporting of a [66] Martin GP, Riley RD, Collins GS, Sperrin M. Developing clinical
multivariable prediction model for individual prognosis or diag- prediction models when adhering to minimum sample size recom-
nosis (TRIPOD): explanation and elaboration. Ann Intern Med mendations: the importance of quantifying bootstrap variability in
2015;162(1):W1e73. tuning parameters and predictive performance. Stat Methods Med
[46] Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent re- Res 2021;30(12):2545e61.
porting of a multivariable prediction model for individual prognosis [67] Van Calster B, Van Smeden M, De Cock B, Steyerberg EW. Regres-
or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med 2015; sion shrinkage methods for clinical prediction models do not guar-
13(1):1e10. antee improved performance: simulation study. Stat Methods Med
[47] R: The R project for statistical computing [Internet]. Available at: Res 2020;29(11):3166e78.
https://www.r-project.org/. Accessed April 25, 2023. [68] Steyerberg EW, Uno H, Ioannidis JPA, van Calster B, Ukaegbu C,
[48] Gehringer C. How to develop, externally valdidate, and update multi- Dhingra T, et al. Poor performance of clinical prediction models: the
nomial prediction models [Internet]. 2023. Available at: https:// harm of commonly applied methods. J Clin Epidemiol 2018;98:133e43.
github.com/celinagehringer/multinomial_risk_prediction_RA. Ac- [69] Collins GS, Ogundimu EO, Cook JA, Manach YL, Altman DG.
cessed July 20, 2023. Quantifying the impact of different approaches for handling contin-
[49] Fine JP, Gray RJ. A proportional hazards model for the subdistribu- uous predictors on the performance of a prognostic model. Stat
tion of a competing risk. J Am Stat Assoc 1999;94(446):496. Med 2016;35(23):4124e35.
[50] Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A [70] Harrell FE. Rms: regression modeling strategies. 2023. Available at:
competing risks analysis should report results on all cause-specific https://cran.r-project.org/web/packages/rms/rms.pdf. Accessed
hazards and cumulative incidence functions. J Clin Epidemiol March 15, 2024.
2013;66(6):648e53. [71] Yee T. VGAM: vector generalized linear and additive models. 2022.
[51] Van Hoorde K. Updating and calibration of multinomial risk predic- Available at: https://CRAN.R-project.org/package5VGAM. Ac-
tion models: PhD thesis. KU Leuven. cessed March 15, 2024.
[52] Wynants L, Timmerman D, Bourne T, Van Huffel S, Van Calster B. [72] Ambler G, Benner A. mfp: multivariable fractional polynomials.
Screening for data clustering in multicenter studies: the residual in- 2023. Available at: https://CRAN.R-project.org/package5mfp. Ac-
traclass correlation. BMC Med Res Methodol 2013;13:128. cessed December 20, 2023.
[53] Sperrin M, Jenkins D, Martin GP, Peek N. Explicit causal reasoning [73] Van Calster B, McLernon DJ, Van Smeden M, Wynants L,
is needed to prevent prognostic models being victims of their own Steyerberg EW, Bossuyt P, et al. Calibration: the Achilles heel of pre-
success. J Am Med Inform Assoc 2019;26:1675e6. dictive analytics. BMC Med 2019;17(1):1e7.
[54] Chowdhury MZI, Turin TC. Variable selection strategies and its [74] Van Calster B, Nieboer D, Vergouwe Y, Cocke BD, Pencina MJ,
importance in clinical prediction modelling. Fam Med Community Steyerberg EW. A calibration hierarchy for risk models was defined:
Health 2020;8(1):262. from utopia to empirical data. J Clin Epidemiol 2016;74:167e76.
[55] Heinze G, Wallisch C, Dunkler D. Variable selection e a review and [75] Van Houwelingen JC, Le Cessie S. Predictive value of statistical
recommendations for the practicing statistician. Biom J 2018;60(3): models. Stat Med 1990;9(11):1303e25.
431e49. [76] Riley RD, Snell KIE, Martin GP, Whittle R, Archer L, Sperrin M,
[56] Archer L, Snell KIE, Ensor J, Hudda MT, Collins GS, Riley RD. et al. Penalization and shrinkage methods produced unreliable clin-
Minimum sample size for external validation of a clinical prediction ical prediction models especially when sample size was small. J Clin
model with a continuous outcome. Stat Med 2021;40(1):133e46. Epidemiol 2021;132:88e96.
[57] Riley RD, Debray TPA, Collins GS, Archer L, Ensor J, van [77] Dalton JE. Flexible recalibration of binary clinical prediction models.
Smeden M, et al. Minimum sample size for external validation of a Stat Med 2013;32(2):282e9.
clinical prediction model with a binary outcome. Stat Med 2021; [78] Austin PC, Steyerberg EW. Graphical assessment of internal and
40(19):4230e51. external calibration of logistic regression models by using loess
[58] Riley RD, Snell KIE, Ensor J, Burke DL, Harrell FE, Moons KGM, smoothers. Stat Med 2014;33(3):517e35.
et al. Minimum sample size for developing a multivariable prediction [79] Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to
model: PART II - binary and time-to-event outcomes. Stat Med 2019; the evaluation of prediction models, molecular markers, and diag-
38(7):1276e96. nostic tests. BMJ 2016;352:i6.
[59] Christodoulou E, van Smeden M, Edlinger M, Timmerman D, [80] Chalkou K, Vickers AJ, Pellegrini F, Manca A. Decision curve anal-
Wanitschek M, Steyerberg EW, et al. Adaptive sample size determi- ysis for personalized treatment choice between multiple options. Med
nation for the development of clinical prediction models. Diagn Decis Making 2023337e49.
Progn Res 2021;5(1):6. [81] Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step
[60] Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, guide to interpreting decision curve analysis. Diagn Progn Res
et al. Multiple imputation for missing data in epidemiological and 2019;3(1):18.
clinical research: potential and pitfalls. BMJ 2009;339:157e60. [82] Debray TPA, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW,
[61] Kleinke K. Multiple imputation under violated distributional assump- Moons KGM. A new framework to enhance the interpretation of
tions: a systematic evaluation of the assumed robustness of predictive external validation studies of clinical prediction models. J Clin Epi-
mean matching. J Educ Behav Stat 2017;42(4):371e404. demiol 2015;68(3):279e89.
C.K. Gehringer et al. / Journal of Clinical Epidemiology 174 (2024) 111481 11

[83] Hailpern SM, Visintainer PF. Odds ratios and logistic regression: [89] Heinze G, Ploner M, Dunkler D, Southworth H, Jiricka L, Steiner G.
further examples of their use and interpretation. STATA J 2003; Logistf: firth’s bias-reduced logistic regression. 2023. Available at:
3(3):213e25. https://cemsiis.meduniwien.ac.at/en/kb/science-research/software/
[84] Hoogland J, van Barreveld M, Debray TPA, Reitsma JB, statistical-software/firth-correction/. Accessed March 15, 2024.
Verstraelen TE, Dijkgraaf MGW, et al. Handling missing predictor [90] Robin X, Hainard A, Tiberti N, Lisacek F, Sanchez JC, M€uller M,
values when validating and applying a prediction model to new pa- et al. pROC: display and analyze ROC curves. 2023. Available at:
tients. Stat Med 2020;39(25):3591e607. https://cran.r-project.org/web/packages/pROC/pROC.pdf. Accessed
[85] Su TL, Jaki T, Hickey GL, Buchan I, Sperrin M. A review of statis- March 15, 2024.
tical updating methods for clinical prediction models. Stat Methods [91] van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation
Med Res 2018;27(1):185e97. by chained equations in R [internet]. J Stat Software 2011;45(3):1e67.
[86] Jenkins DA, Sperrin M, Martin GP, Peek N. Dynamic models to pre- [92] Ripley B, Venables W. Nnet: feed-forward neural networks and multi-
dict health outcomes: current status and methodological challenges. nomial log-linear models [internet]. 2023. Available at: https://cran.r-
Diagn Progn Res 2018;2(1):23. project.org/web/packages/nnet/index.html. Accessed July 27, 2023.
[87] Yoshida K, Bartel A. Tableone: create ‘table 1’ to describe baseline [93] Rossi P. Bayesm: bayesian inference for marketing/micro-econometrics.
characteristics with or without propensity score weights. 2022. Avail- 2019. Available at: https://CRAN.R-project.org/package5bayesm;. Ac-
able at: https://CRAN.R-project.org/package5tableone. Accessed cessed March 15, 2024.
March 15, 2024. [94] Dover DC, Islam S, Westerhout CM, Moore LE, Kaul P, Savu A.
[88] Friedman J, Hastie T, Tibshirani R, Narasimhan B. Lasso and elastic- Computing the polytomous discrimination index. Stat Med 2021;
net regularized generalized linear models. 2023. Available at: https:// 40:3667e81.
cran.r-project.org/web/packages/glmnet/glmnet.pdf. Accessed March [95] Gao M, Li J. Mcca: multi-category classification accuracy. 2019. Avail-
15, 2024. able at: https://github.com/gaoming96/mcca. Accessed March 15, 2024.

You might also like