[go: up one dir, main page]

Academia.eduAcademia.edu
Case Report Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations Hyemin Han Educational Psychology Program, University of Alabama, Tuscaloosa, AL 35487, USA; hyemin.han@ua.edu; Tel.: +1-205-348-0746 Abstract: Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three datadriven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives. Keywords: data-driven analysis; model exploration; variable selection; Bayesian Model Averaging; regularized regression; LASSO; stepwise regression; cross-validation; overfitting Citation: Han, H. Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations. Stats 2024, 7, 732–744. https://doi.org/ 10.3390/stats7030044 Academic Editor: Wei Zhu Received: 24 June 2024 Revised: 11 July 2024 Accepted: 12 July 2024 Published: 18 July 2024 Copyright: © 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1. Introduction Researchers in psychology and educational research who use quantitative methods are often interested in exploring the best prediction or regression models with data-driven analysis methods [1]. Previously, many of them tended to test a priori hypotheses based on existing theories and literature via hypothesis-driven analysis [2,3]. However, recent advances in computer and data science allow researchers to gather large-scale datasets relevant to their research topics in a feasible manner. The open science movement enables them to access various large-size open datasets for secondary analysis [4]. In such a situation, the data-driven model exploration that researchers could not easily perform in the past has become a powerful research method to develop the most plausible explanation model based on data and to generate research questions and hypotheses for subsequent research projects [1,5,6]. Despite the potential benefit of the data-driven approach in quantitative research, researchers need help implementing such an approach in their research projects. One of the most fundamental challenges is that conventional hypothesis-driven analysis, such as testing one model based on statistical indicators, e.g., p-values, that most quantitative researchers are familiar with, is unsuitable for achieving the goal [7–9]. As mentioned, the intended use of conventional hypothesis testing is testing on a specific model rather than comparing the aptitude of multiple candidate models for model exploration [2,3]. Let us imagine an illustrative situation when one mistakenly employs the hypothesis-driven analysis method to identify the best regression model from a collected dataset. One may eventually test one specific model, the full model with all candidate predictors in the dataset. Stats 2024, 7, 732–744. https://doi.org/10.3390/stats7030044 https://www.mdpi.com/journal/stats Stats 2024, 7 733 Then, one may identify which predictor should be in the resultant model by examining whether the responsive p-value is lower than 0.05. In such a case, the resultant model is likely erroneous primarily due to inflated false positives and overestimating model parameters, e.g., coefficients [10,11]. Overfitting is another significant issue [5,8,12,13]. When researchers employ a full model, the model may include unnecessary candidate predictors not fundamentally associated with the dependent variable. In such a case, an apparent R2 value might become higher. However, the full model may be overfitted to the data used for regression. Thus, it cannot accurately predict the dependent variable outside the used dataset. That results in suboptimal cross-validity. More specifically, in such a case, the estimated regression model might be statistically valid in the interval defined by the values and data in the employed dataset. However, outside the intervals, it may perform an extrapolation without statistically significant guarantee of the estimates. The abovementioned illustrative example suggests researchers should use analysis methods developed for data-driven model exploration rather than conventional hypothesis testing in such a case. Furthermore, we should also consider the examination of the relative importance of predictors as a purpose for regression-based analysis in psychological and educational studies [14]. Researchers in psychology and education are frequently interested in the extent to which each predictor contributes to predicting the outcome variable of interest [14]. Although conventional indicators, such as p-values and R2 , have been employed to examine the relative importance of variables, they do not demonstrate the importance accurately [15]. I already explained previously the reason why p-values cannot be used for variable selection or evaluation [10,11]. In the case of R2 , due to the intercorrelation between predictors and multicollinearity, it is practically difficult to calculate the pure contribution of each predictor for evaluating its relative importance [14,16,17]. Hence, methods providing accurate information about the true relative importance of each variable after considering the intercorrelation and multicollinearity, such as dominance analysis and relative weight calculation [16,18], might be beneficial for quantitative studies in psychology and education [14]. Although their primary goal is not exploring best candidate models and predictors unlike data-driven exploration methods, their outcomes, relative importance, can offer some insights into which predictors should be seriously considered in explaining the association between predictors and the outcome variable in regression models of interest. Hence, it would be informative to briefly overview and discuss methods to examine relative importance within the context of the current paper focusing on data-driven model exploration. Although data-driven model exploration methods pursue a different goal— identifying the best model and predictors—the data-driven methods can provide better information regarding the relative importance of variables compared with conventional hypothesis testing methods, particularly testing p-values [14,15]. For instance, it would be possible to determine that predictors which survived data-driven exploration may be more important in predicting an outcome variable than others [19]. I will discuss implications of data-driven methods within the context of examining relative importance in the discussion section after overviewing exploration methods and testing results. In the following subsection, I will briefly overview several data-driven methods widely used in the field for background information. 1.1. Background According to recent studies in psychology and education performing model exploration with data-driven analysis methods, the most accessible methods available in widelyused statistical software, such as R, are stepwise regression, Bayesian Model Averaging (BMA), and regularized regression [5,20–24]. Let us begin with overviewing the first method, stepwise regression. R implements the stepwise regression as a native statistical function, i.e., step. When a researcher conducts stepwise regression, there are different modes of model selection [25,26]. First, forward stepwise regression starts with a null Stats 2024, 7 734 model and adds candidate predictors step-wisely until the process fulfills a criterion. Second, when one performs backward stepwise regression, one begins with a full model and removes one predictor per step until a criterion is satisfied. In the case of step in R [27], the default option employs both stepwise methods while examining the Akaike Information Criterion (AIC) as an indicator for each step. Compared with the conventional approach, testing a full model and selecting predictors using p-values [25], stepwise regression is a better model exploration method as it employs a data-driven approach in its rationale. Furthermore, it is straightforward to use even without knowledge of advanced statistics, e.g., Bayesian statistics. Because stepwise regression suggests only one model at the end [27], from a psychological or educational researcher, who is often interested in clearly identifying factors contributing to predicting their dependent variable of interest, it is easy to understand and interpret the result. However, there are also limitations warranting our caution. Depending on which method (e.g., forward vs. backward vs. both) or criterion (e.g., AIC, p-values, etc.) is employed, stepwise regression may recommend different models across trials [28,29]. Given it selects only one model among all possible candidate models, model uncertainty becomes a significant concern. Because researchers can only collect sample data from a part of the reality—the population, in most cases, depending on which subset is chosen during the sampling and data collection processes—the model suggested by stepwise regression can significantly vary. Although it is possible to alleviate overfitting, as the final model does not include all predictors, some methodology researchers still criticize stepwise regression on the grounds that it can overestimate model parameters [30,31]. Second, BMA employs Bayesian statistics to average multiple candidate models with high posterior probabilities (see the Supplementary Materials for theoretical details) [9,32]. In R, the BMA package implements the functionality [33]. BMA utilizes Bayesian statistics to calculate the posterior probabilities indicating the likelihood of the inclusions of candidate predictors and models. It starts with assigning prior probabilities, the probabilities of the abovementioned predictor or model inclusions, before observing the data [2,3,34]. By default, each predictor and model has an equal chance of inclusion before data observation. The BMA procedure updates the prior distributions into the posterior distributions while observing collected data following the Bayes Theorem [34]. If the examined data suggests the inclusion of a specific predictor or model is more likely than the prior probability before observation, then the posterior probability becomes higher. The opposite can also occur if data does not support the prior probability. Once the procedure is completed, BMA calculates the mean coefficient of each predictor based on the posterior probabilities and the estimated coefficients in candidate models [33,35]. The posterior probability of a specific model works as a weight during the calculation process. If one model demonstrates a high posterior probability, the coefficients of predictors included in that model highly contribute to the averaged coefficients, as the model weight is high. Likewise, coefficients in a model with a low posterior probability contribute relatively less to the mean coefficients in the final result. Compared with the previously introduced methods, BMA has several benefits. First, given it considers multiple candidate models, not one single model, with posterior probabilities, BMA is robust against model uncertainty [9,29,36]. It is less susceptible to uncertainty and overfitting because it does not choose one specific model. The previous studies reporting the superior cross-validity of BMA compared with conventional model exploration methods support such a point [20,22]. Moreover, since it implements Bayesian statistics, BMA can suggest which model is more plausible than others via posterior probabilities [2,3,34]. However, there are issues that researchers need to consider regarding this method. Unlike the conventional methods, BMA does not suggest one single model without clear indicators for inference, such as p-values of coefficients [9,20]. Although such a feature can provide methodological benefits as mentioned above, from the perspectives of researchers in psychology and education whose primary interest is not about methodological advances, the results from BMA might be difficult to interpret. This can be particularly Stats 2024, 7 735 problematic if the researchers want to identify one specific model for explanation rather than prediction [37]. If they do not have background knowledge in Bayesian statistics, interpreting BMA results can be challenging. Third, the regularized regression applies regularization processes to penalize unnecessary candidate predictor coefficients so that they shrink toward zero and minimize possible overfitting (see the Supplementary Materials for theoretical details) [38]. An R package, glmnet, is one of the most widely used packages implementing regularized regression [39,40]. Regularized regression attempts to penalize unnecessary coefficients by unfavoring models with higher coefficients. At the same time, it also considers the log-likelihood to prefer a model maintaining predictability [5,6,38]. Consequently, regularized regression suggests a model with reasonable predictability with its unnecessary coefficients approximated to zero [8,41]. Among various forms of regularized regression, the most frequently used form is the least absolute shrinkage and selection operator (LASSO), which considers the sum of the absolute values of coefficients [42]. LASSO attempts to minimize the sum value to penalize unnecessary coefficients [8]. The most distinctive benefit of LASSO and regularized regression is its robustness against overfitting, like the case of BMA [5]. Previous studies in psychology using LASSO (and similar regularized regression) have demonstrated that models selected via LASSO were more robust against overfitting in terms of better cross-validity compared with full models and models suggested by conventional methods [5,8,43]. The recommended models were more stringent accordingly. Despite these methodological advantages in model exploration, LASSO has several practical issues that should be considered by researchers in psychology and education. Like BMA, its result is more complicated to understand and interpret than the result of conventional regression when inference, not prediction, becomes a primary interest [7,8,37]. Regularized regression generates a model with regularized coefficients without other indicators for statistical inference, such as p-values. Even compared with BMA, which provides posterior probabilities [9,20], it might be less suitable for inference and explanation. Hence, psychological and educational researchers interested in an explanation model rather than prediction may experience practical difficulties while employing this method. As overviewed, data-driven model exploration methods, including BMA and LASSO (and other forms of regularized regression), provide significant methodological benefits when researchers are interested in exploring the best model with data instead of testing a priori hypotheses [1,7]. I especially note that they are relatively robust against overfitting compared with conventional full-model testing [20,30,43]. Hence, when researchers can only collect sample, not population, data from a part of the whole group of interest, the data-driven methods should be standard approaches [9,44]. The methodological benefit becomes maximized when the ratio of observations to the number of candidate predictors in the data is small [9]. When researchers are interested in exploring data with multiple competing candidate predictors, particularly when the sample size is small, the data-driven methods would be the only valid means to achieve their goal. 1.2. Current Study Despite the abovementioned methodological benefits of data-driven methods, several issues warrant further examination that I intend to conduct in this paper. First, most methodological papers (e.g., simulation papers) reporting the superior performance of the methods are authored by methodology experts (e.g., Wang et al. (2004) [45]). From the perspective of end-user researchers with insufficient statistical knowledge, it may be difficult to understand and test such benefits within the context of their research projects. Second, the previous papers have focused on introducing specific methodologies and reporting their performance from the methodological experts’ perspectives. Hence, they could not provide sufficient concrete information about how their methods can improve end users’ research outcomes. Also, the lack of practical guidelines about choosing a specific method given the purpose of a research project and the nature of collected data is Stats 2024, 7 736 concerning. End-user researchers may be interested in such points explained and discussed from a practical point of view. To address the abovementioned issues in applying data-driven methods, especially the insufficiency of practical information and guidelines for end-user researchers, I will examine the performance of the data-driven model exploration methods with data collected from studies in psychology and education, not simulated data. I will test stepwise regression, BMA, and LASSO as regularized regression with three datasets. I will employ representative R functionalities, i.e., step, BMA, and glmnet, with default settings for the tests to provide more practical insights to researchers who start with default functionalities without advanced statistical expertise. The performance of each method will be evaluated by cross-validation accuracy to examine the proneness towards overfitting. Moreover, I will compare the performance of the three methods while the ratio of observations to candidate predictors varies. Although previous research suggested that the novel data-driven exploration methods (e.g., BMA and LASSO) might perform better than conventional or stepwise regression when the abovementioned ratio is low, more concrete information about the point is still warranted. Hence, I will test the three methods while changing the ratio by altering the subsample size with one dataset with many candidate predictors. With the evidence from the performance tests mentioned above, I will provide practical guidelines and considerations for employing data-driven model exploration methods, i.e., stepwise regression, BMA, and LASSO, to end-user researchers in psychology and education. I will discuss their strengths and limitations with evidence collected from tests with datasets in the fields. This paper will offer practical insights about utilizing the tools for psychological and educational researchers who want to perform model exploration instead of a priori hypothesis testing. 2. Materials and Methods 2.1. Test Datasets I employed three datasets collected by studies in psychology to examine the performance of the three data-driven model exploration methods, i.e., stepwise regression, BMA, and LASSO. I acquired the datasets from open data repositories. I determined which datasets to employ based on the ratio of observations to the number of candidate predictors. Given that the previous literature recommends a minimum ratio ranging from 10 to 50 [46], I selected the three datasets with a ratio approximating those recommended numbers (90.42, 22.92, and 7.86, respectively). The first dataset is from the field of positive psychology to examine which psychological factors predict one’s sense of purpose contributing to flourishing (see Han (2024) [47] for the full information about this study). The original dataset and source codes for analyses in the study are available at https://doi.org/10.17605/OSF.IO/6VUK3 (accessed on 11 July 2024). The study employed six candidate predictors, i.e., empathic concern, perspective taking (see Davis (1983) [48] for further details about the Interpersonal Reactivity Index, the measure for these factors), moral identity internalization, moral identity symbolization (see Aquino and Reed (2002) [49] for further details about the Moral Identity Scale), post-conventional moral reasoning (see Choi et al. (2019) [50] for further details about the behavioral Defining Issues Test), and moral growth mindset (see Han et al. (2020) [51] for further details about the Moral Growth Mindset Measure), to predict purpose among adolescents (see Bronk et al. (2018) [52] for the Claremont Purpose Scale to assess the sense of purpose). I employed the second dataset in moral and personality psychology exploring which character strengths significantly predict developed moral reasoning in post-conventional reasoning (see Han et al. (2022) [20] for the full information about this study). The original data and analysis source code files are openly available at https://doi.org/10.17605/OSF. IO/GCR9E (accessed on 11 July 2024). In this study, the Global Assessment of Character Strengths with 72 items was employed (see McGrath (2023) [53] for the details about this Stats 2024, 7 737 assessment). In this case, the ratio of observations to the number of candidate predictors was 22.92 (1100 responses × 50%/24 candidate predictors). Finally, the last dataset is from public health psychology (see Blackburn et al. (2022) [54] for details about the original study and Han (2022) [55] for the full descriptions about the dataset, including procedures for data collection and pre-processing). The original study’s data and source code files are available at https://doi.org/10.17605/OSF.IO/Y4 KGH. The study employed the COVID-19 vaccine intent measure (see Han (2022) [56] for the measurement). For the full descriptions for the three datasets, refer to the Supplementary Materials. 2.2. Test Procedures I used cross-validation to test the performance of the three exploration methods with the three datasets [57]. During the cross-validation process, I randomly separated the whole dataset into two subsets, one for training (model exploration; 50%) and one for cross-validation (50%) [58]. Once the subsets were created, I performed model exploration with the three methods. Because I intended to provide practical insights for end-user researchers with insufficient methodological expertise, I employed the default settings and functionalities without setting any special parameter during the exploration process. I conducted stepwise regression (default R function step), BMA (bicreg in BMA package), and LASSO (cv.glmnet in glmnet package) with the training dataset [27,35,39,40]. I extracted the estimated coefficients of candidate predictors from the regression results. All estimated coefficients from each method were stored in matrices for cross-validation. Then, I performed cross-validation with the stored estimated coefficients from the training process [57]. I calculated the root mean squared error (RMSE) by comparing the estimated and actual dependent variable values [59]. I deemed an exploration method demonstrating the smallest RMSE value as one with the best cross-validation performance [57,60], which is the least susceptible to overfitting, and thus, the best exploration method in the present study. The estimated dependent variable values were calculated with the stored coefficients and predictor values from the cross-validation dataset. I repeated this cross-validation procedure 1000 times. For each time, randomly separating the training vs. cross-validation datasets was performed for repeated performance measures. The whole training and cross-validation test processes were performed for three datasets. I conducted the statistical analysis of the performance data collected from the repeated cross-validation tests. I examined whether one method demonstrated a significantly smaller RMSE than another via the Bayesian t-test implemented in the BayesFactor package [61]. Consequently, three sets of Bayesian t-tests, i.e., BMA vs. LASSO, BMA vs. stepwise regression, and LASSO vs. stepwise regression, were performed for each dataset. I intentionally employed the Bayesian t-test instead of its conventional frequentist counterpart because Bayesian analysis is robust against inflated false positives when multiple comparisons are performed even without any correction [34,62]. I used Bayes Factors (BFs) indicating the extent to which the alternative hypothesis was more strongly supported by evidence than the null hypothesis for inference [63,64]. When 2log(BF) was 3 or higher [34,63,65], I concluded that there was a significant difference that was positively supported by data. I also conducted additional cross-validation tests with the first dataset (including the purpose and moral psychology indicators) to examine how changes in the ratio of observations to the number of candidate predictors affect the performance of the methods more closely. For this, I analyzed the first dataset with four different sizes of subsets, i.e., n = 100, 200, 400, and 800. For these four conditions, the ratio values were 8.33 (100 responses × 50%/6 candidate predictors; only 50% of the subset was used for training), 16.67 (200 responses × 50%/6 candidate predictors), 33.33 (400 responses × 50%/6 candidate predictors), and 66.67 (800 responses × 50%/6 candidate predictors), respectively. The four different subset sizes were determined to make the ratio values range from below 10 to above 50 [46]. I applied the same procedures for the training and cross-validity check used for the whole dataset tests with this randomly selected subset. For example, Stats 2024, 7 738 in the case of the condition of n = 100, 100 out of 1085 responses were randomly selected for each trial. Then, 50 were used for training and 50 were reserved for cross-validation. Finally, I calculated RMSE values with the cross-validation subset based on the coefficients estimated with the training subset. I repeated the abovementioned process 1000 times for each condition. Once all RMSEs were calculated, I performed Bayesian t-tests to compare RMSEs between the three model exploration methods for each n. I examined how the performance trend changes across different subset sizes, from 100 to 800, resulting in varied ratios of observations to the number of candidate predictors, from 8.33 to 66.67. 3. Results 3.1. First Dataset: Purpose and Moral Psychological Indicators I reported the overall performance outcomes when the first dataset was examined in Table 1 (see CPS [full]) and Figure S1. For additional information, I also reported the effect sizes for this and subsequent tests in Cohen’s d. LASSO demonstrated the best performance (the smallest RMSE) compared to all other methods. Interestingly, BMA was not superior to stepwise regression in this condition. Table 1. Results from data exploration method performance analyses. BMA vs. LASSO BMA vs. Stepwise LASSO vs. Stepwise 2log(BF) Cohen’s d 2log(BF) Cohen’s d 2log(BF) Cohen’s d CPS (full) 1032.83 1.35 566.63 0.88 229.50 −0.52 GACS 16.67 0.15 380.59 −0.69 460.33 −0.77 Trust 29.76 0.19 94.39 −0.33 130.69 −0.38 CPS (n = 100) 37.96 −0.21 332.11 −0.64 46.07 −0.23 CPS (n = 200) 473.10 0.79 38.38 0.21 245.62 −0.54 CPS (n = 400) 721.52 1.04 61.59 0.27 426.50 −0.74 CPS (n = 800) 713.33 1.03 239.52 0.53 278.34 −0.58 3.2. Second Dataset: Character Strengths and Moral Reasoning The results from the tests of the second dataset are presented in Table 1 (see GACS) and Figure S2. LASSO performed significantly better than BMA and stepwise regression. BMA also reported a significantly smaller RMSE than stepwise regression. 3.3. Third Dataset: Trust and COVID-19 Vaccine Intent I demonstrated the results from the statistical analysis of the data exploration performance done with the third dataset in Table 1 (see Trust) and Figure S3. LASSO showed the best performance, BMA the second, and stepwise regression the third. 3.4. Performance Trends across Different Sample Sizes I examined how the performance varied as the ratio of observation to the number of candidate predictors changed with the first dataset. The results from the performance analyses when n = 100, 200, 400, and 800 are presented in the four last rows in Table 1 and Figure S4. When n ≥ 200, the ratio was 16.67 or larger; as when the whole dataset was analyzed, LASSO demonstrated the best performance, while the performance of BMA was inferior to that of stepwise regression. However, when the sample size and the ratio were smallest (n = 100), BMA outperformed stepwise regression. The superiority of BMA to stepwise regression was only observed when n = 100. 4. Discussion In this study, I examined the performance of data-driven model exploration methods, i.e., stepwise regression, BMA, and LASSO, with cross-validation. I also investigated how Stats 2024, 7 739 the ratio of observations to the number of predictors influenced the performance of each method. When the three datasets with different ratio values were tested, LASSO based on regularized regression demonstrated the best performance in all cases. BMA outperformed stepwise regression when the ratio values were low (the first dataset with n = 100, the second and third datasets). When the first dataset with the relatively high ratio value (90.42) was analyzed, BMA demonstrated a worse performance than stepwise regression except for when n = 100 (ratio = 8.33). Generally, these findings support my prediction that novel approaches for data-driven model exploration, i.e., BMA and regularized regression (LASSO in this study), would perform better than stepwise regression in terms of better cross-validity and robustness against overfitting. Among them, I found that LASSO performed best in all cases even when there were many observations (the first dataset). Interestingly, BMA did not perform better than stepwise regression when the ratio of observations to the number of candidate predictors was large. The first noteworthy point is that LASSO, a form of regularized regression, demonstrated the best performance in all cases. This might be due to the nature and mechanism of cv.glmnet that I used in the present study for LASSO [6,66]. When one performs cv.glmnet, this function in glmnet package searches for the regression parameters that maximize cross-validation accuracy when the default options are used [39,40,67]. Consequently, the resultant coefficients become shrunken while unnecessarily large coefficients are penalized due to this behavioral tendency [5,6]. It might result in better cross-validation performance and robustness against overfitting compared with other methods that do not directly target optimizing cross-validity. BMA and stepwise regression are not interested in minimizing cross-validation RMSE. Such a difference in the primary purpose of the functionality might contribute to the superior performance of LASSO to others when cross-validity and overfitting become the main concerns. Hence, if a researcher’s primary purpose is finding a prediction model that is most robust against overfitting and best explains and predicts phenomena outside of the collected data, LASSO may be the first candidate to be considered [37]. Despite the methodological benefit of regularized regression, as mentioned in the introduction, there is a significant practical limitation that end-users may need to consider. The method was most robust against overfitting, so it seems the most appropriate method for model exploration and predictor selection from the methodological perspective. However, it does not present any additional useful information for inference, such as p-values, that many psychological and educational researchers might be interested in [20,37]. Hence, it would be difficult to interpret the findings particularly when researchers are interested in their statistical implications (e.g., which predictor is statistically significant, etc.). One possible solution might be that researchers provide in-depth interpretations based on domain knowledge during discussion instead of merely relying on quantitative interpretations, particularly binary decision-making (e.g., p < 0.05) [68,69]. This might require additional labor during studies. However, given that many statisticians raised concerns about using simple thresholds for statistical inference, this will be a more fundamental solution to deal with overfitting and the quality of result interpretation during data-driven analysis [2,70–72]. BMA demonstrated good performance generally when the ratio of observations to the number of candidate predictors was low, particularly when it did not satisfy the most lenient cutoff, 10 [9,46,73]. In other cases, particularly when the number of candidate predictors was small (6, the first dataset), it did not demonstrate performance superior to stepwise regression. Interestingly, when the number of candidate predictors was high (24, the second dataset), BMA performed better than stepwise regression even if the ratio value, 22.92, was higher than the lenient cutoff. Given these results, BMA performs well when the collected data size is smaller than the number of candidate predictors. This effect increases as the number of candidate predictors increases. It is consistent with what the BMA developers argued: BMA is good at addressing model uncertainty when the abovementioned ratio is small [9,23]. Stats 2024, 7 740 Because BMA does not directly aim to optimize cross-validity like regularized regression, it might be plausible to see that its performance is better than regularized regression. Even if that is the case, BMA has a unique benefit compared with regularized regression for applied, not methodology, researchers. Because its results provide more information for inference, such as the posterior probabilities of candidate models and predictors, researchers interested in inference and explanation will find it useful [9,20]. Unlike LASSO and regularized regression, which do not usually provide such information, BMA suggests the selection of inclusion of which model or predictor is more likely to occur. Compared with conventional frequentist indicators, such as p-values, Bayesian indicators calculated by BMA are epistemologically superior because they directly indicate whether an alternative hypothesis, not a null hypothesis, is more strongly supported by evidence. Of course, the results from BMA are less straightforward to interpret than those from stepwise regression because it averages multiple models [20]. However, if researchers can understand and explain the Bayesian output indicators appropriately, BMA might be a good alternative when many candidate predictors are in a small dataset (e.g., when the ratio is low, <10; or when the number of candidate predictors is large, such as 24 in the second dataset). Stepwise regression demonstrated acceptable performance only under limited circumstances when there were plenty of observations in the analyzed dataset. Its performance was superior to BMA only when the number of the candidate predictor was small. It never outperformed regularized regression. This result suggests that stepwise regression, which has been regarded as a traditional approach for model and variable selection, might not be optimal in many cases. If researchers intend to explore the best models and predictors with unspecific candidate predictors, either regularized regression or BMA should be considered. Selecting a single model from many candidates might seem attractive because the result is straightforward to understand. However, due to model uncertainty and the possibility of inflated false positives as discussed in the introduction, the model suggested by stepwise regression can be misleading [9,23,29,73]. Hence, instead of relying on this conventional method, perhaps including stepwise regression that has been widely used so far, researchers may start with the abovementioned methods appropriate for data exploration at the beginning. Then, once a potential model is identified, they can employ hypothesis-focused analysis to acquire more information for inference [74]. I will also consider how data-driven methods are related to efforts to examine the relative importance of candidate predictors. As I mentioned in the introduction, data-driven analysis and analysis of relative importance (e.g., dominance analysis and relative weight analysis) pursue different goals [19]. However, I assumed that results from data-driven analysis may inform psychological and educational researchers who are interested in investigating the extent to which predictors of interest contribute to predicting the outcome variable of interest. At the least, the outcomes of data-driven exploration, the suggested best model and candidate predictors, can indirectly inform such researchers. We can expect that variables that survived the exploration process are deemed to be more important than those that did not. There are some caveats regarding the potential usefulness of data-driven methods in relative importance examination. Obviously, stepwise regression is likely to be least informative. It only suggests one candidate model and is susceptible to model uncertainty [9,29]. Also, it does not provide any additional information other than the suggested model. That said, researchers cannot get more direct information about the relative importance of each survived predictor. The same issue regarding the lack of direct importance indicators can be applied to regularized regression [75]. BMA might be the best method among three candidates for this purpose. In fact, Shou and Smithson [19] suggested that BMA can provide useful information about relative importance when it was compared with dominance analysis. Perhaps the posterior probabilities of candidate predictors, which are about the extent to which the predictors are supposed to be included in prediction models, can be used as more direct indicators for the relative importance of the predictors. Stats 2024, 7 741 However, there are several issues that we need to consider while utilizing BMA for a method for relative importance analysis. Although BMA and dominance analysis behave similarly in most cases, as mentioned previously, they pursue two different goals at the first place, model averaging (or exploration) and testing relative importance [19], at the beginning. Thus, BMA’s relative superiority in examining relative importance should be considered a collateral benefit. Also, the consistency between BMA and dominance analysis becomes lower as multicollinearity increases [19]. If multicollinearity becomes a serious concern, then researchers may need to test the multicollinearity issue before conducting BMA when relative importance is also of their interest. Despite the issue when multicollinearity is severe, I would like to note that one positive aspect of BMA related to multicollinearity is that it can potentially (and perhaps partially) provide one way to address the issue [76]. Future studies should be done to investigate the benefits of BMA in relative importance analysis more accurately. Although I could test the performance of different model exploration methods and discuss practical implications based on evidence in this study, several limitations warrant additional research. First, I did not test analysis methods addressing between-group differences, such as multilevel modeling, to examine data collected from different groups. Given the employed R functionalities widely available to end users only allowed testing within-group population-level predictors, I could not address data-driven methods for multilevel modeling (e.g., explore.models [7]). Researchers interested in examining crosscultural or multi-group differences with advanced statistical knowledge may need to test data-driven methods considering between-group effects in future studies. Second, because I was primarily interested in practical considerations for end users likely to use the functionalities with default settings, I did not examine the performance when customized settings were applied. How data-driven functions behave when different settings are employed was out of the scope of this study. Since fine-tuning settings and parameters for optimization can significantly influence analysis outcomes, such a possibility needs to be tested and discussed in future research. 5. Concluding Remarks In this paper, I tried to suggest several practical guidelines about how to employ datadriven model exploration methods by examining the widely available methods of stepwise regression, BMA, and LASSO. Researchers in psychology and education may refer to the suggestions to conduct more accurate data-driven analysis depending on the nature of their data to be examined. For additional information for practice, I suggest that future works should be done in the field. Perhaps, as I mentioned while discussing limitations, data collected from different groups may need to be further tested with data-driven methods to inform cross-cultural researchers. Furthermore, researchers may consider adjusting options for the R functionalities, and test and compare outcomes to examine how to maximize the performance of each method. With the findings from the proposed future studies, practical researchers will obtain more useful insights into how to utilize the data-driven model exploration methods in an optimal way based on their research purposes and situational factors. Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/stats7030044/s1, Additional Information about Datasets; Additional Information about Data-driven Methods; Figure S1: Performance (RMSE) test with the first dataset; Figure S2: Performance (RMSE) test with the second dataset; Figure S3. Performance (RMSE) test with the third dataset; Figure S4: Performance (RMSE) trends when the ratio of observations to the number of candidate predictors varied. Reference [77] is cited in the Supplementary Materials. Author Contributions: Conceptualization, H.H.; methodology, H.H.; software, H.H.; formal analysis, H.H.; writing—original draft preparation, H.H.; writing—review and editing, H.H. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Stats 2024, 7 742 Institutional Review Board Statement: Not applicable given only secondary datasets openly available to the public were used in this study. Informed Consent Statement: Not applicable given only secondary datasets openly available to the public were used in this study. Data Availability Statement: All data and source files to run the tests conducted in this study are openly available to the public via the Open Science Framework repository: https://doi.org/10.17605 /OSF.IO/ZSYTG (accessed on 11 July 2024). Conflicts of Interest: The author declares no conflicts of interest. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. Jack, R.E.; Crivelli, C.; Wheatley, T. Data-Driven Methods to Diversify Knowledge of Human Psychology. Trends Cogn. Sci. 2018, 22, 1–5. [CrossRef] [PubMed] Wagenmakers, E.-J. A Practical Solution to the Pervasive Problems of p Values. Psychon. Bull. Rev. 2007, 14, 779–804. [CrossRef] [PubMed] Wagenmakers, E.-J.; Marsman, M.; Jamil, T.; Ly, A.; Verhagen, J.; Love, J.; Selker, R.; Gronau, Q.F.; Šmíra, M.; Epskamp, S.; et al. Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications. Psychon. Bull. Rev. 2018, 25, 35–57. [CrossRef] [PubMed] Weston, S.J.; Ritchie, S.J.; Rohrer, J.M.; Przybylski, A.K. Recommendations for Increasing the Transparency of Analysis of Preexisting Data Sets. Adv. Methods Pract. Psychol. Sci. 2019, 2, 214–227. [CrossRef] [PubMed] McNeish, D.M. Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences. Multivar. Behav. Res. 2015, 50, 471–484. [CrossRef] [PubMed] Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [CrossRef] Han, H. A Method to Explore the Best Mixed-Effects Model in a Data-Driven Manner with Multiprocessing: Applications in Public Health Research. EJIHPE 2024, 14, 1338–1350. [CrossRef] [PubMed] Han, H.; Dawson, K.J. Applying Elastic-Net Regression to Identify the Best Models Predicting Changes in Civic Purpose during the Emerging Adulthood. J. Adolesc. 2021, 93, 20–27. [CrossRef] [PubMed] Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian Model Averaging: A Tutorial. Stat. Sci. 1999, 14, 382–401. [CrossRef] Lu, M.; Zhou, J.; Naylor, C.; Kirkpatrick, B.D.; Haque, R.; Petri, W.A.; Ma, J.Z. Application of Penalized Linear Regression Methods to the Selection of Environmental Enteropathy Biomarkers. Biomark. Res. 2017, 5, 9. [CrossRef] Feher, B.; Lettner, S.; Heinze, G.; Karg, F.; Ulm, C.; Gruber, R.; Kuchler, U. An Advanced Prediction Model for Postoperative Complications and Early Implant Failure. Clin. Oral Implants Res. 2020, 31, 928–935. [CrossRef] [PubMed] Babyak, M.A. What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosom. Med. 2004, 66, 411–421. [CrossRef] [PubMed] Ng, A.Y. Preventing “Overfitting” of Cross-Validation Data. In Proceedings of the Machine Learning: Fourteenth International Conference (ICML 97), Nashville, TN, USA, 8–12 July 1997. Johnson, J.W.; LeBreton, J.M. History and use of relative importance indices in organizational research. Organ. Res. Methods 2004, 7, 238–257. [CrossRef] Kruskal, W.; Majors, R. Concepts of relative importance in recent scientific literature. Am. Stat. 1989, 43, 2–6. [CrossRef] Budescu, D.V.; Azen, R. Beyond global measures of relative importance: Some insights from dominance analysis. Organ. Res. Methods 2004, 7, 341–350. [CrossRef] Lipovetsky, S.; Conklin, W.M. Predictor relative importance and matching regression parameters. J. Appl. Stat. 2015, 42, 1017–1031. [CrossRef] Johnson, J.W. A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivar. Behav. Res. 2000, 35, 1–19. [CrossRef] [PubMed] Shou, Y.; Smithson, M. Evaluating predictors of dispersion: A comparison of dominance analysis and Bayesian model averaging. Psychometrika 2015, 80, 236–256. [CrossRef] [PubMed] Han, H.; Dawson, K.J.; Walker, D.I.; Nguyen, N.; Choi, Y.-J. Exploring the Association between Character Strengths and Moral Functioning. Ethics Behav. 2022, 33, 286–303. [CrossRef] Galasso, V.; Pons, V.; Profeta, P.; Becher, M.; Brouard, S.; Foucault, M. Gender Differences in COVID-19 Attitudes and Behavior: Panel Evidence from Eight Countries. Proc. Natl. Acad. Sci. USA 2020, 117, 27285–27291. [CrossRef] Han, H.; Dawson, K.J. Improved Model Exploration for the Relationship between Moral Foundations and Moral Judgment Development Using Bayesian Model Averaging. J. Moral Educ. 2022, 51, 204–218. [CrossRef] Raftery, A.E.; Zheng, Y. Discussion: Performance of Bayesian Model Averaging. J. Am. Stat. Assoc. 2003, 98, 931–938. [CrossRef] Brown, D.L. Faculty Ratings and Student Grades: A University-Wide Multiple Regression Analysis. J. Educ. Psychol. 1976, 68, 573–578. [CrossRef] Stats 2024, 7 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 743 Henderson, D.A.; Denison, D.R. Stepwise Regression in Social and Psychological Research. Psychol. Rep. 1989, 64, 251–257. [CrossRef] Ghani, I.M.M.; Ahmad, S. Stepwise Multiple Regression Method to Forecast Fish Landing. Procedia-Soc. Behav. Sci. 2010, 8, 549–554. [CrossRef] DataCamp; Step: Choose a Model by AIC in a Stepwise Algorithm 2024. Available online: https://www.rdocumentation.org/ packages/stats/versions/3.6.2/topics/step (accessed on 11 July 2024). Clyde, M. Model Uncertainty and Health Effect Studies for Particulate Matter. Environmetrics 2000, 11, 745–763. [CrossRef] George, E.I.; Clyde, M. Model Uncertainty. Stat. Sci. 2004, 19, 81–94. [CrossRef] Hawkins, D.M. The Problem of Overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [CrossRef] Kumar, S.; Attri, S.D.; Singh, K.K. Comparison of Lasso and Stepwise Regression Technique for Wheat Yield Prediction. J. Agrometeorol. 2021, 21, 188–192. [CrossRef] Raftery, A.E.; Madigan, D.; Hoeting, J.A. Bayesian Model Averaging for Linear Regression Models. J. Am. Stat. Assoc. 1997, 92, 179–191. [CrossRef] Raftery, A.E.; Hoeting, J.A.; Volinsky, C.T.; Painter, I.; Yeung, K.Y. Package “BMA”. Available online: https://cran.r-project.org/ web/packages/BMA/BMA.pdf (accessed on 11 July 2024). Han, H. A Method to Adjust a Prior Distribution in Bayesian Second-Level fMRI Analysis. PeerJ 2021, 9, e10861. [CrossRef] [PubMed] Raftery, A.E.; Painter, I.S.; Volinsky, C.T. BMA: An R Package for Bayesian Model Averaging. Newsl. R Proj. 2005, 5, 2–8. Hinne, M.; Gronau, Q.F.; Van den Bergh, D.; Wagenmakers, E.-J. A Conceptual Introduction to Bayesian Model Averaging. Adv. Methods Pract. Psychol. Sci. 2020, 3, 200–215. [CrossRef] Yarkoni, T.; Westfall, J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect. Psychol. Sci. 2017, 53, 174569161769339. [CrossRef] [PubMed] Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [CrossRef] Friedman, J.; Hastie, T.; Tibshirani, R.; Narasimhan, B.; Tay, K.; Simon, N.; Qian, J. Glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. Available online: https://cran.r-project.org/web/packages/glmnet/index.html (accessed on 11 July 2024). Hastie, T.; Qian, J. Glmnet Vignette. Available online: https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html (accessed on 11 July 2024). Kim, M.-H.; Banerjee, S.; Park, S.M.; Pathak, J. Improving Risk Prediction for Depression via Elastic Net Regression-Results from Korea National Health Insurance Services Data. In AMIA Annual Symposium Proceedings; AMIA Symposium: San Francisco, CA, USA, 2016; Volume 2016, pp. 1860–1869. Finch, W.H.; Hernandez Finch, M.E. Regularization Methods for Fitting Linear Models with Small Sample Sizes: Fitting the Lasso Estimator Using R. Pract. Assess. Res. Eval. 2019, 21, 7. [CrossRef] Doebler, P.; Doebler, A.; Buczak, P.; Groll, A. Interactions of Scores Derived from Two Groups of Variables: Alternating Lasso Regularization Avoids Overfitting and Finds Interpretable Scores. Psychol. Methods 2023, 28, 422–437. [CrossRef] Fei, S.; Chen, Z.; Li, L.; Ma, Y.; Xiao, Y. Bayesian Model Averaging to Improve the Yield Prediction in Wheat Breeding Trials. Agric. For. Meteorol. 2023, 328, 109237. [CrossRef] Wang, D.; Zhang, W.; Bakhai, A. Comparison of Bayesian Model Averaging and Stepwise Methods for Model Selection in Logistic Regression. Stat. Med. 2004, 23, 3451–3467. [CrossRef] Heinze, G.; Dunkler, D. Five Myths about Variable Selection. Transpl. Int. 2017, 30, 6–10. [CrossRef] Han, H. Exploring the Relationship between Purpose and Moral Psychological Indicators. Ethics Behav. 2022, 34, 28–39. [CrossRef] Davis, M.H. Measuring Individual Differences in Empathy: Evidence for a Multidimensional Approach. J. Personal. Soc. Psychol. 1983, 44, 113–126. [CrossRef] Aquino, K.; Reed, A. The Self-Importance of Moral Identity. J. Personal. Soc. Psychol. 2002, 83, 1423–1440. [CrossRef] [PubMed] Choi, Y.-J.; Han, H.; Dawson, K.J.; Thoma, S.J.; Glenn, A.L. Measuring Moral Reasoning Using Moral Dilemmas: Evaluating Reliability, Validity, and Differential Item Functioning of the Behavioural Defining Issues Test (bDIT). Eur. J. Dev. Psychol. 2019, 16, 622–631. [CrossRef] Han, H.; Dawson, K.J.; Choi, Y.R.; Choi, Y.-J.; Glenn, A.L. Development and Validation of the English Version of the Moral Growth Mindset Measure [Version 3; Peer Review: 4 Approved]. F1000Research 2020, 9, 256. [CrossRef] [PubMed] Bronk, K.C.; Riches, B.R.; Mangan, S.A. Claremont Purpose Scale: A Measure That Assesses the Three Dimensions of Purpose among Adolescents. Res. Hum. Dev. 2018, 15, 101–117. [CrossRef] McGrath, R.E. A Summary of Construct Validity Evidence for Two Measures of Character Strengths. J. Personal. Assess. 2023, 105, 302–313. [CrossRef] [PubMed] Blackburn, A.M.; Vestergren, S.; the COVIDiSTRESS II Consortium. COVIDiSTRESS Diverse Dataset on Psychological and Behavioural Outcomes One Year into the COVID-19 Pandemic. Sci. Data 2022, 9, 331. [CrossRef] [PubMed] Han, H. Trust in the Scientific Research Community Predicts Intent to Comply with COVID-19 Prevention Measures: An Analysis of a Large-Scale International Survey Dataset. Epidemiol. Infect. 2022, 150, e36. [CrossRef] Han, H. Testing the Validity of the Modified Vaccine Attitude Question Battery across 22 Languages with a Large-Scale International Survey Dataset: Within the Context of COVID-19 Vaccination. Hum. Vaccines Immunother. 2022, 18, 2024066. [CrossRef] Stats 2024, 7 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 744 De Rooij, M.; Weeda, W. Cross-Validation: A Method Every Psychologist Should Know. Adv. Methods Pract. Psychol. Sci. 2020, 3, 248–263. [CrossRef] Bengio, Y.; Grandvalet, Y. No Unbiased Estimator of the Variance of K-Fold Cross-Validation. Adv. Neural Inf. Process. Syst. 2003, 16, 513–520. Tuarob, S.; Tucker, C.S.; Kumara, S.; Giles, C.L.; Pincus, A.L.; Conroy, D.E.; Ram, N. How Are You Feeling?: A Personalized Methodology for Predicting Mental States from Temporally Observable Physical and Behavioral Information. J. Biomed. Inform. 2017, 68, 1–19. [CrossRef] [PubMed] Lorenz, E.; Remund, J.; Müller, S.C.; Traunmüller, W.; Steinmaurer, G.; Pozo, D.; Ruiz-Arias, J.A.; Fanego, V.L.; Ramirez, L.; Romeo, M.G.; et al. Benchmarking of Different Approaches to Forecast Solar Irradiance. In Proceedings of the 24th European Photovoltaic Solar Energy Conference, Hamburg Germany, 21–25 September 2009; pp. 21–25. Morey, R.D.; Rouder, J.N.; Jamil, T.; Urbanek, K.; Ly, A. Package ‘BayesFactor. Available online: https://cran.r-project.org/web/ packages/BayesFactor/BayesFactor.pdf (accessed on 11 July 2024). Berry, D.A.; Hochberg, Y. Bayesian Perspectives on Multiple Comparisons. J. Stat. Plan. Inference 1999, 82, 215–227. [CrossRef] Wagenmakers, E.-J.; Love, J.; Marsman, M.; Jamil, T.; Ly, A.; Verhagen, J.; Selker, R.; Gronau, Q.F.; Dropmann, D.; Boutin, B.; et al. Bayesian Inference for Psychology. Part II: Example Applications with JASP. Psychon. Bull. Rev. 2018, 25, 58–76. [CrossRef] [PubMed] Meskó, N.; Kowal, M.; Láng, A.; Kocsor, F.; Bandi, S.A.; Putz, A.; Sorokowski, P.; Frederick, D.A.; García, F.E.; Aguilar, L.A.; et al. Exploring Attitudes Toward “Sugar Relationships” across 87 Countries: A Global Perspective on Exchanges of Resources for Sex and Companionship. Arch. Sex. Behav. 2024, 53, 811–837. [CrossRef] [PubMed] Kass, R.E.; Raftery, A.E. Bayes Factors. J. Am. Stat. Assoc. 1995, 90, 773–795. [CrossRef] Ahmed, S.E.; Hossain, S.; Doksum, K.A. LASSO and Shrinkage Estimation in Weibull Censored Regression Models. J. Stat. Plan. Inference 2012, 142, 1273–1284. [CrossRef] Scaliti, E.; Pullar, K.; Borghini, G.; Cavallo, A.; Panzeri, S.; Becchio, C. Kinematic Priming of Action Predictions. Curr. Biol. 2023, 33, 2717–2727.e6. [CrossRef] Štěrba, Z.; Šašinka, Č.; Stachoň, Z.; Kubíček, P.; Tamm, S. Mixed Research Design in Cartography: A Combination of Qualitative and Quantitative Approaches. Kartographische Nachrichten 2014, 64, 262–269. [CrossRef] Conn, V.S.; Chan, K.C.; Cooper, P.S. The Problem With p. West. J. Nurs. Res. 2014, 36, 291–293. [CrossRef] [PubMed] Berger, J.O.; Sellke, T. Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence. J. Am. Stat. Assoc. 1987, 82, 112–122. [CrossRef] Wasserstein, R.L.; Lazar, N.A. The ASA’s Statement on p-Values: Context, Process and Purpose. Am. Stat. 2016, 70, 129–133. [CrossRef] Cohen, J. The Earth Is Round (p < 0.05). Am. Psychol. 1994, 49, 997–1003. [CrossRef] Raftery, A.E. Bayesian Model Selection in Social Research. Sociol. Methodol. 1995, 25, 111. [CrossRef] Dreisbach, C.; Maki, K. A Comparison of Hypothesis-Driven and Data-Driven Research: A Case Study in Multimodal Data Science in Gut-Brain Axis Research. CIN Comput. Inform. Nurs. 2023, 41, 497–506. [CrossRef] [PubMed] Mizumoto, A. Calculating the relative importance of multiple regression predictor variables using dominance analysis and random forests. Lang. Learn. 2023, 73, 161–196. [CrossRef] Lee, Y.; Song, J. Robustness of model averaging methods for the violation of standard linear regression assumptions. Commun. Stat. Appl. Methods 2021, 28, 189–204. [CrossRef] Fragoso, T.M.; Bertoli, W.; Louzada, F. Bayesian model averaging: A systematic review and conceptual classification. Int. Stat. Rev. 2018, 86, 1–28. [CrossRef] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.