[go: up one dir, main page]

Academia.eduAcademia.edu
BAYESIAN MODEL AVERAGING 1 Improved Model Exploration for the Relationship between Moral Foundations and Moral Judgment Development Using Bayesian Model Averaging Hyemin Han Kelsie J. Dawson University of Alabama Author Note Hyemin Han and Kelsie J. Dawson, Educational Psychology Program, University of Alabama. Correspondence concerning this article should be addressed to Hyemin Han, Educational Psychology Program, University of Alabama, Tuscaloosa, AL 35487. Contact: hyemin.han@ua.edu The authors report no conflicting interests. BAYESIAN MODEL AVERAGING 2 Improved Model Exploration for the Relationship between Moral Foundations and Moral Judgment Development Using Bayesian Model Averaging Abstract Although some previous studies have investigated the relationship between moral foundations and moral judgment development, the methods used have not been able to fully explore the relationship. In the present study, we used Bayesian Model Averaging (BMA) in order to address the limitations in traditional regression methods that have been used previously. Results showed consistency with previous findings that binding foundations are negatively correlated with post-conventional moral reasoning and positively correlated with maintaining norms and personal interest schemas. In addition to previous studies, our results showed a positive correlation for individualizing foundations and post-conventional moral reasoning. Implications are discussed as well as a detailed explanation of the novel BMA method in order to allow others in the field of moral education to be able to use it in their own studies. Keywords: Bayesian Statistics; Bayesian Model Averaging; Moral Foundations; Moral Judgment Development BAYESIAN MODEL AVERAGING 3 Improved Model Exploration for the Relationship between Moral Foundations and Moral Judgment Development Using Bayesian Model Averaging Introduction In the present study, we explored the relationship between moral foundations and moral judgment development with the Bayesian Model Averaging (BMA) method that allows the comparison of multiple candidate models. The theoretical frameworks of our study are based on the Moral Foundations Theory (MFT) that proposes five different moral foundations and the Neo-Kohlbergian theory that presents three different moral schemas. Instead of testing a specific regression model based on a specific hypothesis, to better conduct data-driven exploration, we employed the BMA method. We explored the BMA-identified models that predicted three individual schemas of moral reasoning by moral foundations as independent variables and evaluated the performance of the models. At the same time, we also introduced novel data exploration to the community of moral education researchers for improvement of research methodology in the field. Moral Foundations and Moral Judgment Development The MFT argues that the basis of moral judgment and moral decision-making are not unidimensional but multidimensional and diverse (Graham et al., 2011). In the MFQ researchers’ original research, they proposed five different foundations for morality: harm and care (HC; avoiding potential harms to others), fairness and reciprocity (pursuing justice in sharing; FR), ingroup and loyalty (taking care of one’s own group, family, nation; IL), authority and respect (respecting legitimate authority and convention; AR), and purity and sanctity (avoiding disgusting entities including actions; PS) (Graham et al., 2011; Graham, Haidt, & Nosek, 2009). People differently value these five foundations when they consider morality and make moral BAYESIAN MODEL AVERAGING 4 decisions. According to the MFT, people’s perspectives, such as political and religious orientations, significantly influence which foundations are highly regarded. For instance, binding foundations, IL, AR, and PS, which are about binding social and community bonds, are highly valued by conservatives, while liberals more focus on individualizing foundations, HC and FR (Graham et al., 2009; Haidt & Graham, 2007). These foundations have been found to be associated with the pattern of one’s moral judgment and decision-making (e.g., Vaughan, Bell Holleran, & Silver, 2019; Winterich, Zhang, & Mittal, 2012). In addition to moral judgment and decision-making, other domains in moral functioning in general, such as empathy, were also found to be associated with moral foundations. For instance, Hannikainen, Hudson, Chopik, Briley, and Derringer (2020) reported that individualizing foundations were positively correlated with empathic concern and perspective taking. The proponents of the MFT argued that the cognitive developmental model of moral judgment, i.e., Kohlbergian and Neo-Kohlbergian theories, cannot explain the diverse nature of morality (Graham et al., 2011). According to Kohlbergian and Neo-Kohlbergian theories, moral judgment development can be explained in terms of the sophistication of post-conventional (PC) moral reasoning (Kohlberg, 1981; Rest, Narvaez, Bebeau, & Thoma, 1999a). Neo-Kohlbergians, who updated Kohlbergian theory, proposed that moral judgment development is associated with the likelihood of the utilization of three schemas, personal-interest (PI), maintaining norms (MN) and PC schemas, in making moral judgment (Rest, Narvaez, Bebeau, & Thoma, 1999b). A person with well-developed moral reasoning is likely to make judgments based on PC and universal moral principles, and less likely to rely on existing social norms or prioritize personal interests. BAYESIAN MODEL AVERAGING 5 A widely used quantitative tool to measure one’s moral judgment development is the Defining Issues Test (DIT). The DIT quantifies one’s likelihood to utilize each of PC, MN, and PI schemas in moral dilemma solving. Particularly, the PC schema score, the P-score, has been used as an index for the development of PC reasoning (Han, Dawson, Thoma, & Glenn, 2020). Its updated version, the DIT-2, provides an additional index, the N2 score as an index for the overall sophistication of moral reasoning, which represents to what extent one prefers the PC schema against the MN and PI schema (Rest, Narvaez, Thoma, & Bebeau, 1999). Prior studies using these tools have shown that the presence of PC moral reasoning is positively associated with moral motivation and behavior (see Thoma (2002) and (2006) for review). However, the MFT proponents criticized that such theoretical frameworks only focus on individualizing foundations, not binding foundations, in explaining the PC schema. They argued that the Kohlbergian and Neo-Kohlbergian theories are biased due to the most sophisticated moral reasoning, PC reasoning, mainly addressing individualizing values, such as justice and fairness, while valuing other important foundations relatively less (Emler, Renwick, & Malone, 1983; Graham et al., 2011). For instance, conservatives are more likely to receive lower DIT scores, particularly PC scores, compared with liberals because they highly value binding foundations but not because their moral reasoning is less developed (Graham et al., 2011). Thus, investigating the relationship between moral foundations and moral judgment development with empirical data is necessary to examine such points of concern. Two previous studies have examined this relationship. Correlational analysis in Baril and Wright's (2012) study (Study 1) reported that the DIT PC score was negatively associated with IL and AR, while the DIT MN score was positively associated with the two foundations; the DIT PI was only positively associated with IL. In general, they found a significant association BAYESIAN MODEL AVERAGING 6 between two binding foundations and DIT schemas. However, they could not find any direct association between moral reasoning and individualizing foundations although the (individualizing– binding) score showed significant correlation with the PC score. Glover et al. (2014) used structural equation modeling to examine the relationship between two MFQ latent variable scores, individualizing and binding foundation scores, and the DIT-2 N2 and schema scores. They reported significant association between DIT-2 scores and binding foundations. However, similar to Baril and Wright's (2012) study, individualizing foundations did not show any significant association with DIT-2 scores. Interestingly, the aforementioned previous studies consistently reported non-significant association between individualizing foundations and moral reasoning. The PC moral reasoning is related to the consideration of universal moral principles (Kohlberg, 1981), and individualizing foundations, HC in particular, are deemed to constitute the basis of such principles by several social psychologists (Gray & Keeney, 2015; Gray & Schein, 2012; Schein & Gray, 2018). Hence, it would be generally expected that individualizing foundations, at least HC, would show meaningful relation with PC reasoning. However, the two previous studies that employed both theoretical frameworks, the MFT and moral judgment development, could not find any direct association between individualizing foundations and PC reasoning. In addition, the methodological aspects would also warrant further investigations on the relationship between moral foundations and reasoning. Although the previous studies examined the relationship with quantitative approaches, we decided to employ a novel analysis method to address their methodological limitations. Further details about these limitations are elaborated in the following section introducing the BMA method as an alternative statistical approach. BAYESIAN MODEL AVERAGING 7 Bayesian Model Averaging As a way to properly explore the relationship between moral foundations and moral judgment development we employed the BMA method. The previous studies used frequentist analysis methods, which are only suitable for testing specific null hypotheses but not for model exploration. For instance, they used P-values to examine whether the variables of interest were significant. This is not an ideal interpretation since p-values can only provide information about null hypotheses, but not alternative hypotheses that researchers are mainly interested in (Wagenmakers et al., 2017). In addition, when the frequentist analysis is applied in regression, only one regression model can be tested (Raftery, 1995). Thus, even if p < .05 is reported, it does not necessarily mean that the tested regression model is the best model among possible alternative models consisting of possible combinations of tested independent variables. Although there are several methods to seek the best model based on the frequentist perspective, such as the stepwise regression, statisticians have warned that such methods are not ideal for the purpose because they could not properly address several critical issues such as inflation of false positives and potential multicollinearity (Hammami, Lee, Ouarda, & Lee, 2012; Harrell, 2015; Tibshirani, 1997), as well as overfitting (McNeish, 2015). Given that it is difficult to identify the best model with the traditional method, a predicted regression model is likely to be overfitted to the data used for regression. Then, the model could not well explain the reality out of the boundary of the used data. Model selection methods based on a different perspective, Bayesian perspective, are possible ways to address the aforementioned issues. In particular, BMA is one of the most widely used Bayesian methods for model exploration (Hoeting, Madigan, Raftery, & Volinsky, 1999). The BMA process starts with assigning the prior probabilities of candidate predictors. As BAYESIAN MODEL AVERAGING 8 a default option, the BMA package implemented in R employs 50% as the prior probability, P(H) (Raftery, Hoeting, Volinsky, Painter, & Yeung, 2020); this means that at the beginning, the probability to be non-zero and included in a regression model of each candidate predictor is 50%. Then, through iterative observation processes, the prior probabilities are updated with data and the posterior probabilities, P(H|D)s, are calculated with Bayes’ theorem (Wagenmakers et al., 2018). P(H|D) is the probability that a specific candidate predictor is non-zero given data (Raftery, 1995). At the end of the iterative updating process, the best model can be identified by comparing the probabilities between alternative models consisting of combinations of candidate predictors (Raftery et al., 2020). As a result, a model reporting the highest probability, which includes predictors with high P(H|D)s, can be selected as the best model. The BMA method is better than the traditional regression method for model selection in several aspects. First, it allows comparison between multiple alternative candidate models instead of testing only one model (Raftery, 1995). Related to this point, the BMA method allows us to address the issue of model uncertainty (Raftery, Madigan, & Hoeting, 1997). If we test only one single model with the traditional regression model, we cannot address model uncertainty regarding the extent to which the tested model is likely to be the true model given data (George & Clyde, 2004), as no other alternative model is tested. The BMA method provides information for the Bayesian posterior probability by comparing the multiple candidate models (George & Clyde, 2004), so we can quantitatively examine model uncertainty with this method. Second, because the Bayesian method utilizes the posterior probabilities of predictors that directly address whether the predictors are non-zero (alternative hypotheses), unlike P-values that only address null hypotheses, it is epistemologically better than the traditional method (Han, Park, & Thoma, 2018). Third, in general, BMA tends to penalize complex models with unnecessary BAYESIAN MODEL AVERAGING 9 predictors, so it is robust against overfitting. Concrete examples have shown that BMA-identified models more accurately predicted dependent variables out of the boundary of data used for regression compared with the regression models predicted by ordinary linear regression (Hoeting et al., 1999). Thus, we expected that BMA would be able to address the aforementioned methodological limitations in the previous studies that examined the relationship between moral foundations and reasoning. Present Study In the present study, we explored the best regression models for predicting the bDIT schema scores, PC, MN, and PI scores, with moral foundations. By employing the BMA method, we examined which moral foundation variables should be included in and excluded from the best regression models. To control the effects of political and religious backgrounds, we used political and religious affiliation variables as covariates during the model selection process. Once the best models were identified by the BMA method, we examined whether the BMA-identified models were statistically better compared with the full models including all candidate predictors, which were similar to the models predicted by traditional regression methods in the prior studies. We compared Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Bayes Factor, and cross-validation performance between the two different types of models. Finally, we discussed the meanings and implications of the findings from the BMA process. Methods Participants Participants were college students recruited from a university located in the Southern United States of America. Participants were enrolled in courses in psychology or educational psychology and invited to sign up for an online subject pool and provided with a link to a BAYESIAN MODEL AVERAGING 10 Qualtrics survey. When they completed the survey, a course credit was offered to them as compensation. The design of the present study and consent form were reviewed and approved by the Institutional Review Board. Data collected from a total of 461 participants (14.32% male, 85.68% female) were analyzed in the present study. Their mean age was 22.50 years (SD = 6.80 years). The composition of the participants’ political affiliations was: 44.25% Republicans, 27.77% Democrats, 14.53% independents, 3.25% Libertarians, .22% Green Party, and 9.98% other. The participants’ religious affiliations were: 24.73% Catholics, 20.61% Evangelical Protestants, 13.02% non-Evangelical Protestants, 10.63% spiritual but not religious, 4.56% Agnostic, 3.25% Atheistic, .65% Jewish, .43% Muslims, .22% Hindis, and 21.91% other. Materials Behavioral Defining Issues Test We employed the Behavioral DIT (bDIT), which is a shortened version of the original DIT more suitable for behavioral experiments and online studies, to examine participants’ moral judgment development (Han, Dawson, Thoma, et al., 2020). The bDIT consists of three moral dilemmas, Heinz and the Drug, Escaped Prisoner, and Newspaper, and the overall organization is similar to that of the original DIT. First, participants were presented with each moral dilemma. Then, they were asked to choose their behavioral option. For instance, after presenting Heinz and the Drug, participants were asked whether they shall steal the drug or not. After making the behavioral choice, a total of eight items were presented that asked what the most important moral philosophical rationale in the decision-making was. In each of the eight items, three rationale options, one corresponding to each schema, were offered. As three dilemmas were presented, a total of 24 items (8 per dilemma) were presented. BAYESIAN MODEL AVERAGING 11 Then, three individual schema scores were calculated: a PC (P-score equivalent), MN, and PI score. Each schema score was calculated as follows: 𝑆𝑐ℎ𝑒𝑚𝑎 𝑆𝑐𝑜𝑟𝑒 = # 𝑜𝑓 𝑎 𝑐𝑒𝑟𝑡𝑎𝑖𝑛 𝑠𝑐ℎ𝑒𝑚𝑎 𝑜𝑝𝑡𝑖𝑜𝑛 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 × 100 24 For instance, when a participant selected PC options 12 times out of all 24 items, their PC score becomes 12 / 24 x 100 = 50. The resultant schema scores are similar to the schema scores calculated from the original DIT. For example, the bDIT PC score is conceptually equivalent to the P-score from the original DIT (Han, Dawson, Thoma, et al., 2020), which has been used as an indicator of sophisticated moral reasoning (Rest & Narvaez, 1994). Previous studies testing the reliability of the bDIT showed that the internal consistency was at least acceptable (> .60) (Choi, Han, Dawson, Thoma, & Glenn, 2019; Han, Dawson, Thoma, et al., 2020). The bDIT showed good internal consistency in the present study as well (α = .77). Additionally, the internal consistency was at least acceptable in all three schemes, α = .80 in PC, α = .80 in MN, and α = .74 in PI, respectively. Moreover, the previous studies reported that the PC score calculated from the bDIT and that from the original DIT, the P-score, were very highly correlated with each other (Choi et al., 2019; Han, Dawson, Thoma, et al., 2020). The bDIT PC score also showed good convergent validity in terms of significant correlation with other moral psychological indicators (Han, Dawson, Choi, Choi, & Glenn, 2020), such as the moral and general growth mindset (Dweck, 2000; Han, Choi, Dawson, & Jeong, 2018), empathic concern and perspective taking (Davis, 1983), moral internalization (Aquino & Reed II, 2002), and moral disengagement (Moore, Detert, Klebe Trevino, Baker, & Mayer, 2012). Given these, the bDIT can be a reliable and valid test as a shortened version of the original in situations where the DIT is difficult to administer, such as experimental and online studies that use multiple measures. BAYESIAN MODEL AVERAGING 12 Moral Foundations Questionnaire We used the Moral Foundations Questionnaire (MFQ) to measure participants’ scores for five moral foundations (Graham et al., 2011). This questionnaire consists of a total of 30 items that were designed to assess participants’ perceived importance of each moral foundation in moral judgment. Each subscale score was calculated by averaging six items assigned to the subscale. Answers were anchored to a six-point Likert scale. The items asked whether each foundation was relevant to participants (e.g., “whether or not someone suffered emotionally”; 0: not at all relevant-5: extremely relevant) or whether they agreed with the statement about the foundation (e.g., “compassion for those who are suffering is the most crucial virtue”; 0: strongly disagree-5: strongly agree). All five subscales showed at least acceptable internal consistency: HC .70, FR .68, IL .73, AR .72, and PS .78. Procedure We recruited participants from psychology and educational psychology subject pool systems. They signed up for our study and received a link to our Qualtrics survey. Once they completed the consent procedure, they were randomly presented with the bDIT and MFQ. After completing the tests, participants were asked to complete a demographics survey form. Then, they were referred to the pool system again in order to receive a course credit as compensation. Analysis We used R to analyze the data. Before starting model exploration, we examined the descriptive statistics of the variables of interest, the bDIT schema and MFQ subscale scores. In addition, we conducted correlation analysis. Since multiple correlation coefficients were tested, the false positive discovery rate (FDR) correction was performed. BAYESIAN MODEL AVERAGING 13 To search for the best model, we employed the BMA method implemented in an R package, BMA. Based on Bayesian statistics, BMA calculates the posterior probability of each candidate model, which indicates to which extent the model is favored by data (Raftery et al., 2020). Then, a model with the highest posterior probability was identified as the best model to be selected (Hoeting et al., 1999). We performed three BMA processes to search for the best models predicting three dependent variables, bDIT-PC, MN, and PI scores. As candidate predictors, five MFQ subscale scores were entered into the model. In addition, participants’ political and religious affiliations were used as control variables and transformed into dummy variables. Once the BMA process was completed, we identified the model with the highest posterior probability as the best model. To evaluate the identified models, we compared the BMA-identified and full models with AIC, BIC, Bayes Factors, and cross-validation performance. First, we created two linear regression models, the BMA-identified and full models, with lm function. Then, both AIC and BIC were calculated in order to determine which model had smaller AIC and BIC values, which is deemed better than the other (Zucchini, Claeskens, & Nguefack-Tsague, 2016). Second, a Bayes Factor that indicates which model is relatively more probable than the other given data was employed; for instance, if two models, M1 and M2 are compared, a Bayes Factor BF12, P(D|M1)/P(D|M2), quantifies to which extent evidence supports M1 against M2 (Han, Park, et al., 2018; Wagenmakers et al., 2018). We calculated and compared Bayes Factors of the BMA-identified and full models with the BayesFactor package (Morey, Rouder, Jamil, Urbanek, & Ly, 2018). According to the guidelines suggested by Bayesian statisticians (Kass & Raftery, 1995), we concluded that when BFBMAFull = P(D|MBMA)/P(D|MFull) > 3, evidence positively supports the BMA-identified model against the full model, when BFBMAFull > 6, BAYESIAN MODEL AVERAGING 14 evidence strongly supports, and when BFBMAFull >10, evidence very strongly supports (Kass & Raftery, 1995). Third, we examined whether the BMA-identified model was more robust against overfitting compared with the full models with K-fold cross-validation. K-fold cross-validation examines whether a regression model can predict phenomena out of the boundary of the data used for regression (Han, Lee, & Soylu, 2020). When this method is employed, 1/K of the whole dataset is randomly sampled and used for validation. Then, regression is performed with the rest of the dataset (1-1/K). In our study, we used 50% of the dataset for validation and the remaining 50% for regression as we employed 2-fold cross-validation. The validation dataset, which was not used to estimate coefficients, is entered into the regression model to predict the dependent variable. We used the mean squared error (MSE) with the validation dataset for model evaluation. The MSE in our study was calculated as follows: 𝑀𝑆𝐸 = 1 <(𝑦?! − 𝑦! )" 𝑁 Where N is the size of the validation dataset, 𝑦?! is the predicted value and 𝑦! is the actual value of the dependent variable. When two models’ MSEs are compared, the smaller MSE indicates the better model (Browne, 2000). We examined whether the MSE calculated with the validation dataset was smaller in the BMA-identified model compared with the full model. We repeated this procedure 10,000 times for randomization. We performed frequentist and Bayesian t-tests to compare the models predicting PC, MN, and PI scores. Readers interested in technical further details of statistical analyses may refer to the Open Science Framework (OSF; https://osf.io/vtrxu/), where all data and source code files were shared. BAYESIAN MODEL AVERAGING 15 Results Descriptive statistics and correlation analysis The descriptive statistics are presented in Table 1. The result from correlation analysis was also reported in the same table. Table 1 Descriptive statistics and correlation analysis result M (SD) bDIT PC bDIT MN bDIT PI MFQ HC MFQ FR MFQ IL MFQ AR bDIT PC 50.91 (21.44) - bDIT MN 29.25 (19.33) -.49*** - bDIT PI 18.99 (14.58) -.27*** -.23*** - MFQ HC 4.60 (.79) .16** -.08 -.14* - MFQ FR 4.46 (.72) .21*** -.15* -.11 .72*** - MFQ IL 3.87 (.87) -.23** .15* .15* .43*** .34*** - MFQ AR 4.13 (.83) -.21*** .17** .05 .48*** .41*** .76*** - MFQ PS 3.99 (.96) -.18** .16** .06 .45*** .39*** .65*** .73*** Note. *** p < .001, ** p < .01, * p < .05. All p-values were corrected for FDR. Model selection with Bayesian Model Averaging Table 2 presents the best models predicting bDIT PC, MN, and PI scores that were selected with the BMA method. The best models reported the greatest posterior probability values. The estimated coefficients and standard errors of predictors and covariates that were included in each best model are also presented in the table. Table 2 BMA-identified regression models Variable bDIT PC bDIT MN bDIT PI B (SE) B (SE) B (SE) BAYESIAN MODEL AVERAGING Intercept MFQs 35.41 (5.86) 32.25 (5.66) MFQ HC 5.47 (1.63) - MFQ FR 7.03 (1.72) MFQ IL -6.19 (1.33) MFQ AR MFQ PS Covariates 16 Independent Libertarian Non-Evangelical Protestant - -7.12 (1.26) 6.93 (1.10) 23.89 (4.08) -4.65 (.91) 4.30 (.83) - -4.77 (1.22) - - 6.85 (2.55) - - - 12.12 (4.70) - 10.54 (2.44) - - Note. Estimated coefficients and standard errors were displayed only when respective predictors were included in the best models. Covariates other than being Independent, Libertarian, and Non-Evangelical Protestant were not selected in any best model. Reference group: Republican (political affiliation) and Catholic (religious affiliation). Comparison with the full model We compared the BMA-identified and the full models, which included all candidate predictors and covariates. First, AIC and BIC were compared between the two different types of models. AIC was smaller in the BMA-identified model versus the full model when bDIT PC (4,007.23 vs. 4,008.40) and PI scores were predicted (3,747.25 vs. 3,764.38); however, the full model reported the smaller AIC in predicting the bDIT MN score (3987.37 vs. 3991.13). When BIC was compared, the BMA-identified models reported the smaller BIC compared with the full models in all cases, predicting bDIT PC (4,040.29 vs. 4,095.20), MN (4,011.80 vs. 4,074.17), and PI scores (3,763.78 vs. 3,851.18). Second, we compared Bayes Factors between two different types of models. In all cases, the BMA-identified models were very strongly supported against the full models by evidence (BFBMAFull ≥ 10). In predicting the bDIT PC, the calculated BFBMAFull was 1,639.85. When the BAYESIAN MODEL AVERAGING 17 bDIT MN was predicted, BFBMAFull was 463.64. In the prediction of the bDIT PI, the resultant BFBMAFull was 528,770.60. Finally, we performed 2-fold cross-validation with 10,000 iterations to test whether the BMA-identified models were more robust against overfitting compared with the full models. As reported in Table 3, in all cases, the BMA-identified models showed smaller MSEs. Both frequentist and Bayesian t-tests consistently reported that the non-zero differences were very strongly supported by evidence. Table 3 Results from 2-fold cross-validation BMA model MSE Full model MSE Mean over 10,000 iterations Mean over 10,000 SD iterations SD t p Cohen's D BF bDIT PC 18.76 .64 19.10 .69 -35.70 < .001 -.50 2.35E+266 bDIT MN 18.37 .68 18.78 .75 -41.33 <.001 -.58 9.75E+353 bDIT PI 14.10 .47 14.57 .51 -68.10 <.001 -.96 2.42E+903 Discussion In the present study, we explored the best models that predicted bDIT schema scores with five individual moral foundations using the BMA method. We controlled for the effects from political and religious backgrounds since they are considered to be associated with one’s moral foundations and reasoning. The results showed that first, the PC schema was positively associated with both individualizing foundations, while two binding foundations, IL and PS, showed negative association. Second, the MN schema was positively associated with AR, but negatively associated with FR. Third, the PI schema score showed positive correlation with IL and negative correlation with HC. The BMA-identified models showed better performance compared with the full models, which are employed in the frequentist analysis, in terms of BAYESIAN MODEL AVERAGING 18 information criteria (except the AIC in the case of bDIT MN), Bayes Factors, and robustness against overfitting (cross-validation performance). In general, findings regarding the relationship between binding foundations and moral judgment development from the present study are coherent with what has been proposed in theories in moral psychology and moral development. With the BMA method, we were able to successfully explore the specified contribution of each individual binding foundation to each schema score, which could not be well examined in the prior studies using the traditional regression method. The findings from the BMA-identified model prediction in the present study are consistent with previous studies in terms of the relationship between moral foundations and reasoning. As Baril and Wright (2012) and Glover et al. (2014) reported, we found the negative association between binding foundations, IL and PS, and the PC reasoning. The MN and PI schema scores were positively associated with AR and IL, respectively. One interesting point was that two binding foundations showed significant association with either the MN or PI score. Given that the MN schema concerns abiding by and valuing social norms and conventions while making moral judgments (Thoma, 2014), it is not surprising to see that AR, which is about following legitimate authority and respecting conventions (Haidt & Graham, 2007), was a significant predictor for the MN score. IL, on the other hand, focuses on taking care of one’s own group members, such as close others (Haidt & Graham, 2007). Since the PI schema is related to prioritizing close relationships rather than larger social systems (Thoma, 2014), it is also reasonable to see that IL significantly predicted the PI schema score. Moreover, in general, the reported associations between individualizing foundations and schema scores seem consistent with what Gray and Keeney (2015) argued about the universality of the HC foundation in moral judgment. Given that both the most and least sophisticated BAYESIAN MODEL AVERAGING 19 schemas showed significant association with HC, HC can be considered as one of the most fundamental moral principles that has developmental implications (Schein & Gray, 2018). Also, the fact that the BMA-identified models, which is a parsimonious model with the most essential predictors (Clyde, 2000), included HC might empirically support the aforementioned point. HC negatively predicted the PI schema score perhaps due to the fact that the PI schema is mainly concerned with welfare within the boundary of one’s close others, instead of potential harms to society beyond the boundary (Thoma, 2014), while HC is more about universal moral concerns (Schein & Gray, 2018). Such a trend was less obvious in the case of the MN schema, as this schema is at least related with potential harms to societal beings and conventions, which exist outside of the boundary, although the schema does not fully address universal issues (Thoma, 2014). Instead, FR negatively predicted the MN score. Given that the perspective of fairness constitutes the basis to critically deliberate upon existing norms and conventions (Emler, Tarry, & James, 2007; González, 2002), the strong perception of fairness would be negatively associated with the MN schema, which is more about maintaining existing norms instead of criticizing them (Thoma, 2014), as shown in the present study. The aforementioned associations existed even after controlling for political and religious affiliations, which were not fully considered in the models of previous studies. Although Graham et al. (2011) argued that the Neo-Kohlbergian theory might be liberal biased and conservatives are likely to receive low DIT scores, we found pure contributions of moral foundations to the bDIT schema scores even after controlling for political and religious backgrounds. This may suggest that the variance in moral judgment development is not completely attributable to political or religious orientation but moral foundations per se explain a significant amount of the variance. It is consistent with Thoma, Narvaez, Rest, and Derryberry's (1999) argument that the BAYESIAN MODEL AVERAGING 20 DIT provides unique information about moral judgment development “above and beyond that accounted for by … political attitudes… (p. 338).” The present study has several methodological implications for future research as well. We employed the BMA method that allowed us to explore the best model among possible competitive candidate models and address moral uncertainty (George & Clyde, 2004; Raftery et al., 1997). The traditional regression method can only test one model against the null model, so previous studies that used the traditional method perhaps could not fully explore the relationship between variables of interest, moral foundations and reasoning. We successfully explored the best models predicting moral reasoning with individual moral foundations and control variables with the BMA method. The BMA-identified models showed better performance compared with the full models that have been widely tested with the traditional regression method. Particularly, we found improved cross-validation performance when the BMA method was implemented. Given this, the BMA methods will allow us to address the issue of overfitting that may significantly threaten the generalizability and replicability of empirical studies based on regression (Babyak, 2004). Finally, we uploaded the data and R code files, which include lineby-line comments, to our OSF project space (https://osf.io/vtrxu/) so that readers can practice the BMA method with a concrete example. By doing so, researchers in moral education who are interested in exploring the relationship between variables will be able to employ the introduced method, which is more suitable for data exploration than the traditional regression method, in their projects. In addition, the findings from the present study might also suggest implications for moral education. Given the strong association between individualizing foundations and the postconventional reasoning, moral educators may consider emphasizing those foundations in moral BAYESIAN MODEL AVERAGING 21 educational activities, particularly those aiming at the development of moral judgment. Of the two individualizing foundations, HC in particular seemingly showed a consistent association with the moral judgment development as it showed positive correlation with PC and negative correlation with PI. In fact, Gray and Keeney (2015) and Gray and Schein (2012) argued that HC shall constitute the basis of moral reasoning in general. Proponents of the MFTs also acknowledged that HC is the foundation that is commonly valued across people with different political views (Graham et al., 2009). Related to this point, Greene and his colleagues, who proposed the dual-process model in moral judgment, suggested that HC can be utilized as a referencing source in solving complex moral problems that are involved in conflicts between different values and views through the deliberative process (Cushman, Young, & Greene, 2010; Greene, 2014). Given that the modern world has been highly diversified and we can see the increasing trend of conflicts between different cultures and orientations, it would be necessary to emphasize HC as the core feature in moral education (e.g., perceiving potential harms to others’ wellbeing (Rest & Narvaez, 1994), caring about others’ pains and concerns (Colby & Damon, 1992)) to address the aforementioned current social problem in the long term. As a possible direction for further investigations using the bDIT, researchers may consider examining the association between automatic and behavioral aspects of moral judgment and moral foundations with the bDIT. Because the present study primarily aimed at exploring the association between moral foundations and moral judgment, we focused on the explicit outcomes of the bDIT, the schema scores. However, the bDIT has been developed to measure participants’ behavioral responses to dilemmas, such as the reaction time, that can be used as proxies for automatic, implicit, and unconscious aspects of moral judgment. In fact, the first study that employed the bDIT also examined the relationship between moral judgment development, moral BAYESIAN MODEL AVERAGING 22 competence, and behavioral responses (Han, Dawson, Thoma, et al., 2020). Thus, given that the MFT has been proposed conceptually based on the social intuitionist model (Graham et al., 2009), an approach to explain people’s moral decision-making processes as automatic and intuitive processes (Haidt, 2001), examining the relationship between moral foundations and intuitive and automatic aspects of moral judgment with the bDIT would be informative for improving our knowledge about how moral judgment occurs in the reality. There are several limitations in the present study that warrant further studies. First, although the prior research reported several preliminary findings that suggested the association between moral foundations and moral functioning in general, we only focused on moral judgment and reasoning in the present study. Given that other domains in moral functioning were out of the scope of the current study, further investigations are necessary to examine the aforementioned association. Second, we collected data from college students, so findings from the present study might not be completely generalized across different groups. Third, we used political and religious affiliation information instead of actual political and religious orientations as control variables. To address these limitations, future studies that recruit participants from diverse backgrounds and employ the multidimensional scales for political and religious orientations. References Aquino, K., & Reed II, A. (2002). The self-importance of moral identity. Journal of Personality and Social Psychology, 83(6), 1423–1440. doi:10.1037//0022-3514.83.6.1423 Babyak, M. A. (2004). What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosomatic Medicine, 66(3), 411–421. doi:10.1097/01.psy.0000127692.23278.a9 BAYESIAN MODEL AVERAGING 23 Baril, G. L., & Wright, J. C. (2012). Different types of moral cognition: Moral stages versus moral foundations. Personality and Individual Differences, 53(4), 468–473. doi:10.1016/j.paid.2012.04.018 Browne, M. W. (2000). Cross-Validation Methods. Journal of Mathematical Psychology, 44(1), 108–132. doi:10.1006/jmps.1999.1279 Choi, Y.-J., Han, H., Dawson, K. J., Thoma, S. J., & Glenn, A. L. (2019). Measuring moral reasoning using moral dilemmas: evaluating reliability, validity, and differential item functioning of the behavioural defining issues test (bDIT). European Journal of Developmental Psychology, 1–10. doi:10.1080/17405629.2019.1614907 Clyde, M. (2000). Model uncertainty and health effect studies for particulate matter. Environmetrics, 11(6), 745–763. doi:10.1002/1099-095X(200011/12)11:6<745::AIDENV431>3.0.CO;2-N Colby, A., & Damon, W. (1992). Some do care : contemporary lives of moral commitment. New York, NY: Free Press. Cushman, F., Young, L., & Greene, J. D. (2010). Our multi-system moral psychology: Towards a consensus view. In J. D. Doris (Ed.), The Moral Psychology Handbook (pp. 47–71). Oxford, UK: Oxford University Press. Davis, M. H. (1983). Measuring individual differences in empathy: Evidence for a multidimensional approach. Journal of Personality and Social Psychology, 44, 113–126. doi:10.1037/0022-3514.44.1.113 Dweck, C. S. (2000). Self-Theories: Their Role in Motivation, Personality, and Development. Philadephia, PA: Psychology Press. Emler, N., Renwick, S., & Malone, B. (1983). The relationship between moral reasoning and BAYESIAN MODEL AVERAGING 24 political orientation. Journal of Personality and Social Psychology, 45(5), 1073–1080. doi:10.1037/0022-3514.45.5.1073 Emler, N., Tarry, H., & James, A. St. (2007). Post-conventional moral reasoning and reputation. Journal of Research in Personality, 41(1), 76–89. doi:10.1016/j.jrp.2006.02.003 George, E. I., & Clyde, M. (2004). Model Uncertainty. Statistical Science, 19(1), 81–94. doi:10.1214/088342304000000035 Glover, R. J., Natesan, P., Wang, J., Rohr, D., McAfee-Etheridge, L., Booker, D. D., … Wu, M. (2014). Moral rationality and intuition: An exploration of relationships between the Defining Issues Test and the Moral Foundations Questionnaire. Journal of Moral Education, 43(4), 395–412. doi:10.1080/03057240.2014.953043 González, E. (2002). Defining a post-conventional corporate moral responsibility. Journal of Business Ethics, 39, 101–108. doi:10.1023/A:1016388102599 Graham, J., Haidt, J., & Nosek, B. A. (2009). Liberals and conservatives rely on different sets of moral foundations. Journal of Personality and Social Psychology, 96(5), 1029–1046. doi:10.1037/a0015141 Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., & Ditto, P. H. (2011). Mapping the moral domain. Journal of Personality and Social Psychology, 101(2), 366–385. doi:10.1037/a0021847 Gray, K., & Keeney, J. E. (2015). Disconfirming Moral Foundations Theory on Its Own Terms. Social Psychological and Personality Science, 6(8), 874–877. doi:10.1177/1948550615592243 Gray, K., & Schein, C. (2012). Two Minds Vs. Two Philosophies: Mind Perception Defines Morality and Dissolves the Debate Between Deontology and Utilitarianism. Review of BAYESIAN MODEL AVERAGING 25 Philosophy and Psychology, 3(3), 405–423. doi:10.1007/s13164-012-0112-5 Greene, J. D. (2014). Beyond Point-and-Shoot Morality: Why Cognitive (Neuro)Science Matters for Ethics. Ethics, 124(4), 695–726. doi:10.1086/675875 Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review, 108(4), 814–834. Haidt, J., & Graham, J. (2007). When Morality Opposes Justice: Conservatives Have Moral Intuitions that Liberals may not Recognize. Social Justice Research, 20(1), 98–116. doi:10.1007/s11211-007-0034-z Hammami, D., Lee, T. S., Ouarda, T. B. M. J., & Lee, J. (2012). Predictor selection for downscaling GCM data with LASSO. Journal of Geophysical Research: Atmospheres, 117(D17), n/a-n/a. doi:10.1029/2012JD017864 Han, H., Choi, Y.-J., Dawson, K. J., & Jeong, C. (2018). Moral growth mindset is associated with change in voluntary service engagement. PLOS ONE, 13(8), e0202327. doi:10.1371/journal.pone.0202327 Han, H., Dawson, K. J., Choi, Y. R., Choi, Y.-J., & Glenn, A. L. (2020). Development and validation of the English version of the Moral Growth Mindset measure. F1000Research, 9, 256. doi:10.12688/f1000research.23160.3 Han, H., Dawson, K. J., Thoma, S. J., & Glenn, A. L. (2020). Developmental Level of Moral Judgment Influences Behavioral Patterns During Moral Decision-Making. The Journal of Experimental Education, 88(4), 660–675. doi:10.1080/00220973.2019.1574701 Han, H., Lee, K., & Soylu, F. (2020). Applying the Deep Learning Method for Simulating Outcomes of Educational Interventions. SN Computer Science, 1(2), 70. doi:10.1007/s42979-020-0075-z BAYESIAN MODEL AVERAGING 26 Han, H., Park, J., & Thoma, S. J. (2018). Why do we need to employ Bayesian statistics and how can we employ it in studies of moral education?: With practical guidelines to use JASP for educators and researchers. Journal of Moral Education, 47(4), 519–537. doi:10.1080/03057240.2018.1463204 Hannikainen, I. R., Hudson, N. W., Chopik, W. J., Briley, D. A., & Derringer, J. (2020). Moral migration: Desires to become more empathic predict changes in moral foundations. Journal of Research in Personality, 88, 104011. doi:10.1016/j.jrp.2020.104011 Harrell, F. E. (2015). Regression Modeling Strategies. Cham, Switzerland: Springer International Publishing. doi:10.1007/978-3-319-19425-7 Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14(4), 382–401. doi:10.1214/ss/1009212519 Kass, R. E., & Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical Association, 90(430), 773–795. doi:10.2307/2291091 Kohlberg, L. (1981). The philosophy of moral development: Moral stages and the idea of justice. San Francisco: Harper & Row. McNeish, D. M. (2015). Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences. Multivariate Behavioral Research, 50(5), 471–484. doi:10.1080/00273171.2015.1036965 Moore, C., Detert, J. R., Klebe Trevino, L., Baker, V. L., & Mayer, D. M. (2012). Why employees do bad things: Moral disengagement and unethical organizational behavior. Personnel Psychology, 65(1), 1–48. doi:10.1111/j.1744-6570.2011.01237.x Morey, R. D., Rouder, J. N., Jamil, T., Urbanek, K., & Ly, A. (2018). Package ‘BayesFactor.’ Retrieved from https://cran.r-project.org/web/packages/BayesFactor/BayesFactor.pdf BAYESIAN MODEL AVERAGING 27 Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111. doi:10.2307/271063 Raftery, A. E., Hoeting, J. A., Volinsky, C. T., Painter, I., & Yeung, K. Y. (2020). Package “BMA.” Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association, 92(437), 179–191. doi:10.1080/01621459.1997.10473615 Rest, J. R., & Narvaez, D. (1994). Moral development in the professions: Psychology and applied ethics. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Rest, J. R., Narvaez, D., Bebeau, M. J., & Thoma, S. J. (1999a). Postconventional moral thinking: A Neo-Kohlbergian approach. Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. Rest, J. R., Narvaez, D., Bebeau, M., & Thoma, S. (1999b). A Neo-Kohlbergian approach: The DIT and schema theory. Educational Psychology Review, 11(4), 291–324. doi:10.1023/a:1022053215271 Rest, J. R., Narvaez, D., Thoma, S. J., & Bebeau, M. J. (1999). DIT2: Devising and testing a revised instrument of moral judgment. Journal of Educational Psychology, 91(4), 644–659. doi:10.1037/0022-0663.91.4.644 Schein, C., & Gray, K. (2018). The Theory of Dyadic Morality: Reinventing Moral Judgment by Redefining Harm. Personality and Social Psychology Review, 22(1), 32–70. doi:10.1177/1088868317698288 Thoma, S. J. (2002). An Overview of the Minnesota Approach to Research in Moral Development. Journal of Moral Education, 31(3), 225–245. BAYESIAN MODEL AVERAGING 28 doi:10.1080/0305724022000008098 Thoma, S. J. (2006). Research on the Defining Issues Test. In M. Killen & J. G. Smetana (Eds.), Handbook of Moral Development (pp. 67–91). Mahwah, NJ: Psychology Press. Thoma, S. J. (2014). Measuring moral thinking from a neo-Kohlbergian perspective. Theory and Research in Education, 12(3), 347–365. doi:10.1177/1477878514545208 Thoma, S. J., Narvaez, D., Rest, J., & Derryberry, P. (1999). Does Moral Judgment Development Reduce to Political Attitudes or Verbal Ability? Evidence Using the Defining Issues Test. Educational Psychology Review, 11, 325–341. doi:10.1023/A:1022005332110 Tibshirani, R. (1997). THE LASSO METHOD FOR VARIABLE SELECTION IN THE COX MODEL. Statistics in Medicine, 16(4), 385–395. doi:10.1002/(SICI)10970258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3 Vaughan, T. J., Bell Holleran, L., & Silver, J. R. (2019). Applying Moral Foundations Theory to the Explanation of Capital Jurors’ Sentencing Decisions. Justice Quarterly, 36(7), 1176– 1205. doi:10.1080/07418825.2018.1537400 Wagenmakers, E.-J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., … Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. doi:10.3758/s13423-017-1323-7 Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., … Morey, R. D. (2017). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review. doi:10.3758/s13423-017-1343-3 Winterich, K. P., Zhang, Y., & Mittal, V. (2012). How political identity and charity positioning increase donations: Insights from Moral Foundations Theory. International Journal of Research in Marketing, 29(4), 346–354. doi:10.1016/j.ijresmar.2012.05.002 BAYESIAN MODEL AVERAGING 29 Zucchini, W., Claeskens, G., & Nguefack-Tsague, G. (2016). Model selection. Encyclopedia of Mathematics. Retrieved from http://encyclopediaofmath.org/index.php?title=Model_selection&oldid=37771