Skip to main content
  • Fayetteville, Arkansas, United States
Person-fit analyses are commonly used to detect aberrant responding in self-report data. Nonparametric person fit statistics do not require fitting a parametric test theory model and have performed well compared to other person-fit... more
Person-fit analyses are commonly used to detect aberrant responding in self-report data. Nonparametric person fit statistics do not require fitting a parametric test theory model and have performed well compared to other person-fit statistics. However, detection of aberrant responding has primarily focused on dominance response data, thus the effectiveness of person-fit statistics in detecting different aberrant behaviors in ideal point data is unclear. This study compares the performance of nonparametric person-fit statistics in unfolding and dominance model contexts. Results for dominance data indicate that increases in detection rates depend, among other factors, on type of aberrant responding and person-fit statistic used. The detection of aberrant responses in ideal point data was ineffective using four nonparametric person-fit statistics, with slightly higher type I error and power less than 0.25. Additional research is needed to identify or develop nonparametric or parametric...
The Common Core State Standards (CCSS) represent an unprecedented change in American education. As an increasingly integral part of the school accountability movement under No Child Left Behind and Race to the Top, respon- sibility for... more
The Common Core State Standards (CCSS) represent an unprecedented change in American education. As an increasingly integral part of the school accountability movement under No Child Left Behind and Race to the Top, respon- sibility for implementing CCSS rests largely with school leadership. One important factor in the success or failure of these efforts is the perceptions and experiences of the teachers who will ultimately employ CCSS in the classroom. This survey study examined teachers' views of CCSS implementation, teaching conditions, collabora- tion, and job satisfaction. Factor analysis revealed that the openness and activeness of school leadership had a significant effect on teachers' perceptions of implemen- tation, suggesting that attention to these aspects of leadership is an important consideration during transition to CCSS.
People can respond to psychological items differently based on how items are presented. Gender differences on anxiety and stress have been documented when items are written using positive versus negative wording formats. One limitation of... more
People can respond to psychological items differently based on how items are presented. Gender differences on anxiety and stress have been documented when items are written using positive versus negative wording formats. One limitation of item wording research is whether differences are due to wording or differences in constructs when positive and negative items are not matched on content. In this study, ten items from the Zung (1971) Anxiety Scale were selected, and ten items were created to be the reverse wording of the original items. The purpose was to investigate the functioning of reverse-worded items for males and females of different age groups. A volunteer sample of 2,540 adults was collected. Data were disaggregated by gender and age. Differential bundle functioning analyses were conducted using POLY-SIBTEST with ten positively worded items used as the matching subtest to compare responses on ten negatively worded items (5 point scale). When women were matched with men, there was no significant difference on negatively worded items (b uni = 0.070, p = .702). Neither were there gender differences across age categories. However, when adults were compared by age, older adults were significantly less likely than young adults to agree with negatively worded anxiety items. To better understand the differences, women in three age groups were compared and results indicated significant differences between all comparisons (b uni = 0.554, p = .003 for 18-39 versus 40-59 years; b uni = 0.686, p = .005 for 40-59 versus 60 years and older; b uni = 1.030, p < .001 for 18-39 versus 60 years and older). The same age-related trend was observed for males, with the youngest males significantly more likely to agree with negative items than the oldest males (b uni = 1.026, p = .003). Results indicate that previous gender studies on anxiety scales may be confounded by age.
The Perceived Stress Scale (PSS, Cohen & Williamson, 1988) is a widely used measure of self-report of perceived stress. Principal components factor analysis often results in a two-factor solution composed of either positively worded... more
The Perceived Stress Scale (PSS, Cohen & Williamson, 1988) is a widely used measure of self-report of perceived stress. Principal components factor analysis often results in a two-factor solution composed of either positively worded (PW) or negatively worded (NW) items (see Lee [2012]). Cohen and Williamson stated that the distinction between factors was irrelevant for purposes of measuring perceived stress. For this reason, data tends to be viewed from a unidimensional standpoint. The purpose of this study was to analyze responses to the PSS using unidimensional and multidimensional graded response model IRT and to compare the estimated item parameters and trait scores. Responses to the 10-item PSS were gathered from 1,388 participants using a voluntary sample. The multidimensional model had a significantly better fit to the data over the unidimensional model.  On average, item parameters were estimated with smaller standard errors from the unidimensional model. Scores from the unidimensional and multidimensional model were not significantly different for the entire sample, but were within subgroups. Males had a significantly different estimated score on the two dimensions from the multidimensional model; the score on the negative dimension was significantly different from the unidimensional score, but the score on the positive dimension was not. Females did not have significantly different scores on the two dimensions from the multidimensional, but each was significantly different from the unidimensional score. Within different age categories, there was no significant difference in the two trait scores from the multidimensional model, but there were significant differences in unidimensional scores those on one of the two dimensions for certain groups. Though the unidimensional model is most often applied to psychometric scales with PW and NW items, a multidimensional model may provide a more valid measure of a construct when responses are influenced by the wording direction.
In the United States, legislation intended to limit abortion access based on fetal development markers (e.g., heartbeat, fetal pain) has become increasingly common. We found that people’s support for legal abortion decreases when survey... more
In the United States, legislation intended to limit abortion access based on fetal development markers (e.g., heartbeat, fetal pain) has become increasingly common. We found that people’s support for legal abortion decreases when survey items mention fetal developmental markers compared with items that do not. However, the majority of participants supported access to legal abortion in health-related circumstances or pregnancies as a result of rape at the detection of a fetal heartbeat. Using terms that personify the fetus may evoke responses from participants that limit their endorsement of abortion. Thus, including this terminology in the public and political discourse seems to influence abortion attitudes. This might have implications related to electoral outcomes which eventually determine whether pregnant people are guaranteed access to abortion.
A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel–Haenszel procedure. Prior heuristics have been used for nearly 25 years,... more
A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel–Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item parameters from estimates of ACT and ASVAB tests from 1987 and 1984, respectively. Further, suggested heuristics for data fitting a two-parameter logistic model (2PL) have essentially went unused since their original presentation. This simulation study incorporates a wide range of data conditions to recommend heuristics for both 2PL and three-parameter logistic (3PL) data that correspond with ETS’s Mantel–Haenszel heuristics. Levels of agreement between the new SIBTEST heuristics and Mantel–Haenszel heuristics were similar for 2PL data and higher than prior SIBTEST heuristics for 3PL data. The new recommendations provide higher true-positive rates for 2PL data. Conversely, they displayed decreased true-positive rates for 3PL data. False-positive rates, overall, remained below the level of significance for the new heuristics. Unequal group sizes resulted in slightly larger false-positive rates than balanced designs for both prior and new SIBTEST heuristics, with rates less than alpha levels for equal ability distributions and unbalanced designs versus false-positive rates slightly higher than alpha with unequal ability distributions and unbalanced designs.
Aberrant responding on tests and surveys has been shown to affect the psychometric properties of scales and the statistical analyses from the use of those scales in cumulative model contexts. This study extends prior research by comparing... more
Aberrant responding on tests and surveys has been shown to affect the psychometric properties of scales and the statistical analyses from the use of those scales in cumulative model contexts. This study extends prior research by comparing the effects of four types of aberrant responding on model fit in both cumulative and ideal point model contexts using graded partial credit (GPCM) and generalized graded unfolding (GGUM) models. When fitting models to data, model misfit can be both a function of misspecification and aberrant responding. Results demonstrate how varying levels of aberrant data can severely impact model fit for both cumulative and ideal-point data. Specifically, longstring responses have a stronger impact on dimensionality for both ideal point and cumulative data, while random responding tends to have the most negative impact on data model fit according to information criteria (AIC, BIC). The results also indicate that ideal point data models such as GGUM may be able to fit cumulative data as well as the cumulative model itself (GPCM), whereas cumulative data models may not provide sufficient model fit for data simulated using an ideal point model.
A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two... more
A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese’s standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia’s recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl’s suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.
Previous research indicates that abortion attitudes may vary across different contexts, such as the reason for abortion and gestational age of the pregnancy. To expand on these findings, we examine...
This study examines knowledge of and attitudes toward Roe v. Wade among a sample of 779 US Latinx adults. Survey response patterns were examined in relation to generational status and choice of survey language as well as to several... more
This study examines knowledge of and attitudes toward Roe v. Wade among a sample of 779 US Latinx adults. Survey response patterns were examined in relation to generational status and choice of survey language as well as to several demographic variables previously shown to influence abortion attitudes (e.g., age, religiosity, political affiliation). Differences were found in knowledge of Roe v. Wade by generational status and survey language, with those with higher generational statuses and those taking the survey in English exhibiting greater knowledge. Finally, greater knowledge of Roe v. Wade and choosing to take the survey in English predicted more positive attitudes toward Roe v. Wade controlling for other demographic variables; no effect on attitudes of generational status was observed. These findings contribute to our understanding of abortion attitudes among US Latinxs as well as the relationship between political socialization, knowledge, and attitudes toward social issues.
This study uses the nominal response model to investigate the effects of extreme response styles. The Zung Self-Rating Anxiety Scale (SAS) is a commonly used scale for the identification of anxiety disorders. In some cases, the response... more
This study uses the nominal response model to investigate the effects of extreme response styles. The Zung Self-Rating Anxiety Scale (SAS) is a commonly used scale for the identification of anxiety disorders. In some cases, the response options are not extreme, ranging from A little of the time to Most of the time; in other cases, extreme responses are used: None or a little to Most or all of the time. The SAS was administered to two samples, each having a different response set. Results indicated that the items administered with extreme responses were more informative for those at the middle of the distribution, while items administered without the extreme responses were more informative at the ends of the distribution.
The generalized partial credit model (GPCM) is often used for polytomous data; however, the nominal response model (NRM) allows for the investigation of how adjacent categories may discriminate differently when items are positively or... more
The generalized partial credit model (GPCM) is often used for polytomous data; however, the nominal response model (NRM) allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. Ten items from three different self-reported scales were used (anxiety, depression, and perceived stress), and authors wrote an additional item worded in the opposite direction to pair with each original item. Sets of the original and reverse-worded items were administered, and responses were analyzed using the two models. The NRM fit significantly better than the GPCM, and it was able to detect category responses that may not function well. Positively worded items tended to be more discriminating than negatively worded items. For the depression scale, category boundary locations tended to have a larger range for the positively worded items than for the negatively worded items from both models. Some pairs of items functioned comparab...
This study investigated the effects of estimating unidimensional latent abilities for subgroups of a population across multiple test forms with confounding difficulty and number of items within sub-content areas. Examinees were grouped... more
This study investigated the effects of estimating unidimensional latent abilities for subgroups of a population across multiple test forms with confounding difficulty and number of items within sub-content areas. Examinees were grouped based on their true abilities; estimates within subgroups were compared across test forms having equal length and average item difficulty overall, but differing numbers of items and/or difficulty within subsets. Examinees with differing true abilities across the two dimensions had significantly different estimated scores across forms, depending on the alignment of true ability in each dimension with the average item difficulty of items and number of items within that dimension. These effects tended to decrease as the correlation between dimensions increased. The results of this study bring awareness to test developers for the need of controlling item specifications within sub-content areas.
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within sub-content areas. In this simulation... more
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within sub-content areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall. Manipulated variables were the number of items and average item difficulty within subsets of items primarily measuring one of two dimensions. Datasets were simulated at four levels of correlation (0, .3, .6, and .9). Item parameters were estimated using the Rasch and 2PL unidimensional IRT models. Estimated discrimination and difficulty were compared across forms and within subsets of items. The average unidimensional estimated discrimination was consistent across forms having the same correlation. Forms having a larger set of easy items measuring one dimension were estimated as being more difficult than forms having a larger set of hard items. Estimates were also investigated within subsets of items, and measures of bias were reported. This study encourages test developers to not only maintain consistent test specifications across forms as a whole, but also within sub-content areas.
ABSTRACT
Research Interests:
ABSTRACT
Research Interests:
Assessment is critical to rehabilitation practice and research, and self-reports are a commonly used form of assessment. This study examines a gender effect according to item wording on the Perceived Stress Scale for adults with multiple... more
Assessment is critical to rehabilitation practice and research, and self-reports are a commonly used form of assessment. This study examines a gender effect according to item wording on the Perceived Stress Scale for adults with multiple sclerosis. Past studies have demonstrated two-factor solutions on this scale and other scales measuring stress-related constructs with factor loadings being determined by item wording. Moreover, women have typically scored higher on these measured constructs. However, a literature review reveals that this gender difference often manifests only on the factor composed of negatively worded items. This study extends this line of research by examining gender differences on the Perceived Stress Scale on the negatively worded items at both the item and bundle levels after controlling for responses on the positively worded items. Implications of this study on the field of rehabilitation are discussed.
The purpose of this study was to investigate the validity of the short and long forms Teacher Sense of Efficacy Scale (TSES; Tschannen-Moran & Woolfolk Hoy, 2001). Participants included 549 in-service and 423 pre-service teachers... more
The purpose of this study was to investigate the validity of the short and long forms Teacher Sense of Efficacy Scale (TSES; Tschannen-Moran & Woolfolk Hoy, 2001). Participants included 549 in-service and 423 pre-service teachers from Pakistan. Confirmatory factor analysis validated the three-factor model for in-service teachers, as had been observed with other cultures. However, it did not support the one-factor model for pre-service teachers. As a follow-up, exploratory factor analysis produced three-factors for pre-service teacher groups, concluding that a three-factor model is more appropriate for both pre-service and in-service teachers in Pakistan. The findings of this study provide significant benefits for Pakistani researchers who want to use a teacher efficacy instrument as a tool for their studies.
Multilanguage surveys are a vital component of comparative public health science. And, with dozens of tools available to guide the translation and design process, an open dialogue about key translation frameworks and design approaches and... more
Multilanguage surveys are a vital component of comparative public health science. And, with dozens of tools available to guide the translation and design process, an open dialogue about key translation frameworks and design approaches and their strengths and limitations is needed. Herein, we briefly summarize the application and use of several popular translation frameworks and questionnaire design approaches. Our purpose is to draw attention to the complexities of multilanguage surveys by noting how the most appropriate framework or approach is entirely dependent on the context of a specific study. We conclude with a call encouraging the adoption of frameworks and approaches that value high degrees of cultural input, ideally among a large team of culture, language, and subject matter experts. And, as the implemented translation framework or questionnaire design approach may hold implications for the quality and validity of data, we also call on editors to create recommendations tha...
The following provides a macro for test developers to assess content validity using the index of item-objective congruence measure (Rovinelli & Hambleton, 1977). The target audience for this information focuses on researchers and... more
The following provides a macro for test developers to assess content validity using the index of item-objective congruence measure (Rovinelli & Hambleton, 1977). The target audience for this information focuses on researchers and practitioners involved with the development of measurement instruments. The fields of expertise could range from higher education to business and government. SAS/IML® is used to provide the classical unidimensional measure developed by Rovinelli and Hambleton (1977). This measure is limited to items that are measuring a single construct or a specific composite of constructs. In modern test theory, it is common to develop items that have multiple assessment targets. Thus, a macro of a newly developed index for evaluating items that measure multiple objectives or combinations of constructs is also provided in the paper.
With the recent changes to the composition of the Supreme Court in the USA, speculation that Roe v. Wade may be overturned abounds. Research assessing people’s knowledge and sentiment toward Roe v. Wade is limited. As such, we assessed... more
With the recent changes to the composition of the Supreme Court in the USA, speculation that Roe v. Wade may be overturned abounds. Research assessing people’s knowledge and sentiment toward Roe v. Wade is limited. As such, we assessed the relationship between knowledge and sentiment regarding Roe v. Wade and whether the relationship is moderated by political affiliation and abortion identity (e.g., “pro-life,” “pro-choice”). In 2018, after Justice Brett Kavanaugh was nominated to the Supreme Court, we distributed an online survey to a quota-based sample of English- and Spanish-speaking adults in the USA. Roe v. Wade knowledge was significantly related to sentiment; higher knowledge was generally associated with greater support for upholding Roe v. Wade. However, both political affiliation and abortion identity moderated this relationship. Specifically, higher baseline knowledge was associated with lower sentiment scores among those identifying as Republican and “pro-life.” Those wh...
The purpose of this study was to investigate the potential effects of item direction and orientation across various uni- and multidimensional structures on model fit, item parameters, and scores, using the multidimensional graded response... more
The purpose of this study was to investigate the potential effects of item direction and orientation across various uni- and multidimensional structures on model fit, item parameters, and scores, using the multidimensional graded response IRT model. Items from the Perceived Stress Scale were used, and additional items were created that were worded in the opposite direction. Item direction was categorized as positive or negative and by item orientation, e.g., the use of a contraction, such as “not” or “no,” or prefix, such as “un-,” “non-,” to denote direction. Data were collected from 3176 respondents. The data were fit to a unidimensional IRT model and to ten different multidimensional (including bifactor) model structures. Overall, the two-factor bifactor model structures fit the data best, having direction-specific factors for positive and negative items or having orientation-specific factors for items naturally worded using terms and items oriented using “not” and “un-”. Within ...
Objectives: Salient belief elicitations (SBEs), informed by the Reasoned Action Approach (RAA), are used to identify 3 sets of beliefs - behavioral, control, and normative - that influence attitudes toward a health behavior. SBEs ask... more
Objectives: Salient belief elicitations (SBEs), informed by the Reasoned Action Approach (RAA), are used to identify 3 sets of beliefs - behavioral, control, and normative - that influence attitudes toward a health behavior. SBEs ask participants about their own beliefs through open-ended questions. We adapted a SBE by focusing on abortion, which is infrequently examined through SBEs; we also included a survey version that asked participants their views on what a hypothetical woman would do if contemplating an abortion. Given these deviations from traditional SBEs, the purpose of this study was to assess if the adapted SBE was understood by participants in English and Spanish through cognitive interviewing. Methods: We examined participants' interpretations of SBE items about abortion to determine if they aligned with the corresponding RAA construct. We administered SBE surveys and conducted cognitive interviews with US adults in both English and Spanish. Results: Participants c...

And 35 more

This study uses the nominal response model to investigate the effects of extreme response styles. The Zung Self-Rating Anxiety Scale (SAS) is a commonly used scale for the identification of anxiety disorders. In some cases, the response... more
This study uses the nominal response model to investigate the effects of extreme response styles. The Zung Self-Rating Anxiety Scale (SAS) is a commonly used scale for the identification of anxiety disorders. In some cases, the response options are not extreme, ranging from A little of the time to Most of the time; in other cases, extreme responses are used: None or a little to Most or all of the time. The SAS was administered to two samples, each having a different response set. Results indicated that the items administered with extreme responses were more informative for those at the middle of the distribution, while items administered without the extreme responses were more informative at the ends of the distribution.
The GPCM is often used for polytomous data, however the NRM allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. In this study, responses to reverse-worded... more
The GPCM is often used for polytomous data, however the NRM allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. In this study, responses to reverse-worded items are analyzed using the two models, and the estimated parameters are compared.
Psychological scales, e.g., anxiety, depression, and stress inventories, tend to be a combination of positively and negatively worded items with ordered item responses using a Likert-type scale. The generalized partial credit model (GPCM)... more
Psychological scales, e.g., anxiety, depression, and stress inventories, tend to be a combination of positively and negatively worded items with ordered item responses using a Likert-type scale. The generalized partial credit model (GPCM) is often applied to ordinal response data, but little research uses the nominal response model (NRM) with these types of instruments. Preson, Reise, Cai, and Hays (2011) compared these models applied to psychological scales; this study focused on the item parameter estimates. We advance this study by comparing the estimated latent trait from the GPCM and the NRM to an instrument constructed with reverse-worded items. The purpose is to compare the estimated latent trait for the two models and for the subsets of positively or negatively worded items.
The Perceived Stress Scale (PSS, Cohen & Williamson, 1988) is a widely used measure of self-report of perceived stress. Principal components factor analysis often results in a two-factor solution composed of either positively worded (PW)... more
The Perceived Stress Scale (PSS, Cohen & Williamson, 1988) is a widely used measure of self-report of perceived stress. Principal components factor analysis often results in a two-factor solution composed of either positively worded (PW) or negatively worded (NW) items. Cohen and Williamson stated that the distinction between factors was irrelevant for purposes of measuring perceived stress. For this reason, data tends to be viewed from a unidimensional standpoint. Purpose: The purpose of this study was to analyze responses to the PSS using unidimensional and multidimensional graded response model IRT and to compare the estimated item parameters and trait scores. Results: The multidimensional model had a significantly better fit to the data over the unidimensional model. On average, item parameters were estimated with smaller standard errors from the unidimensional model. Scores from the unidimensional and multidimensional model were not significantly different for the entire sample, but were within subgroups. Implications: Though the unidimensional model is most often applied to psychometric scales with PW and NW items, a multidimensional model may provide a more valid measure of a construct when responses are influenced by the wording direction. The survey was administered to a convenience sample. The 10-item Perceived Stress Scale (PSS, Cohen, Kamarck, & Mermelstein, 1983; Cohen & Williamson, 1988). The scale is composed of four positively worded (PW) and six negatively worded (NW) items. The original 5-point response scale was used: 1 = Never, 2 = Almost Never, 3 = Sometimes, 4 = Fairly Often, and 5 = Very Often. Graded Response Model. The UGRM is used when response data are ordinal. The foundation of the model is the dichotomous two-parameter logistic IRT model to estimate the probability of responding in category í µí±˜ or higher. , í µí±ƒ(í µí±‹ í µí±— ≥ í µí±˜ í µí¼ƒ, í µí± § í µí±—í µí±£ = exp í µí± § í µí±—í µí±£ 1+exp í µí± § í µí±—í µí±£ , where í µí± § í µí±—í µí±£ = í µí±Ž í µí±— í µí¼ƒ − í µí±‘ í µí±—í µí±£. In this form, í µí±Ž í µí±— parameter is the slope, or discrimination, parameter of item í µí±—, and í µí±‘ í µí±—í µí±£ is the intercept parameter of category í µí±£ (í µí±£ = 1, 2, … í µí±š) to item í µí±—. Some may be more familiar with the IRT-parameterization with the form í µí± § í µí±—í µí±£ = í µí±Ž í µí±— í µí¼ƒ − í µí± í µí±—í µí±£ , where í µí± í µí±—í µí±£ is the location parameter (í µí± í µí±—í µí±£ = −í µí±Ž í µí±— í µí±‘ í µí±—í µí±£). The GRM partitions the set of response categories into a series of dichotomous responses: 1 versus 2, 3, and 4; 1 and 2 versus 3 and 4; 1, 2, and 3 versus 4, for a 4-category response option. Therefore, the probability of responding to item í µí±—'s í µí±˜th category is given by (Samejima, 1969) í µí±ƒ(í µí±‹ í µí±— = í µí±˜ í µí¼ƒ, í µí± § í µí±—í µí±£ = í µí±ƒ(í µí±‹ í µí±— ≥ í µí±˜ í µí¼ƒ, í µí± § í µí±—í µí±£ − í µí±ƒ(í µí±‹ í µí±— ≥ í µí±˜ + 1 í µí¼ƒ, í µí± § í µí±—í µí±£. Multidimensional Model (MGRM). The MGRM allows for the estimation of the discrimination parameter for each dimension, and a location parameter for each category that is constant across all dimensions for an item. í µí± § í µí±—í µí±£ = í µí²‚ í µí²‹ í µí¼½ − í µí±‘ í µí±—í µí±£ Where í µí²‚ í µí²‹ is a vector of slope parameters for item í µí±— on each dimension, í µí¼½ is a vector of the latent trait scores on each dimension, and í µí±‘ í µí±—í µí±£ is the intercept parameters of item í µí±— on category í µí±£. • The factor structure of the PSS was explored using a principal components factor analysis with promax rotation. • The PW items were reverse-coded so that a lower value indicated a low level of perceived stress and a higher value indicated a high level of perceived stress. – A secondary data set was made with the NW items reverse-coded instead to evaluate the effect of recoding items worded in the opposite direction. • A unidimensional GRM and an exploratory multidimensional GRM were fit to the response data. • R version 3.2.1 (R Core Team, 2014) was used to perform the analysis. Model Fit. The MGRM fit the data better than the UGRM based on AIC and BIC. Item Parameters. The estimated item parameter estimates from the UGRM and MGRM were compared using measures of correlation. • The slope from the UGRM was more correlated with the estimated slope of the second dimension (corresponding to the PW dimension): í µí±Ÿ í µí±Ž,í µí±Ž í µí±ƒí µí±Š = 0.232. • The slope from the UGRM was negatively correlated with the estimated slope of the first dimension (corresponding to the NW dimension): í µí±Ÿ í µí±Ž,í µí±Ž í µí±í µí±Š = −0.114. • The corresponding intercept parameters from the UGRM and MGRM were highly correlated for all categories: í µí±Ÿ í µí±‘ í µí±ˆ,1 ,í µí±‘ í µí±€,1 = 0.982, í µí±Ÿ í µí±‘ í µí±ˆ,2 ,í µí±‘ í µí±€,2 = 0.999, í µí±Ÿ í µí±‘ í µí±ˆ,3 ,í µí±‘ í µí±€,3 = 0.977, and í µí±Ÿ í µí±‘ í µí±ˆ,4 ,í µí±‘ í µí±€,4 = 0.970. • When the NW items were reverse-coded, the estimated slope had no change. The estimated intercept were ordered in the opposite direction with the opposite sign. Latent Trait Scores. The estimated trait scores from the UGRM and MGRM were compared to the average score for the 10 items and within the subset of NW and PW items. • The scores from the GRMs were highly correlated. • The observed scores and the estimated í µí¼ƒ on equivalent item sets were also highly correlated, in absolute value. • The estimated í µí¼ƒ from the UGRM was more correlated with í µí¼ƒ í µí±€,í µí±ƒí µí±Š than with í µí¼ƒ í µí±€,í µí±í µí±Š. • Standard errors of the estimated í µí¼ƒ were smaller from the UGRM than from the MGRM. • The direction of item coding had equal estimates on both datasets in absolute value. í µí¼½ Within Subgroups. Two-way repeated measures analysis was used to test the difference in estimated trait scores from the UGRM and MGRM for males and females in different age groups. There was no significant interaction between gender and age group. Figure 1. Average score and estimated trait score to 10-item scale and subsets of NW and PW items within subgroups. • Mean GRM scores from the UGRM and the PW dimension from the MGRM (1) followed a similar directional trend as the average observed response scores. Mean MGRM scores on the NW dimension (2) had the opposite trend, but was similar in absolute value to those on the PW dimension. • MGRM Scores: Estimated í µí¼ƒ scores on the two dimensions, in absolute value, from a MGRM model were significantly different for those between 18-20 years of age, but not within other age groups. These estimates were significantly different for males but not for females. • MGRM and UGRM Scores: í µí¼ƒ scores from the MGRM were significantly different from scores using a UGRM for the 18-20 and 40+ subgroups and for females.
The purpose of this study was to investigate the validity of the short and long forms Teacher Sense of Efficacy Scale (TSES; Tschannen-Moran & Woolfolk Hoy, 2001). Participants included 549 in-service and 423 pre-service teachers from... more
The purpose of this study was to investigate the validity of the short and long forms Teacher Sense of Efficacy Scale (TSES; Tschannen-Moran & Woolfolk Hoy, 2001). Participants included 549 in-service and 423 pre-service teachers from Pakistan. Confirmatory factor analysis validated the three-factor model for in-service teachers, as had been observed with other cultures. However, it did not support the one-factor model for pre-service teachers. As a follow-up, exploratory factor analysis produced three-factors for pre-service teacher groups, concluding that a three-factor model is more appropriate for both pre-service and in-service teachers in Pakistan. The findings of this study provide significant benefits for Pakistani researchers who want to use a teacher efficacy instrument as a tool for their studies.
Research Interests:
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within sub-content areas. In this simulation... more
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within sub-content areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall. Manipulated variables were the number of items and average item difficulty within subsets of items primarily measuring one of two dimensions. Datasets were simulated at four levels of correlation (0, .3, .6, and .9). Item parameters were estimated using the Rasch and 2PL unidimensional IRT models. Estimated discrimination and difficulty were compared across forms and within subsets of items. The average unidimensional estimated discrimination was consistent across forms having the same correlation. Forms having a larger set of easy items measuring one dimension were estimated as being more difficult than forms having a larger set of hard items. Estimates were also investigated within subsets of items, and measures of bias were reported. This study encourages test developers to not only maintain consistent test specifications across forms as a whole, but also within sub-content areas.
This study investigated the effects of estimating unidimensional latent abilities for subgroups of a population across multiple test forms with confounding difficulty and number of items within sub-content areas. Examinees were grouped... more
This study investigated the effects of estimating unidimensional latent abilities for subgroups of a population across multiple test forms with confounding difficulty and number of items within sub-content areas. Examinees were grouped based on their true abilities; estimates within subgroups were compared across test forms having equal length and average item difficulty overall, but differing numbers of items and/or difficulty within subsets. Examinees with differing true abilities across the two dimensions had significantly different estimated scores across forms, depending on the alignment of true ability in each dimension with the average item difficulty of items and number of items within that dimension. These effects tended to decrease as the correlation between dimensions increased. The results of this study bring awareness to test developers for the need of controlling item specifications within sub-content areas.