[go: up one dir, main page]

Academia.eduAcademia.edu
Running Head: GENDER BIASES IN PERSON IMPRESSIONS This paper was accepted for publication in Journal of Experimental Psychology: General. This is a non-final and non-copy-edited version of the paper. Gender Biases in Impressions from Faces: Empirical Studies and Computational Models DongWon Oh1* Ron Dotsch2 Jenny Porter1 Alexander Todorov1 1Department of Psychology, Princeton University, Princeton, NJ 08544 2Department of Psychology, Utrecht University, Utrecht, The Netherlands *Correspondence: dong.w.oh@gmail.com 1 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Abstract Trustworthiness and dominance impressions summarize trait judgments from faces. Judgments on these key traits are negatively correlated to each other in impressions of female faces, implying less differentiated impressions of female faces. Here we test whether this is true across many trait judgments and whether less differentiated impressions of female faces originate in different facial information used for male and female impressions or different evaluation of the same information. Using multidimensional rating datasets and data-driven modeling, we show that (1) impressions of women are less differentiated and more valence-laden than impressions of men, and find that (2) these impressions are based on similar visual information across face genders. Female face impressions were more highly intercorrelated and were better explained by valence (Study 1). These intercorrelations were higher when raters more strongly endorsed gender stereotypes. Despite the gender difference, male and female impression models – derived from separate trustworthiness and dominance ratings of male and female faces – were similar to each other (Study 2). Further, both male and female models could manipulate impressions of faces of both genders (Study 3). The results highlight the high-level, evaluative effect of face gender in impression formation – women are judged negatively to the extent their looks do not conform to expectations, not because people use different facial information across genders, but because people evaluate the information differently across genders. Keywords: face perception, social perception, social cognition, gender stereotypes 2 Running Head: GENDER BIASES IN PERSON IMPRESSIONS People effortlessly attribute traits, such as competence and emotional stability, to others based on their facial appearance (Todorov, 2017; Todorov, Olivola, Dotsch, & Mende-Siedlecki, 2015). These impressions of traits affect a variety of important real-world outcomes, which range from voting behavior (Antonakis & Dalgas, 2009; Lenz & Lawson, 2011; Little, Burriss, Jones, & Roberts, 2007; Olivola & Todorov, 2010; Todorov, Mandisodza, Goren, & Hall, 2005), to court decisions (Blair, Judd, & Chapleau, 2004; Eberhardt, Davies, Purdie-Vaughns, & Johnson, 2006; Wilson & Rule, 2015; Zebrowitz & McDonald, 1991), to mating choices (Cooper, Dunne, Furey, & O'Doherty, 2012). Because the trait impressions are highly intercorrelated, by examining the relations among the perceived traits, researchers can succinctly describe the structure of face impression formation (Oosterhof & Todorov, 2008; Sutherland et al., 2013), and then using the impression structure, predict impressions on multiple traits and reveal the facial information at the basis of these impressions (Oh, Buck, & Todorov, 2019; Todorov, Dotsch, Porter, & Oosterhof, 2013; Walker & Vetter, 2009; 2016). There is a large body of research on the structure underlying the relations between traits in impressions (Fiske, Cuddy, Glick, & Xu, 2002; Imhoff & Koch, 2016; Koch, Imhoff, Unkelbach, & Alves, 2016; Oosterhof & Todorov, 2008; Osgood, Suci, & Tannenbaum, 1957; Rosenberg, Nelson, & Vivekananthan, 1968; Wiggins, 1979). In face-based first impressions, the impressions are reducible to a small number of summary dimensions – valence and physical power, approximated by judgments of trustworthiness and dominance (Oosterhof & Todorov, 2008; Walker & Vetter, 2016; but see Sutherland et al. (2013) which found three dimensions, including the new dimension of attractiveness). Importantly, the structure of impressions may vary across meaningful subcategories of 3 Running Head: GENDER BIASES IN PERSON IMPRESSIONS faces, such as men and women. However, previous research on face impressions has mostly neglected potential gender differences, implicitly assuming the same structure of impressions across genders. This assumption is inconsistent with both empirical evidence and theoretical reasoning. Gender Biases in Impressions Empirical data suggest that impressions from facial appearance are more highly correlated for women than for men. Trustworthiness and dominance impressions, for instance, are negatively correlated for female faces, but not for male faces (Sutherland, Young, Mootz, & Oldmeadow, 2015). Dominant female faces are perceived more negatively than non-dominant female faces, non-dominant male faces, and dominant male faces. These findings are inconsistent with the existing model of face trait attribution, which assumes that the two summary dimensions of valence/trustworthiness and power/dominance are orthogonal to each other (see Rosenberg et al. (1968) for the model of nonvisual person perception, where these dimensions are correlated). Given the high correlations of these two trait impressions with other trait impressions (Oosterhof & Todorov, 2008; Sutherland et al., 2013), it is likely that face impressions overall are more highly intercorrelated for women than for men.1 The idea of less differentiated face impressions of women aligns well with the rich literature on gender stereotypes. People expect men and women to be and behave in 1 Trustworthiness, in social-perception and gender-study literatures, is also referred to as communion (Abele, 2003; Cislak & Wojciszke, 2008), warmth (Fiske et al., 2003), or approachability (Sutherland et al., 2013). These concepts are highly similar in people’s minds (e.g., Sutherland, Oldmeadow, & Young, 2016). To avoid confusion, the current article uses the word valence to represent the positivity/negativity in social evaluation. 4 Running Head: GENDER BIASES IN PERSON IMPRESSIONS certain ways (e.g., women to be more submissive, dependent, and gentle than men; Bem, 1974; I. K. Broverman, Vogel, Broverman, Clarkson, & Rosenkrantz, 1972; Prentice & Carranza, 2002; Spence, Helmreich, & Holahan, 1979; Spence, Helmreich, & Stapp, 1975). These beliefs are widely held across cultures (Williams & Best, 1990) and difficult to change (Prentice & Carranza, 2004). Although the gender-associated expectations are held for both men and women, there are a larger number of traits that are considered typical or desirable for women than for men (Prentice & Carranza, 2002), including valence-related traits such as kindness and friendliness (Heilman, 2001; Rudman, 1998; Rudman & Glick, 2001). Relatedly, women are evaluated positively to the extent that they conform to the stereotypes associated with them (benevolent sexism; Glick & Fiske, 1996; Glick et al., 2000) unlike men, who are freer from the normative boundaries of stereotypes. If this principle applies to facial impressions, women whose appearance suggest traits inconsistent with the stereotypes (e.g., a woman with a face that makes other people intuitively judge that she has a domineering personality) are likely to be evaluated more negatively than men with the same degree of stereotype-inconsistency in their appearance. It would follow that in females’ impressions, more traits should be correlated with valence than in males’ impressions, consistent with the previously found negative correlation between perceived facial dominance and trustworthiness for women (Sutherland et al., 2015). For these reasons, we expect less differentiated face impressions for female than for male faces. Notably, the social cognition theories mentioned above (e.g., benevolent sexism) are about what behavior and traits people expect of others, and do not make a direct prediction about how people evaluate faces. However, facial appearance serves as a source of trait 5 Running Head: GENDER BIASES IN PERSON IMPRESSIONS inferences that in people’s minds are predictive of behavior: People effortlessly judge a person’s traits from their face with a high level of within- and cross-rater reliability and act on these inferences (for review, see Todorov et al., 2015). Moreover, some facial appearances reliably lead to trait inferences that are stereotypical (or counterstereotypical) given the person’s social category. In this article, we refer to such an appearance as a “stereotypical (or counterstereotypical) appearance”. For example, a woman with masculine facial features (e.g., strong chin) is inferred to be dominant, and thus could be described as having a “counterstereotypical appearance” in this sense. These kinds of inferences are influenced by both low-level visual information (e.g., masculine facial features) and high-level category inferences (e.g., gender expectations such as women are not dominant). The Present Research Using dimensionality reduction and computational modeling, the current paper examines the principles behind the gender difference in facial impressions, expanding prior research in two ways. First, by examining the degree of intercorrelations between the ratings of face impressions, we investigate whether and to what extent female face impressions are less differentiated than male face impressions. Across three datasets, we find that female impressions are indeed less differentiated. Further, by measuring a rater characteristic related to social perception and expectations, namely, the degree to which the raters endorse gender stereotypes, we test whether the degree of impression differentiation is related to how much raters endorse gender stereotypes. Specifically, if the gender-related difference in impression differentiation stems from gender stereotypes as 6 Running Head: GENDER BIASES IN PERSON IMPRESSIONS we argue, then perceivers who endorse gender stereotypes more strongly would show less differentiated impressions of faces. Second, by building separate data-driven computational models of impressions (Oosterhof & Todorov, 2008) of male and female faces and cross-validating the models across face genders, we investigate what is at the basis of the less differentiated impressions for women. Specifically, the gender difference in impression differentiation can stem from either (1) the same visual information used differently across genders to form an impression or (2) different visual information used to form the impression. For facial attractiveness, for example, the same visual information is used across genders but has the opposite evaluative outcome, in which masculine face reflectance increases the attractiveness of men but decreases the attractiveness of women (Said & Todorov, 2011). This supports the first hypothesis. However, it is yet to be tested whether this would generalize to key face impressions such as trustworthiness and dominance. A data-driven face model of a trait impression (e.g., competence) represents what visual information people use to form the impression (Dotsch & Todorov, 2012; Jack & Schyns, 2017; Todorov, Dotsch, Wigboldus, & Said, 2011). If people use the same information when forming impressions of male and female faces (e.g., masculine facial properties to infer dominance), then the models of male and female impressions should be similar. Such a result would imply that differences between male and female impression differentiation are caused by a different evaluative process resulting from gender categorization. Categorizing a face as female, for example, would lead to stronger correlations between impressions that reflect female gender stereotypes. On the other 7 Running Head: GENDER BIASES IN PERSON IMPRESSIONS hand, if people use different information when forming impressions of male and female faces, then models of male and female impressions should be different. We can test these possibilities by (1) looking directly at the similarity of the male and female models and (2) cross-validating the models of impressions on novel male and female faces. To the extent that the models for male and female faces are similar, the models would be highly correlated, and their effects on impressions would be similar irrespective of whether they are applied to male or female faces (e.g., impressions of the trustworthiness of a novel female face would be similar whether the face is manipulated by a male or a female model of trustworthiness). In contrast, even if the models for male and female faces are highly correlated, to the extent that they are based on different information, they would have different effects on impressions depending on whether they are applied to a male or a female face (e.g., impressions of the trustworthiness of a novel female face would be more successfully manipulated by a female than by a male model of trustworthiness). Before reporting individual studies, we would like to make a clear distinction between (a) the outcome of social perception, and the potential (b) low- and (c) high-level mechanisms underlying this outcome. In the case of the gender difference in facial impression differentiation, Study 1 investigates whether female impressions are less differentiated and more valence-laden than male impressions, findings that would imply counterstereotypical looks in women are negatively evaluated (the outcome). Facial appearance per se admittedly cannot be stereotypical or counterstereotypical, because a stereotype is described and prescribed at the level of traits and behaviors. However, as explained above, facial appearance in people’s minds serves as a reliable source of trait 8 Running Head: GENDER BIASES IN PERSON IMPRESSIONS inferences (e.g., assertiveness, tenderness) and expectations of behaviors (e.g., loud voice, gentle bodily gestures). Studies 2 and 3 investigate the potential mechanisms underlying the findings (less differentiated impressions in women’s faces): Do people use different visual information when forming male and female facial impressions (a low-level mechanism) or do people interpret visual information differently across genders (a highlevel mechanism)? Our findings suggest it is the latter. Study 1: Impression Differentiation in Male and Female Faces In Study 1, we compare two measures between genders to assess potential differences in the level of impression differentiation – (1) the degree to which multiple impressions in each gender are intercorrelated and (2) the degree to which specific impressions are explained by general valence of impressions. We expected less differentiated face impressions for women than for men, expressed in (1) a stronger correlation between specific trait impressions for female than for male faces, and (2) a larger variance explained by the first principal component (PC1) for female than for male faces to the extent that PC1 captures the valence of impressions. Study 1a: Reanalysis of Oosterhof & Todorov (2008) In Study 1a, we analyze preexisting rating datasets of male and female face images. In the original work, Oosterhof and Todorov (2008) conducted a PCA on the rating dataset without consideration of face gender, and found that the first two principal components, which could be interpreted as valence and power, accounted for over 80% of the variance. Methods 9 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Participants. Three-hundred-and-one Princeton University undergraduate students were recruited by Oosterhof and Todorov (2008), and participated in the trait rating experiments for partial course credit or cash. Stimuli. Sixty-six (33 males, 33 females) naturalistic face photos with direct gaze and resting expressions were used (Lundqvist, Flykt, & Arne, 1998). The individuals in the photos were white amateur actors between the ages of 20–30 with no facial hair, earrings, eyeglasses, or visible make-up, all wearing grey T-shirts. Procedure. Participants rated 33 male and 33 female face photos on 14 traits – how aggressive, attractive, caring, confident, dominant, emotionally stable, intelligent, mean, responsible, sociable, threatening, trustworthy, unhappy, or weird each individual looked. These traits were selected due to their empirical and theoretical importance: The traits (except for dominance) explained about 68% of unconstrained, spontaneous person descriptions from face images (Oosterhof & Todorov, 2008). Dominance was included because of its importance in personality perception (Wiggins, 1979). To collect the face impression ratings, different groups of participants were assigned to form impressions of all 66 faces on a single trait (nrater ≥ 18). That is, each participant rated the faces on a single trait. Participants were told that the study was about first impressions and were encouraged to rely on their “gut feeling.” The faces were presented and rated in three separate blocks to reduce the measurement error for each participant’s answers. The average face rating of each participant served as the measure of their evaluation on the respective trait. This procedure also allows for screening out unreliable raters – those who show zero or negative test-retest within-rater reliability as calculated between ratings in different blocks. 10 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Each face image was presented in color at the center of the screen (220 × 298 pixels with the height of the face being about 206 pixels) with a question above the face (“How [trait term] is this person?”) and a response scale below the face (“1 Not at all [trait term] 9 Extremely [trait term]”). Each face was visible until the participant responded, the intertrial interval (ITI) was 1,000 ms, and the order of faces was randomized. All 14 trait ratings showed moderate to high interrater agreement (rmin = .26) and interrater reliability (αmin = .90). To obtain impression measures for each face, the ratings on the 14 traits were averaged across raters. The mean rating dataset is available at Open Science Framework: https://osf.io/ycv72/. We conducted two main analyses to test for gender differences in the level of differentiations in impressions. In the first analysis, we calculated the extent to which traits are correlated to one another in each gender, and compared the level of correlations between ratings of male and female faces. Specifically, we computed pairwise correlational coefficients among all 14 trait ratings ( "#!$ = 91 pairs) separately for male and female faces (Fig. 1A, top). Because we were interested in contrasting the average strength of interimpression correlation between face genders, we converted all coefficients into positive values. To test whether female facial impressions were more strongly correlated, or less differentiated, to each other than male facial impressions were, we conducted a test of the matrix equality between the two gender-specific matrices (Jennrich, 1970). Higher absolute values of the correlational coefficients for female than male faces, along with a significant difference in the correlation matrices, would implicate higher dependency between perceptions of traits for female faces. 11 Running Head: GENDER BIASES IN PERSON IMPRESSIONS In the second analysis, we z-transformed the average trait ratings within each trait in each face gender, and then subjected the ratings to an orthogonal PCA for each gender. No rotation was made. We reported and visualized the components with eigenvalues bigger than 1 (the Kaiser rule), following the original study by Oosterhof and Todorov (Table S1, Supplemental Text). To test whether female facial impressions were more driven by valence than male facial impressions, we compared the amount of the variance explained by PC1 for each gender. A larger variance explained by PC1 would implicate greater dependency of impressions on valence to the extent that PC1 is loaded highly on by valenced impression ratings (e.g., “responsible”, “mean”). 12 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Figure 1. Correlational analyses and principal component analysis (PCA) results for male and female faces in Study 1. Female impressions are less differentiated than male impressions (A), and female impressions load more highly on PC1 (valence) than male impressions do (B). For the correlational analysis, a Pearson correlational matrix was computed between all trait pairs within each face gender. The color of each cell represents the strength of the correlation (darker: stronger correlation) (A). For PCA, an orthogonal PCA was conducted for each face gender. The magnitude of each bar and the number on each bar represents the loading of the respective trait on PC1 (left) and PC2 (right) for each 13 Running Head: GENDER BIASES IN PERSON IMPRESSIONS face gender. The traits are sorted in the order of the loading strength in male faces ratings (B). Results and Discussion Consistent with the PCA results collapsing across male and female faces (Oosterhof & Todorov, 2008), the gender-specific PCAs revealed two key components (see Table S1 for the eigenvalues and the variance explained by each component). For both genders, PC1 was highly loaded on by all positive (e.g., trustworthy, responsible) or negative traits (e.g., threatening, weird; Fig. 1B, top). This is consistent with previous models of face impressions, in which the first component is summarized as valence (Oosterhof & Todorov, 2008) or approachability (Sutherland et al., 2015). New to our data, female face impressions were less differentiated and more valenceladen than male face impressions (Figs. 2A and 2B, top): When the cross-impression correlational matrices were compared between face genders using their absolute values (Jennrich test of matrix equality), the correlation was significantly different, with female ratings being more strongly intercorrelated (M|r| = 0.68, SD|r| = 0.19) than male ratings (M|r| = 0.55, SD|r| = 0.24; χ2(91) = 384.37, P < .001; Fig. 2A, top), indicating a higher level of dependency between impressions of female faces. This is consistent with a visual inspection of the PCA solutions: The trait loadings on PC1 have bigger absolute values for female than for male faces and the loadings on PC2 have smaller absolute values for female than for male faces, as shown by longer and darker bars for female faces on average (Figs. 1B, top). Correspondingly, the amount of variance explained by PC1, a proxy for valence, was larger for female than male ratings (71.69% vs. 58.40%; Fig. 2B, top; Table S1), 14 Running Head: GENDER BIASES IN PERSON IMPRESSIONS indicating a higher level of dependency of impressions on overall positivity/negativity in female face impressions. Figure 2. The level of the intercorrelations of trait impressions of men and women (A) and the amount of the variance explained by PC1 and PC2 in the impressions (B). In each study, face-level correlational analyses were conducted between impression ratings separately for male and female face images, and absolute values of the coefficients were 15 Running Head: GENDER BIASES IN PERSON IMPRESSIONS compared across genders. Each dot corresponds to the absolute value of the coefficient of the correlation between an impression pair (e.g., the “threatening” and the “unhappy” ratings). The violin plots show the distribution of the values in each face gender, and the dots on the side the raw values. The lower and upper hinges of each box correspond to the 25th and 75th percentiles. The black bar in each box denotes the median. A higher Y value represents a lower level of differentiation (or a higher level of intercorrelation) between trait impressions. Across studies, the impressions of women are significantly more highly intercorrelated than the impressions of men (Ps < .001; A). In each study, a PCA was conducted separately for the impressions of male and female images. A higher y value on PC1 represents a stronger relationship between valence of impressions and specific impressions. Across studies, the impressions of women are more valence-laden than the impressions of men (B). PCA = principal component analysis. Study 1b: Analysis of a New Dataset Study 1a revealed that impressions of women are less differentiated and more valence-laden than impressions of men. The objective of Study 1b was to replicate these findings and to test whether the differentiation in impressions is related to relevant stereotypes held by perceivers. Specifically, we expected that the more strongly a perceiver endorses conventional beliefs about genders, the more likely their ratings will show (1) less differentiated and (2) more highly valence-laden impressions. However, whether the effect of stereotype endorsement is stronger for impressions of women’s than men’s faces or independent of the effect of the face gender on impression differentiation is unclear. To 16 Running Head: GENDER BIASES IN PERSON IMPRESSIONS test these hypotheses, we collected new impression ratings of male and female faces and measured participants’ level of gender stereotype endorsement. We also tested whether the gender of participants is related to the gender difference in facial impression differentiation. Because the rater gender was missing in the dataset used in Study 1a, we could not include it in the analyses of Study 1a. In Study 1b, we recorded participants’ gender. A possibility is that male raters will show more simplified, valence-laden impressions of women than of men, given prior studies showing stronger endorsement of gender stereotypes by male than female raters (Glick & Fiske, 1996; Swim, Aikin, Hall, & Hunter, 1995; Williams & Best, 1990). Another possibility is that male and female raters will form similar impressions across face genders, given prior studies showing no effect of participant gender on gender-stereotyping (Costrich, Feinstein, Kidder, Marecek, & Pascale, 1975; Deaux & Lewis, 1984; Eagly & Steffen, 1984; Goldberg, 1968; Hagen & Kahn, 1975; Moss-Racusin, Dovidio, Brescoll, Graham, & Handelsman, 2012). A final possibility is that female raters will show more simplified, valence-laden impressions of women than of men, given prior studies showing a higher level of genderstereotyping (e.g., negative evaluation of women with counterstereotypical traits) by female than male raters (Garcia-Retamero & López-Zafra, 2006; Goldberg, 1968; ParksStamm, Heilman, & Hearns, 2007; Rudman, 1998). Methods Participants. Five-hundred-thirty-six online workers living in the U.S. (258 males, 278 females, 1 other gender) participated through Amazon Mechanical Turk (MTurk) for monetary reward. Required participant number for each trait was estimated from the 17 Running Head: GENDER BIASES IN PERSON IMPRESSIONS interrater reliabilities from Study 1a so that Cronbach’s α of the ratings would reach .90 for both male and female faces. Stimuli. Sixty-six (33 males, 33 females) face photos of white individuals used in Study 1a were employed again (Lundqvist et al., 1998). Procedure. The 14 traits rated in Study 1a were rated by a new group of participants. As in Study 1a, different groups of participants were assigned to form impressions of all 66 faces on a single trait (nrater ≥ 11). Participants were given the same instructions as in Study 1a. We asked each participant to judge the faces on a single trait in order to make the rating task design identical with Study 1a’s, avoid participant fatigue, and reduce a possible inflation of correlations between trait ratings (in contrast to a procedure where participant rate the same faces on multiple traits). We kept the same participants from participating in more than one task (e.g., participating in both the “aggressive” rating and the “dominant” rating tasks) using MTurk features. The face stimuli were presented in color at the center of the screen (369 × 500 pixels with the height of the face being about 345 pixels) and were rated twice in separate blocks to reduce the measurement error for each participant’s answers: Each participant’s ratings were averaged across blocks. As in Study 1a, we calculated each rater’s intra-rater reliability by correlating their ratings from different blocks. The ratings from participants with zero or negative reliability were excluded, which left us with 469 participants’ responses (235 males, 233 females, 1 other gender). Each face image was presented at the center of the screen with a question (“How [trait term] is this person?”) and a response scale below the face (“1 Not at all [trait term] 9 Extremely [trait term]”). Each face was visible until the participant responded, the ITI was 1,000 ms, and the order of faces was randomized. All 14 trait ratings showed moderate 18 Running Head: GENDER BIASES IN PERSON IMPRESSIONS to high interrater agreement (rmin = .32) and interrater reliability (αmin = .81). The mean rating dataset is available at Open Science Framework: https://osf.io/ycv72/. The same two analyses as in Study 1a were conducted (i.e., the correlational analyses and the PCAs) on the ratings averaged across raters to test for gender differences in the differentiation of impressions. First, we compared the degree of the intercorrelations across impressions between genders. Correlational coefficients between the ratings of every trait pair among all 14 traits ( "#!$ =91 pairs) were calculated for each gender. Second, we compared the amount of variance explained by PC1, a proxy of valence. To test for the effects of the raters’ gender and their gender stereotype endorsement, at the end of the study participants were asked to report their gender and to complete a questionnaire regarding gender stereotype endorsement (GSE) that measured the extent to which they agreed with conventional gender stereotypes (Cundiff & Vescio, 2016; Eagly & Mladinic, 1989). Each question (“How do the average man and the average woman compare with each other on how [trait term] they are?”) was presented with a response scale (“1 Men extremely more - … - 5 No difference - … - 9 Women extremely more”). The trait terms were 20 words describing traits either considered stereotypically female and positive (e.g., nurturing), female and negative (e.g., whiny), male and positive (e.g., competitive), or male and negative (e.g., egotistical; Table S2 for the list). The valence and the gender stereotypicality of these words have been validated (Eagly & Mladinic, 1989; Spence et al., 1979) and the GSE level measured using these words was shown to predict relevant individual characteristics, such as political orientation (Cundiff & Vescio, 2016; Eagly & Mladinic, 1989). 19 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Responses to every question in the questionnaire were significantly different from the middle score (5 in the range of [1,9]) in the stereotype-consistent direction (ts > 7.02, Ps < .001; Fig. S1). Half the responses were reverse-coded (n = 10) so a larger value meant a higher level of GSE. We replaced each missing value with the weighted response averaged across 10 raters whose other responses were the most similar to the rater’s (the k-nearest neighbor imputation). Inter-rater similarity in the responses was determined by the Euclidean distance. The GSE questionnaire showed high internal consistency across questions (α = .90). Similarly, every item showed a moderately high item-whole correlation (e.g., the correlation between each trait question and the whole questionnaire; Mr = 0.59). Based on a high internal consistency across items, we used the sum of the responses to all items as the index of each rater’s GSE (“the GSE score”). The GSE score had a possible range of [20,180] with a higher score indicating higher level of stereotype endorsement. To test whether the raters who more strongly endorsed gender stereotypes showed less differentiated facial impressions, we used the rater GSE score to predict the two indices of impression differentiation from the main analyses per face gender – (1) the mean absolute value of inter-impression correlational coefficients and (2) the variance explained by PC1. Because we asked each participant to rate every face image only on a single trait (as opposed to all traits), we could not calculate the two indices for each participant. Instead, we subgrouped participants based on an overlapping participant window on the GSE score (see Results for details). We then predicted the two indices using the subgroup’s GSE score as a predictor. Results and Discussion 20 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Before reporting the main results, we report how the present data replicate the findings of Oosterhof and Todorov (2008). When the impression ratings were collapsed across face genders, PC1 and PC2 accounted for over 80% of the variance in the ratings newly collected for Study 1b (65.57% and 19.90%, respectively). PC1 and PC2 were loaded highly on by trustworthiness (.95) and dominance ratings (.94), respectively. This replicates Oosterhof and Todorov (2008), where PC1 and PC2 explained over 80% of the variance in the ratings (63.3% and 18.3%, respectively) and were loaded highly on by trustworthiness (.92) and dominance ratings (.87). When the impression ratings were separated by the face gender, again, the PCA solution of each gender replicated Oosterhof and Todorov (2008). Specifically, between the 2008 data and the current data, the component loadings of the impressions were highly similar for both male (R = .97) and female face impressions (R = .98, Supplemental Text). In sum, these findings demonstrate high stability of face impressions as the ratings in Oosterhof and Todorov (2008) were collected over ten years ago. As in Study 1a, for both genders PC1 was highly loaded on by all positive or negative traits (Fig. 1B, middle). This replicates previous models of face impressions (Oosterhof & Todorov, 2008; Sutherland et al., 2013) and Study 1a. Again, female face impressions were less differentiated and more valence-laden than male face impressions. First, the correlational coefficients were significantly different, with female ratings being more strongly intercorrelated (M|r| = 0.63, SD|r| = 0.24) than male ratings (M|r| = 0.58, SD|r| = 0.26; χ2(91) = 3145.10, P < .001; Fig. 2A, middle), indicating a higher level of dependency between impressions in female face impressions. Second, PC1 explained a larger amount of 21 Running Head: GENDER BIASES IN PERSON IMPRESSIONS variance in female face ratings than in male face ratings (67.94% vs. 61.66%; Fig. 2B, middle; Table S1). These findings replicate the findings of Study 1a. Role of Raters’ Gender Stereotypes. To examine how the rater’s gender stereotypes were related to impression differentiation, we repeated the two analyses (i.e., the correlational analyses and PCAs) for each face gender using responses of multiple rater subsets. We varied the rater subset according to their GSE score. Specifically, starting from the raters who endorsed gender stereotypes the least (i.e., raters with the lowest GSE score, [75,134]) through the raters who endorsed gender stereotypes the most (i.e., raters with the highest GSE score, [119,168]), we subgrouped raters based on their GSE score. We performed the analyses using the data from the raters in a window of GSE score with a fixed window width of 49. We then slid the window by 1 GSE score (i.e., increased the start and end points of the window by 1), repeating the correlational analysis and PCA on the ratings averaged across raters per trait in each window. This sliding window’s width, the starting point, and the end point were determined so that each window had ≥ 10 raters per impression trait (minn = 245, maxn = 410 in total across traits per window). We used overlapping windows of raters (rather than nonoverlapping participant subgroups with various levels of GSE) in order to generate a dependent variable that is continuous. The same analyses with two participant groups divided by the median GSE score (i.e., low- and high-GSE groups) yielded identical results (Fig. S2). In order to (1) statistically compare impression differentiation indices across face genders, (2) compute confidence intervals (CIs) of each index per face gender, and (3) control for the number of raters across the sliding windows, we randomly selected 10 raters’ raw rating responses per trait from each window (which had 10 or more raters). 22 Running Head: GENDER BIASES IN PERSON IMPRESSIONS The raw trait ratings were averaged across the 10 raters per sample for each impression per face image. Using the average ratings per face image, we then calculated (1) the interimpression correlational coefficients and (2) the amount of variance explained by PC1 in the impression ratings, a proxy for impression valence, per face gender. We repeated the rater selection and index calculation 1,000 times for each rater window as a means of bootstrapping to estimate the 95% CIs. To understand the relationship between the rater GSE and the impression differentiation, we ran linear and quadratic regressions predicting the mean correlational coefficient and the amount of variance in male and female ratings explained by PC1 using the rater subgroup GSE score as the predictor. The GSE score was higher than the absolute middle score of the questionnaire, 100 in the range of [20,180] (M = 121.11, SD = 17.37; t(468) = 26.33, P < .001), indicating that participants on average endorsed gender stereotypical beliefs (e.g., “men are more likely to be aggressive than women.”, “women are more likely to be emotional than men.”). Male raters showed a higher level of gender stereotype endorsement (n = 235, M = 124.71, SD = 19.64) than female raters (n = 233, M = 117.58, SD = 13.85; t(420.83) = 4.54, P < .001). Importantly, when the rater GSE score increased, the correlations between impression ratings increased too for both male and female faces (Fig. 3A). Although the linear regression model was significant for the ratings of both genders (male faces: R2 = .78, F(1,43) = 154.81, P < .001; female faces: R2 = .75, F(1,43) = 130.63, P < .001), the quadratic model (male faces: R2 = .82, F(2,42) = 93.57, P < .001; female faces: R2 = .89, F(2,42) = 162.07, P < .001) explained significantly more variance than the linear model did (male faces: F(1,43) = 7.81, P = .008; female faces: F(1,43) = 48.68 P < .001). Correspondingly, the 23 Running Head: GENDER BIASES IN PERSON IMPRESSIONS amount of variance explained by PC1 in the ratings followed the same quadratic pattern of change across the GSE score: While the linear regression model was significant for both genders (male faces: R2 = .82, F(1,43) = 191.10, P < .001; female faces: R2 = .78, F(1,43) = 152.54, P < .001), the quadratic model (male faces: R2 = .84, F(2,42) = 110.62, P < .001; female faces: R2 = .89, F(2,42) = 175.10, P < .001) explained a significantly larger amount of variance than the linear models did (male faces: F(1,43) = 6.35, P = .016; female faces: F(1,43) = 44.25, P < .001). However, for both measures the increase was largely monotonic (Fig. 3) and the magnitude of the linear effect was larger than the magnitude of the quadratic effect. Across all rater GSE scores, female face ratings had higher correlational coefficients (Fig. 3A, ts > 15.15, Ps < .001) and larger amount of variance explained by PC1 than male face ratings (Fig. 3B, ts > 25.70, Ps < .001). We obtained consistent results when we divided the rater GSE level into four factor scores, each of which represented the level of endorsement for stereotypes about male × positive, male × negative, female × positive, or female × negative traits (Fig. S3, Supplemental Text). Taken together, these results show that those who more strongly endorsed gender stereotypes were more likely to form less differentiated impressions of both male and female faces. However, irrespective of this effect, raters were more likely to show less differentiated impressions of female faces than of male faces. 24 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Figure 3. The level of intercorrelations across impressions (A) and the amount of variance in the impressions explained by valence (B) as a function of the raters’ gender stereotype endorsement (GSE) score. Each data point (A: the absolute value of correlational coefficients between all facial impression rating pairs, B: the amount of variance explained by PC1 in the PCA of facial impression ratings per gender) was calculated from a rater subgroup (nrater = 10 per trait, nrater = 140 in total per subgroup). Each subgroup was sampled from a sliding window on the rater GSE score (nrater ≥ 10 per trait), in which the X value is the middle point of the window. The shaded regions show 95% CIs estimated from 1,000 bootstrapped replications per data point. The intercorrelations of face impressions (ts > 15.15, Ps < .001) and the variance explained by PC1 were always significantly higher in female than in male impressions across the GSE score (ts > 25.70, Ps < .001). The vertical dotted line at X=100 represents no GSE bias. PCA = principal component analysis. CI = confidence interval. 25 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Role of Raters’ Gender. To examine how the rater gender was related to differences in impression differentiation of male and female faces, we calculated correlational coefficients across impression ratings of male and female faces, separately for male and female raters. We conducted tests of matrix equality for any difference in the correlational coefficient matrices using the coefficients absolute values, across rater genders and face genders. For both male and female raters, female impressions were less differentiated than male impressions: The female face ratings were more strongly intercorrelated than male face ratings in male raters (male faces: M|r| = 0.50, SD|r| = 0.25; female faces: M|r| = 0.51, SD|r| = 0.21; χ2(91) = 1259.50, P < .001) and female raters (male faces: M|r| = 0.56, SD|r| = 0.25; female faces: M|r| = 0.61, SD|r| = 0.21; χ2(91) = 913.18, P < .001). For both male and female faces, female raters showed more strongly intercorrelated impressions of faces than male raters did (male faces: χ2(91) = 21840.47; female faces: χ2(91) = 3264.53). We obtained consistent results using a 2 [face gender] × 2 [rater gender] ANOVA (Supplemental Text). To assess how the rater gender was related to the valence dependency of impressions, we conducted PCAs separately for male and female faces, this time, using the mean impression ratings of male raters and female raters. For both male and female raters, PC1 explained more variance in female than male face impressions (male raters: 55.72% vs. 54.18%; female raters: 65.69% vs. 60.77%). Taken together, these findings suggest that female raters showed less differentiated impressions of faces, especially for female than for male faces. Study 1c: Analysis of Ma et al. (2015) Studies 1a and 1b used the same face images. To test the robustness of the results of the previous studies, in Study 1c we run the same analyses (i.e., correlational analyses, 26 Running Head: GENDER BIASES IN PERSON IMPRESSIONS PCAs) on a preexisting face rating dataset involving different sets of face images, impressions, and participants (Ma, Correll, & Wittenbrink, 2015). Methods Participants. For the impression trait ratings previously collected by Ma and colleagues (2015), over 1,087 participants (≥ 308 males, ≥ 552 females, ≥ 227 unreported) had rated face images on various traits. Rater gender was not included in the norming data set and was not included in our analyses. Stimuli. Naturalistic face photos with direct gaze and resting expressions were used. The individuals were amateur actors with no facial hair, earrings, eyeglasses, or visible make-up, all wearing grey T-shirts: 597 photos of 109 self-identified Asian (57 females), 197 Black (104 females), 108 Latino (56 females), and 183 white actors (90 females) between the ages of 17–65 were used (Ma et al., 2015). Non-white faces were included (69% of all faces), unlike Studies 1a and 1b. Prior work showed that facial evaluation space is largely common across different face races (e.g., Sutherland, Liu, Zhang, Chu, Oldmeadow, & Young, 2017a). Procedure. Participants rated 290 male and 307 female face photos on 15 traits – how afraid, angry, attractive, babyfaced, disgusted, dominant, feminine, happy, masculine, racially prototypical, sad, surprised, threatening, trustworthy, or unusual each individual looked. Ma et al. (2015) asked each participant to form impressions of individuals from photos on multiple attributes (e.g., “Consider the person pictured above and rate him/her with respect to other people of the same race and gender. – Fearful/Afraid (1=Not at all; 7=Extremely)”). As in Studies 1a and 1b, two analyses were conducted to test the gender differences in face impressions. First, to compare the in/dependency of perceived traits 27 Running Head: GENDER BIASES IN PERSON IMPRESSIONS between genders, we contrasted the absolute values of the coefficients between male and female face ratings. We calculated pairwise correlational coefficients among all 15 trait ratings ( "%!$ =105 pairs) for each gender. Second, to test whether raters showed less differentiated impressions for female than male faces, we calculated the amount of the variance explained by the first component for each gender. Results and Discussion Female face impressions were, again, less differentiated and more valence-laden than male face impressions. As in Studies 1a and 1b, the correlational coefficients were significantly different, with female ratings being more strongly intercorrelated (M|r| = 0.33, SD|r| = 0.23) than male ratings (M|r| = 0.27, SD|r| = 0.21; χ2(105) = 2223.24, P < .001; Fig. 2A, bottom), indicating a higher level of dependency between traits in female face impressions. As in Studies 1a and 1b, for both genders PC1 was highly loaded on by all positive or negative traits (Fig. 1B, bottom). Correspondingly, the amount of variance explained by PC1, a proxy of valence, was larger for female than male ratings (40.87% vs. 31.60%; Fig. 2B, bottom; Table S1), indicating a higher level of dependency of female ratings on valence. The 15 traits used in Study 1c were different from the 14 traits rated in Studies 1a and 1b. The traits in Studies 1a and 1b were determined based on the frequency of mention in unconstrained verbal descriptions of face images. This difference may explain the small difference in the results between the studies (e.g., overall lower level of inter-trait correlation in Study 1c). All in all, the results in Study 1c replicate what we found in Studies 1a and 1b: Face impressions of women are less differentiated and are more highly valenceladen than those of men. 28 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Studies 2 and 3: Computational Models of Male and Female Face Impressions Study 2: Building Gender-Specific Impression Models Study 1 showed that face impressions of women are less differentiated, and are more highly valence-laden than those of men. Studies 2 and 3 examine the basis of this phenomenon: How is the gender difference in impressions related to people’s use of facial information when forming impressions (e.g., using facial masculinity to form an impression of dominance) of men and women? Does the gender difference stem from (1) earlier, lower-level, differences in perception of male and female faces or (2) later, higher-level, evaluative differences? Specifically, people may either (1) use different facial information when forming impressions of men and women or (2) use the same facial information when forming impressions of both men and women but evaluate this information differently. We built and validated impression models to test these possibilities. We model impressions for each gender rather than modeling them collapsed across genders as in previous work (Oosterhof & Todorov, 2008). To investigate the extent to which people use dis/similar facial information to form impressions of men and women, we built data-driven models of impressions for male and female faces and calculated the similarities between these gender-specific face impression models. Data-driven face modeling reveals facial information that correlates with an impression with little prior assumptions of what information matters (Funk, Walker, & Todorov, 2016; Jack & Schyns, 2017; Oosterhof & Todorov, 2008; Todorov et al., 2011; Todorov & Oosterhof, 2011; Walker & Vetter, 2009; 2016). In prior work (Oosterhof & Todorov, 2008), participants rated randomly generated faces on a trait (e.g., how competent they looked). The faces were generated from a multidimensional, statistical face 29 Running Head: GENDER BIASES IN PERSON IMPRESSIONS space, where each face is represented as a point in this space. In this approach, each impression model is a vector in the space that visualizes holistic changes in facial appearance that make the face appear more trait-like (e.g., more competent). All prior computational models were built irrespective of the gender of faces. Here, we built models of impressions of trustworthiness and dominance separately for male and female faces. Methods Participants. Five-hundred-and-ten MTurk online workers living in the U.S. (233 males, 256 females, 21 unreported) participated for monetary reward. Stimuli and Procedure. Previous data-driven computational models of face impressions were based on trait impression ratings of randomly generated faces from a multidimensional, statistical face space (Oosterhof & Todorov, 2008). To build new models separately for male and female faces, we generated 301 male faces and 301 female faces with FaceGen 3.2 (Singular Inversions). The FaceGen model is based on a database of actual male and female faces that were laser-scanned in 3D. These sample faces consist of individuals of diverse races (e.g., East Asian, Black, West Asian, white). In the FaceGen model, a face is a point in a 100-dimensional face space. The 100 dimensions are orthogonal to each other, and are chosen to capture large variance in the appearance of individual faces. Moving a face along a dimension in this space results in a holistic change in the shape and reflectance (i.e., texture and coloration) of the face in a specific way. The shape and the reflectance of a face are determined by 50 shape and 50 reflectance parameters. We generated male and female faces (300 each) by randomly selecting each parameter from a normal distribution. We used a single set of 300 source faces and made them more male- or female-like, so that all the male and all the female faces were centered 30 Running Head: GENDER BIASES IN PERSON IMPRESSIONS around the average male and the average female faces in the FaceGen database, respectively. This resulted in paired images of male and female faces (Fig. S4). All stimuli are available at Open Science Framework: https://osf.io/ycv72/. Each participant rated either 51 male or 51 female faces on one of two impressions – trustworthiness or dominance. We chose these two impressions, because they are the best approximations of the valence and power dimensions underlying face impressions, and they are highly distinctive from each other (Oosterhof & Todorov, 2008; Sutherland et al., 2013; Todorov, Said, Engell, & Oosterhof, 2008). We kept the same participants from participating in both “trustworthiness” and “dominance” rating tasks. The random 51 faces of each gender consisted of 50 random faces from the pool of 300 random faces and the gender-specific average face. Participants were told that there were no right or wrong answers and that we were interested in their first impression or “gut response”. The faces were presented twice in two separate blocks for the reduction of the measurement error and the screening of unreliable raters’ data. We calculated each rater’s intra-rater reliability by correlating their ratings from the two different blocks. The ratings from participants with zero or negative intra-rater reliability were excluded, which left us with responses of 418 participants (181 males, 217 females, and 20 unreported). Each face was presented in color (512 × 512 pixels with the height of the face being about 440 pixels) with a question below it, “How [trait term, e.g., trustworthy] is this [man/woman]? 1 Not at all [trait term] - 9 Extremely [trait term]”. Each face was visible until the participant responded, the ITI was 250 ms, and the order of faces was randomized. The ratings of trustworthiness and dominance were negatively correlated for both genders, but the correlation was stronger for female faces (Fig. S5). 31 Running Head: GENDER BIASES IN PERSON IMPRESSIONS To create gender-specific computational trait models – male trustworthiness model, female trustworthiness model, male dominance model, and female dominance model – we averaged the two ratings per face across participants for each trait. For each genderspecific model, we computed the contribution of each of the 50 shape and the 50 reflectance parameters to the average trait impression ratings of the 300 faces, following the previous data-driven statistical approach (Oosterhof & Todorov, 2008; Todorov & Oosterhof, 2011). The mean impression ratings of the 301 faces and the values for a single parameter (out of 100) of the 301 faces are essentially two vectors with 301 elements each. To create one parameter of an impression model, the cross-product of these two vectors were summed across faces, and then were normalized across parameters. The 100 parameters of the model represented the amount of variation that would induce a 1SD change in the trait impression rating. Results and Discussion The resulting gender-specific statistical impression models are shown in Figure 4. Both models derived from male and female face ratings are similar to existing statistical models of impressions (e.g., Oosterhof & Todorov, 2008). Further, the gender-specific models represent similar facial information found in prior research: As both male and female faces are manipulated to appear more trustworthy, their expressions become more positive and vice versa (Oosterhof & Todorov, 2008; Sutherland et al., 2013; Walker & Vetter, 2009). Likewise, as both male and female faces are manipulated to appear more dominant, they become more masculine and facially mature (Oosterhof & Todorov, 2008; Sutherland et al., 2013; Zebrowitz, 2005; Zebrowitz & Montepare, 2008). 32 Running Head: GENDER BIASES IN PERSON IMPRESSIONS 33 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Figure 4. Gender-specific models of facial impressions of trustworthiness and dominance applied to novel synthetic male and female face images in Study 3a. The trustworthiness (A, B) and dominance impression models (C, D) derived from ratings of female (the top row of each subpanel) and male faces (the bottom row of each subpanel) were applied to a sample male (A, C) and female face (B, D). Both male and female models could manipulate the impression of both male and female faces, showing that facial information used to form these key impressions is similar across face genders. Within each impression, male and female models were highly positively correlated, suggesting that similar information is used when people form impressions of male and female faces (trustworthiness: ρ = .68, dominance: ρ = .85; Fig. S6, Table S3). However, trustworthiness and dominance models were more strongly negatively correlated in the female (ρ = -.38) than in the male models (ρ = -.16), suggesting that these models are more similar for female than for male faces. This is consistent with the data from Studies 1a–1c, in which face impressions of women were less differentiated than those of men. Role of Raters’ Gender. The correlation between the female trustworthiness and dominance models was stronger for female (ρ = -.42) than male raters (ρ = -.30), whereas the correlation between the male trustworthiness and dominance models was comparable for female (ρ = -.16) and male raters (ρ = -.16). These findings show that female raters relied on more similar visual information when forming impressions of trustworthiness and dominance of female faces than did male raters. This is consistent with the data from Study 1b, in which female raters showed less differentiated impressions of female than of male faces. 34 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Study 3a: Cross-Gender Validation of Models with Synthetic Face Images In Study 3, we cross-validated the gender-specific impression models. Specifically, we manipulated the level of perceived traits of novel images of male and female faces (Fig. S7), and asked a new group of participants to rate the faces on the respective traits. The first objective of this study was to test whether the models of impressions successfully capture the changes in facial appearance that lead to changes in impressions. The second, more important objective was to test whether the gender-specific impression models work better when applied to a congruent face (e.g., when a male model is applied to a male face). We test two possible outcomes in Studies 3a and 3b. If the face models apply better to the gender-congruent faces (e.g., male models apply better to male than female faces) despite the similarity of the gender-specific models observed in Study 2, then it would suggest that the gender differences in impressions are likely due to lower-level, perceptual (rather than later, evaluative) differences in the usage of visual cues when forming facial impressions of men and women. In contrast, if the face models apply equally well to male and female faces (e.g., male models apply equally well to male and female faces), it would suggest that the gender differences in impressions are likely due to later, evaluative (rather than lower-level, perceptual) differences when forming facial impressions of men and women. Such a result would suggest that people use similar visual information when forming impressions of male and female faces, although they interpret this information differently (e.g., whereas a masculine male face is evaluated positively, a masculine female face is evaluated negatively; Sutherland et al. 2015; Oh, Buck, & Todorov, 2019). 35 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Methods Participants. Two-hundred-and-sixty-eight MTurk online workers living in the U.S. (121 males, 147 females) participated for monetary reward. Stimuli and Procedure. To validate the gender-specific face models, we generated faces that reflected the impression change in each model. First, using a procedure similar to previous validation studies (Todorov et al., 2013), we generated 25 new faces: We generated 1,000 faces whose positions on the 100 parameters were independently sampled from 100 normal distributions. From the 1,000 random faces, 25 faces that physically differed maximally to one another were chosen (i.e., faces with highest average Euclidean distance to each other; Fig. S7). We used a single set of 25 faces and made them more male- or female-like. All stimuli are available at Open Science Framework: https://osf.io/ycv72/. Second, we manipulated each face with the trait models. We varied the face parameters of the 50 faces by adding -3, -2, -1, 0, 1, 2, and 3SDs, with the 0SD addition being null manipulation (Supplemental Text). There were 4 gender-specific models – male trustworthiness, female trustworthiness, male dominance, and female dominance models. This resulted in 1,400 faces (2 [face gender] × 25 [identities] × 7 [manipulation levels] × 4 [model]). Each participant rated either male or female faces manipulated by either male or female model of either trustworthiness or dominance. We kept the same participants from participating in more than one task (e.g., participating in both the “male trustworthiness” rating and the “female dominance” rating tasks). The face stimuli were presented in color on the screen (512 × 512 pixels with the height of the face being about 440 pixels). 36 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Participants were told that there were no right or wrong answers and that we were interested in their first impression: 175 faces were presented in a random order (25 [identities] × 7 [levels]), followed by presentation of 25 randomly chosen faces from the previously presented faces without any noticeable break. The 25 faces were repeated for the calculation of test-retest reliability. The ratings from participants with zero or negative intra-rater reliability were excluded, which left us with ratings of 247 raters (107 males, 140 females). Each face gender × model trait × model gender condition had ≥ 30 raters. The ratings of trustworthiness and dominance were highly reliable irrespective of whether the gender of the original faces and the model were identical (αmin = .96) or not (αmin = .96; Table S4). To assess how well the models varied the intended impressions, we ran linear and quadratic regressions for the trait ratings of male and female faces with the level of model manipulation as the predictor. To determine whether the gender-specific models are more successful when applied to a congruent face (e.g., a female model applied to a female face), we then compared the predictive powers of the models across genders. We ran repeated measures ANOVAs on Fisher’s z scores, transformed from the correlations between the observed and predicted ratings of the regressions. Results and Discussion All linear and quadratic models explained significant amount of variance in the impression ratings regardless of whether the model gender and the face gender were congruent (linear: mean R2s > .94, quadratic: mean R2s > .96; Fig. 5) or not (linear: mean R2s > .94, quadratic: mean R2s > .97). Thus, the effectiveness of the trait model was not affected by the congruency between the face gender and the model gender. 37 Running Head: GENDER BIASES IN PERSON IMPRESSIONS To further assess the relative effectiveness of the models, we ran a repeated measures ANOVA on the correlations between the predicted and observed ratings, transformed to Fisher’s z scores, for both the linear and quadratic regression models. We found a main effect of model trait for the linear (F(1,24) = 5.19, P = .03, ηG2 = .05) and the quadratic regression models (F(1,24) = 5.98, P = .02, ηG2 = .03). Specifically, the dominance models predicted the ratings better (linear: Mz = 2.42, SDz = 0.36; quadratic: Mz = 2.75, SDz = 0.42) than the trustworthiness models did (linear: Mz = 2.26, SDz = 0.35; quadratic: Mz = 2.63, SDz = 0.35). We also found a main effect of face gender for the linear (F(1,24) = 10.52, P < .01, ηG2 = .04) and quadratic regression models (F(1,24) = 4.40, P < .05, ηG2 = .02). Specifically, the ratings of male faces were better predicted by the models (linear: Mz = 2.41, SDz = 0.37; quadratic: Mz = 2.74, SDz = 0.40) than the ratings of female faces (linear: Mz = 2.28, SDz = 0.35; quadratic: Mz = 2.65, SDz = 0.38). These two effects were not predicted and were relatively small in size. Only for the quadratic regression model, we found a significant model gender × face gender interaction (F(1,24) = 7.09, P = .01, ηG2 = .03), indicating that the male models were better at manipulating impressions of female faces (Mz = 2.75, SDz = 0.42) than male faces (Mz = 2.71, SDz = 0.32), whereas the female models were better at manipulating impressions of male faces (Mz = 2.76, SDz = 0.47) than female faces (Mz = 2.54, SDz = 0.32). In sum, the models could generate faces varying on the intended impressions within and across gender. The effect of gender congruency was found only in the quadratic models: The direction of the effect was the opposite of what was hypothesized, and the effect size was relatively small. We conducted another validation study to test the robustness of these effects. 38 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Figure 5. Validation of models of trustworthiness (top) and dominance (bottom) with synthetic male (left) and female faces (right). Linear (gray) and quadratic (black) fit of ratings of trustworthiness as a function of the female (solid) and male (dashed) model values of the faces. The mean coefficient of determination (R2) and unstandardized coefficient (b1) averaged across faces per model are displayed. Error bars denote ±SE. 39 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Study 3b: Cross-Gender Validation of Models with Real-Life Face Images In Study 3a, we manipulated male and female faces to look more/less trustworthy or dominant using male and female impression models, and calculated how well the models predicted the ratings of gender-congruent and incongruent faces. We found no evidence for gender specificity in the models’ capacity to manipulate impressions. If anything, for the quadratic models, the predicted interaction was in the opposite direction. These findings suggest that people may be using similar visual information when forming impressions of male and female faces. To assess the robustness of these results, in Study 3b, we conducted another validation study using real-life face images. Specifically, using the gender-specific models, we manipulated the level of perceived traits of novel real-life images of male and female faces. We then asked a new group of participants to rate the faces on the respective traits. Methods Participants. Two-hundred-and-ninety-two MTurk online workers living in the U.S. (127 males, 164 females, and 1 other gender) participated for monetary reward. Stimuli and Procedure. Photos of male and female faces (25 each) were randomly selected from a face database (DeBruine & Jones, 2017), consisting of naturalistic face photos with direct gaze and resting expressions without any eyeglasses or visible make-up, all wearing white T-shirts: 50 photos of 8 self-identified East Asian (4 females), 8 West Asian (3 females), 12 Black (5 females), and 22 white actors (13 females) between the ages of 19–37 were used (Fig. S8). Non-white faces comprised 56% of all faces. The face-space model on which the gender-specific models are built is based on the 3D scans of white and non-white faces, and the face-space model can capture the trait-related facial variance 40 Running Head: GENDER BIASES IN PERSON IMPRESSIONS within and across different facial races. All stimuli are available at Open Science Framework: https://osf.io/ycv72/. Using the impression models, we manipulated the faces along the respective traits. As in Study 3a, we prepared seven facial variations, including the original face, for each face identity × impression model. To apply the models, we transformed the initial face images along the gender-specific impression models from Study 2, using PsychoMorph (Tiddeman, Burt, & Perrett, 2001). First, we created synthetic faces that represented these models, extreme faces that are -4 and 4SD deviant from the average FaceGen face on each model. Next, we used the transformation function of PsychoMorph to change the initial 25 male and 25 female real-life face images (Fig. S8) along the continuum of the difference between the two extreme face images, for each impression. Unlike the standard morphing procedure, which is a direct transition between two images, the transformation procedure in PsychoMorph allows users to manipulate a single image along a continuum and generate photo-realistic images (Sutherland, Rhodes, & Young, 2017b). On the most extreme ends, each face image was transformed 40% towards the -4 or 4SD model face. The value of 40% was chosen because any stronger manipulation caused distortional artifacts on the images. The manipulation magnitude was identical for the intervals between the 7 facial variations: The final face images were transformed -40%, -26.67%, -13.33%, 0%, 13.33%, 26.67%, and 40% from the initial faces (Fig. 6). To maintain the ostensible gender and ethnicities of the original faces, we only used the variation in the face shape of the models. As in Study 3a, there were 4 gender-specific models. This resulted in 1,400 faces (2 [face gender] × 25 [identities] × 7 [manipulation levels] × 4 [model]). 41 Running Head: GENDER BIASES IN PERSON IMPRESSIONS As in Study 3a, each participant rated either male or female faces manipulated by either male or female model of either trustworthiness or dominance. We kept the same participants from participating in more than one rating task. The face stimuli were presented in color (512 × 512 pixels with the height of the face being about 440 pixels). The rating design was identical with Study 3a: 175 faces were presented first (25 [identities] × 7 [levels]), followed by presentation of 25 randomly chosen faces from the previously presented faces without any break. The 25 faces were repeated for the calculation of test-retest reliability. The ratings from participants with zero or negative intra-rater reliability were excluded, which left us with ratings of 239 raters (138 males, 100 females, and 1 other gender). Each face gender × model trait × model gender condition had ≥ 30 raters. The ratings of both trustworthiness and dominance were reliable irrespective of whether the gender of the original faces and the model were identical (αmin = .88) or not (αmin = .83; Table S5). To assess how well the models varied the intended impressions of faces, we adopted the same analyses as those in Study 3a – regressions for the trait ratings and repeated measures ANOVAs on Fisher’s z scores converted from the correlations between the observed and predicted ratings of the regressions. 42 Running Head: GENDER BIASES IN PERSON IMPRESSIONS 43 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Figure 6. The gender-specific models of facial impressions of trustworthiness and dominance applied to real-life face images in Study 3b. The trustworthiness (A, B) and dominance impression models (C, D) derived from ratings of female (the top row of each subpanel) and male faces (the bottom row of each subpanel) were applied to a sample male (A, C) and female faces (B, D). Both male and female models could manipulate the impression of both male and female faces, showing that facial information used to form these key impressions is similar across face genders. Results and Discussion As in Study 3a, all models significantly explained the impression ratings regardless of whether the model gender and the face gender were congruent (linear: mean R2s >. 54, quadratic: mean R2s >. 68; Fig. 7) or not (linear: mean R2s >. 63, quadratic: mean R2s >. 68). To further assess the relative effectiveness of the models, we ran a 2 [face gender] × 2 [model trait] × 2 [model gender] repeated measures ANOVA on Fisher’s z scores converted from the correlations between the predicted and observed ratings for both the linear and quadratic regression models. We found a significant main effect of model trait for the linear (F(1,24) = 82.73, P < .001, ηG2 = .35) and the quadratic regression models (F(1,24) = 75.49, P < .001, ηG2 = .33). Specifically, the dominance models predicted the ratings better (linear: Mz = 1.87, SDz = 0.45; quadratic: Mz = 2.11, SDz = 0.48) than the trustworthiness models did (linear: Mz = 1.22, SDz = 0.49; quadratic: Mz = 1.46, SDz = 0.49). The same effect was observed in Study 3a. 44 Running Head: GENDER BIASES IN PERSON IMPRESSIONS We also found a main effect of model gender for the linear (F(1,24) = 4.67, P < .05, ηG2 = .02) and the quadratic regression models (F(1,24) = 4.93, P = .04, ηG2 = .02). Specifically, models derived from ratings of male faces were more effective in manipulating impressions (linear: Mz = 1.61, SDz = 0.52; quadratic: Mz = 1.86, SDz = 0.51) than models derived from ratings of female faces (linear: Mz = 1.49, SDz = 0.61; quadratic: Mz = 1.71, SDz = 0.64). We also found a significant model trait × face gender interaction for the linear (F(1,24) = 7.84, P < .01, ηG2 = .03) and the quadratic regression models (F(1,24) = 8.66, P < .01, ηG2 = .03), indicating that dominance models were more effective at manipulating female (linear: Mz = 1.91, SDz = 0.50; quadratic: Mz = 2.22, SDz = 0.52) than male faces (linear: Mz = 1.84, SDz = 0.39; quadratic: Mz = 2.00, SDz = 0.41), whereas the trustworthiness models were more effective at manipulating male (linear: Mz = 1.34, SDz = 0.49; quadratic: Mz = 1.51, SDz = 0.53) than female faces (linear: Mz = 1.09, SDz = 0.45; quadratic: Mz = 1.42, SDz = 0.46). None of these two effects were observed in Study 3a. Across the two validation studies with computer-generated and real-life faces, the only consistent finding was that the dominance impression models could generate faces varying on the intended impressions better than the trustworthiness impression models could, irrespective of the face gender or the model gender. All in all, the models were capable of generating faces varying on the intended impressions within and across gender, showing no evidence for gender specificity. 45 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Figure 7. Validation of models of trustworthiness (top) and dominance (bottom) with real-life male (left) and female face images (right). Linear (gray) and quadratic (black) fit of ratings of trustworthiness as a function of the female (solid) and male (dashed) model values of the faces. The mean coefficient of determination (R2) and unstandardized coefficient (b1) averaged across faces per model are displayed. Error bars denote ±SE. 46 Running Head: GENDER BIASES IN PERSON IMPRESSIONS General Discussion People form appearance-based impressions of men and women differently. Women whose faces appear more dominant, for example, are perceived as less trustworthy, while the perceived dominance of men does not affect impressions of their trustworthiness (Sutherland et al., 2015; 2017a). In the current series of studies, we addressed two questions: Are impressions of women less differentiated than impressions of men in general and what is the source of this difference in impressions? Answering these questions is important, because simplified impressions of women can lead to unfair treatment: Women are evaluated negatively when they behave not “woman-like” (e.g., assertive; Moss-Racusin et al., 2012; Rudman, 1998; Rudman & Phelan, 2008). By analyzing multiple datasets of facial impressions and building separate face models of impressions of men and women, we found that (1) first impressions from faces are less differentiated for women and importantly that (2) visual information used to form impressions of trustworthiness and dominance is highly similar for men’s and women’s faces. Combined, these findings suggest that people use the same visual information when forming impressions of men and women, but that this information is evaluated differently. These results emphasize the role of gender categorization in impression formation. The present paper also shows that trait impressions from faces are more strongly intercorrelated in those with stronger stereotypes. Finally, the present paper introduces the first separate data-driven impression models for male and female faces. Our first main finding was that participants held less differentiated (i.e., more highly intercorrelated) trait impressions of women than of men (Studies 1–2). Specifically, trait impressions of women varied from each other to a smaller degree and were more tied to 47 Running Head: GENDER BIASES IN PERSON IMPRESSIONS overall positivity/negativity evaluation than impressions of men (Study 1). Consistent with this finding, the models of trustworthiness and dominance impressions were more similar for female than for male faces (Study 2). These findings confirm ambivalent sexism (Glick & Fiske, 1996, 2011) in the domain of visual perception, corroborating previous research (Sutherland et al., 2015). The theory of ambivalent sexism posits that women are evaluated positively as far as they are stereotyped into restricted traits and roles (e.g., being helpful), unlike men whose impression valence is less dependent on stereotypes. This theory would predict that women whose appearance evokes counterstereotypical trait impressions (e.g., assertiveness) are likely to be evaluated more negatively than men whose appearance evokes counterstereotypical trait impressions (e.g., tenderness). This gender difference will lead to a higher level of intercorrelations between the impressions of women than between those of men. In the same vein, our findings suggest that the backlash effect, a phenomenon in which women who violate prescriptions of feminine traits receive social/economic penalties (Rudman, 1998; Rudman & Glick, 2001; Rudman & Phelan, 2008), generalizes to visual perception. Qualities perceived as traditionally masculine lead to more negative impressions of women (e.g., less likable): Women with dominant facial looks (Sutherland et al., 2015), assertive attitude (Rudman, 1998; Rudman & Glick, 2001; Rudman & Phelan, 2008), or work competency (Hagen & Kahn, 1975) are evaluated more negatively than women with the opposite qualities. Although qualities perceived as feminine usually lead to more negative impressions of men (Derlega & Chaikin, 1976; Heilman & Wallen, 2010; Moss-Racusin et al., 2010), the effects of counterstereotypical looks on men are weaker (Sutherland et al., 2015) and sometimes even beneficial. Men (and women) with a feminine face shape, for example, are perceived as attractive (Perrett et al., 1998; Rhodes, Hickford, 48 Running Head: GENDER BIASES IN PERSON IMPRESSIONS & Jeffery, 2000; Said & Todorov, 2011; but see Rhodes, 2006). In sum, when gender norms are violated, women face harsher penalties than men. Because people judge traits from faces, including norm-related traits, and face evaluation broadly depends on the valence of the impression, the stronger negative consequences of norm violation for women can lead to stronger links between multiple trait judgments. The high level of dependency on valence (the first principal component) and the weak dependency on the second component in female impressions highlights an overall low level of dependency of impressions on dominance for female faces (Fig 2, Table. S1). While a smaller variance in dominance ratings in female faces could explain the low dominance dependency, dehumanization of women might also underlie the phenomenon. A person is perceived as less agentic and powerful, thus less dominant, when they are perceived as more object-like (Haslam, 2006). Women are often perceived as more objectlike (i.e., lacking autonomy) than men are (Nussbaum, 1999). The dehumanization account would also explain the simpler trait structure in impressions of female faces. The level of impression differentiation was affected by the characteristics of those who formed the impressions. Raters who were more likely to endorse gender stereotypes showed less differentiated impressions of both men and women than raters who were less likely to endorse these stereotypes. This finding supports the idea that gender differences in impression differentiation result from evaluative processes triggered by gender categorization and, more generally, the idea that trait impressions of a group can be more strongly or weakly intercorrelated depending on the perceiver’s stereotypes of the group’s traits (Secord & Berscheid, 1963; Stolier, Hehman, & Freeman, 2017). Those who strongly expect gender-stereotypic qualities in others would show a smaller within-gender 49 Running Head: GENDER BIASES IN PERSON IMPRESSIONS variation and a bigger between-gender variation in impressions of others than those who do not. This could lead to stronger intercorrelations between impressions in those who strongly endorse gender stereotypes. These individuals may evaluate others based on their gender category, because gender stereotypes are highly accessible to them during impression formation (Higgins, 1996; Lepore & Brown, 1997). Participant gender also affected the level of impression differentiation. Surprisingly, it was female raters who showed less differentiated impressions of women (Study 1b). Correspondingly, models of trustworthiness and dominance impressions derived from female ratings were also more similar than the models derived from male ratings (Study 2). This participant gender effect may seem surprising, because female participants are less likely to endorse gender stereotypes or traditional sex roles than male participants (Study 1b; Glick & Fiske, 1996; Swim et al., 1995; Williams & Best, 1990). However, female participants have previously been found to show person evaluation consistent with gender stereotypes (e.g., Garcia-Retamero & López-Zafra, 2006; Goldberg, 1968; Parks-Stamm et al., 2007; Rudman, 1998). In the case of visual person impressions, female participants’ more simplified female impressions might arise because of their sensitivity to the a/typicality of female faces. Female raters might have a clearer prototype of female faces than male raters and therefore might be more sensitive to the typicality in female faces. Given that facial typicality affects impression valence (Dotsch, Hassin, & Todorov, 2016; Sofer, Dotsch, Wigboldus, & Todorov, 2015), this may explain why female participants showed lower level of impression differentiation and higher level of valence-dependency for female (vs. male) faces. Bolstering such a possibility is women’s better recognition of female faces, relative to men (Ellis, Shepherd, & Bruce, 1973; Lewin & Herlitz, 2002; 50 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Rehnman & Herlitz, 2006). Women have been repeatedly found to be better at recognizing female faces than men, while there have been little cross-gender observer differences in recognizing male faces. This specific female-on-female recognition superiority suggests that women may be better at noticing facial differences across individual females although this ability could be based on other mechanisms (e.g., better encoding of other females’ faces). If this ability of female raters contributes to their higher level of sensitivity to facial typicality of female faces, this could lead to their lower impression differentiation level and higher level of valence-dependency for female faces. Our second main finding was that the models of trustworthiness and dominance impressions of male and female faces were based on similar facial information (Studies 2 and 3). These are the first computational models of trustworthiness and dominance impressions built separately for men and women. Correlational analyses (Study 2) and cross-gender validations of these models (Study 3) consistently found that similar facial information is used in these key impressions of both genders. Specifically, the model visualizations (Figs. 4 and 6) show that resemblance to emotional expressions is a key input to trustworthiness impressions (Engell, Todorov, & Haxby, 2010; Hess, Blairy, & Kleck, 2000; Keating, Mazur, & Segall, 1981; Montepare & Dobish, 2003; Oosterhof & Todorov, 2008; 2009; Said, Sebe, & Todorov, 2009; Sutherland et al., 2013; Zebrowitz & Montepare, 2008) for both male- and female-based models. Consistent with prior models (Oosterhof & Todorov, 2008; Sutherland et al., 2013; Walker & Vetter, 2009), as the faces are manipulated to look more trustworthy, they acquire more positive expressions. In contrast, as the faces are manipulated to look less trustworthy, they acquire more negative expressions. For dominance impressions, the key inputs are masculinity and facial maturity 51 Running Head: GENDER BIASES IN PERSON IMPRESSIONS (Oosterhof & Todorov, 2008; Sutherland et al., 2013; Zebrowitz, 2005; Zebrowitz & Montepare, 2008) and to smaller extent similarity to angry facial expressions (Hareli, Shomrat, & Hess, 2009; Hess et al., 2000; Said et al., 2009). As the faces are manipulated to look more dominant, they become more masculine, facially mature, and acquire more negative expressions. In contrast, as the faces are manipulated to look less dominant, they become more feminine, babyfaced, and acquire more positive expressions. Comparing the male- and female-based models, it is also possible to see specific gender differences, especially for the models of trustworthiness impressions. For example, faces on the positive end of the female trustworthiness model are more light-skinned than faces on the positive end of the male trustworthiness model (Figs. 4 and 6), possibly reflecting real gender differences in the human face (Jablonski & Chaplin, 2000). However, as the cross-validation results showed, these gender differences did not matter for impressions. Impressions of male and female faces were successfully manipulated irrespective of whether the model was derived from ratings of female or male faces, and this was the case for both synthetic and real-life faces (Study 3)2. These findings clearly show that people use highly similar information when forming impressions of men and women on trustworthiness and dominance, but they evaluate this information differently. The high similarity in facial impression models is consistent with the findings of South Palomares and colleagues (2018) who built male- and female-specific face models that represent traits preferred in a romantic partner, and found that the two models are highly similar. By showing that there is little low-level perceptual differences in the While we did not observe differences in facial information used to form impressions of male and female faces, it is of course possible that more sensitive models could reveal such differences. 2 52 Running Head: GENDER BIASES IN PERSON IMPRESSIONS information people use to form impressions of male and female faces, the current findings highlight the importance of social categorization and the perceiver’s preconceptions about categories in person impressions (Fiske & Neuberg, 1990; Freeman, Penner, Saperstein, Scheutz, & Ambady, 2011; Hugenberg & Bodenhausen, 2004; Kramer, Young, Day, & Burton, 2017; Stolier & Freeman, 2016; 2017). Our conclusion that people evaluate information differently from male and female faces (rather than use different information for male and female faces) is inferred indirectly by combining the results of multiple studies: Despite the higher level of intercorrelations and valence-dependency in women’s impressions (Studies 1 and 2), both male- and femalespecific impression models could manipulate facial impressions irrespective of the gender of the face (Study 3); further, the gender stereotype endorsement of participants was predictive of the level of facial impression differentiation (Study 1b). Future research can test this conclusion in a more direct fashion. One, for example, can measure the stereotype endorsement level of participants whose ratings are used to build gender-specific models: Female impression models derived from those who strongly endorse gender stereotypes would be more similar to each other across traits than female models derived from those who do not endorse gender stereotypes. The gender difference in the structure of impressions has implications for both social perception theories and social justice. Although often visually ambiguous and conceptually continuous, gender is thought as categorical (Fiske & Neuberg, 1990) and people judge the gender of faces with ease and a high level of consensus (Hehman, Sutherland, Flake, & Slepian, 2017). Moreover, gender-related differences in facial features are easily detectable from faces (Burriss, Little, & Nelson, 2007; Schyns, Bonnar, & Gosselin, 53 Running Head: GENDER BIASES IN PERSON IMPRESSIONS 2016), and are processed in the early stages of face perception (Cellerino et al., 2007; Mouchetant-Rostaing, Giard, Bentin, Aguera, & Pernier, 2000; Mouchetant-Rostaing & Giard, 2003; Welling, Bestelmeyer, Jones, DeBruine, & Allan, 2017). Further, facial information that correlates with gender (e.g., facial masculinity) shapes the formation of person impressions (Oh et al., 2019; Oosterhof & Todorov, 2008; Sutherland et al., 2013; 2015). For social perception theories, our findings suggest that gender categorization shapes the overall pattern of person impression formation over and beyond associations between a few facial features and trait impressions. Beyond the differences in impressions of male and female faces, our methods can be extended to test for similar effects for other meaningful face subcategories in which one or more subcategories are more stereotyped or prejudiced against than the others (e.g., race and ethnicity). For instance, impressions of Black individuals might be less differentiated and more valence-laden than impressions of white individuals when rated by individuals living in the U.S. Further research could identify to what extent one could attenuate or reverse the gender difference in facial evaluation induced by the face category. While we employed static facial images here, non-static facial cues (e.g., dynamic facial gestures; Gill, Garrod, Jack, & Schyns, 2014) and non-facial cues (e.g., clothes; Freeman et al. 2011; Oh, Shafir, & Todorov, 2019) strongly affect social perception. Gill et al. (2014), for example, found that trait-related facial movements (e.g., smiling, which leads to trustworthiness impressions) override trait-related static facial information (e.g., untrustworthy-looking face). When a less female-stereotypic face, for example, presents a female-stereotypic facial gesture (e.g., 54 Running Head: GENDER BIASES IN PERSON IMPRESSIONS smiling), the negative effects of less impression differentiation in female impressions might diminish. For social justice, our findings suggest another contributing factor to gender discrimination. Women with counterstereotypical appearance are perceived more unfavorably and discriminated more harshly than men with counterstereotypical appearance (Rudman, 1998; Rudman & Phelan, 2008; Sutherland et al., 2015). Given the importance of first impressions (Ballew & Todorov, 2007; Blair et al., 2004; Eberhardt et al., 2006; Funk & Todorov, 2013; Olivola & Todorov, 2010; Todorov et al., 2005), less differentiated impressions of women would result in evaluative inferences that may penalize women more strongly than men for not fitting the expected stereotypes. In sum, our findings show that people have less differentiated and more valenceladen impressions of women than of men, although these impressions are based on similar visual information. These findings suggest that discrimination against women can start from the moment of forming first impressions, as women with counterstereotypical looks are likely to be evaluated negatively. 55 Running Head: GENDER BIASES IN PERSON IMPRESSIONS References Abele, A. E. (2003). The dynamics of masculine-agentic and feminine-communal traits. Findings from a Prospective Study. Journal of Personality and Social Psychology, 85, 768776. Antonakis, J., & Dalgas, O. (2009). Predicting elections: Child's play! Science, 323(5918), 1183–1183. http://doi.org/10.1126/science.1167748 Ballew, C. C., & Todorov, A. T. (2007). Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences of the United States of America, 104(46), 17948–17953. http://doi.org/10.1073/pnas.0705435104 Bem, S. L. (1974). The measurement of psychological androgyny. Journal of Consulting and Clinical Psychology, 42(2), 155–162. Blair, I. V., Judd, C. M., & Chapleau, K. M. (2004). The influence of Afrocentric facial features in criminal sentencing. Psychological Science, 15(10), 674–679. http://doi.org/10.1111/j.0956-7976.2004.00739.x Broverman, I. K., Vogel, S. R., Broverman, D. M., Clarkson, F. E., & Rosenkrantz, P. S. (1972). Sex-role stereotypes: A current appraisal. Journal of Social Issues, 28(2), 59–78. http://doi.org/10.1111/j.1540-4560.1972.tb00018.x Burriss, R. P., Little, A. C., & Nelson, E. C. (2007). 2D:4D and sexually dimorphic facial characteristics. Archives of Sexual Behavior, 36(3), 377–384. http://doi.org/10.1007/s10508-006-9136-1 Cellerino, A., Borghetti, D., Valenzano, D. R., Tartarelli, G., Mennucci, A., Murri, L., & Sartucci, F. (2007). Neurophysiological correlates for the perception of facial sexual dimorphism. 56 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Brain Research Bulletin, 71(5), 515–522. http://doi.org/10.1016/j.brainresbull.2006.11.007 Cislak, A. & Wojciszke, B. (2008). Agency and communion are inferred from actions serving interests of self or others. European Journal of Social Psychology, 38, 1103-1110. Cooper, J. C., Dunne, S., Furey, T., & O'Doherty, J. P. (2012). Dorsomedial prefrontal cortex mediates rapid evaluations predicting the outcome of romantic Interactions. Journal of Neuroscience, 32(45), 15647–15656. http://doi.org/10.1523/JNEUROSCI.255812.2012 Costrich, N., Feinstein, J., Kidder, L., Marecek, J., & Pascale, L. (1975). When stereotypes hurt: Three studies of penalties for sex-role reversals. Journal of Experimental Social Psychology, 11(6), 520–530. Cundiff, J. L., & Vescio, T. K. (2016). Gender stereotypes influence how people explain gender disparities in the workplace. Sex Roles, 75(3-4), 126–138. http://doi.org/10.1007/s11199-016-0593-2 Deaux, K., & Lewis, L. L. (1984). Structure of gender stereotypes: Interrelationships among components and gender label. Journal of Personality and Social Psychology, 46(5), 991– 1004. DeBruine, L. M., & Jones, B. C. (2017). Face Research Lab London Set. Figshare. http://doi.org/10.6084/m9.figshare.5047666 Derlega, V. J., & Chaikin, A. L. (1976). Norms affecting self-disclosure in men and women. Journal of Consulting and Clinical Psychology, 44(3), 376–380. Dotsch, R., Hassin, R. R., & Todorov, A. T. (2016). Statistical learning shapes face evaluation. Nature Human Behaviour, 1, Article 0001. 57 Running Head: GENDER BIASES IN PERSON IMPRESSIONS http://doi.org/https://doi.org/10.1038/s41562-016-0001 Dotsch, R., & Todorov, A. T. (2012). Reverse correlating social face perception. Social Psychological and Personality Science, 3(5), 562–571. http://doi.org/10.1177/1948550611430272 Eagly, A. H., & Mladinic, A. (1989). Gender stereotypes and attitudes toward women and men. Personality and Social Psychology Bulletin, 15(4), 543–558. http://doi.org/10.1177/0146167289154008 Eagly, A. H., & Steffen, V. J. (1984). Gender stereotypes stem from the distribution of women and men into social roles. Journal of Personality and Social Psychology, 46(4), 735–754. Eberhardt, J. L., Davies, P. G., Purdie-Vaughns, V. J., & Johnson, S. L. (2006). Looking deathworthy: Perceived stereotypicality of black defendants predicts capitalsentencing outcomes. Psychological Science, 17(5), 383–386. http://doi.org/10.1111/j.1467-9280.2006.01716.x Ellis, H. D., Shepherd, J. W., & Bruce, A. (1973). The effects of age and sex upon adolescents’ recognition of faces. The Journal of Genetic Psychology, 123(1), 173–174. http://doi.org/10.1080/00221325.1973.10533202 Engell, A. D., Todorov, A. T., & Haxby, J. V. (2010). Common neural mechanisms for the evaluation of facial trustworthiness and emotional expressions as revealed by behavioral adaptation. Perception, 39(7), 931–941. http://doi.org/10.1068/p6633 Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation, from categorybased to individuating processes: Influences of information and motivation on attention and interpretation. Advances in Experimental Social Psychology, 23, 1–74. Fiske, S. T., Cuddy, A. J. C., Glick, P., & Xu, J. (2002). A model of (often mixed) stereotype 58 Running Head: GENDER BIASES IN PERSON IMPRESSIONS content: Competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology, 82(6), 878–902. http://doi.org/10.1167/14.1.28 Freeman, J. B., Penner, A. M., Saperstein, A., Scheutz, M., & Ambady, N. (2011). Looking the part: Social status cues shape race perception. PLoS One, 6(9), e25107. http://doi.org/10.1371/journal.pone.0025107 Funk, F., & Todorov, A. T. (2013). Criminal stereotypes in the courtroom: Facial tattoos affect guilt and punishment differently. Psychology, Public Policy, and Law, 19(4), 466– 478. http://doi.org/10.1037/a0034736 Funk, F., Walker, M., & Todorov, A. T. (2016). Modelling perceptions of criminality and remorse from faces using a data-driven computational approach. Cognition and Emotion, 40(5), 1–13. Garcia-Retamero, R., & López-Zafra, E. (2006). Prejudice against Women in Male-congenial Environments: Perceptions of Gender Role Congruity in Leadership. Sex Roles, 55(1-2), 51–61. http://doi.org/10.1007/s11199-006-9068-1 Gill, D, Garrod, O. G. B., Jack, R. E., & Schyns, P. G. (2014). Facial movements strategically camouflage involuntary social signals of face morphology. Psychological Science, 25(5), 1079–1086. http://doi.org/10.1177/09567f97614522274 Glick, P., & Fiske, S. T. (1996). The ambivalent sexism inventory: Differentiating hostile and benevolent sexism. Journal of Personality and Social Psychology, 70(3), 491–512. Glick, P., & Fiske, S. T. (2011). Ambivalent sexism revisited. Psychology of Women Quarterly, 35(3), 530–535. http://doi.org/10.1177/0361684311414832 Glick, P., Fiske, S. T., Mladinic, A., Saiz, J. L., Abrams, D., Masser, B., et al. (2000). Beyond 59 Running Head: GENDER BIASES IN PERSON IMPRESSIONS prejudice as simple antipathy: Hostile and benevolent sexism across cultures. Journal of Personality and Social Psychology, 79(5), 763–775. http://doi.org/10.1037//00223514.79.5.763 Glick, P., Lameiras, M., Fiske, S. T., Eckes, T., Masser, B., Volpato, C., et al. (2004). Bad but bold: Ambivalent attitudes toward men predict gender inequality in 16 nations. Journal of Personality and Social Psychology, 86(5), 713–728. Goldberg, P. (1968). Are women prejudiced against women? Trans-Action, 5(5), 28–30. http://doi.org/10.1007/BF03180445 Gosselin, F., & Schyns, P. G. (2001). Bubbles: a technique to reveal the use of information in recognition tasks. Vision Research, 41(17), 2261–2271. Hagen, R. L., & Kahn, A. (1975). Discrimination against competent women. Journal of Applied Social Psychology, 5(4), 362–376. http://doi.org/10.1111/j.15591816.1975.tb00688.x Hareli, S., Shomrat, N., & Hess, U. (2009). Emotional versus neutral expressions and perceptions of social dominance and submissiveness. Emotion, 9(3), 378–384. http://doi.org/10.1037/a0015958 Haslam, N. (2006). Dehumanization: An integrative review. Personality and Social Psychology Review, 10(3), 252-264. http://doi.org/10.1207/s15327957pspr1003_4 Hehman, E., Sutherland, C. A. M., Flake, J. K., & Slepian, M. L. (2017). The unique contributions of perceiver and target characteristics in person perception. Journal of Personality and Social Psychology, 113(4), 513–529. http://doi.org/10.1037/pspa0000090 Heilman, M. E. (2001). Description and prescription: How gender stereotypes prevent 60 Running Head: GENDER BIASES IN PERSON IMPRESSIONS women's ascent up the organizational ladder. Journal of Social Issues, 57(4), 657–674. http://doi.org/10.1111/0022-4537.00234 Heilman, M. E., & Wallen, A. S. (2010). Wimpy and undeserving of respect: Penalties for men’s gender-inconsistent success. Journal of Experimental Social Psychology, 46(4), 664–667. http://doi.org/10.1016/j.jesp.2010.01.008 Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283. http://doi.org/10.1023/A:1006623213355 Higgins, E. T. (1996). Knowledge activation: Accessibility, applicability, and salience. In E. T. Higgins & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles (pp. 133–168). New York, NY. Hugenberg, K., & Bodenhausen, G. V. (2004). Ambiguity in social categorization: The role of prejudice and facial affect in race categorization. Psychological Science, 15(5), 342–345. http://doi.org/10.1111/j.0956-7976.2004.00680.x Imhoff, R., & Koch, A. (2016). How orthogonal are the Big Two of social perception? On the curvilinear relation between agency and communion. Perspectives on Psychological Science, 12(1), 122–137. Jablonski, N. G., & Chaplin, G. (2000). The evolution of human skin coloration. Journal of Human Evolution, 39(1), 57–106. http://doi.org/10.1006/jhev.2000.0403 Jack, R. E., & Schyns, P. G. (2017). Toward a social psychophysics of face communication. Annual Review of Psychology, 68(1), 269–297. http://doi.org/10.1146/annurev-psych010416-044242 Jennrich, R. I. (1970). An asymptotic χ2 test for the equality of two correlation matrices. 61 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Journal of the American Statistical Association, 65(330), 904–912. http://doi.org/10.1080/01621459.1970.10481133 Keating, C. F., Mazur, A., & Segall, M. H. (1981). A cross-cultural exploration of physiognomic traits of dominance and happiness. Ethology and Sociobiology, 2(1), 41– 48. http://doi.org/10.1016/0162-3095(81)90021-2 Koch, A., Imhoff, R., Unkelbach, C., & Alves, H. (2016). The ABC of stereotypes about groups: Agency/socioeconomic success, conservative-progressive beliefs, and communion. Journal of Personality and Social Psychology, 110(5), 675–709. http://doi.org/10.1037/pspa0000046 Kramer, R. S. S., Young, A. W., Day, M. G., & Burton, A. M. (2017). Robust social categorization emerges from learning the identities of very few faces. Psychological Review, 124(2), 115–129. http://doi.org/10.1037/rev0000048 Lenz, G. S., & Lawson, C. (2011). Looking the part: Television leads less informed citizens to vote based on candidates’ appearance. American Journal of Political Science, 55(3), 574– 589. http://doi.org/10.1111/j.1540-5907.2011.00511.x Lepore, L., & Brown, R. (1997). Category and stereotype activation: Is prejudice inevitable? Journal of Personality and Social Psychology, 72(2), 275–287. Lewin, C., & Herlitz, A. (2002). Sex differences in face recognition – Women's faces make the difference. Brain and Cognition, 50(1), 121–128. Little, A. C., Burriss, R. P., Jones, B. C., & Roberts, S. C. (2007). Facial appearance affects voting decisions. Evolution and Human Behavior, 28(1), 18–27. http://doi.org/10.1016/j.evolhumbehav.2006.09.002 Lundqvist, D., Flykt, A., & Arne, Ö. (1998). The Karolinska directed emotional faces. 62 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Stockholm, Sweden: Karolinska Hospital. Ma, D. S., Correll, J., & Wittenbrink, B. (2015). The Chicago face database: A free stimulus set of faces and norming data. Behavior Research Methods, 47(4), 1122–1135. http://doi.org/10.3758/s13428-014-0532-5 Mangini, M., & Biederman, I. (2004). Making the ineffable explicit: estimating the information employed for face classifications. Cognitive Science, 28(2), 209–226. http://doi.org/10.1016/j.cogsci.2003.11.004 Montepare, J. M., & Dobish, H. (2003). The contribution of emotion perception and their overgeneralizations to trait impressions. Journal of Nonverbal Behavior, 27(4), 237– 254. Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J., & Handelsman, J. (2012). Science faculty's subtle gender biases favor male students. Proceedings of the National Academy of Sciences of the United States of America, 109(41), 16474–16479. http://doi.org/10.1073/pnas.1211286109 Moss-Racusin, C. A., Phelan, J. E., & Rudman, L. A. (2010). When men break the gender rules: Status incongruity and backlash against modest men. Psychology of Men & Masculinity, 11(2), 140–151. http://doi.org/10.1037/a0018093 Mouchetant-Rostaing, Y., Giard, M. H., Bentin, S., Aguera, P. E., & Pernier, J. (2000). Neurophysiological correlates of face gender processing in humans. European Journal of Neuroscience, 12(1), 303–310. http://doi.org/10.1046/j.1460-9568.2000.00888.x Mouchetant-Rostaing, Y., & Giard, M. H. (2003). Electrophysiological correlates of age and gender perception on human faces. Journal of Cognitive Neuroscience (Supplement), 15(6), 900–910. http://doi.org/10.1162/089892903322370816 63 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Nussbaum, M. C. (1999). Sex and social justice. Oxford, England: Oxford University Press. Oh, D., Buck, E. A., & Todorov, A. T. (2019). Revealing hidden gender biases in competence impressions from faces. Psychological Science, 30(1), 65–79. http://doi.org/10.1177/0956797618813092 Oh, D., Shafir, E., & Todorov, A. (2019). Economic status cues from clothes affect perceived competence from faces. Retrieved from PsyArXiv: https://doi.org/10.31234/osf.io/saqnv Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8(4), 434–447. http://doi.org/10.1037/1082-989X.8.4.434 Olivola, C. Y., & Todorov, A. T. (2010). Elected in 100 milliseconds: Appearance-based trait inferences and voting. Journal of Nonverbal Behavior, 34(2), 83–110. http://doi.org/10.1007/s10919-009-0082-1 Oosterhof, N. N., & Todorov, A. T. (2008). The functional basis of face evaluation. Proceedings of the National Academy of Sciences of the United States of America, 105(32), 11087–11092. http://doi.org/10.1073/pnas.0805664105 Oosterhof, N. N., & Todorov, A. T. (2009). Shared perceptual basis of emotional expressions and trustworthiness impressions from faces. Emotion, 9(1), 128–6. http://doi.org/10.1037/a0014520 Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press. Parks-Stamm, E. J., Heilman, M. E., & Hearns, K. A. (2007). Motivated to penalize: Women's strategic rejection of successful women. Personality and Social Psychology Bulletin, 64 Running Head: GENDER BIASES IN PERSON IMPRESSIONS 34(2), 237–247. http://doi.org/10.1177/0146167207310027 Perrett, D. I., Lee, K. J., Penton-Voak, I. S., Rowland, D., Yoshikawa, S., Burt, D. M., et al. (1998). Effects of sexual dimorphism on facial attractiveness. Nature, 394(6696), 884– 887. http://doi.org/10.1038/29772 Prentice, D. A., & Carranza, E. (2002). What women and men should be, shouldn“t be, are allowed to be, and don”t have to be: The contents of prescriptive gender stereotypes. Psychology of Women Quarterly, 26(4), 269–281. http://doi.org/10.1111/14716402.t01-1-00066 Prentice, D. A., & Carranza, E. (2004). Sustaining cultural beliefs in the face of their violation: The case of gender stereotypes. In M. Schaller & C. S. Crandall (Eds.), The psychological foundations of culture. Rehnman, J. & Herlitz, A. (2006). Higher face recognition ability in girls: Magnified by ownsex and own-ethnicity bias, Memory, 14(3), 289-296, http://doi.org/10.1080/09658210500233581 Rhodes, G. (2006). The evolutionary psychology of facial beauty. Annual Review of Psychology, 57(1), 199–226. http://doi.org/10.1146/annurev.psych.57.102904.190208 Rhodes, G., Hickford, C., & Jeffery, L. (2000). Sex typicality and attractiveness: Are supermale and superfemale faces super attractive? British Journal of Psychology, 91(1), 125–140. http://doi.org/10.1348/000712600161718 Rosenberg, S., Nelson, C., & Vivekananthan, P. S. (1968). A multidimensional approach to the structrue of persaonlity impressions. Journal of Personality and Social Psychology, 9(4), 283–294. http://doi.org/10.1037/h0026086 Rudman, L. A. (1998). Self-promotion as a risk factor for women: The costs and benefits of 65 Running Head: GENDER BIASES IN PERSON IMPRESSIONS counterstereotypical impression management. Journal of Personality and Social Psychology, 74(3), 629–645. http://doi.org/10.1037/0022-3514.74.3.629 Rudman, L. A., & Glick, P. (2001). Prescriptive gender stereotypes and backlash toward agentic women. Journal of Social Issues, 57(4), 743–762. http://doi.org/10.1111/00224537.00239 Rudman, L. A., & Phelan, J. E. (2008). Backlash effects for disconfirming gender stereotypes in organizations. Research in Organizational Behavior, 28, 61–79. http://doi.org/10.1016/j.riob.2008.04.003 Said, C. P., & Todorov, A. T. (2011). A statistical model of facial attractiveness. Psychological Science, 22(9), 1183–1190. http://doi.org/10.1177/0956797611419169 Said, C. P., Sebe, N., & Todorov, A. T. (2009). Structural resemblance to emotional expressions predicts evaluation of emotionally neutral faces. Emotion, 9(2), 260–264. http://doi.org/10.1037/a0014681 Schyns, P. G., Bonnar, L., & Gosselin, F. (2016). Show me the features! Understanding recognition from the use of visual information. Psychological Science, 13(5), 402–409. http://doi.org/10.1111/1467-9280.00472 Secord, P. F., & Berscheid, E. S. (1963). Stereotyping and the generality of implicit personality theory. Journal of Personality, 31(1), 65–78. http://doi.org/10.1111/j.14676494.1963.tb01841.x Sofer, C., Dotsch, R., Wigboldus, D. H. J., & Todorov, A. T. (2015). What is typical is good: the influence of face typicality on perceived trustworthiness. Psychological Science, 26(1), 39–47. http://doi.org/10.1177/0956797614554955 South Palomares, J. K., Sutherland, C. A. M., & Young, A. W. (2017). Facial first impressions 66 Running Head: GENDER BIASES IN PERSON IMPRESSIONS and partner preference models: Comparable or distinct underlying structures? British Journal of Psychology, 22(2), 173–26. http://doi.org/10.1111/bjop.12286 Spence, J. T., Helmreich, R. L., & Holahan, C. K. (1979). Negative and positive components of psychological masculinity and femininity and their relationships to self-reports of neurotic and acting out behaviors. Journal of Personality and Social Psychology, 37(10), 1673–1682. http://doi.org/10.1037/0022-3514.37.10.1673 Spence, J. T., Helmreich, R., & Stapp, J. (1975). Ratings of self and peers on sex role attributes and their relation to self-esteem and conceptions of masculinity and femininity. Journal of Personality and Social Psychology, 32(1), 29–39. Stolier, R. M., & Freeman, J. B. (2016). Neural pattern similarity reveals the inherent intersection of social categories. Nature Neuroscience, 19(6), 795–797. http://doi.org/10.1038/nn.4296 Stolier, R. M., & Freeman, J. B. (2017). A neural mechanism of social categorization. Journal of Neuroscience, 37(23), 5711–5721. http://doi.org/10.1523/JNEUROSCI.333416.2017 Stolier, R. M., Hehman, E., & Freeman, J. B. (2017). A dynamic structure of social trait space. Trends in Cognitive Sciences, 22(3), 197–200. Sutherland, C. A. M., Liu, X., Zhang, L., Chu, Y., Oldmeadow, J. A., & Young, A. W. (2017a). Facial first impressions across culture: Data-driven modeling of Chinese and British perceivers’ unconstrained facial impressions. Personality and Social Psychology Bulletin, 014616721774419. http://doi.org/10.1177/0146167217744194 Sutherland, C. A. M., Oldmeadow, J. A., & Young, A. W. (2016). Integrating social and facial models of person perception: Converging and diverging dimensions. Cognition. 67 Running Head: GENDER BIASES IN PERSON IMPRESSIONS Sutherland, C. A. M., Oldmeadow, J. A., Santos, I. M., Towler, J., Michael Burt, D., & Young, A. W. (2013). Social inferences from faces: Ambient images generate a three-dimensional model. Cognition, 127(1), 105–118. http://doi.org/10.1016/j.cognition.2012.12.001 Sutherland, C. A. M., Rhodes, G., & Young, A. W. (2017b). Facial image manipulation: A tool for investigating social perception. Social Psychological and Personality Science, 8(5), 538–551. http://doi.org/10.1177/1948550617697176 Sutherland, C. A. M., Young, A. W., Mootz, C. A., & Oldmeadow, J. A. (2015). Face gender and stereotypicality influence facial trait evaluation: Counter-stereotypical female faces are negatively evaluated. British Journal of Psychology, 106(2), 186–208. http://doi.org/10.1111/bjop.12085 Swim, J. K., Aikin, K. J., Hall, W. S., & Hunter, B. A. (1995). Sexism and racism: Old-fashioned and modern prejudices. Journal of Personality and Social Psychology, 68(2), 199–214. Tiddeman, B., Burt, M., & Perrett, D. I. (2001). Prototyping and transforming facial textures for perception research. IEEE Computer Graphics and Applications, 21(4), 42–50. http://doi.org/10.1109/38.946630 Todorov, A. T. (2017). Face value: The irresistible influence of first impressions. Princeton, NJ: Princeton University Press. Todorov, A. T., & Oosterhof, N. N. (2011). Modeling social perception of faces. IEEE Signal Processing Magazine, 28(2), 117–122. http://doi.org/10.1109/MSP.2010.940006 Todorov, A. T., Dotsch, R., Porter, J. M., & Oosterhof, N. N. (2013). Validation of data-driven computational models of social perception of faces. Emotion, 13(4), 724–738. http://doi.org/10.1037/a0032335 Todorov, A. T., Dotsch, R., Wigboldus, D. H. J., & Said, C. P. (2011). Data-driven methods for 68 Running Head: GENDER BIASES IN PERSON IMPRESSIONS modeling social perception. Social and Personality Psychology Compass, 5(10), 775–791. http://doi.org/10.1111/j.1751-9004.2011.00389.x Todorov, A. T., Mandisodza, A. N., Goren, A., & Hall, C. C. (2005). Inferences of competence from faces predict election outcomes. Science, 308(5728), 1623–1626. http://doi.org/10.1126/science.1110589 Todorov, A. T., Olivola, C. Y., Dotsch, R., & Mende-Siedlecki, P. (2015). Social attributions from faces: Determinants, consequences, accuracy, and functional significance. Annual Review of Psychology, 66(1), 519–545. Todorov, A. T., Said, C. P., Engell, A. D., & Oosterhof, N. N. (2008). Understanding evaluation of faces on social dimensions. Trends in Cognitive Sciences, 12(12), 455–460. http://doi.org/10.1016/j.tics.2008.10.001 Walker, M., & Vetter, T. (2009). Portraits made to measure: Manipulating social judgments about individuals with a statistical face model. Journal of Vision, 9(11), 1–13. http://doi.org/10.1167/9.11.12 Walker, M., & Vetter, T. (2016). Changing the personality of a face: Perceived Big Two and Big Five personality factors modeled in real photographs. Journal of Personality and Social Psychology, 110(4), 609–624. Welling, L. L. M., Bestelmeyer, P. E. G., Jones, B. C., DeBruine, L. M., & Allan, K. (2017). Effects of sexually dimorphic shape cues on neurophysiological correlates of women’s face processing. Adaptive Human Behavior and Physiology, 9(5), 1–14. http://doi.org/10.1007/s40750-017-0072-1 Wiggins, J. S. (1979). A psychological taxonomy of trait-descriptive terms: The interpersonal domain. Journal of Personality and Social Psychology, 37(3), 395–412. 69 Running Head: GENDER BIASES IN PERSON IMPRESSIONS http://doi.org/10.1037/0022-3514.37.3.395 Williams, J. E., & Best, D. L. (1990). Sex and psyche: Gender and self viewed cross-culturally. Sage Publications, Inc. Wilson, J. P., & Rule, N. O. (2015). Facial trustworthiness predicts extreme criminalsentencing outcomes. Psychological Science, 26(8), 1–7. http://doi.org/10.1177/0956797615590992 Zebrowitz, L. A. (2005). The Origin of First Impressions. Journal of Cultural and Evolutionary Psychology, 2(1), 93–108. http://doi.org/10.1556/JCEP.2.2004.1-2.6 Zebrowitz, L. A., & McDonald, S. M. (1991). The impact of litigants' baby-facedness and attractiveness on adjudications in small claims courts. Law and Human Behavior, 15(6), 603–623. http://doi.org/10.1007/BF01065855 Zebrowitz, L. A., & Montepare, J. M. (2008). Social psychological face perception: Why appearance matters. Social and Personality Psychology Compass, 2(3), 1497–1517. http://doi.org/10.1111/j.1751-9004.2008.00109.x 70 Revision of XGE-2018-0980R1 as invited by the action editor, Agneta Herlitz. Supplemental Material Gender Biases in Impressions from Faces: Empirical Studies and Computational Models DongWon Oh1* Ron Dotsch2 Jenny Porter1 Alexander Todorov1 1Department of Psychology, Princeton University Princeton, NJ 08544 2Department of Psychology, Utrecht University Utrecht, The Netherlands *Correspondence: dong.w.oh@gmail.com Supplemental Text Figures S1–S8 Tables S1–S5 1 Underlying Structure of Male and Female Face Impressions in Study 1 Principal Components Analysis Procedure When summarizing and visualizing the results of the orthogonal PCAs in Studies 1a and 1b, we followed the Kaiser rule: For both male and female faces, we reported the first two components, i.e., PC1 and PC2, because they had eigenvalues bigger than 1 (Table S1). This indicates that the third and following components were unable to explain a variance larger than a single input variable (i.e., a trait rating) alone, so we deemed it as unnecessary to include them. For consistency, in Study 1c we restricted the number of components to two although the PCA solution found four components with eigenvalue > 1 for both face genders (see Table S1 for details). However, it should be noted that the 14 trait impressions used in Studies 1a and 1b were selected in a data-driven fashion after recoding of free-response descriptions of person impressions from faces (Oosterhof & Todorov, 2008), whereas there is no report that the 15 traits used in Study 1c were chosen in a data-driven way (Ma, Correll, & Wittenbrink, 2015). Comparison between 2008 and 2018 Face Evaluation Structures In addition to the main analyses, we examined the similarity between the dataset of Oosterhof & Todorov (2008; Study 1a) and the newly collected data (Study 1b). When face genders were collapsed across, the two datasets led to highly similar PCA solutions, suggesting the same structure of impressions between the two datasets collected about ten years apart (see the main text for details). When face genders were considered and separate PCAs were conducted for male and female ratings, the two datasets led to highly similar PCA solutions again, suggesting the gender specific structures of impressions from the dataset of Oosterhof and Todorov (2008) and from the present dataset (Study 1b) are highly similar. Specifically, we ran correlational analyses of the PCA loadings of the trait ratings on PC1 and PC2 across the two datasets. A high correlation between the PCA loadings of the two datasets indicates high similarity between the impression structures. Between the 2008 data and the current data, the component loadings of the traits were highly similar for both male (R = .97) and female face impressions (R = .98), suggesting high stability of the impressions of male and female faces over time. 2 Analysis of the Effects of Raters’ Gender Stereotype Endorsement When testing for the effect of the raters’ GSE on facial impressions (see Table S2 for the traits used in the GSE questionnaire), we conducted an additional analysis on the structure of impressions using four factor scores of GSE (each of which represents specific subtypes of gender stereotypes) in addition to the analyses reported in the main text (Study 1b). In the analyses reported in the main text, we used the GSE score, the sum of each rater’s responses to all items in the questionnaire (see the main text for details). The additional analysis yielded consistent results with the main analyses. To examine if the relationship between the impression differentiation and the rater GSE is affected by gender- or valence-specificity of stereotypes, we computed four factor scores for each rater – stereotype gender [male/female] × stereotype valence [positive/negative]. Each factor score represented the extent to which each rater supported gender- and valence-specific stereotypes. The four-factor confirmatory factor model revealed an acceptable albeit minimum level of fitness (χ2(164) = 826.21, P < .001; CFI = 0.827; RMSEA = 0.093, 90% CI = [0.087,0.099]; SRMR = 0.067). The resulting factor loadings of this four-factor solution showed the expected relationships between the factors and the questionnaire responses (see Table S2 for details). We then simply replicated the analyses reported in the main text using each GSE factor score rather than the GSE sum score. Although we explored the relationship between impression differentiation and the rater GSE in relation to the specific subtypes of stereotype (gender × valence), people’s attitudes towards the four subtypes (i.e., male and positive, male and negative, female and positive, and female and negative) go hand in hand (Glick & Fiske, 1996; Glick et al., 2000, 2004). That is, if one holds one subtype of gender stereotypes, say, stereotypes about stereotypically male and positive traits (e.g., “men are more analytical than women”), then the person is likely to hold the other three gender stereotype subtypes as well (e.g., “men are more hostile than women”, “women are more gullible than men”, “women are more nurturing than men”). Thus, we did not expect any uniquely distinct effect of the stereotype subtypes on the effect of GSE on impression differentiation. As we expected, the results for all four subtypes (factor scores) were consistent with the results reported in the main text. First, when a rater GSE factor score increased, regardless of the 3 subtype of GSE, the correlation between impressions became stronger for both male and female faces, and the quadratic regression model explained more variance than the linear regression model (Fig. S2). Second, female face impressions had stronger inter-correlations and larger amount of variance explained by valence than male face ratings did (see below and Figure S2 for details). Role of Raters’ Gender Stereotypes about Male × Positive Traits. The linear regression model was significant for the ratings of both face genders (male faces: R2 = .93, F(1,136) = 1713.94, P < .001; female faces: R2 = .79, F(1,136) = 515.78, P < .001), but the quadratic model (male faces: R2 = .94, F(2,135) = 1070.56, P < .001; female faces: R2 = .86, F(2,135) = 428.11, P < .001) explained significantly more variance (male faces: F(1,136) = 32.33, P < .001; female faces: F(1,136) = 71.83, P < .001). Correspondingly, the amount of variance in the ratings explained by the valence component (PC1) followed the same quadratic pattern of change across the male × positive GSE factor score: The linear regression model was significant (male faces: R2 = .95, F(1,136) = 2468.34, P < .001; female faces: R2 = .84, F(1,136) = 717.14, P < .001), but the quadratic model (male faces: R2 = .95, F(2,135) = 1343.58, P < .001; female faces: R2 = .88, F(2,135) = 496.27, P < .001) explained a larger amount of variance than the linear models did (male faces: F(1,136) = 12.38, P < .001; female faces: F(1,136) = 44.74, P < .001). Across the factor score, female face ratings had higher correlational coefficients (ts > 6.32, Ps < .001) and larger amount of variance explained by the valence component than male face ratings (ts > 12.87, Ps < .001). Role of Raters’ Gender Stereotypes about Male × Negative Traits. The linear regression model was significant for the ratings of both face genders (male faces: R2 = .84, F(1,136) = 712.15, P < .001; female faces: R2 = .89, F(1,136) = 1076.63, P < .001), but the quadratic model (male faces: R2 = .91, F(2,135) = 697.08, P < .001; female faces: R2 = .96, F(2,135) = 1442.86, P < .001) explained significantly more variance (male faces: F(1,136) = 110.20, P < .001; female faces: F(1,136) = 203.78, P < .001). Correspondingly, the amount of variance in the ratings explained by valence followed the same quadratic pattern of change across the male × negative GSE factor score: The linear regression model was significant (male faces: R2 = .83, F(1,136) = 648.12, P < .001; female faces: R2 = .93, F(1,136) = 1725.88, P < .001), but the quadratic model (male faces: R2 = .90, F(2,135) = 579.89, P < .001; female faces: R2 = .97, F(2,135) = 2129.94, P < .001) explained a larger amount of 4 variance than the linear models did (male faces: F(1,136) = 89.57, P < .001; female faces: F(1,136) = 186.02, P < .001). Across the factor score, female face ratings had higher correlational coefficients (ts > 7.43, Ps < .001) and larger amount of variance explained by valence than male face ratings (ts > 14.35, Ps < .001). Role of Raters’ Gender Stereotypes about Female × Positive Traits. The linear regression model was significant for the ratings of both face genders (male faces: R2 = .91, F(1,136) = 1345.46, P < .001; female faces: R2 = .75, F(1,136) = 399.19, P < .001), but the quadratic model (male faces: R2 = .95, F(2,135) = 1387.28, P < .001; female faces: R2 = .87, F(2,135) = 449.27, P < .001) explained significantly more variance (male faces: F(1,136) = 132.10, P < .001; female faces: F(1,136) = 127.64, P < .001). Correspondingly, the amount of variance in the ratings explained by valence followed the same quadratic pattern of change across the female × positive GSE factor score: The linear regression model was significant for both genders (male faces: R2 = .88, F(1,136) = 953.66, P < .001; female faces: R2 = .78, F(1,136) = 482.23, P < .001), but the quadratic model (male faces: R2 = .95, F(2,135) = 1270.04, P < .001; female faces: R2 = .89, F(2,135) = 535.72, P < .001) explained a larger amount of variance than the linear models did (male faces: F(1,136) = 198.88, P < .001; female faces: F(1,136) = 130.40, P < .001). Across the factor score, female face ratings had higher correlational coefficients (ts > 10.60, Ps < .001) and larger amount of variance explained by valence than male face ratings (ts > 17.16, Ps < .001). Role of Raters’ Gender Stereotypes about Female × Negative Traits. The linear regression model was significant for the ratings of both face genders (male faces: R2 = .93, F(1,136) = 1715.09, P < .001; female faces: R2 = .56, F(1,136) = 176.45, P < .001), but the quadratic model (male faces: R2 = .93, F(2,135) = 954.42, P < .001; female faces: R2 = .64, F(2,135) = 121.59, P < .001) explained significantly more variance (male faces: F(1,136) = 15.16, P < .001; female faces: F(1,136) = 29.61, P < .001). Correspondingly, the amount of variance in the ratings explained by valence followed the same quadratic pattern of change across the female × negative GSE factor score: The linear regression model was significant (male faces: R2 = .94, F(1,136) = 2115.56, P < .001; female faces: R2 = .63, F(1,136) = 233.82, P < .001), but the quadratic model (male faces: R2 = .95, F(2,135) = 1258.17, P < .001; female faces: R2 = .70, F(2,135) = 157.86, P < .001) explained a larger amount of variance than the linear models did (male faces: F(1,136) = 25.15, P < .001; female faces: F(1,136) = 5 30.75, P < .001). Across the factor score, female face ratings had higher correlational coefficients (ts > 2.14, Ps < .033) and larger amount of variance explained by valence than male face ratings (ts > 7.69, Ps < .001). It should be noted that in all four cases, although the quadratic models explained significantly more variance than the linear models, the magnitude of the quadratic effects was much smaller than the magnitude of the linear effects, and the increase in the intercorrelations of trait ratings as a function of GSE was largely monotonic (Fig. S2). Analysis of the Effects of Rater Gender When testing for the effect of raters’ gender on facial impressions, we conducted an additional analysis using a 2 [face gender] × 2 [rater gender] repeated measures ANOVA on the absolute values of the interimpression correlational coefficients in addition to the analyses reported in the main text (Study 1b). In the analyses reported in the main text, we used Jennrich (1970) tests of matrix equality (see the main text for details) instead of an ANOVA because the dataset violates the assumption of sample independence. However, given that ANOVA is known to be rather robust to violations of independence, we report the additional result below in the Supplemental Material. The additional analysis yielded consistent results with the main analyses (we calculated a generalized eta-squared (ηG2) as the measure of the effect size of each effect (Olejnik & Algina, 2003) to account for the repeated measures design). The 2 [face gender] × 2 [rater gender] repeated measures ANOVA on the absolute values of the intercorrelational coefficients between trait ratings yielded a significant effect of the rater gender (F(1,90) = 24.36, P < .001, η2G = .03) with the female raters showing a higher level of correlations between trait ratings (M|r| = 0.59, SD|r| = 0.23) than the male raters (M|r| = 0.50, SD|r| = 0.23), indicating that female raters had less differentiated face impressions. This main effect was qualified by a significant interaction between the rater gender and face gender (F(1,90) = 4.87, P < .05, η2G = .01). Female raters showed a stronger cross-trait intercorrelations for female (M|r| = 0.61, SD|r| = 0.21) than for male faces (M|r| = 0.56, SD|r| = 0.25; t(90) = 2.50, P = .01, Bonferroni correction), whereas male raters showed the same level of cross-trait correlations for female (M|r| = 0.51, SD|r| = 0.21) and for male faces (M|r| = 0.50, SD|r| = 0.25; t(90) = 0.41). This finding suggests that the less differentiated impressions of female faces are primarily due to female raters. 6 Building of Male and Female Face Impression Models in Study 2 A data-driven face modeling approach allows one to build models of impressions without a priori assumptions about the effect of specific facial features (e.g., the size of the nose) for impressions (Dotsch & Todorov, 2012; Funk, Walker, & Todorov, 2016; Gosselin & Schyns, 2001; Jack & Schyns, 2017; Mangini & Biederman, 2004; Oosterhof & Todorov, 2008; Todorov, Dotsch, Wigboldus, & Said, 2011; Walker & Vetter, 2009; 2016). In the standard, hypothesis-driven approach, different facial features are manipulated. However, the combinations of features rapidly proliferate as the number of features increases (Jack & Schyns, 2017; Todorov et al., 2011), damaging the feasibility and/or the statistical power of the investigation. For example, a simple factorial design for the investigation of the effect of even only ten binary facial features (e.g., a long vs. short nose) would lead to 210 experimental conditions. The data-driven approach prevents this by presenting a relatively small number of faces (e.g., 300 faces), which randomly vary in their features. In Studies 2–3, we used the statistical face space model of FaceGen 3.2 (Singular Inversions) that captures the variance from a large sample of real human faces with 100 orthogonal dimensions (Todorov et al., 2011; Todorov & Oosterhof, 2011). Each dimension represents the variance in a holistic combination of features. A single face is represented as a vector in the statistical space (i.e., an array of 100 numbers). Using this approach, one can generate an unlimited number of faces by randomly sampling parameters and generating the corresponding faces as images. Participants then judge the randomly sampled faces on a trait of interest, e.g., trustworthiness. One can model the trait judgment by extracting the change in face parameters that are correlated with the change in the trait judgment. With the resulting model, we can visualize what aspects of facial appearance change when an impression of the trait changes. The trait model can be applied to any new face to make it appear more or less trait-like (e.g., trustworthy) by moving its corresponding parameters across the modeled judgment. With these manipulated faces, one can study what types of facial cues (e.g., emotional facial gestures, perceived physical strength) predict the perceived level of the trait. In addition, a data-driven statistical face model allows one to vary a particular perceived trait of faces while specifically controlling for another trait (Oh, Buck, & Todorov, 2019; Todorov, Dotsch, Porter, & Oosterhof, 2013). For example, Oh and colleagues (2019) manipulated the 7 perceived competence of faces while controlling for facial attractiveness, thereby effectively suppressing the halo effect underlying competence impressions. This procedure found that facial masculinity is one of the ingredients of competence impressions, revealing gender biases in competence impressions. 8 Validation of Male and Female Face Impression Models in Study 3 To create the face stimuli for the validation studies, we manipulated their level of perceived trustworthiness and dominance. We added -3, -2, -1, 0, 1, 2, and 3SDs to the trustworthiness or dominance value of the 25 randomly generated male and 25 randomly generated female faces, using either the male or the female model. In other words, we moved the coordinates of the faces in the face space along one of the four gender-specific trait models. This was a different approach from that of previous validation studies, which did not control for the gender of the faces: Todorov and colleagues (2013) manipulated the trait dimension of randomly generated faces to take specific values (i.e., -3, -2, -1, 0, 1, 2, and 3SDs on each trait model). These procedures are inappropriate when validating gender-specific trait models that are inherently correlated with gender in raters’ perception. For instance, male faces are perceived as more dominant than female faces, and female faces are perceived as more trustworthy than male faces (e.g., Sutherland et al., 2013; Studies 1a and 1b in the main text). Because of these correlations, these procedures would decrease gender-related differences between male and female face sets, as they would project all the faces, regardless of their gender, onto the trait dimensions with the same values. As a result, we would essentially be generating less male-like male faces and less female-like female faces for the validation stimulus set. Note that the parameters of the average male face and the parameters of the average female face used here were based on samples of actual male and female faces. That is, 3D laser scans of these male and female faces were used to construct the FaceGen statistical face space and extract the 100 face parameters. In the current project, by adding dimension values of [-3 to 3SDs] to the original faces (rather than assigning the faces to the dimension values as in the previous approach), we maintained the gender-related facial information in the male and female faces. When cross-validating gender-specific impression models using ANOVAs, we calculated a generalized eta-squared (ηG2) as the measure of the effect size of each effect (Olejnik & Algina, 2003) to account for the repeated measures design of the experiment. 9 Figure S1. The distribution of raters’ responses to gender stereotype endorsement (GSE) questions in Study 1b. In every question, raters showed a bias away from the middle score (blue dotted line) in the direction consistent with gender stereotypes (ts > 7.02, Ps < .001). The bigger purple (traits associate with women) and green dot (traits associate with men) denote the mean response, and the smaller black dots raw responses. All missing values were replaced using 10-nearest neighbor imputation. 10 Figure S2. The level of intercorrelations across impressions (A) and the amount of variance in the impressions explained by valence (B) as a function of the raters’ GSE score in Study 1b. Each data point (A: the absolute value of correlational coefficients between all impression rating pairs, B: the amount of variance explained by PC1 (valence) in the PCA of ratings per gender) was calculated from two rater subgroups (nhigh-GSE = 235, nlow-GSE = 234). We divided the participants into high- and low-GSE raters for each trait, using the median split per trait (range of median = [112.5,126.0] across traits). For the raters whose GSEs were exactly at the median per trait (n = 13), we categorized 7 of them with higher GSEs in the high-GSE group and 6 with lower GSEs in the low-GSE group, regardless of which trait they evaluated faces on. The intercorrelations and the variance explained were higher in female than in male impressions among both high- (M|r| = 0.57, SD|r| = 0.23 vs. M|r| = 0.55, SD|r| = 0.26, χ2(91) = 799.53, P < .001; 62.62 % vs. 58.30 %) and low-GSE raters (M|r| = 0.52, SD|r| = 0.20 vs. M|r| = 0.50, SD|r| = 0.23, χ2(91) = 545.05, P < .001; 57.59 % vs. 43.70 %). These results replicate the results in the main text: Figure 3 shows consistent results, with individual differences in the GSE level better preserved. The error bars denote ±SE. GSE = gender stereotype endorsement. PCA = principal component analysis. 11 Figure S3. The level of intercorrelations across impressions (the left column in each subpanel) and the amount of variance in the impressions explained by valence (the right column in each subpanel) as a function of the raters’ GSE factor scores in Study 1b. Each data point was calculated from a rater subgroup (nrater = 10 per trait, nrater = 140 in total per subgroup). Each subgroup was sampled from a sliding window on the rater GSE factors (nrater ≥ 10 per trait), in which the X value is the middle point of the sliding window. GSE factors represent the degree to which raters endorsed gender stereotypes about either stereotypically male and positive (e.g., analytical; top left), male and negative (e.g., hostile; top right), female and positive (e.g., nurturing; bottom left), or female and negative traits (e.g., nagging; bottom right). The factor scores were derived from a four-solution confirmation factor analysis. The shaded regions show 95% CIs estimated from 1,000 bootstrapped replications per face gender for each factor score. The intercorrelations of face 12 impressions (ts > 2.14, Ps < .033) and the amount of variance explained by PC1 (ts > 7.69, Ps < .001) were significantly higher in female than in male impressions across every GSE factor score. See Supplemental Text for details. GSE = gender stereotype endorsement. CI = confidential interval. PC = principal component. 13 Figure S4. A sample of randomly generated synthetic female faces (A) and male faces (B) in Study 2. For each gender, 300 faces were generated as variations of the gender-specific average face. The face shape and face reflectance were varied randomly. 14 Figure S5. A scatterplot of the dominance and trustworthiness ratings of faces as a function of gender in Study 2. The density functions along the X and Y axes represent the distributions of male (green) and female faces (purple) for the trustworthiness and dominance ratings, respectively. 15 Figure S6. Similarities between gender-specific trustworthiness and dominance models. Similarities between models are represented with angles, e.g., 0 rad when ρ = 1, π/2 rad when ρ = 0, π when ρ = -1. Each model was built on the ratings of either only male faces (green line) or only female faces (purple line). 16 Figure S7. Twenty-five female (A) and twenty-five male (B) synthetic face identities used in the validation of the data-driven, computational models in Study 3a. These faces were randomly generated by a statistical face model with the constraint to be maximally distinctive from each other. 17 A B Figure S8. Twenty-five female (A) and twenty-five male (B) real-life face identities used in the validation of the data-driven, computational models in Study 3b. These faces were randomly selected from a standardized face stimulus set (DeBruine & Jones, 2017). 18 Table S1. Eigenvalues of PCs and the Amount of Variance Explained by the PCs in Study 1. Study 1a Component 1 Female Faces % λ Variance Explained 10.04 71.69% Study 1b Male Faces 8.18 % Variance Explained 58.40% λ Study 1c Female Faces % λ Variance Explained 9.47 67.66% Male Faces 8.67 % Variance Explained 61.94% λ Female Faces % λ Variance Explained 6.13 40.87% Male Faces 4.74 % Variance Explained 31.60% λ 2 1.95 13.93% 3.38 24.16% 2.34 16.74% 3.45 24.65% 1.96 13.08% 3.13 20.84% 3 0.83 5.91% 0.86 6.17% 0.58 4.15% 0.55 3.91% 1.66 11.03% 1.94 12.91% 4 0.39 2.78% 0.60 4.32% 0.39 2.77% 0.37 2.62% 1.45 9.67% 1.30 8.65% 5 0.17 1.22% 0.32 2.26% 0.28 2.03% 0.24 1.70% 0.99 6.61% 0.85 5.67% 6 0.15 1.05% 0.16 1.17% 0.20 1.43% 0.17 1.20% 0.76 5.05% 0.78 5.20% 7 0.12 0.85% 0.14 1.03% 0.16 1.11% 0.14 0.98% 0.57 3.83% 0.60 3.97% 8 0.10 0.71% 0.10 0.68% 0.15 1.05% 0.11 0.82% 0.46 3.06% 0.38 2.50% 9 0.08 0.55% 0.07 0.53% 0.13 0.92% 0.08 0.56% 0.29 1.91% 0.31 2.09% 10 0.05 0.36% 0.06 0.41% 0.10 0.69% 0.07 0.49% 0.21 1.39% 0.27 1.83% 11 0.04 0.32% 0.04 0.29% 0.08 0.59% 0.05 0.37% 0.15 0.97% 0.22 1.45% 12 0.03 0.23% 0.03 0.22% 0.05 0.36% 0.04 0.31% 0.14 0.91% 0.19 1.23% 13 0.03 0.21% 0.02 0.18% 0.05 0.34% 0.04 0.27% 0.11 0.75% 0.13 0.84% 14 0.03 0.18% 0.02 0.17% 0.02 0.17% 0.03 0.19% 0.07 0.45% 0.11 0.76% 15 (n/a) (n/a) (n/a) (n/a) (n/a) (n/a) (n/a) (n/a) 0.06 0.40% 0.07 0.46% 19 Note. Boldface indicates eigenvalue > 1.00. PC = principal component. 20 Table S2. Factor Loadings of GSE Items from the CFA in Study 1b. Trait Gender Trait Valence Trait Item dominant Factor 1 .79 Factor 2 .00 Factor 3 .00 Factor 4 .00 competitive .68 .00 .00 .00 quantitative .53 .00 .00 .00 analytical .50 .00 .00 .00 aggressive .00 .75 .00 .00 hostile .00 .72 .00 .00 egotistical .00 .71 .00 .00 boastful .00 .70 .00 .00 arrogant .00 .67 .00 .00 cynical .00 .31 .00 .00 sensitive .00 .00 .78 .00 nurturing .00 .00 .75 .00 artistic .00 .00 .30 .00 intuitive .00 .00 .21 .00 emotional .00 .00 .00 .80 nagging .00 .00 .00 .71 subordinate .00 .00 .00 .58 whiny .00 .00 .00 .58 servile .00 .00 .00 .56 gullible .00 .00 .00 .48 Positive Stereotypical Male Traits Negative Positive Stereotypical Female Traits Negative Note. Boldface indicates factor loading > .3. Each question read “How do the average man and the average woman compare with each other on how [TRAIT TERM] they are?” GSE = Gender Stereotype Endorsement. CFA = Confirmatory Factor Analysis. 21 Table S3. Similarity between Face Impression Models in Studies 2–3. Male models Trustworthiness Trait Model Female models Trustworthiness Dominance Dominance Male Trustworthiness - - - - Male Dominance -.16 - - - Female Trustworthiness .68 -.44 - - Female Dominance -.14 .85 -.38 - Note. Numbers indicate pairwise Pearson coefficients between computational impression models (100 parameters each). Boldface indicates P < .001. Figure S5 visualizes the same information except for the bottom two rows in the table. The original trustworthiness and dominance models from Oosterhof & Todorov (2008) were built without taking into account face gender. 22 Table S4. Interrater Reliabilities of Ratings of Synthetic Faces Manipulated by Trustworthiness and Dominance Models in Study 3a. Trait model and original face gender Face × Model  Gender-Congruent Faces Face × Model  Gender-Incongruent Faces Cronbach’s α based on face ratings Male faces manipulated with male trustworthiness model .97 Male faces manipulated with male dominance model .96 Female faces manipulated with female trustworthiness model .96 Female faces manipulated with female dominance model .97 Male faces manipulated with female trustworthiness model .96 Male faces manipulated with female dominance model .98 Female faces manipulated with male trustworthiness model .97 Female faces manipulated with male dominance model .96 23 Table S5. Interrater Reliabilities of Ratings of Real-Life Faces Manipulated by Trustworthiness and Dominance Models in Study 3b. Trait model and original face gender Face × Model Gender-Congruent Faces Face × Model  Gender-Incongruent Faces Cronbach’s α based on face ratings Male faces manipulated with male trustworthiness model .88 Male faces manipulated with male dominance model .93 Female faces manipulated with female trustworthiness model .81 Female faces manipulated with female dominance model .92 Male faces manipulated with female trustworthiness model .83 Male faces manipulated with female dominance model .91 Female faces manipulated with male trustworthiness model .85 Female faces manipulated with male dominance model .92 24 References DeBruine, L. M., & Jones, B. C. (2017). Face Research Lab London Set. Figshare. http://doi.org/10.6084/m9.figshare.5047666 Dotsch, R., & Todorov, A. T. (2012). Reverse correlating social face perception. Social Psychological and Personality Science, 3(5), 562–571. http://doi.org/10.1177/1948550611430272 Funk, F., Walker, M., & Todorov, A. T. (2016). Modelling perceptions of criminality and remorse from faces using a data-driven computational approach. Cognition and Emotion, 40(5), 1– 13. Glick, P., & Fiske, S. T. (1996). The ambivalent sexism inventory: Differentiating hostile and benevolent sexism. Journal of Personality and Social Psychology, 70(3), 491–512. Glick, P., Fiske, S. T., Mladinic, A., Saiz, J. L., Abrams, D., Masser, B., et al. (2000). Beyond prejudice as simple antipathy: Hostile and benevolent sexism across cultures. Journal of Personality and Social Psychology, 79(5), 763–775. http://doi.org/10.1037//00223514.79.5.763 Glick, P., Lameiras, M., Fiske, S. T., Eckes, T., Masser, B., Volpato, C., et al. (2004). Bad but bold: Ambivalent attitudes toward men predict gender inequality in 16 nations. Journal of Personality and Social Psychology, 86(5), 713–728. Gosselin, F., & Schyns, P. G. (2001). Bubbles: a technique to reveal the use of information in recognition tasks. Vision Research, 41(17), 2261–2271. Jack, R. E., & Schyns, P. G. (2017). Toward a social psychophysics of face communication. Annual Review of Psychology, 68(1), 269–297. http://doi.org/10.1146/annurev-psych-010416044242 Jennrich, R. I. (1970). An asymptotic χ2 test for the equality of two correlation matrices. Journal 25 of the American Statistical Association, 65(330), 904–912. http://doi.org/10.1080/01621459.1970.10481133 Mangini, M., & Biederman, I. (2004). Making the ineffable explicit: estimating the information employed for face classifications. Cognitive Science, 28(2), 209–226. http://doi.org/10.1016/j.cogsci.2003.11.004 Oh, D., Buck, E. A., & Todorov, A. T. (2019). Revealing hidden gender biases in competence impressions from faces. Psychological Science, 30(1), 65–79. http://doi.org/10.1177/0956797618813092 Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8(4), 434–447. http://doi.org/10.1037/1082-989X.8.4.434 Oosterhof, N. N., & Todorov, A. T. (2008). The functional basis of face evaluation. Proceedings of the National Academy of Sciences, 105(32), 11087–11092. http://doi.org/10.1073/pnas.0805664105 Sutherland, C. A. M., Oldmeadow, J. A., Santos, I. M., Towler, J., Michael Burt, D., & Young, A. W. (2013). Social inferences from faces: Ambient images generate a three-dimensional model. Cognition, 127(1), 105–118. http://doi.org/10.1016/j.cognition.2012.12.001 Todorov, A. T., & Oosterhof, N. N. (2011). Modeling social perception of faces. IEEE Signal Processing Magazine, 28(2), 117–122. http://doi.org/10.1109/MSP.2010.940006 Todorov, A. T., Dotsch, R., Porter, J. M., & Oosterhof, N. N. (2013). Validation of data-driven computational models of social perception of faces. Emotion, 13(4), 724–738. http://doi.org/10.1037/a0032335 Todorov, A. T., Dotsch, R., Wigboldus, D. H. J., & Said, C. P. (2011). Data-driven methods for modeling social perception. Social and Personality Psychology Compass, 5(10), 775–791. 26 http://doi.org/10.1111/j.1751-9004.2011.00389.x Walker, M., & Vetter, T. (2009). Portraits made to measure: Manipulating social judgments about individuals with a statistical face model. Journal of Vision, 9(11), 1–13. http://doi.org/10.1167/9.11.12 Walker, M., & Vetter, T. (2016). Changing the personality of a face: Perceived Big Two and Big Five personality factors modeled in real photographs. Journal of Personality and Social Psychology, 110(4), 609–624. 27