(PDF) Measuring creative contribution: The Creative Momentum Assessment Model

The most widely used creativity assessments are divergent thinking tests, but these and other popular creativity measures have been shown to have little validity. The Consensual Assessment Technique is a powerful tool used by creativity researchers in which panels of expert judges are asked to rate the creativity of creative products such as stories, collages, poems, and other artifacts. The Consensual Assessment Technique is based on the idea that the best measure of the creativity of a work of art, a theory, a research proposal, or any other artifact is the combined assessment of experts in that field. Unlike other measures of creativity, the Consensual Assessment Technique is not based on any particular theory of creativity, which means that its validity (which has been well established empirically) is not dependent upon the validity of any particular theory of creativity. The Consensual Assessment Technique has been deemed the “gold standard” in creativity research and can be ve...

The scientific study of creativity has proven a difficult undertaking. Researchers have employed a diversity of definitions and measurement methods. As a result, creativity research is underrepresented in the literature and the findings of different studies often prove difficult to draw into a coherent body of understanding. A heuristic framework to explicate the different methods by which creativity may be studied forms the basis of this article. Drawing upon existing conceptions of the creativity construct and previous efforts to provide structure to creativity research, the new taxonomic framework examines creativity from 3 primary perspectives in the form of a 3-dimensional matrix. The implications of the taxonomic framework for creativity research are examined. The new taxonomic framework contributes to the understanding of creativity research through the introduction of a comprehensive heuristic to guide future research and the interpretation of previous studies.

During a rescue excavation in the plain known as “Pianabella”, south of the ancient city of Ostia, the remains of some masonry structures in opus latericium appeared under the modern Via Ostiense. Earlier excavations north and south of the road suggested that these structures belonged to funerary buildings, part of a vast necropolis in use from 1st to 4th century AD. In the western section of the excavation (“Tratto 7”) part of a paved road emerged, thus identified as the road linking the so-called ‘Necropoli Laurentina’ to the Christian basilica of Pianabella. Another road appeared further to the east; it was the suburban extension of the “Via del Sabazeo”, a road never documented before south of the modern Via Ostiense. In this area (‘Cavo A1’) a marble sarcophagus lid bearing an inscription and decorated with sea griffins was found, fortunately escaping an attempted theft. The particular conditions of the excavation, dictated by the consistency of the ground and the urgency of the intervention, did not allow the sterile layer to be reached. The findings, in a very fragmentary state, confirm the chronology of the funeral structures, which it was possible to place topographically within the previous findings.

Measuring Creative Contribution: The Creative Momentum Assessment Model ABSTRACT The Creative Momentum Assessment Model (CMAM) is a criterion-based assessment system which applies the phenomenon of psychological momentum to creative ideas, products and expressions. It embodies, and extends, previous theoretical and experimental approaches to measuring creative contributions. It implements accountable self-grading, critical reflection, expert assessment, and timely feedback in conjunction with a criterion-baseline comprised of paradigm-related creative movements, novelty, problem resolution, elaboration and synthesis. Multiple creative contributions, completed by seventy-seven undergraduates participating in a general education creative thinking class, were both self-graded and assessed by a group of design experts using the Consensual Assessment Technique (CAT) plus by a small panel of tutors with access to criteria definitions, rubrics, critical reflection reports and self-grades. CAT assessors exhibited much higher inter-rater reliabilities and creativity grade alignment with the subjects than did tutors, highlighting the diversifying influence of conceptual versus perceptual activity, and novelty detection. More general criteria exhibited higher reliabilities than specific ones. Subjects tended to over-grade themselves at first, followed by more realistic assessments. Assessors’ reliabilities and creativity grade alignment increased with CMAM usage, even in combination with subjects’ self-grades. CMAM criteria factored according to the nature of contribution and assessment methodology uncovering four basic types of criterion dependencies. 967-86915/210411 1 Background An important creativity research direction has been to discover an effective, reliable method of assessing creative contributions including ideas, products, services, and expressions across all fields of endeavor. The purpose of this study is to investigate an assessment model that was designed to incorporate the strengths of previous models, overcome their weaknesses, and provide an effective tool for assessing and enhancing realworld creative contribution. Previous models utilized in assessing creative contribution have focused primarily on tangible products. MacKinnon (1978) proposed that the foundation of all studies of creativity lie in the analysis of creative products. Surprisingly, the number of empirically-supported attempts to develop a creative contribution measurement system has been limited. Early investigations evaluated criteria for art products (Barron, Gaines, Lee, & Marlow, 1973); chemistry and engineering works (Taylor & Sandler, 1972); 6th grade creative works (excluding novelty as a criterion, Pearlman, 1983); art, music, creative writing, performing arts and dance via the Detroit Public Schools’ Creative Product Scales (Parke & Byrnes, 1984); and various creative products using criteria from the Creative Product Analysis Matrix (CPAM, Besemer & Treffinger, 1981) which later emerged as the Creative Product Semantic Scale (CPSS, Besemer & O’Quin, 1986, 1987, 1989). The most versatile criterion models to date are the CPAM and CPSS. The strengths of CPAM/CPSS are that more facets of creative achievement are covered and inter-rater reliability is generally acceptable (α>.70). Weaknesses include a) the use of an extensive item checklist based on the semantic differential work of Osgood, Suci and Tannenbaum (1957) to calibrate criteria, a method which is both time consuming and unwieldy outside of research applications b) criteria tend to focus on creative products to the exclusion of creative ideas or expressions c) creative products rated by these instruments are usually simple (e.g. magnifying glass, razor blade and mechanical openers; cat pitcher; dog chair; steer desk etc.) which raises doubt about their real-world applicability and d) multiple raters are necessary to attain reliable scoring. 967-86915/210411 2 Social-psychological approaches controlling for, or eliminating, within-group variances have also been used to assess creative works. The leading model for this approach was developed by Amabile (1979, 1982, 1996) and is called the Consensual Assessment Technique (CAT). It is based on an operational definition of creativity: a work or response is creative to the extent that expert observers agree it is creative. CAT is not allied with any theory of creativity. Its premise is that experts within a given domain can recognize creativity in that domain when they see it (Baer, 1994). CAT typically uses a panel of expert judges who measure creative works based on implicit definitions of set criteria. The judges are not allowed to confer; receive no prior training; and evaluate criteria other than creativity. Products are considered randomly and rated against each other, rather than against some absolute standard. CAT has been used successfully in many contexts for over twenty years. Its strengths are a) acceptable inter-rater reliabilities (α>.70) and b) its operational approach reflects real-world assessment. CAT’s weaknesses are a) most of the rated works (Amabile & Hennessey, 1999) tend (like CPAM/CPSS) to be simple or generic (e.g. collages, completing sentences, writing essays, descriptive paragraphs, captions for cartoons, telling stories based on a photograph, free-form poems, etc.) b) works are typically produced under highly constrained experimental conditions (e.g. similar instructions, similar content) which don’t reflect real-world conditions c) assessment is time consuming and resource intensive d) selection of expert judges is problematic due to variations in students’ skill levels, judges’ abilities, a work’s specific domain and purpose of assessment (Makel & Plucker, 2008) e) and CAT’s discriminant validity is questionable in that it relies on implicit definitions of criteria (Runco & Mraz, 1992). Though CAT evaluation of real-world creative works is not as prevalent, there’s evidence for high reliability in rating nonparallel creative works (Baer, Kaufman & Gentile, 2004). In most cases, criterion-based models are observer-reliant and exclude selfassessment. In addition to measurement challenges, there are psychological hurdles in evaluating creative contributions. For example, artistic and verbal creativity is negatively impacted in subjects who expect to be evaluated (Amabile, 1979; Amabile, Goldfarb & Brackfield, 1990). Assessment models that adversely influence creative output are probably less valuable. External reward, in the forms of grades or monetary gain, can also be problematic. Creative individuals are intrinsically motivated and may respond negatively to extrinsic reward except where it 967-86915/210411 3 is informative or fosters additional creative endeavor (Amabile, 1996). Assessment models that enhance intrinsic motivation are preferable. In fact, Torrance’s original concept for his Torrance Tests for Creative Thinking, a popular divergent thinking test, was to develop a tool that enriched understanding and nurtured qualities which assist individuals in expressing their creativity, and not for assessment purposes (Hébert, Cramond, Neumeister, Millar, & Silvian, 2002). Creative Momentum Assessment Model The Creative Momentum Assessment Model (CMAM) was designed to overcome many of the hurdles in previous assessment models by combining the psychological momenta (defined below) of the creator’s and observer’s experience of a creative contribution in the context of a criterion-baseline and coupled with accountable self-assessment, critical self-reflection, expert evaluation, and timely feedback. A psychological force, called momentum, commonly noted in large sports events, has been observed to affect observers’ (and players’) expectations of performance. That is, precipitating events like spectacular goals and unexpected movements affect both players’ and audiences’ perceptions of a game. Markman and Guenther (2007), applying Newtonian physics, hypothesized that psychological momentum (p) equals the product of mass (m), or the ‘strength of contextual variables that connote value, immediacy and importance’ (p.801) and velocity (v), or the movement of an attitude object, person, or group of people toward a target. They tested the perception of psychological momentum and then extended the concept beyond performer/observer goal expectancies in the athletic domain to task performance in a wider array of domains. Additional empirical support for p originated in visual-cognitive psychophysical studies of representational momentum wherein an observer viewing a target in actual or implied motion remembers the final location of the target as being positioned slightly forward in the direction of target motion (Freyd & Finke, 1984; Hubbard, 1995; Thornton & Hubbard, 2002). Hubbard (2004) noted that observers made causal inferences, or representations, regarding the physical properties of an object in motion. These representations are based on expectations regarding the displacement of an object, or target. 967-86915/210411 4 In the context of creativity-related tasks, it is assumed that both creators and external observers experience some perceptual and/or conceptual displacement via the transformation of familiar associations through value-bearing novelty. The creative target, similar to winning a sporting event, can be defined from the creator’s perspective as finding or solving a creative problem, generating a new idea, product or service, or expressing something in an original manner; or, from an assessor’s (observer’s) perspective, as the expectation of experiencing both novelty and salience in creative contributions. The associated momentum manifests in creators as the ‘Aha!’ experience and in observers as the ‘Wow! factor. Conceivably, the psychophysical force generated by both novelty and salience, or meaningful resolution of some sort (Qui, Li et al., 2010) is experienced variously by creator and observer. In creators, insightful moments may act as catalysts to enhance creative momentum leading to further insights and ongoing achievement. In some instances, prolific creative output arises in individuals said to be creatively ‘on a roll’. Csikszentmihalyi (1996) proposed that creativity is autotelic, a state where the joy of creative endeavor stimulates further creative effort. Creative momentum may lie at the source of this phenomenon. In CMAM’s creative momentum (pc) assessment, creative mass (mc) is defined as the value or appropriateness of a creative contribution while creative velocity (vc) is the magnitude and direction of that new-value’s motion in idea space away from some reference point and toward a solution. Creative mass (herein ‘creative substance’) represents the new value in idea space, is presumably related to memory retrieval/encoding, and is described through CPAM/CPSS’s key criteria: type of novelty, problem resolution, elaboration and synthesis. Dobbins and Wagner (2005), using fMRI, found that (in memory retrieval of novel figural stimuli) novelty detection was faster and more accurate than perceptual or conceptual recollection, brain activation differed across these retrieval types, and while perceptual and conceptual recollection of the familiar involved the left prefrontal cortex (PFC), novelty shifted activity to the right PFC. The PFC is considered the decision-making center in the brain. Furthermore, it has been noted that novelty items are more easily recognized than familiar items (Kinsbourne & George, 1974; Tulving & Kroll, 1995). CMAM addresses the neural equivalent of novelty activation through type of novelty criteria [novelty (ideas), novelty (use of materials), germinal], perceptual activity through elaboration and synthesis criteria [coherence, complex, perfected, appealing, communicative, 967-86915/210411 5 elegant] and conceptual activity through criteria related to the capacity for problem resolution [logical, useful, appropriate, valuable]. In defining creative substance, some CPAM/CPSS sub-criteria were either deleted, or altered. For example, ‘transformative’ was dropped because it involved speculation on the future impact of contributions; ‘surprising’ was deleted because it denoted arousal and the detection of unexpected stimuli which may, or may not, involve ‘originality’; and ‘adequate’ was removed because CPAM/CPSS authors observed that it elicited negative connotations. ‘Original’ was converted into novelty (ideas) and novelty (materials) to delineate contributions whose ideas may not be unusual but whose packaging is. Coherent and perfected extended CPAM/CPSS’s initial product focus (e.g. ‘organic’, ‘well-crafted’) to include creative ideas and expressions. Appealing avoided ‘attractive’ superficiality connotations and potential visual perception bias. Communicative was assumed to describe information transfer more universally than ‘expressive’ because it included, for example, abstract concepts requiring considerable interpretation (e.g. complex geometry). Creative velocity (herein ‘creative movement’) is motion in idea space relative to existing paradigms. A paradigm has been defined variously as a set of rules and regulations which establish or define boundaries and suggest behavior (Baker, 1992); a fundamental way of perceiving, thinking, valuing and acting in accordance with a certain view of reality (Harman, 1970); and “a constellation of concepts, values, perceptions and practices shared by a community, which forms a particular vision of reality that is the basis of the way a community organizes itself" (Capra, 1996, p. 6). In this regard, CMAM adopts Sternberg’s (2003) Propulsion Theory of Creativity (PTC) which describes creativity as movement from one position in idea space to another or a field transitioning relative to its paradigms. For example, Kuhn’s (1962) notion of paradigm shift as a change in basic assumptions governing a ruling theory of science can be construed as the result of creative movement acting upon multiple observers in a scientific domain. Markman and Guenther (2007) noted that psychological momentum is positive if movement is toward a target and negative if away from a target. In PTC, all movements (called creative solution types) involve positive momentum while changes directed toward a target vary in relation to a field’s paradigms and trends. CMAM employs seven of PTC’s creative solution types as movement criteria which are governed by three basic paradigm relationships: paradigm acceptance [contributions that move a field according to its present direction, i.e. replication, redefinition, forward 967-86915/210411 6 incrementation]; paradigm rejection [contributions that move a field in a new direction, i.e. redirection, reconstruction/redirection, reinitiation]; and paradigm merger [contributions that combine both approaches, i.e. integration]. [Refer to Sternberg (2003) for details on solution types.] PTC’s eighth solution type, advance forward incrementation, was removed because it addressed future value retrospectively thereby promoting individuals to justify their creativity based on the ignorance of assessors! According to the novelty/encoding hypothesis, novelty assessment is an early stage of encoding occurring in the brain’s limbic, temporal/parietal regions followed by higher level meaning-based encoding in cortical regions that include the left PFC (Craik & Lockhart, 1972; Tulving, 1983). Paradigms in a single contribution (e.g. associated with form, function, performance, socio-cultural significance, etc.) that include novelty, elaboration/synthesis and conceptual memory activation probably involve both low and high level cognitive encoding operations. However, creative movement is more likely associated with higher-order conceptual activity related to problem resolution than novelty or elaboration/synthesis criteria. This remains to be seen. The magnitude and direction of creative movement may change over time, in either creator or observer, leading to what could be called creative acceleration, or a sense of creative exhilaration. For example, Darwin developed the theory of evolution via a series of insights over twenty years (Ospovat, 1981) which suggests low acceleration; yet, when The Origin of Species was first published, both scientists and clergymen experienced a psychological jolt because the full impact of his insights into human origins struck them all at once. On the other hand, Einstein took eleven years to develop his general theory of relativity, still a rather slow process; however, its impact on the physics community was even slower as it took over forty years for the mathematics, which underlay its creative significance, to be understood. A criterion-baseline, however well-constructed, cannot of itself generate accurate measurement, especially in single assessors, due to differences in ability and personal bias. Runco, McCarthy and Svenson (1994) noted that professionals who produce excellent creative works (e.g. art) may be unreliable, biased and inaccurate in assessing the works of others. Even if experts can accurately (by specific domain/field standards) assess certain kinds of creative contributions, they may be less capable in evaluating others. Assessment models which don’t 967-86915/210411 7 address this weakness have limited usefulness. In this respect, CMAM adopts a more forward-thinking approach. Creative momenta, appraised via the criterion-baseline, are first divided into self and expert scores then combined proportionately in favor of self-ratings to support implicit motivation and ultimately enhance reliability. Runco, McCarthy and Svenson (1994) and Amabile (1996) found a moderate relationship between self and expert ratings of creative products with self rating generally higher than expert rating on similar rank orderings of products. Self-ratings are often biased (e.g. affected by task difficulty, effort, memory, honesty, etc.) but they do have advantages (Hocevar, 1981; Runco, Noble, & Luptak, 1990). For example, individuals are better informed about their intent, the work, the creative process, nature of insights, and overall performance. In short, individuals may value their work more because they understand the rationale behind it whereas experts aren’t privy to this information, may not know how a solution solves a problem, or understand the creator’s intent (Runco & Chand, 1995). Hocevar (1981), in fact, favored self-ratings. CMAM hypothetically compensates for assessment bias by aligning individual and expert measurements of creative momenta over time, and against a criterion baseline. In order to minimize self-assessment bias, individuals critically reflect, via a Creativity Reflection Report (CRR), upon their contributions while referencing CMAM criteria. CRR is then judged by separate, noncreativity-related criteria to reduce inhibition stemming from external evaluation. Self-ratings are penalized, however, for being unjustified; that is, personal objectivity is rewarded by higher CRR scores. Expert ratings are offered simply as professional opinions. They include constructive feedback on creative achievement and critical reflections. CMAM’s self-ratings hold the majority in creativity assessment. This is done to promote sensible risk-taking and build self-confidence. Overall scoring emphasis on critical reflection provides a form of assessment control for educators. Final contribution scores combine both self/expert(s) creativity ratings and CRR scores. Three hypotheses are proposed in using CMAM as a measurement tool. Hypothesis 1 declares that combined self/expert creativity scores should be more reliable than experts’ creativity scores alone. This hypothesis implies that Csikszentmihalyi’s (1999) domain/field expert assessments (e.g. observer-based momenta) as the 967-86915/210411 8 basis of his systems approach become more reliable if supported by ‘accountable’ creativity self-assessment, provided that common criteria are used. Hypothesis 2, furthermore, proposes that familiarity with CMAM criteria, coupled with accountable self-grading and expert feedback on both creativity and critical reflections, serves to align self/expert creativity scores. That is, CMAM accuracy is not a ‘snapshot’ measurement phenomenon, but evolves with model familiarity. Hypothesis 3 is that CMAM’s main criteria sections will factor with each other according to the nature of creative contributions and assessors’ access to relevant information. Evidence for this effect appeared when elaboration/synthesis criteria loaded inconsistently with novelty and resolution criteria in the CPAM/CPSS studies (Besemer & O’Quin, 1987, 1989). The Study Method CMAM criteria, in both English and Chinese versions, were introduced, with appropriate rubrics, to undergraduate students at the Hong Kong Polytechnic University over a period of five weeks (within a fourteen-week semester) as part of a general education creative thinking course. Criteria were explained during weekly one-hour interactive lectures with examples from multiple disciplines coupled with relevant exercises and tutorials teaching creative tools. Tutors were rotated weekly to lessen any delivery style bias. Three assignments were completed. They required making multiple associations, embodied figural and verbal content, employed the full range of CMAM substance criteria and focused on one basic paradigm relationship each. The assignments were completed over a period of seven weeks and included a 4-page brochure describing a new form of entertainment (paradigm acceptance, Assign2), an innovative inspirational ‘freedom’ poster (with attached explanation) that rejected subjects’ initial assumptions about freedom (paradigm rejection, Assign3a) and a group PowerPoint presentation which integrated group members’ new freedom constructs with paradigms from two unrelated fields (i.e. furniture, time pieces, relaxation devices, food relief) into an original product, process, or performance (paradigm 967-86915/210411 9 merger, Assign3b). Subjects noted their specific creative movement attempts, self-graded their creativity on a 9-point letter grade scale (F=0 to A+=4.5), completed a Creativity Reflection Report (CRR), and reported the difficulty of each assignment on a 5-point Likert scale (1=very easy to 5=very difficult). To avoid assessor bias, subjects were identified by code numbers and both submissions and criteria were rated in random order (except for the criterion overall creativity grade, see below). Three design-faculty tutors (1 female, 2 male, average age=47.7years) who were familiar with CMAM criteria, their rubrics and who had access to CRRs, self-grades and difficulty levels, assessed subjects on fifteen CMAM criteria (i.e. one movement and fourteen substance criteria) plus five of Amabile & Hennessey’s (1982) CAT criteria (i.e. creativity, effort evident, planning, organization and technical goodness) based on a 5point Likert scale [5(high)-3(medium)-1(low)]. The construct creativity was scored twice: 1) creativity as a quick intuitive response to contributions and 2) overall creativity grade as summative response after considering the other twenty criteria. The CMAM criterion, germinal, was excluded because the term was unfamiliar to CAT judges. Overall creativity grade, self-grades and CRR subtotal (i.e. comprised of theory application, research proficiency, organization, completeness) as well as the composite scores tutor total score (CRR subtotal plus overall creativity grade) and final score (tutor total score plus self-grades) were evaluated on the 9-point letter grade scale. Additionally, twelve independent judges (11 female, 1 male, average age=26.4 years), deemed to be experts in rating creativity and whose experience was not far removed from undergraduate study, scored the same twenty-one criteria as tutors using the same scales. They had to access to CRRs and therefore could not make composite scores. The weighting of each contribution’s final score was creativity self-grade (30%), overall creativity grade (10%) and CRR grade (60%) which gave subjects a 75% say in their creativity assessment. Results 967-86915/210411 10 Assessor Reliability. A total of seventy-seven subjects (M=37, F=40) participated in two individual creativity assignments while ninety-two subjects (M=44, F=48), including the seventy-seven, formed thirty-one groups to undertake a group creativity assignment. The overall sample included twenty-six designers and sixty-six non-designers. Cronbach alpha was used across assignments to evaluate assessors’ inter-rater reliabilities. Results are summarized in Table 1. Average reliabilities are cited for interpretive convenience. Judges’ average reliabilities per criterion were very good to excellent (range: α=.82 to α=.91) while tutors’ were low to marginal (range: α=.47 to α=.68). Across assignments, judges’ overall average reliability was .85 with tutors at .56. For all assessors, the highest average reliabilities within each criteria section were overall creativity grade, elaboration & synthesis, and effort evident, with the resolution criteria section high in problem resolution for judges and useful for tutors. Better reliabilities were found in the section headings. All assessors exhibited weaker reliability in the problem resolution section. The single most unreliable criterion was paradigm movement, however, it improved with usage. Overall creativity grade reliability, like most novelty criteria, also increased with each assignment. Across assignments, tutors’ average CRR alpha was higher than their solution criteria (range: α=.60 to α=.68) with organization at the top. In the composite scores, tutors’ alpha averages reached acceptability (α>.70) increasing as more variables, such as CRR scores, were added. When subjects’ creativity self-grades were calculated in conjunction with assessors’ overall creativity grades, reliability decreased somewhat; however, the assessors’ previous pattern of increased reliability across assignments was maintained. TABLE 1. Judges’ and tutors’ inter-rater reliability across assignments (Cronbach Alpha) 967-86915/210411 11 Assign3a Assign2 Assign3b Assessor Averages Judges Tutors Judges Tutors Judges Tutors Judges Tutors Paradigm (movement) .70 .13 .90 .33 .86 .78 .82 .41 .56 Solution Criteria Novelty (idea) .81 .39 .87 .52 .88 .76 .85 Novelty (materials) .81 .52 .83 .38 .86 .77 .83 .56 Creativity .81 .46 .86 .45 .89 .77 .86 .56 Overall Creativity Grade .88 .51 .93 .58 .93 .73 .91 .61 Section Average .83 .47 .87 .48 .76 .60 .86 .57 Problem Resolution .79 .47 .90 .46 .89 .88 .85 .51 Logical .74 .47 .87 .48 .84 .61 .82 .52 Useful .79 .33 .88 .65 .78 .79 .82 .59 Appropriate .75 .41 .89 .37 .83 .62 .82 .47 Valuable .74 .51 .87 .31 .85 .78 .82 .53 Section Average .76 .44 .88 .46 .68 .82 .83 .52 .90 .68 Elaboration & Synthesis .88 .57 .92 .65 .84 .90 Coherent .80 .48 .91 .61 .88 .77 .86 .62 Complex .82 .38 .92 .34 .90 .68 .88 .47 Communicative .82 .56 .88 .44 .84 .69 .85 .56 Appealing .82 .53 .82 .47 .84 .72 .83 .57 Perfected .82 .54 .89 .23 .85 .82 .85 .53 Elegant .83 .57 .86 .28 .84 .73 .85 .53 Section Average .83 .52 .89 .43 .57 .88 .59 .93 .65 .75 .77 .86 Effort Evident .87 .94 .91 .67 Planning .84 .59 .91 .49 .88 .77 .88 .62 Organization .83 .61 .91 .59 .86 .80 .87 .67 TechnicalGoodness .78 .53 .86 .50 .87 .53 .84 .52 Section Average .83 .58 .90 .56 .89 .72 .87 .62 Overall Average .81 .48 .89 .47 .87 .73 .85 .56 Theory Application ‐ .52 ‐ .64 ‐ .74 ‐ .64 Research Proficiency ‐ .54 ‐ .61 ‐ .64 ‐ .60 Organization ‐ .65 ‐ .62 ‐ .77 ‐ .68 Completeness ‐ .55 ‐ .56 ‐ .81 ‐ .64 Section Average ‐ .57 ‐ .61 ‐ .74 ‐ .64 CRR Subtotal ‐ .61 ‐ .65 ‐ .98 ‐ .74 Tutor Total Score ‐ .60 ‐ .65 ‐ 1.00 ‐ .75 Final Score ‐ .69 ‐ .74 ‐ .93 ‐ .79 .45 .86 .51 .89 .69 .85 .55 CRR Criteria Self/Assessor Combined Overall Creativity Grade .81 Note: high section averages are underlined. Based on means analysis, judges scored all CMAM criteria higher (mean range= Assign2: 2.62-3.16; Assign3a: 2.50-2.87; Assign3b: 2.50-3.09) than tutors (mean range= Assign2: 1.39-2.66; Assign3a: 1.71-2.65; Assign3b: 1.44-2.43) excepting overall creativity grade for Assign3a (mean difference=0.07). Mean scores indicated that subject performance was considered, especially by tutors, as below average. Tutors’ standard deviations for Assign2 and Assign3b were typically higher than judges’. There was greater overall assessor agreement on Assign3a. In general, tutors were more critical toward, and discriminating of, subjects’ contributions than judges. 967-86915/210411 12 Multiple analysis of variance (MANOVA) was conducted across various conditions (judge x tutor; male x female; design x nondesign) with CMAM criteria as dependent variables in order to determine significant interactions in assessor scoring. See Table 2 for a detailed summary. Assign3b was excluded because group conditions could not be statistically differentiated. Judges’ scores were significantly higher than tutors’: 1) Assign2 (range: F1,146=21.07, p<.001 to F1,146=192.31, p<.001) 2) Assign3a (range: F1,146=7.83, p<.01 to F1,146=97.44, p<.001). The most significant differences in criteria occurred for: elegant (F1,146=192.31, p<.001), perfected (F1,146=135.26, p<.001) and useful (F1,146=104.89, p<.001) on Assign2; and elegant (F1,146=97.44, p<.001) on Assign3a. Certain criteria in Assign3a showed no significant scoring differences (they also exhibited the least significance in Assign2): creativity, overall creativity grade, appropriate, elaboration & synthesis, and effort evident. As noted previously, most of these criteria exhibited higher alphas. Judges scored designers significantly better than non-designers across both assignments (range: F1,73=11.61, p<.001 to F1,73=58.19, p<.001) while tutors found less significant differences, especially in Assign3a. Judges scored females significantly higher than males on most Assign2 criteria (except for the problem resolution section) but gender differences attenuated in Assign3a. Tutors didn’t indicate significant gender differences, except for the novelty-related criteria, planning in Assign2, and novelty (idea) in Assign3a. TABLE 2. MANOVA results for Judge/Tutor, Design/Nondesign, and M/F conditions, with CMAM criteria as dependent variables 967-86915/210411 13 Judges & Tutors Scores Dependent / Independent Variables Assign2 Judges Scores Judge/Tutor F 29.57 η2 df 1,73 .235 1,73 66.97 .000 .314 1,73 58.19 .000 .285 12.91 .000 .081 1,73 25.84 .000 .261 7.00 .010 .087 1, 146 32.66 .000 .183 1,73 29.45 .000 .168 10.87 .001 .069 1,73 12.92 .001 .150 5.13 .027 .066 9.42 .003 df 1,73 .061 1,73 F 8.28 9.01 p .005 .004 .102 .110 F 1.05 4.03 p .310 η2 1, 146 .152 .013 M/F Novelty (material) .000 p .171 η2 Creativity .128 F 1.90 η2 1, 146 26.16 p .000 η2 Novelty (idea) .048 .052 Overall Creativity Grade 1, 146 .178 4.84 1,73 12.09 .001 .142 1.83 Problem Resolution 1, 146 56.00 .000 .277 1,73 23.94 .000 .141 3.19 .076 .021 1,73 10.13 .002 .122 2.43 .124 .032 1, 146 43.98 .000 .231 1,73 13.69 .000 .086 .36 .550 .002 1,73 6.69 .012 .084 .48 .493 .006 104.89 .000 .132 .000 1, 146 1, 146 42.49 .000 .225 1,73 15.40 .000 .095 1.13 .290 .008 1,73 4.85 .031 .062 .92 .340 .012 1, 146 56.23 .000 .278 1,73 20.79 .000 .125 1.22 .272 .008 1,73 7.18 .009 .090 .10 .758 .001 1,73 15.65 .081 1,73 3.43 .068 .045 .37 Elaboration & Synthesis 1, 146 1, 146 66.18 .000 .312 1,73 24.02 .000 .141 2.04 .155 .014 1,73 8.73 .004 .107 .53 .469 .007 1, 146 56.10 .000 .278 1,73 12.46 .001 .079 6.13 .014 .040 1,73 5.11 .027 .065 3.71 .058 .048 Communicative 1, 146 .226 26.82 .000 .155 5.64 .019 .037 .000 .177 .057 .005 Coherent .000 3.73 .544 Complex 42.63 1,73 .000 .019 .024 Useful 12.93 .092 .032 Appropriate .418 2.87 .029 Valuable .000 1,73 31.64 .180 .014 Logical 22.27 1,73 F 21.41 Design/Nondesign df 1, 146 .000 .168 Tutors Scores M/F Paradigm (movement) 44.85 p .000 Design/Nondesign .049 58.51 .000 .286 1,73 43.20 .000 .228 4.09 .045 .027 1,73 19.30 .000 .209 1.14 .290 .015 Appealing 1, 146 54.09 .000 .270 1,73 25.09 .000 .147 3.91 .050 .026 1,73 6.70 .012 .084 1.95 .167 .026 Perfected 1, 146 135.26 .000 .481 1,73 32.46 .000 .182 4.87 .029 .032 1,73 10.56 .002 .126 1.35 .249 .018 Elegant 1, 146 192.31 .000 .568 1,73 39.31 .000 .212 5.50 .020 .036 1,73 8.81 .004 .108 1.55 .218 .021 Effort Evident 1, 146 .198 2.67 .107 .035 Planning 1, 146 22.12 .000 .132 1,73 37.75 .000 .205 6.45 .012 .042 1,73 20.49 .000 .219 4.39 .040 .057 Organization 1, 146 24.07 .000 .142 1,73 28.80 .000 .165 4.08 .045 .027 1,73 14.10 .000 .162 2.13 .149 .028 Technical Goodness .040 15.87 .000 .098 1,73 31.47 .000 .177 4.20 .042 .028 1,73 17.98 .000 1, 146 21.07 .000 .126 1,73 29.88 .000 .170 5.88 .017 .039 1,73 14.56 .000 .166 3.03 .086 Paradigm (movement) 1, 146 8.19 .005 .053 1,73 18.05 .000 .110 4.14 .044 .028 1,73 5.22 .025 .067 .47 .496 .006 Novelty (idea) 1, 146 14.65 .000 .091 1,73 28.21 .000 .162 6.82 .010 .045 1,73 11.12 .001 .132 4.03 .048 .052 Novelty (material) 1, 146 17.90 .000 .109 1,73 22.61 .000 .134 3.43 .066 .023 1,73 4.34 .041 .056 .80 .374 .011 Creativity 1, 146 3.82 .053 .026 1,73 24.73 .000 .145 2.15 .145 .014 1,73 9.95 .002 .120 1.28 .261 .017 .105 2.29 Assign3a .135 .030 Problem Resolution 1, 146 1.56 .213 .011 1,73 19.34 .000 .117 2.98 .086 .020 1,73 6.50 .013 .082 .99 .323 .013 Logical Overall Creativity Grade 1, 146 1, 146 15.45 .01 .000 .096 1,73 11.61 .001 .074 1.38 .242 .009 1,73 1.25 .268 .017 .02 .889 .000 Useful 1, 146 34.12 .934 .000 .000 .189 1,73 1,73 24.37 14.89 .000 .000 .143 .093 5.02 3.31 .027 .071 .033 .022 1,73 1,73 8.55 1.96 .005 .166 .026 .93 .338 .013 Appropriate 1, 146 3.56 .061 .024 1,73 16.80 .000 .103 2.12 .147 .014 1,73 3.41 .069 .045 .08 .774 .001 Valuable 1, 146 37.11 .000 .203 1,73 16.88 .000 .104 3.56 .061 .024 1,73 1.92 .170 .026 .68 .414 .009 Elaboration & Synthesis 1, 146 .135 8.00 .005 .052 1,73 7.92 .098 2.35 Coherent 1, 146 24.93 .000 .146 1,73 17.78 .000 .109 6.45 .012 .042 1,73 3.50 .065 .046 1.34 .251 .018 Complex 1, 146 26.70 3.30 .000 .071 .155 .022 1,73 1,73 14.41 .000 .090 3.67 .057 .025 1,73 .91 .344 .012 .54 .465 .007 Communicative 1, 146 9.15 .003 .059 1,73 18.64 .000 .113 1.86 .174 .013 1,73 3.98 .050 .052 .04 .837 .001 Appealing 1, 146 38.90 .000 .210 1,73 40.94 .000 .219 6.96 .009 .046 1,73 11.30 .001 .134 2.58 .113 .034 22.77 .000 3.52 .063 .006 .130 .031 Perfected 1, 146 60.26 .000 .292 1,73 25.44 .000 .148 .024 1,73 7.25 .009 .090 1.33 .253 .018 Elegant 1, 146 97.44 .000 .400 1,73 35.75 .000 .197 3.71 .056 .025 1,73 6.64 .012 .083 1.86 .177 .025 Effort Evident 1, 146 3.23 .074 .022 1,73 23.11 .000 .137 4.34 .039 .029 1,73 10.31 .002 .124 1.70 .197 .023 Planning 1, 146 17.83 .000 .109 1,73 24.05 .000 .141 6.27 .013 .041 1,73 9.42 Organization 1, 146 7.83 .006 .051 1,73 22.23 .000 .132 6.57 .011 .043 1,73 10.74 .002 .128 3.84 .054 .050 Technical Goodness 1, 146 32.41 .000 .182 1,73 31.59 .000 .178 3.90 .050 .026 1,73 8.48 .005 .104 1.21 .274 .016 .003 .114 1.53 .220 .021 Note: Bolded figures are significant Grading Alignment. Grading alignment results are summarized in Table 3. Assessors’ mean differences in scoring overall creativity grade were significant except for Assign3a (mean difference=-.07, p=.771). Assign3b mean differences were less than a half grade-point. Differences between judges’ scoring of overall creativity grade and subjects’ self-grading of creativity were also very significant and less than a half grade-point for the first two assignments (Assign2: mean difference=-.36, p<.01; Assign3a: mean difference=-.44, p<.001). Assign3b results, however, slightly exceeded a half grade-point. Differences between tutors’ overall creativity grade and subjects’ self-grading were very significant, almost a gradepoint for Assign2 (mean difference=-.95, p<.001), less than a half grade-point for Assign3a (mean difference=-.37, p<.001) and over a grade point for Assign3b (mean difference=-1.07, p<.001); that is, tutors were again more harsh in grading than judges. Negative differences indicated subject overgrading. Subject over-grading was higher in the group assignment. Mean grading differences between assessors tend to support the half grade-point margin of error typically observed in expert panel assessments. In the individual assignments, assessor/subject grade alignment increased. Furthermore, 967-86915/210411 14 subjects’ rating of assignment difficulty exhibited a significant inverse correlation with self-grades (Assign2, r=-.279, p<.05; Assign 3a, r=-.239, p<.05); that is, greater perceived difficulty led to lower self-grades, and vice-versa. TABLE 3. Mean differences between assessors’ grades and subjects’ self-grades Assign2 Assign3a Mean Difference (I-J) p Lower Bound Upper Bound Judges - Tutors .60 .000 .36 Judges - Self-Grade -.36 .002 Tutors - Self-Grade -.95 .000 I-J Mean Difference (I-J) p Lower Bound .84 -.07 .771 -.60 -.11 -.44 -1.20 -.71 -.37 Assign3b Upper Bound Mean Difference (I-J) p Lower Bound Upper Bound -.32 .17 .40 .040 .01 .79 .000 -.69 -.20 -.67 .000 -1.06 -.28 .001 -.62 -.12 -1.07 .000 -1.46 -.68 Bold figures represent mean differences of +/- a half grade point Criteria Relations. Varimax rotation was calculated across both assignments and assessors to determine CMAM’s factor structure, which is summarized in Table 4. With the exception of tutors’ loadings in Assign2 and Assign3a, two factors emerged. Excepting tutors’ Assign3b results, paradigm movement loaded consistently with problem resolution criteria suggesting that paradigm movement is, in fact, a measure of creative solution type and allied to high-order conceptual processing. In terms of creativity factors, judges’ Assign2/Assign3a novelty/creativity section tended to factor separately from all other criteria except for overall creativity grade which, in Assign3a, loaded with the other criteria. Judges’ Assign3b novelty criteria factored with elaboration/synthesis and certain motivation/technical criteria. Tutors’ Assign2 novelty/creativity section tended to load separately from other criteria except for novelty (materials) which factored with both elaboration/synthesis and motivation/technical criteria. Tutors’ Assign3a novelty criteria loaded primarily with problem resolution criteria while Assign3b novelty generally factored separately from other criteria. These findings suggest that novelty-related activity factors variously with both perceptual and conceptual activity. In terms of elaboration and synthesis factors, judges’ Assign2 and Assign3a elaboration/synthesis section tended to load with both problem resolution and motivation/technical criteria. Their Assign3b elaboration/synthesis section also loaded with motivation/technical criteria. Tutors’ Assign2 elaboration/synthesis section also factored with motivation/technical criteria, however, the Assign3a 967-86915/210411 15 elaboration/synthesis section split into two subgroups described as follows: a) structure perception: elaboration and synthesis, coherent, complex and b) content perception: communicative, appealing, perfected. The latter subgroup tended to factor better with motivation/technical criteria. Tutors’ Assign3b elaboration/synthesis section factored similarly to judges’ Assign2 and Assign3a loading patterns. As noted, motivation/technical criteria usually factored with elaboration/synthesis criteria. All three criteria sections factored differently with each other, apparently according to a contribution’s nature and assessment method. In this respect, four dependencies (see Discussion for further details) tended to emerge. Nominating novelty criteria as (N), problem resolution as (R), elaboration and synthesis as (E) and motivation/technical criteria as (T), the dependencies are: 1) [(N) + (R/E/T)]; 2) [(N/E/T) + (R)]; 3) [(N) + (R) + (E/T); 4) [(N/R) + (E/T)] with T always associated with E (and paradigm movement usually associated with R). In terms of reflection report factors, CRR criteria tended to factor with novelty/creativity criteria in Assign2, with the elaboration/synthesis structure perception subgroup in Assign3a, and were split between novelty and other criteria in Assign3b. These variations conceivably reflect tutors’ more critical solution considerations (e.g. Assign2: the existence, or not, of novelty within the accepted paradigm; Assign3a: elaborations indicating, or not, radical approaches to freedom; Assign3b: the capacity, or not, to integrate new associations, conduct research and organize material in resolving a merger). TABLE 4. CMAM criteria factor analysis across assignments and assessors 967-86915/210411 16 Judges Assign2 Tutors Assign3a Assign3b Assign2 1 2 1 2 1 2 1 Paradigm (movement) .80 .38 .82 .51 .60 .73 Novelty (idea) .37 .84 .65 .67 .94 .17 Novelty (materials) .44 .84 .51 .77 .92 Creativity .39 .85 .67 .68 Overall Creativity Grade .63 .74 .77 Problem Resolution .77 .53 Logical .89 Useful Assign3b Assign3a 2 3 1 2 3 1 2 .32 .22 .57 .31 .82 .20 .51 .78 .26 .87 .15 .33 .74 .36 .40 .85 .30 .65 .59 .13 .21 .51 .66 .65 .62 .93 .29 .33 .84 .22 .48 .58 .43 .37 .91 .61 .80 .58 .44 .75 .41 .58 .63 .44 .47 .86 .79 .57 .63 .75 .39 .37 .75 .54 .73 .25 .75 .55 .26 .78 .55 .14 .96 .37 .23 .81 .41 .56 .46 .71 .56 .73 .49 .76 .59 .27 .91 .25 .29 .78 .46 .68 .37 .79 .49 Appropriate .83 .43 .77 .59 .39 .89 .34 .33 .80 .45 .77 .27 .76 .55 Valuable .74 .55 .73 .64 .52 .80 .36 .45 .67 .48 .70 .32 .77 .54 Elaboration & Synthesis .83 .44 .87 .46 .73 .63 .73 .27 .48 .64 .40 .53 .80 .50 Coherent .81 .50 .83 .50 .74 .62 .65 .16 .60 .57 .50 .48 .84 .45 Complex .67 .55 .83 .50 .80 .56 .43 .43 .44 .53 .38 .40 .65 .56 Communicative .72 .54 .75 .59 .66 .68 .71 .21 .53 .40 .52 .58 .81 .44 Appealing .50 .79 .45 .87 .88 .38 .65 .40 .38 .29 .51 .63 .66 .65 Perfected .71 .62 .76 .62 .70 .66 .67 .33 .48 .40 .48 .67 .86 .37 .50 Factor Solution Criteria Elegant .40 .80 .45 .83 .79 .48 .71 .30 .26 .27 .58 .50 .74 Effort Evident .69 .62 .84 .47 .76 .59 .76 .48 .30 .58 .27 .63 .72 .54 Planning .78 .52 .83 .51 .72 .67 .78 .40 .36 .62 .32 .62 .77 .54 Organization .78 .54 .86 .48 .66 .69 .81 .35 .37 .66 .30 .61 .85 .45 TechnicalGoodness .73 .55 .67 .67 .68 .68 .80 .34 .26 .37 .16 .80 .90 .15 Theory Application - - - - - - .23 .67 .62 .78 .48 .30 .62 .72 Research Proficiency - - - - - - .37 .65 .54 .81 .40 .34 .75 .56 Organization - - - - - - .49 .64 .49 .80 .37 .38 .71 .62 Completeness - - - - - - .46 .67 .49 .80 .37 .37 .68 .66 CRR Subtotal - - - - - - .36 .70 .59 .82 .44 .35 .43 .88 Tutor Total Score - - - - - - .38 .72 .56 .79 .47 .37 .32 .89 Final Score - - - - - - .38 .70 .46 .78 .40 .26 .77 .51 CRR Criteria Note: higher weightings are bolded. Although paradigm movement was listed as a single criterion, assessors were informed prior to evaluation that assignments variously denoted paradigm acceptance, rejection or merger. Although subjects had no choice on Assign3b (as all were integrations), Assign2 and Assign3a allowed selections of one of three creative solution types (i.e. specific movements). Not all subjects stated the specific movement they were attempting in their CRRs (Assign2, N=76; Assign3a, N=66). In Assign2, forward incrementation attracted more attempts (39.47%) than replication (31.58%) or redefinition (28.95%). In Assign3a, redirections held the majority of attempts (65.15%) followed by reconstruction/redirection (19.70%) and reinitiation (15.15%). Discussion Judges’ inter-rater reliability, using CAT, was on the average higher (α=.85) across the three assignments than the average reliabilities of Besemer and O’Quin’s (1986) a) CPAM investigation of two creative contributions rated by 133 undergraduates using 70 bi-polar adjectives to describe eleven solution subscales (α=.82); b) four CPAM studies (1987) involving 90 student raters using a 110 item checklist to measure eight solution subscales (α=.76); and c) a later CPSS study (1989) where 194 967-86915/210411 17 undergraduates rated three products using 71 bi-polar adjectives defining ten solution subscales (α=.77). Noting that CMAM contributions were more complex than those in the above-cited studies, plus reliability for overall creativity grade, a creativity summation, (range: α=.88 to α=.93) exceeded CAT reliabilities for creativity in simple artifacts such as collages (α=.70 to α=.82; Amabile & Hennessey, 1999), denoted that CMAM criteria have considerable value in evaluating creative contributions. Results further indicate that experts’ implicit definitions of CMAM criteria are more reliable than nonexpert ratings of multiple adjectives describing similar criteria. However, CAT, though quite reliable, is not an optimal assessment tool for educators, being both time-consuming and resource heavy. The panel of tutors, for the most part, proved unreliable (α<.70). However, their most reliable criteria were the same as judges’: overall creativity grade, elaboration & synthesis, and effort evident. Tutors’ paradigm movement and the problem resolution section exhibited the least reliability, with problem resolution and useful faring best. According to Dobbins and Wagner (2005), conceptual memoryretrieval activity involves higher meaning, more selectivity and slower processing time than novelty and perceptual activity. That assessors exhibited less reliability in problem resolution criteria with the tutors less reliable overall, indicated that conceptual selectivity influenced scoring differentiation. In this respect, tutors had the advantage of additional information via CRRs, criterion definitions and rubrics, self-grades, level of difficulty etc. Novelty detection and perceptual activity criteria shared slightly more commonality among assessors. Tutors’ lower mean scores and higher standard deviations in Assign2 and Assign3b support a more critical and differentiated purview of contributions, which tallies with reliability findings. Though somewhat speculative, general criterion labels may involve more implicit sharing of default schemas (i.e. common conceptual patterns or frameworks) leading to greater interrater reliability than more specific criterion labels which require refined interpretation and probably increased conceptual and perceptual selectivity. Similarly, criterion labels in themselves (e.g. used in CAT measurements) likely embody more default schemas, and exhibit more inter-rater reliability, than the same labels coupled with specific definitions, descriptions and rubrics. The higher reliabilities found in section headings, coupled with low tutor alphas, tend to support these contentions. Greater reliability, 967-86915/210411 18 however, does not indicate enhanced discriminative validity. Because subjects tended to agree more with judges’ implicitly-defined scoring of creativity, self-assessment (if not founded in critical reflection) may tend to align more with CAT’s default schemas than with the more refined schemas of tutors. In short, accurate, meaningful measurement of creative substance depends on CMAM’s capacity to fine-tune and align creator/observer schemas. These conjectures, however, require further investigation. MANOVA results found the largest judge-tutor scoring differences in elegant, perfected and useful for the first assignment and elegant in the second. Elegant is defined (in its Chinese label) as being both simple and profound. Perfected implies some comparison with rater expectations of a finished work, or of other similar works, and useful denotes the number of real or imagined uses. These criteria, two of which are considered more perception-based, involve some conceptual activity, variations in which might lead to the larger scoring gap. Again, conceptual activity seems to diversify evaluations more so that novelty detection or perceptual activity. For all assessors, designers and females tended to outperform their counterparts in creativity. Judges’ results in this respect were more significant. That designers, whose fundamental training is in creativity, outperformed non-designers in novelty/creativity criteria is not extraordinary. Gender differences, however, are more contentious. Runco, Cramond and Pagnani (2009) noted that the majority of gender studies on creative potential found no significant differences, with about a third favoring females. The question is do these findings on creative potential translate to creative works? Evidence herein indicates that females may outperform males in some criteria (e.g. novelty ideas); but, a small sample confined to one institution is not indicative and further work is required. CMAM was designed to compensate for unreliable, biased tutor evaluation by incorporating accountable self-grading, critical reflection, and timely feedback. When creativity self-grades were combined with judges’ and tutors’ overall creativity grades, reliability decreased providing evidence that subjects were, indeed, using a different perspective in evaluating their own work, which probably 967-86915/210411 19 exacerbated over-grading. In this respect, Hypothesis 1, which proposes that reliability should increase, was unsupported. However, the combination of self-grades and assessors’ grades did not undermine reliability. In fact, alpha steadily increased across assignments, an effect likely due to familiarity with CMAM criteria, grading accountability, and expert feedback. The combined ratings in Assign3b nearly attained reliable levels (α=.69). In this respect, Hypothesis 1 was supported because reliability did increase over time. In the composite scores tutor total score (overall creativity grade and CRR subtotal) and final score (tutor total score plus creativity self grade), alpha reached quite high levels indicating that critical reflection, when included in the scoring mix, improved expert-subject inter-rater reliability. Except for Assign3b (and tutors’ assessment of Assign2) mean differences between creativity selfgrades and assessors’ overall creativity grade were generally less than a half grade point, a margin of error often observed in panel-based assessments. Tutors’ larger grade margin in Assign2 was expected because subjects were unfamiliar with CMAM and, as yet, had no access to feedback. Individual assignment results tended to support Hypothesis 2; that is, CMAM enhanced alignment in creativity scores. Reduced grade alignment in the group assignment can be attributed to assessors’ higher expectations for group-work leading to lower mean scores. Subjects also struggled in organizing group meetings within their highly variant class schedules which may have promulgated inferior work. Peer pressure to elevate group grades may also have been a contributing factor. In general, groups are known to rate their idea generation performance much higher than individuals (Paulhus, Dzindolet. Poletes, & Camacho, 1993; Paulhus, Larey & Ortega, 1995; Stroebe, Diehl, & Abakoumin, 1992). Groups probably trusted their negotiating power in claiming higher grades and validating CRR responses. An additional group assignment would have been useful to determine if CMAM promotes greater grade alignment in groups over time; however, student/staff workload prohibited such an exercise. Although this study did not investigate the single assessor’s grade alignment with students, a parallel study (unpublished) in which twenty-six design students, completing final year projects in visual communication, did evaluate the effects of CMAM engagement (or not) within a high feedback environment. The CMAM-engaged experimental group (both student and supervisor were engaged) had 967-86915/210411 20 significantly better, and more consistent, grade alignment (mean difference=0.06) while the CMAMdisengaged control group (both student and supervisor were disengaged) dropped in alignment between reporting periods (mean difference=0.65 to 0.75). Additionally, the experimental group outperformed the control group by 55.6% in the final grade, and garnered 66.7% of all awards and design merit for the entire cohort. These findings further support Hypothesis 2. They also imply that creative momentum assessed by experts without a criterion baseline or creator input is faulted. Factor analysis of CMAM criteria exhibited some interesting patterns. Paradigm movement loaded consistently with problem resolution criteria, except for tutors’ ratings of Assign3b. This indicates that creative movement is more associated with context and problem solving and therefore requires highorder conceptual processing than novelty-detection, for example. As Assign3b involved paradigm integration, movement itself was construed by tutors as a form of novelty. Additionally, Amabile and Hennessey’s (1999) motivation/technical criteria section (effort evident, planning, organization and technical goodness) loaded consistently with elaboration/synthesis which, in turn, factored variously with novelty and resolution. This implies that motivation/technical criteria support both problem resolution and novelty via elaboration and synthesis. Besemer and O’Quin (1989) observed that elaboration/synthesis loaded with resolution in one study and with novelty in others (Besemer & O’Quin, 1986). These studies employed non-expert assessors using adjective checklists as opposed to experts using implicit definitions for measurement. Results suggested that elaboration/synthesis criteria factor according to the nature of a creative work. If creativity is defined as novelty and appropriateness, or social value (Amabile, 1987; Sternberg and Lubart, 1991, 1995, 1996) and appropriateness is further determined by the degree and manner of problem resolution and/or elaboration/synthesis, then Hypothesis 3 was supported. That is, all three major criteria sections, representing novelty and appropriateness, tended to factor according to the nature of the contribution. Loading differences suggest that manner of assessment also played a role. One explanation for the factoring variability is that there are four basic types of criteria dependencies found in creative contributions which are influenced by the sort of information available to assessors: 967-86915/210411 21 • Type I. [(N) + (R/E/T)] Novelty is distinguished from appropriateness (i.e. problem resolution and elaboration/synthesis). This dependency pattern was found in judges’ evaluations of Assign2 (a brochure), Assign3a (a poster) as well as tutors’ assessment of Assign3b (a PowerPoint presentation). Judges, who lacked CRR information and therefore details on the problems being addressed, detected novelty separately in the first two assignments but bundled problem resolution with elaboration/synthesis. Tutors, however, managed to differentiate these sections leading to assessors’ Type differences. Assign3b presumably demanded more elaboration and synthesis due to the additional associations required in paradigm merger attempts. In this assignment, the better informed tutors factored resolution criteria (e.g. the merger problem) with elaboration/synthesis while separating out novelty. Judges, alternatively, viewed these contributions as Type II (see below). Besemer and O’Quin’s (1987) study of three letter openers generally followed Type I. All letter opener designs involved paradigm acceptance as their novelty was determined within a familiar context (e.g. the field of letter openers). A consequent investigation of Le Vocifèrant, a unique painting/sheet-metal sculpture by Jean DuBuffet, followed the same dependency pattern; that is, novelty was obvious. In Type I, novelty is self evident but does not necessarily create value. Value emerges when problem resolution is coupled with elaboration/synthesis; • Type II. [(N/E/T) + (R)] Novelty is generated through elaboration/synthesis while appropriateness is determined via problem resolution. This Type was suggested in judges’ evaluations of Assign3b and tutors’ evaluation of the criterion novelty (materials) in Assign2 and Assign3a. Whereas judges could have estimated Assign3b novelty through PowerPoint elaborations, they had to determine the existence of paradigm merger attempts (i.e. resolution) without access to CRRs, a difficult task. For tutors, Assign2/Assign3a elaboration was combined with subjects’ novel use of materials in expressing originality (e.g. layout and design of brochure/poster). Besemer and O’Quin (1986) measured two t-shirt designs in which some elaboration/synthesis criteria (complex; well-crafted, e.g. perfected) factored with novelty criteria. Investigators proposed that a certain level of complexity makes a product more engaging whereas too much leads to confusion. Furthermore, 967-86915/210411 22 they suggested that novelty was likely associated with arousal during viewing of the shirts’ graphical images. In Dobbins and Wagner’s study (2005), an overlap between novelty detection and perceptual recollection of simple visual objects led researchers to propose that these two memory retrieval demands share a common visuo-perceptual attention component separate from conceptual recollection. Their findings reflect Type II dependency. Studies with other kinds of stimuli (e.g. text, touch) are needed to determine if the other dependencies suggested herein are also found in interactions between these three major brain activities. Examples of Type II can be found in fine arts, performance, novels, certain design works, etc. where novelty manifests through the processes of elaboration and synthesis. • Type III. [(N/R) + (E/T)] Novelty is determined by the capacity to resolve a problem (and vice versa) while elaboration/synthesis criteria factor independently. Except for novelty (materials), this type emerged in tutors’ scoring of paradigm rejections (Assign3a). Subjects’ ability to reject their personal freedom paradigms (i.e. the problem) was more obvious to tutors reading detailed CRRs than for judges who only had brief descriptions of subjects’ intents. In this assignment, tutors’ factored elaboration/synthesis criteria into two subgroups (e.g. structure and content perception) a result, perhaps, of better awareness of how solutions were generated. In this dependency, novelty and resolution are interdependent. Besemer and O’Quin’s (1989) study involving four different key chains, cat pitcher, dog chair and steer desk followed this Type as did most elaboration criteria (excepting complex and well-crafted) in their earlier study of two t-shirt designs (Besemer and O’Quin, 1986). Again, all artifacts exemplify paradigm acceptance, however, their originality was determined through remote associations related to problem resolution (e.g. steer and desk, dog and chair, cat and pitcher, etc.). Type III is prevalent in many design solutions as well as in major discoveries like Einstein’s general relativity and Darwin’s theory of evolution, where novelty emerges directly from the problem approach. • Type IV. [(N) + (R) + (E/T) Novelty, problem resolution, elaboration/synthesis are all independent. This type was found in tutors’ assessment of Assign2 (except for novelty, materials). Subjects’ brochures, though often well elaborated, left some doubt if a paradigm had been altered in any way 967-86915/210411 23 (i.e. resolution). Many solutions were considered unoriginal, as indicated by tutors’ low means creativity scores. This Type may also arise when a solution appears original, problem resolution (i.e. value) is questionable, and elaboration somewhat extraneous. Many ‘silly’ inventions fall into this category. Alternatively, Type IV could manifest in complex solutions involving diverse forms of novelty, resolution and elaboration, such as found in certain global marketing campaigns. Although criteria dependencies seem to vary with the nature of creative contributions, access to information and assessment methodology, further research is required to test the four Types under more controlled conditions. If validated, they would enhance our understanding of creative achievement. The ability to take sensible risk is important in creativity (Sternberg, 2003). CMAM was originally designed to encourage risk-taking. In Assign2, 68.42% of subjects selected forward incrementation and redefinition. These are more challenging forms of paradigm acceptance as they significantly alter paradigms, or others perceptions of them. On the other hand, in Assign3a, 65.15% selected redirection the least challenging rejection movement, especially in a personal context (i.e. awareness of freedom) that doesn’t require expertise. Results indicated that risk-taking is not a function of the weight given to creativity self-assessment but of perceived challenges and skill levels related to creative movement (and perhaps creative substance as well). For example, paradigm acceptance was generally less challenging allowing subjects to take more risk. The reverse seemed to hold for paradigm rejection where basic assumptions had to be over-turned. Csikszentmihalyi (1996) proposed that a balance between perceived challenge and skill level results in the state of creative flow, an ‘intense experiential involvement in moment-to-moment activity’ (Csikszentmihalyi et al., 2005, p.600). This balance, however, may be difficult to achieve. Level of assignment difficulty (perceived challenge) correlated inversely yet significantly with self-grades (perceived skill level) which tended to confirm the difficulty in attaining flow. Increased creativity grade alignment between subjects and tutors implied that subjects’ perceived level of challenge, and therefore risk, was indirectly validated. This does not mean, however, that individuals will accept greater creative risk just by having their perceptions confirmed. It would be 967-86915/210411 24 interesting to investigate the relationship between self-generated creative momenta measurements, risktaking and attempts to attain, or maintain, flow. In fact, Markman and Guenther (2007) proposed that flow contributes to psychological momentum; but, unlike p, flow does not include the notion of exerting an effect on the individual’s capacity to attain desired outcomes. The advantage of CMAM is that it clarifies challenges, skill levels and risk to both creator and observer thereby allowing more accurate measurements to be made. At the same time, creative momentum can be adjusted to maintain implicit motivation and, on the other hand, perhaps contribute to flow. The key weaknesses in this study were lack of a control group, a sample restricted to general education students who are typically less motivated in GE subjects, and one less group assignment for comparison. Nonetheless, results indicated that CMAM has potential for assessing creative contributions; however, accuracy doesn’t lie in ‘snapshot’ measurements. That is, scoring reliability and grade accuracy improve with CMAM familiarity. The study employed a full range of CMAM criteria which, though useful for artists and designers, may not be as appropriate in other domains. Reliability results indicate that creative momentum can be measured simply via three basic substance criteria: novelty (ideas or materials), problem resolution (useful or appropriate) and elaboration/synthesis plus one movement criteria (i.e. paradigm acceptance, rejection, merger). On the other hand, CMAM allows educators to use relevant sub-criteria from these sections to design, and selectively measure, assignments and projects. As a measurement tool, CMAM also has the potential to enrich understanding and nurture qualities that assist individuals in expressing their creativity. Acknowledgements – This research was supported under the Creativity Assessment Project by grants from the Hong Kong Polytechnic University’s Working Group on Outcome-Based Education (WGOBE) and the School of Design. The author would like to thank Dr. Lau Wing Chuen who acted as project research associate as well as designers Siu King Chung; Dr. Yuen Man Wah, Lee Yu Hin, Chong Wai Yung, Tam Chi-hang, and Wai Hon Wah who served as subject tutors and course advisors. 967-86915/210411 25 References Amabile, T.M. (1979). Effects of external evaluation on artistic creativity. Journal of Personality and Social Psychology, 37, 221-233. Amabile, T.M. (1982). Social psychology of creativity: A consensual assessment technique. Journal of Personality and Social Psychology, 43, 997-1013. Amabile, T.M. (1987). The motivation to be creative. In S.G. Isaksen (Ed.), Frontiers of creativity research: beyond the basics (pp. 223-254). Buffalo, NY: Bearly. Amabile, T.M. (1996). Creativity in context: Update to ‘The social psychology of creativity’. Boulder, CO: Westview. Amabile, T. & Hennessey, B.A. (1999). Consensual assessment. In M.A. Runco and S.R. Pritzker (Eds.), Encyclopedia of Creativity, Vol. 1 (pp. 347-359), San Diego, CA: Academic Press. Amabile, T.M., Goldfarb, P., & Brackfield, S.C. (1990). Social influences on creativity: Evaluation, coaction, and surveillance. Creativity Research Journal, 3, 6-21. Baer, J. (1994). Performance assessments of creativity: Do they have long-term stability? Roeper Review, 7 (1), 7-11. Baer, J., Kaufman, J.C. & Gentile, C.A. (2004). Extension of consensual assessment technique to nonparallel creative products. Creativity Research Journal, 16 (1), 113117. Baker, J. (1992). Paradigms: The business of discovering the future. New York: HarperBusiness. Barron, F., Gaines, R., Lee, D., & Marlowe, C. (1973). Problems and pitfalls in the use of rating schemes to describe visual art. Perceptual and Motor Skills, 37, 523-530. Besemer, S.P. & O’Quin, K. (1986). Analyzing creative products: Refinement and test of a judging instrument. Journal of Creative Behavior, 20, 115-126. 967-86915/210411 26 Besemer, S.P. & O’Quin, K. (1987). Creative product analysis: Testing a model by developing a judging instrument. In S.G. Isaksen (Ed.) Frontiers of Creativity Research, (pp.341-357), Buffalo, NY: Bearly Limited. Besemer, S.P. & O’Quin, K. (1989). The development, reliability, and validity of the Revised Creative Product Semantic Scale. Creativity Research Journal, 2, 267-278. Besemer, S.P. & Treffinger, D.J. (1981). Analysis of creative products: Review and synthesis. Journal of Creative Behavior, 15 (3), 158-178. Capra, F. (1996). The web of life: A new scientific understanding of living systems. New York: Anchor Books. Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: a framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684. Csikszentmihalyi, M. (1996). Creativity: Flow and the psychology of discovery and invention. New York: Harper Collins. Csikszentmihalyi, M. (1999). Implications of a systems perspective for the study of creativity. In R. J. Sternberg (Ed.), Handbook of creativity, (pp. 313-335), Cambridge, UK: Cambridge University Press. Csikszentmihalyi, M., Abuhamdeh, S., & Nakamura, J. (2005). Flow. In A. Elliot & C. Dweck (Eds.), Handbook of competence and motivation (pp.598-608). New York, Guilford. Dobbins, I.G., & Wagner, A.D. (2005). Domain-general and domain-sensitive prefrontal mechanisms for recollecting events and detecting novelty. Cerebral Cortex, 15, 1768-1778. Freyd, J.J., & Finke, R.A. (1984). Representational momentum. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 126-132. Harman, W. (1970). An incomplete guide to the future. New York: W. W. Norton. Hébert, T.P., Cramond, B., Neumeister, K.L.S., Millar, G., & Silvian, A.F. (2002). E. Paul Torrance: His life, accomplishments and legacy. Storrs, CT: The University of Connecticut, The National Research Center on the Gifted and Talented (NRC/GT). 967-86915/210411 27 Hocevar, D. (1981). Measurement of creativity: review and critique. Journal of Personality Assessment, 45, 450-464. Hubbard, T.L. (1995). Environmental invariants in the representation of motion: Implied dynamics and representational momentum, gravity, friction and centripetal force. Psychonomic Bulletin and Review, 2, 322-338. Hubbard, T.L. (2004). The perception of causality: Insights from Michotte’s launching effect, naïve impetus theory, and representational momentum. In A.M. Oliveira, M.P. Teixera, G.F. Borges, & M.J. Ferro (Eds.), Fechner day 2004 (pp. 116-121). Coimbra, Portugal: International Society for Psychophysics. Kinsbourne, M. & George, J. (1974). The mechanism of the word-frequency effect on recognition memory. Journal of Verbal Learning and Verbal Behavior, 13, 63-69. Kuhn, T.S. (1962) The Structure of Scientific Revolutions, 1st. ed., Chicago: University of Chicago. MacKinnon, D.W. (1978). In search of human effectiveness: Identifying and developing creativity. Buffalo, NY: Creative Education Foundation. Makel, M.C. & Plucker, J.A. (2008). Creativity. In S.I. Pfeiffer (Ed.) Handbook of Giftedness in Children, (pp. 247-270), New York: Springer Science + Business Media. Markman, K.D., & Guenther, C.L. (2007). Psychological momentum: Intuitive physics and naïve beliefs. Personality and Social Psychology Bulletin, 33 (6), 800-812. Osgood, C.E., Suci, G., & Tannenbaum, P. (1957) The measurement of meaning. Urbana, IL: University of Illinois Press Ospovat, D. (1981), The Development of Darwin’s Theory: Natural History, Natural Theology, and Natural Selection, 1838-1859, Cambridge, NY: Cambridge University Press Parke, B.N. & Byrnes, P. (1984). Toward objectifying the measurement of creativity. Roeper Review, 6, 216-218. 967-86915/210411 28 Paulhus, P.B., Dzindolet, M.T., Poletes, G. & Camacho, L.M. (1993). Perception of performance in group brainstorming: The illusion of group productivity, Personality and Social Psychology Bulletin, 19, 78-89. Paulhus, D.L., Larey, T.S., & Ortega, A.H. (1995). Performance and perceptions of brainstormers in an organizational setting, Basic and Applied Social Psychology, 17, 249-265. Pearlman, C. (1983). Teachers as an informational resource in identifying and rating student creativity, Education, 103, 215-222. Qui, J., Li, H., Jou, J.W., Liu, J., Luo, Y.J., Feng, T.Y., Wu, Z.Z., & Zhang, Q.L. (2010). Neural correlates of the ‘Aha’ experiences: Evidence from an fMRI study of insight problem solving. Cortex, 46, 397-403. Runco, M.A., & Chand, I. (1995). Cognition and creativity. Educational Psychology Review, 7 (3), 243-267. Runco M.A., Cramond B., & Pagnani A. (2009). Sex differences in creative potential and creative performance. In J.C. Chrisler and D. R. McCreary (Eds.), Handbook of Gender Research in Psychology, (pp. 343-360), New York: Springer. Runco. M.A., McCarthy, K.A., & Svenson, E. (1994). Judgments of the creativity of artwork from students and professional artists, Journal of Psychology, 128 (1), 23-31. Runco, M.A., & Mraz, W. (1992). Scoring divergent thinking tests using total ideational output and a creativity index, Educational and Psychological Measurement, 52, 213221. Runco, M.A., Noble, E.P., & Luptak, Y. (1990). Agreement between mothers and sons on ratings of creative activity. Educational and Psychological Measurement, 50, 673680. Sternberg, R.J. (2003) Wisdom, intelligence, and creativity synthesized. New York: Cambridge University Press. 967-86915/210411 29 Sternberg, R. J., & Lubart, T. I. (1991). An investment theory of creativity and its development. Human Development,34, 1–32. Sternberg, R. J., & Lubart, T. I. (1995). Defying the crowd: Cultivating creativity in a culture of conformity. New York: Free Press. Sternberg, R. J., & Lubart, T. I. (1996). Investing in creativity. American Psychologist, 51, 677–688. Stoebe, W., Diehl, M. & Abakoumkin, G. (1992). The illusion of group effectivity, Personality and Social Psychology Bulletin, 18, 643-650. Taylor, I.A. & Sandler, B.E. (1972). Use of a creative product inventory for evaluating products of chemists. Proceedings of the 80th annual convention of the American Psychological Association, 7 (part 1), 311-312. Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press. Tulving, E., & Kroll, N.E.A. (1995). Novelty assessment in the brain and long-term memory encoding. Psychonomic Bulletin and Review, 2, 387-390. Thornton, I.M., & Hubbard, T.L. (Eds.). (2002). Representational momentum: New findings, new directions. New York: Psychology Press. 967-86915/210411 30

RELATED PAPERS

RELATED TOPICS

Log In

Measuring creative contribution: The Creative Momentum Assessment Model

Measuring creative contribution: The Creative Momentum Assessment Model

Related Papers

RELATED PAPERS

RELATED TOPICS