Measuring Creative Contribution:
The Creative Momentum Assessment Model
ABSTRACT
The Creative Momentum Assessment Model (CMAM) is a criterion-based assessment system which applies the
phenomenon of psychological momentum to creative ideas, products and expressions. It embodies, and extends,
previous theoretical and experimental approaches to measuring creative contributions. It implements
accountable self-grading, critical reflection, expert assessment, and timely feedback in conjunction with a
criterion-baseline comprised of paradigm-related creative movements, novelty, problem resolution, elaboration
and synthesis. Multiple creative contributions, completed by seventy-seven undergraduates participating in a
general education creative thinking class, were both self-graded and assessed by a group of design experts using
the Consensual Assessment Technique (CAT) plus by a small panel of tutors with access to criteria definitions,
rubrics, critical reflection reports and self-grades. CAT assessors exhibited much higher inter-rater reliabilities
and creativity grade alignment with the subjects than did tutors, highlighting the diversifying influence of
conceptual versus perceptual activity, and novelty detection. More general criteria exhibited higher reliabilities
than specific ones. Subjects tended to over-grade themselves at first, followed by more realistic assessments.
Assessors’ reliabilities and creativity grade alignment increased with CMAM usage, even in combination with
subjects’ self-grades. CMAM criteria factored according to the nature of contribution and assessment
methodology uncovering four basic types of criterion dependencies.
967-86915/210411
1
Background
An important creativity research direction has been to discover an effective, reliable method of assessing
creative contributions including ideas, products, services, and expressions across all fields of endeavor. The
purpose of this study is to investigate an assessment model that was designed to incorporate the strengths of
previous models, overcome their weaknesses, and provide an effective tool for assessing and enhancing realworld creative contribution.
Previous models utilized in assessing creative contribution have focused primarily on tangible products.
MacKinnon (1978) proposed that the foundation of all studies of creativity lie in the analysis of creative
products. Surprisingly, the number of empirically-supported attempts to develop a creative contribution
measurement system has been limited. Early investigations evaluated criteria for art products (Barron, Gaines,
Lee, & Marlow, 1973); chemistry and engineering works (Taylor & Sandler, 1972); 6th grade creative works
(excluding novelty as a criterion, Pearlman, 1983); art, music, creative writing, performing arts and dance via
the Detroit Public Schools’ Creative Product Scales (Parke & Byrnes, 1984); and various creative products
using criteria from the Creative Product Analysis Matrix (CPAM, Besemer & Treffinger, 1981) which later
emerged as the Creative Product Semantic Scale (CPSS, Besemer & O’Quin, 1986, 1987, 1989). The most
versatile criterion models to date are the CPAM and CPSS. The strengths of CPAM/CPSS are that more facets
of creative achievement are covered and inter-rater reliability is generally acceptable (α>.70). Weaknesses
include a) the use of an extensive item checklist based on the semantic differential work of Osgood, Suci and
Tannenbaum (1957) to calibrate criteria, a method which is both time consuming and unwieldy outside of
research applications b) criteria tend to focus on creative products to the exclusion of creative ideas or
expressions c) creative products rated by these instruments are usually simple (e.g. magnifying glass, razor
blade and mechanical openers; cat pitcher; dog chair; steer desk etc.) which raises doubt about their real-world
applicability and d) multiple raters are necessary to attain reliable scoring.
967-86915/210411
2
Social-psychological approaches controlling for, or eliminating, within-group variances have also been used to
assess creative works. The leading model for this approach was developed by Amabile (1979, 1982, 1996) and
is called the Consensual Assessment Technique (CAT). It is based on an operational definition of creativity: a
work or response is creative to the extent that expert observers agree it is creative. CAT is not allied with any
theory of creativity. Its premise is that experts within a given domain can recognize creativity in that domain
when they see it (Baer, 1994). CAT typically uses a panel of expert judges who measure creative works based
on implicit definitions of set criteria. The judges are not allowed to confer; receive no prior training; and
evaluate criteria other than creativity. Products are considered randomly and rated against each other, rather than
against some absolute standard. CAT has been used successfully in many contexts for over twenty years. Its
strengths are a) acceptable inter-rater reliabilities (α>.70) and b) its operational approach reflects real-world
assessment. CAT’s weaknesses are a) most of the rated works (Amabile & Hennessey, 1999) tend (like
CPAM/CPSS) to be simple or generic (e.g. collages, completing sentences, writing essays, descriptive
paragraphs, captions for cartoons, telling stories based on a photograph, free-form poems, etc.) b) works are
typically produced under highly constrained experimental conditions (e.g. similar instructions, similar content)
which don’t reflect real-world conditions c) assessment is time consuming and resource intensive d) selection of
expert judges is problematic due to variations in students’ skill levels, judges’ abilities, a work’s specific domain
and purpose of assessment (Makel & Plucker, 2008) e) and CAT’s discriminant validity is questionable in that it
relies on implicit definitions of criteria (Runco & Mraz, 1992). Though CAT evaluation of real-world creative
works is not as prevalent, there’s evidence for high reliability in rating nonparallel creative works (Baer,
Kaufman & Gentile, 2004). In most cases, criterion-based models are observer-reliant and exclude selfassessment.
In addition to measurement challenges, there are psychological hurdles in evaluating creative contributions. For
example, artistic and verbal creativity is negatively impacted in subjects who expect to be evaluated (Amabile,
1979; Amabile, Goldfarb & Brackfield, 1990). Assessment models that adversely influence creative output are
probably less valuable. External reward, in the forms of grades or monetary gain, can also be problematic.
Creative individuals are intrinsically motivated and may respond negatively to extrinsic reward except where it
967-86915/210411
3
is informative or fosters additional creative endeavor (Amabile, 1996). Assessment models that enhance
intrinsic motivation are preferable. In fact, Torrance’s original concept for his Torrance Tests for Creative
Thinking, a popular divergent thinking test, was to develop a tool that enriched understanding and nurtured
qualities which assist individuals in expressing their creativity, and not for assessment purposes (Hébert,
Cramond, Neumeister, Millar, & Silvian, 2002).
Creative Momentum Assessment Model
The Creative Momentum Assessment Model (CMAM) was designed to overcome many of the hurdles in
previous assessment models by combining the psychological momenta (defined below) of the creator’s and
observer’s experience of a creative contribution in the context of a criterion-baseline and coupled with
accountable self-assessment, critical self-reflection, expert evaluation, and timely feedback.
A psychological force, called momentum, commonly noted in large sports events, has been observed to affect
observers’ (and players’) expectations of performance. That is, precipitating events like spectacular goals and
unexpected movements affect both players’ and audiences’ perceptions of a game. Markman and Guenther
(2007), applying Newtonian physics, hypothesized that psychological momentum (p) equals the product of mass
(m), or the ‘strength of contextual variables that connote value, immediacy and importance’ (p.801) and velocity
(v), or the movement of an attitude object, person, or group of people toward a target. They tested the perception
of psychological momentum and then extended the concept beyond performer/observer goal expectancies in the
athletic domain to task performance in a wider array of domains. Additional empirical support for p originated
in visual-cognitive psychophysical studies of representational momentum wherein an observer viewing a target
in actual or implied motion remembers the final location of the target as being positioned slightly forward in the
direction of target motion (Freyd & Finke, 1984; Hubbard, 1995; Thornton & Hubbard, 2002). Hubbard (2004)
noted that observers made causal inferences, or representations, regarding the physical properties of an object in
motion. These representations are based on expectations regarding the displacement of an object, or target.
967-86915/210411
4
In the context of creativity-related tasks, it is assumed that both creators and external observers experience some
perceptual and/or conceptual displacement via the transformation of familiar associations through value-bearing
novelty. The creative target, similar to winning a sporting event, can be defined from the creator’s perspective as
finding or solving a creative problem, generating a new idea, product or service, or expressing something in an
original manner; or, from an assessor’s (observer’s) perspective, as the expectation of experiencing both novelty
and salience in creative contributions. The associated momentum manifests in creators as the ‘Aha!’ experience
and in observers as the ‘Wow! factor. Conceivably, the psychophysical force generated by both novelty and
salience, or meaningful resolution of some sort (Qui, Li et al., 2010) is experienced variously by creator and
observer. In creators, insightful moments may act as catalysts to enhance creative momentum leading to further
insights and ongoing achievement. In some instances, prolific creative output arises in individuals said to be
creatively ‘on a roll’. Csikszentmihalyi (1996) proposed that creativity is autotelic, a state where the joy of
creative endeavor stimulates further creative effort. Creative momentum may lie at the source of this
phenomenon.
In CMAM’s creative momentum (pc) assessment, creative mass (mc) is defined as the value or appropriateness
of a creative contribution while creative velocity (vc) is the magnitude and direction of that new-value’s motion
in idea space away from some reference point and toward a solution. Creative mass (herein ‘creative substance’)
represents the new value in idea space, is presumably related to memory retrieval/encoding, and is described
through CPAM/CPSS’s key criteria: type of novelty, problem resolution, elaboration and synthesis. Dobbins and
Wagner (2005), using fMRI, found that (in memory retrieval of novel figural stimuli) novelty detection was
faster and more accurate than perceptual or conceptual recollection, brain activation differed across these
retrieval types, and while perceptual and conceptual recollection of the familiar involved the left prefrontal
cortex (PFC), novelty shifted activity to the right PFC. The PFC is considered the decision-making center in the
brain. Furthermore, it has been noted that novelty items are more easily recognized than familiar items
(Kinsbourne & George, 1974; Tulving & Kroll, 1995). CMAM addresses the neural equivalent of novelty
activation through type of novelty criteria [novelty (ideas), novelty (use of materials), germinal], perceptual
activity through elaboration and synthesis criteria [coherence, complex, perfected, appealing, communicative,
967-86915/210411
5
elegant] and conceptual activity through criteria related to the capacity for problem resolution [logical, useful,
appropriate, valuable]. In defining creative substance, some CPAM/CPSS sub-criteria were either deleted, or
altered. For example, ‘transformative’ was dropped because it involved speculation on the future impact of
contributions; ‘surprising’ was deleted because it denoted arousal and the detection of unexpected stimuli which
may, or may not, involve ‘originality’; and ‘adequate’ was removed because CPAM/CPSS authors observed that
it elicited negative connotations. ‘Original’ was converted into novelty (ideas) and novelty (materials) to
delineate contributions whose ideas may not be unusual but whose packaging is. Coherent and perfected
extended CPAM/CPSS’s initial product focus (e.g. ‘organic’, ‘well-crafted’) to include creative ideas and
expressions. Appealing avoided ‘attractive’ superficiality connotations and potential visual perception bias.
Communicative was assumed to describe information transfer more universally than ‘expressive’ because it
included, for example, abstract concepts requiring considerable interpretation (e.g. complex geometry).
Creative velocity (herein ‘creative movement’) is motion in idea space relative to existing paradigms. A
paradigm has been defined variously as a set of rules and regulations which establish or define boundaries and
suggest behavior (Baker, 1992); a fundamental way of perceiving, thinking, valuing and acting in accordance
with a certain view of reality (Harman, 1970); and “a constellation of concepts, values, perceptions and
practices shared by a community, which forms a particular vision of reality that is the basis of the way a
community organizes itself" (Capra, 1996, p. 6). In this regard, CMAM adopts Sternberg’s (2003) Propulsion
Theory of Creativity (PTC) which describes creativity as movement from one position in idea space to another
or a field transitioning relative to its paradigms. For example, Kuhn’s (1962) notion of paradigm shift as a
change in basic assumptions governing a ruling theory of science can be construed as the result of creative
movement acting upon multiple observers in a scientific domain. Markman and Guenther (2007) noted that
psychological momentum is positive if movement is toward a target and negative if away from a target. In PTC,
all movements (called creative solution types) involve positive momentum while changes directed toward a
target vary in relation to a field’s paradigms and trends. CMAM employs seven of PTC’s creative solution types
as movement criteria which are governed by three basic paradigm relationships: paradigm acceptance
[contributions that move a field according to its present direction, i.e. replication, redefinition, forward
967-86915/210411
6
incrementation]; paradigm rejection [contributions that move a field in a new direction, i.e. redirection,
reconstruction/redirection, reinitiation]; and paradigm merger [contributions that combine both approaches, i.e.
integration]. [Refer to Sternberg (2003) for details on solution types.] PTC’s eighth solution type, advance
forward incrementation, was removed because it addressed future value retrospectively thereby promoting
individuals to justify their creativity based on the ignorance of assessors!
According to the novelty/encoding hypothesis, novelty assessment is an early stage of encoding occurring in the
brain’s limbic, temporal/parietal regions followed by higher level meaning-based encoding in cortical regions
that include the left PFC (Craik & Lockhart, 1972; Tulving, 1983). Paradigms in a single contribution (e.g.
associated with form, function, performance, socio-cultural significance, etc.) that include novelty,
elaboration/synthesis and conceptual memory activation probably involve both low and high level cognitive
encoding operations. However, creative movement is more likely associated with higher-order conceptual
activity related to problem resolution than novelty or elaboration/synthesis criteria. This remains to be seen. The
magnitude and direction of creative movement may change over time, in either creator or observer, leading to
what could be called creative acceleration, or a sense of creative exhilaration. For example, Darwin developed
the theory of evolution via a series of insights over twenty years (Ospovat, 1981) which suggests low
acceleration; yet, when The Origin of Species was first published, both scientists and clergymen experienced a
psychological jolt because the full impact of his insights into human origins struck them all at once. On the other
hand, Einstein took eleven years to develop his general theory of relativity, still a rather slow process; however,
its impact on the physics community was even slower as it took over forty years for the mathematics, which
underlay its creative significance, to be understood.
A criterion-baseline, however well-constructed, cannot of itself generate accurate measurement, especially in
single assessors, due to differences in ability and personal bias. Runco, McCarthy and Svenson (1994) noted that
professionals who produce excellent creative works (e.g. art) may be unreliable, biased and inaccurate in
assessing the works of others. Even if experts can accurately (by specific domain/field standards) assess certain
kinds of creative contributions, they may be less capable in evaluating others. Assessment models which don’t
967-86915/210411
7
address this weakness have limited usefulness. In this respect, CMAM adopts a more forward-thinking
approach. Creative momenta, appraised via the criterion-baseline, are first divided into self and expert scores
then combined proportionately in favor of self-ratings to support implicit motivation and ultimately enhance
reliability. Runco, McCarthy and Svenson (1994) and Amabile (1996) found a moderate relationship between
self and expert ratings of creative products with self rating generally higher than expert rating on similar rank
orderings of products. Self-ratings are often biased (e.g. affected by task difficulty, effort, memory, honesty,
etc.) but they do have advantages (Hocevar, 1981; Runco, Noble, & Luptak, 1990). For example, individuals are
better informed about their intent, the work, the creative process, nature of insights, and overall performance. In
short, individuals may value their work more because they understand the rationale behind it whereas experts
aren’t privy to this information, may not know how a solution solves a problem, or understand the creator’s
intent (Runco & Chand, 1995). Hocevar (1981), in fact, favored self-ratings. CMAM hypothetically
compensates for assessment bias by aligning individual and expert measurements of creative momenta over
time, and against a criterion baseline.
In order to minimize self-assessment bias, individuals critically reflect, via a Creativity Reflection Report
(CRR), upon their contributions while referencing CMAM criteria. CRR is then judged by separate, noncreativity-related criteria to reduce inhibition stemming from external evaluation. Self-ratings are penalized,
however, for being unjustified; that is, personal objectivity is rewarded by higher CRR scores. Expert ratings are
offered simply as professional opinions. They include constructive feedback on creative achievement and
critical reflections. CMAM’s self-ratings hold the majority in creativity assessment. This is done to promote
sensible risk-taking and build self-confidence. Overall scoring emphasis on critical reflection provides a form of
assessment control for educators. Final contribution scores combine both self/expert(s) creativity ratings and
CRR scores.
Three hypotheses are proposed in using CMAM as a measurement tool. Hypothesis 1 declares that combined
self/expert creativity scores should be more reliable than experts’ creativity scores alone. This hypothesis
implies that Csikszentmihalyi’s (1999) domain/field expert assessments (e.g. observer-based momenta) as the
967-86915/210411
8
basis of his systems approach become more reliable if supported by ‘accountable’ creativity self-assessment,
provided that common criteria are used. Hypothesis 2, furthermore, proposes that familiarity with CMAM
criteria, coupled with accountable self-grading and expert feedback on both creativity and critical reflections,
serves to align self/expert creativity scores. That is, CMAM accuracy is not a ‘snapshot’ measurement
phenomenon, but evolves with model familiarity. Hypothesis 3 is that CMAM’s main criteria sections will
factor with each other according to the nature of creative contributions and assessors’ access to relevant
information. Evidence for this effect appeared when elaboration/synthesis criteria loaded inconsistently with
novelty and resolution criteria in the CPAM/CPSS studies (Besemer & O’Quin, 1987, 1989).
The Study
Method
CMAM criteria, in both English and Chinese versions, were introduced, with appropriate rubrics, to
undergraduate students at the Hong Kong Polytechnic University over a period of five weeks (within a
fourteen-week semester) as part of a general education creative thinking course. Criteria were explained
during weekly one-hour interactive lectures with examples from multiple disciplines coupled with
relevant exercises and tutorials teaching creative tools. Tutors were rotated weekly to lessen any
delivery style bias.
Three assignments were completed. They required making multiple associations, embodied figural and
verbal content, employed the full range of CMAM substance criteria and focused on one basic paradigm
relationship each. The assignments were completed over a period of seven weeks and included a 4-page
brochure describing a new form of entertainment (paradigm acceptance, Assign2), an innovative
inspirational ‘freedom’ poster (with attached explanation) that rejected subjects’ initial assumptions
about freedom (paradigm rejection, Assign3a) and a group PowerPoint presentation which integrated
group members’ new freedom constructs with paradigms from two unrelated fields (i.e. furniture, time
pieces, relaxation devices, food relief) into an original product, process, or performance (paradigm
967-86915/210411
9
merger, Assign3b). Subjects noted their specific creative movement attempts, self-graded their
creativity on a 9-point letter grade scale (F=0 to A+=4.5), completed a Creativity Reflection Report
(CRR), and reported the difficulty of each assignment on a 5-point Likert scale (1=very easy to 5=very
difficult).
To avoid assessor bias, subjects were identified by code numbers and both submissions and criteria were
rated in random order (except for the criterion overall creativity grade, see below). Three design-faculty
tutors (1 female, 2 male, average age=47.7years) who were familiar with CMAM criteria, their rubrics
and who had access to CRRs, self-grades and difficulty levels, assessed subjects on fifteen CMAM
criteria (i.e. one movement and fourteen substance criteria) plus five of Amabile & Hennessey’s (1982)
CAT criteria (i.e. creativity, effort evident, planning, organization and technical goodness) based on a 5point Likert scale [5(high)-3(medium)-1(low)]. The construct creativity was scored twice: 1) creativity
as a quick intuitive response to contributions and 2) overall creativity grade as summative response
after considering the other twenty criteria. The CMAM criterion, germinal, was excluded because the
term was unfamiliar to CAT judges. Overall creativity grade, self-grades and CRR subtotal (i.e.
comprised of theory application, research proficiency, organization, completeness) as well as the
composite scores tutor total score (CRR subtotal plus overall creativity grade) and final score (tutor
total score plus self-grades) were evaluated on the 9-point letter grade scale. Additionally, twelve
independent judges (11 female, 1 male, average age=26.4 years), deemed to be experts in rating
creativity and whose experience was not far removed from undergraduate study, scored the same
twenty-one criteria as tutors using the same scales. They had to access to CRRs and therefore could not
make composite scores. The weighting of each contribution’s final score was creativity self-grade
(30%), overall creativity grade (10%) and CRR grade (60%) which gave subjects a 75% say in their
creativity assessment.
Results
967-86915/210411
10
Assessor Reliability. A total of seventy-seven subjects (M=37, F=40) participated in two individual
creativity assignments while ninety-two subjects (M=44, F=48), including the seventy-seven, formed
thirty-one groups to undertake a group creativity assignment. The overall sample included twenty-six
designers and sixty-six non-designers. Cronbach alpha was used across assignments to evaluate
assessors’ inter-rater reliabilities. Results are summarized in Table 1. Average reliabilities are cited for
interpretive convenience. Judges’ average reliabilities per criterion were very good to excellent (range:
α=.82 to α=.91) while tutors’ were low to marginal (range: α=.47 to α=.68). Across assignments, judges’
overall average reliability was .85 with tutors at .56. For all assessors, the highest average reliabilities
within each criteria section were overall creativity grade, elaboration & synthesis, and effort evident,
with the resolution criteria section high in problem resolution for judges and useful for tutors. Better
reliabilities were found in the section headings. All assessors exhibited weaker reliability in the problem
resolution section. The single most unreliable criterion was paradigm movement, however, it improved
with usage. Overall creativity grade reliability, like most novelty criteria, also increased with each
assignment. Across assignments, tutors’ average CRR alpha was higher than their solution criteria
(range: α=.60 to α=.68) with organization at the top. In the composite scores, tutors’ alpha averages
reached acceptability (α>.70) increasing as more variables, such as CRR scores, were added. When
subjects’ creativity self-grades were calculated in conjunction with assessors’ overall creativity grades,
reliability decreased somewhat; however, the assessors’ previous pattern of increased reliability across
assignments was maintained.
TABLE 1.
Judges’ and tutors’ inter-rater reliability across assignments (Cronbach Alpha)
967-86915/210411
11
Assign3a
Assign2
Assign3b
Assessor Averages
Judges
Tutors
Judges
Tutors
Judges
Tutors
Judges
Tutors
Paradigm (movement)
.70
.13
.90
.33
.86
.78
.82
.41
.56
Solution Criteria
Novelty (idea)
.81
.39
.87
.52
.88
.76
.85
Novelty (materials)
.81
.52
.83
.38
.86
.77
.83
.56
Creativity
.81
.46
.86
.45
.89
.77
.86
.56
Overall Creativity Grade
.88
.51
.93
.58
.93
.73
.91
.61
Section Average
.83
.47
.87
.48
.76
.60
.86
.57
Problem Resolution
.79
.47
.90
.46
.89
.88
.85
.51
Logical
.74
.47
.87
.48
.84
.61
.82
.52
Useful
.79
.33
.88
.65
.78
.79
.82
.59
Appropriate
.75
.41
.89
.37
.83
.62
.82
.47
Valuable
.74
.51
.87
.31
.85
.78
.82
.53
Section Average
.76
.44
.88
.46
.68
.82
.83
.52
.90
.68
Elaboration & Synthesis
.88
.57
.92
.65
.84
.90
Coherent
.80
.48
.91
.61
.88
.77
.86
.62
Complex
.82
.38
.92
.34
.90
.68
.88
.47
Communicative
.82
.56
.88
.44
.84
.69
.85
.56
Appealing
.82
.53
.82
.47
.84
.72
.83
.57
Perfected
.82
.54
.89
.23
.85
.82
.85
.53
Elegant
.83
.57
.86
.28
.84
.73
.85
.53
Section Average
.83
.52
.89
.43
.57
.88
.59
.93
.65
.75
.77
.86
Effort Evident
.87
.94
.91
.67
Planning
.84
.59
.91
.49
.88
.77
.88
.62
Organization
.83
.61
.91
.59
.86
.80
.87
.67
TechnicalGoodness
.78
.53
.86
.50
.87
.53
.84
.52
Section Average
.83
.58
.90
.56
.89
.72
.87
.62
Overall Average
.81
.48
.89
.47
.87
.73
.85
.56
Theory Application
‐
.52
‐
.64
‐
.74
‐
.64
Research Proficiency
‐
.54
‐
.61
‐
.64
‐
.60
Organization
‐
.65
‐
.62
‐
.77
‐
.68
Completeness
‐
.55
‐
.56
‐
.81
‐
.64
Section Average
‐
.57
‐
.61
‐
.74
‐
.64
CRR Subtotal
‐
.61
‐
.65
‐
.98
‐
.74
Tutor Total Score
‐
.60
‐
.65
‐
1.00
‐
.75
Final Score
‐
.69
‐
.74
‐
.93
‐
.79
.45
.86
.51
.89
.69
.85
.55
CRR Criteria
Self/Assessor Combined
Overall Creativity Grade
.81
Note: high section averages are underlined.
Based on means analysis, judges scored all CMAM criteria higher (mean range= Assign2: 2.62-3.16;
Assign3a: 2.50-2.87; Assign3b: 2.50-3.09) than tutors (mean range= Assign2: 1.39-2.66; Assign3a:
1.71-2.65;
Assign3b:
1.44-2.43)
excepting
overall
creativity
grade
for
Assign3a
(mean
difference=0.07). Mean scores indicated that subject performance was considered, especially by tutors,
as below average. Tutors’ standard deviations for Assign2 and Assign3b were typically higher than
judges’. There was greater overall assessor agreement on Assign3a. In general, tutors were more critical
toward, and discriminating of, subjects’ contributions than judges.
967-86915/210411
12
Multiple analysis of variance (MANOVA) was conducted across various conditions (judge x tutor; male
x female; design x nondesign) with CMAM criteria as dependent variables in order to determine
significant interactions in assessor scoring. See Table 2 for a detailed summary. Assign3b was excluded
because group conditions could not be statistically differentiated. Judges’ scores were significantly
higher than tutors’: 1) Assign2 (range: F1,146=21.07, p<.001 to F1,146=192.31, p<.001) 2) Assign3a
(range: F1,146=7.83, p<.01 to F1,146=97.44, p<.001). The most significant differences in criteria occurred
for: elegant (F1,146=192.31, p<.001), perfected (F1,146=135.26, p<.001) and useful (F1,146=104.89, p<.001)
on Assign2; and elegant (F1,146=97.44, p<.001) on Assign3a. Certain criteria in Assign3a showed no
significant scoring differences (they also exhibited the least significance in Assign2): creativity, overall
creativity grade, appropriate, elaboration & synthesis, and effort evident. As noted previously, most of
these criteria exhibited higher alphas. Judges scored designers significantly better than non-designers
across both assignments (range: F1,73=11.61, p<.001 to F1,73=58.19, p<.001) while tutors found less
significant differences, especially in Assign3a. Judges scored females significantly higher than males on
most Assign2 criteria (except for the problem resolution section) but gender differences attenuated in
Assign3a. Tutors didn’t indicate significant gender differences, except for the novelty-related criteria,
planning in Assign2, and novelty (idea) in Assign3a.
TABLE 2.
MANOVA results for Judge/Tutor, Design/Nondesign, and M/F conditions,
with CMAM criteria as dependent variables
967-86915/210411
13
Judges & Tutors Scores
Dependent / Independent
Variables
Assign2
Judges Scores
Judge/Tutor
F
29.57
η2
df
1,73
.235
1,73
66.97
.000
.314
1,73
58.19
.000
.285
12.91
.000
.081
1,73
25.84
.000
.261
7.00
.010
.087
1, 146
32.66
.000
.183
1,73
29.45
.000
.168
10.87
.001
.069
1,73
12.92
.001
.150
5.13
.027
.066
9.42
.003
df
1,73
.061
1,73
F
8.28
9.01
p
.005
.004
.102
.110
F
1.05
4.03
p
.310
η2
1, 146
.152
.013
M/F
Novelty (material)
.000
p
.171
η2
Creativity
.128
F
1.90
η2
1, 146
26.16
p
.000
η2
Novelty (idea)
.048
.052
Overall Creativity Grade
1, 146
.178
4.84
1,73
12.09
.001
.142
1.83
Problem Resolution
1, 146
56.00
.000
.277
1,73
23.94
.000
.141
3.19
.076
.021
1,73
10.13
.002
.122
2.43
.124
.032
1, 146
43.98
.000
.231
1,73
13.69
.000
.086
.36
.550
.002
1,73
6.69
.012
.084
.48
.493
.006
104.89
.000
.132
.000
1, 146
1, 146
42.49
.000
.225
1,73
15.40
.000
.095
1.13
.290
.008
1,73
4.85
.031
.062
.92
.340
.012
1, 146
56.23
.000
.278
1,73
20.79
.000
.125
1.22
.272
.008
1,73
7.18
.009
.090
.10
.758
.001
1,73
15.65
.081
1,73
3.43
.068
.045
.37
Elaboration & Synthesis
1, 146
1, 146
66.18
.000
.312
1,73
24.02
.000
.141
2.04
.155
.014
1,73
8.73
.004
.107
.53
.469
.007
1, 146
56.10
.000
.278
1,73
12.46
.001
.079
6.13
.014
.040
1,73
5.11
.027
.065
3.71
.058
.048
Communicative
1, 146
.226
26.82
.000
.155
5.64
.019
.037
.000
.177
.057
.005
Coherent
.000
3.73
.544
Complex
42.63
1,73
.000
.019
.024
Useful
12.93
.092
.032
Appropriate
.418
2.87
.029
Valuable
.000
1,73
31.64
.180
.014
Logical
22.27
1,73
F
21.41
Design/Nondesign
df
1, 146
.000
.168
Tutors Scores
M/F
Paradigm (movement)
44.85
p
.000
Design/Nondesign
.049
58.51
.000
.286
1,73
43.20
.000
.228
4.09
.045
.027
1,73
19.30
.000
.209
1.14
.290
.015
Appealing
1, 146
54.09
.000
.270
1,73
25.09
.000
.147
3.91
.050
.026
1,73
6.70
.012
.084
1.95
.167
.026
Perfected
1, 146
135.26
.000
.481
1,73
32.46
.000
.182
4.87
.029
.032
1,73
10.56
.002
.126
1.35
.249
.018
Elegant
1, 146
192.31
.000
.568
1,73
39.31
.000
.212
5.50
.020
.036
1,73
8.81
.004
.108
1.55
.218
.021
Effort Evident
1, 146
.198
2.67
.107
.035
Planning
1, 146
22.12
.000
.132
1,73
37.75
.000
.205
6.45
.012
.042
1,73
20.49
.000
.219
4.39
.040
.057
Organization
1, 146
24.07
.000
.142
1,73
28.80
.000
.165
4.08
.045
.027
1,73
14.10
.000
.162
2.13
.149
.028
Technical Goodness
.040
15.87
.000
.098
1,73
31.47
.000
.177
4.20
.042
.028
1,73
17.98
.000
1, 146
21.07
.000
.126
1,73
29.88
.000
.170
5.88
.017
.039
1,73
14.56
.000
.166
3.03
.086
Paradigm (movement)
1, 146
8.19
.005
.053
1,73
18.05
.000
.110
4.14
.044
.028
1,73
5.22
.025
.067
.47
.496
.006
Novelty (idea)
1, 146
14.65
.000
.091
1,73
28.21
.000
.162
6.82
.010
.045
1,73
11.12
.001
.132
4.03
.048
.052
Novelty (material)
1, 146
17.90
.000
.109
1,73
22.61
.000
.134
3.43
.066
.023
1,73
4.34
.041
.056
.80
.374
.011
Creativity
1, 146
3.82
.053
.026
1,73
24.73
.000
.145
2.15
.145
.014
1,73
9.95
.002
.120
1.28
.261
.017
.105
2.29
Assign3a
.135
.030
Problem Resolution
1, 146
1.56
.213
.011
1,73
19.34
.000
.117
2.98
.086
.020
1,73
6.50
.013
.082
.99
.323
.013
Logical
Overall Creativity Grade
1, 146
1, 146
15.45
.01
.000
.096
1,73
11.61
.001
.074
1.38
.242
.009
1,73
1.25
.268
.017
.02
.889
.000
Useful
1, 146
34.12
.934
.000
.000
.189
1,73
1,73
24.37
14.89
.000
.000
.143
.093
5.02
3.31
.027
.071
.033
.022
1,73
1,73
8.55
1.96
.005
.166
.026
.93
.338
.013
Appropriate
1, 146
3.56
.061
.024
1,73
16.80
.000
.103
2.12
.147
.014
1,73
3.41
.069
.045
.08
.774
.001
Valuable
1, 146
37.11
.000
.203
1,73
16.88
.000
.104
3.56
.061
.024
1,73
1.92
.170
.026
.68
.414
.009
Elaboration & Synthesis
1, 146
.135
8.00
.005
.052
1,73
7.92
.098
2.35
Coherent
1, 146
24.93
.000
.146
1,73
17.78
.000
.109
6.45
.012
.042
1,73
3.50
.065
.046
1.34
.251
.018
Complex
1, 146
26.70
3.30
.000
.071
.155
.022
1,73
1,73
14.41
.000
.090
3.67
.057
.025
1,73
.91
.344
.012
.54
.465
.007
Communicative
1, 146
9.15
.003
.059
1,73
18.64
.000
.113
1.86
.174
.013
1,73
3.98
.050
.052
.04
.837
.001
Appealing
1, 146
38.90
.000
.210
1,73
40.94
.000
.219
6.96
.009
.046
1,73
11.30
.001
.134
2.58
.113
.034
22.77
.000
3.52
.063
.006
.130
.031
Perfected
1, 146
60.26
.000
.292
1,73
25.44
.000
.148
.024
1,73
7.25
.009
.090
1.33
.253
.018
Elegant
1, 146
97.44
.000
.400
1,73
35.75
.000
.197
3.71
.056
.025
1,73
6.64
.012
.083
1.86
.177
.025
Effort Evident
1, 146
3.23
.074
.022
1,73
23.11
.000
.137
4.34
.039
.029
1,73
10.31
.002
.124
1.70
.197
.023
Planning
1, 146
17.83
.000
.109
1,73
24.05
.000
.141
6.27
.013
.041
1,73
9.42
Organization
1, 146
7.83
.006
.051
1,73
22.23
.000
.132
6.57
.011
.043
1,73
10.74
.002
.128
3.84
.054
.050
Technical Goodness
1, 146
32.41
.000
.182
1,73
31.59
.000
.178
3.90
.050
.026
1,73
8.48
.005
.104
1.21
.274
.016
.003
.114
1.53
.220
.021
Note: Bolded figures are significant
Grading Alignment. Grading alignment results are summarized in Table 3. Assessors’ mean differences
in scoring overall creativity grade were significant except for Assign3a (mean difference=-.07, p=.771).
Assign3b mean differences were less than a half grade-point. Differences between judges’ scoring of
overall creativity grade and subjects’ self-grading of creativity were also very significant and less than a
half grade-point for the first two assignments (Assign2: mean difference=-.36, p<.01; Assign3a: mean
difference=-.44, p<.001). Assign3b results, however, slightly exceeded a half grade-point. Differences
between tutors’ overall creativity grade and subjects’ self-grading were very significant, almost a gradepoint for Assign2 (mean difference=-.95, p<.001), less than a half grade-point for Assign3a (mean
difference=-.37, p<.001) and over a grade point for Assign3b (mean difference=-1.07, p<.001); that is,
tutors were again more harsh in grading than judges. Negative differences indicated subject overgrading. Subject over-grading was higher in the group assignment. Mean grading differences between
assessors tend to support the half grade-point margin of error typically observed in expert panel
assessments. In the individual assignments, assessor/subject grade alignment increased. Furthermore,
967-86915/210411
14
subjects’ rating of assignment difficulty exhibited a significant inverse correlation with self-grades
(Assign2, r=-.279, p<.05; Assign 3a, r=-.239, p<.05); that is, greater perceived difficulty led to lower
self-grades, and vice-versa.
TABLE 3.
Mean differences between assessors’ grades and subjects’ self-grades
Assign2
Assign3a
Mean
Difference
(I-J)
p
Lower
Bound
Upper
Bound
Judges - Tutors
.60
.000
.36
Judges - Self-Grade
-.36
.002
Tutors - Self-Grade
-.95
.000
I-J
Mean
Difference
(I-J)
p
Lower
Bound
.84
-.07
.771
-.60
-.11
-.44
-1.20
-.71
-.37
Assign3b
Upper
Bound
Mean
Difference
(I-J)
p
Lower
Bound
Upper
Bound
-.32
.17
.40
.040
.01
.79
.000
-.69
-.20
-.67
.000
-1.06
-.28
.001
-.62
-.12
-1.07
.000
-1.46
-.68
Bold figures represent mean differences of +/- a half grade point
Criteria Relations. Varimax rotation was calculated across both assignments and assessors to determine
CMAM’s factor structure, which is summarized in Table 4. With the exception of tutors’ loadings in
Assign2 and Assign3a, two factors emerged. Excepting tutors’ Assign3b results, paradigm movement
loaded consistently with problem resolution criteria suggesting that paradigm movement is, in fact, a
measure of creative solution type and allied to high-order conceptual processing. In terms of creativity
factors, judges’ Assign2/Assign3a novelty/creativity section tended to factor separately from all other
criteria except for overall creativity grade which, in Assign3a, loaded with the other criteria. Judges’
Assign3b novelty criteria factored with elaboration/synthesis and certain motivation/technical criteria.
Tutors’ Assign2 novelty/creativity section tended to load separately from other criteria except for
novelty (materials) which factored with both elaboration/synthesis and motivation/technical criteria.
Tutors’ Assign3a novelty criteria loaded primarily with problem resolution criteria while Assign3b
novelty generally factored separately from other criteria. These findings suggest that novelty-related
activity factors variously with both perceptual and conceptual activity.
In terms of elaboration and synthesis factors, judges’ Assign2 and Assign3a elaboration/synthesis
section tended to load with both problem resolution and motivation/technical criteria. Their Assign3b
elaboration/synthesis section also loaded with motivation/technical criteria. Tutors’ Assign2
elaboration/synthesis section also factored with motivation/technical criteria, however, the Assign3a
967-86915/210411
15
elaboration/synthesis section split into two subgroups described as follows: a) structure perception:
elaboration and synthesis, coherent, complex and b) content perception: communicative, appealing,
perfected. The latter subgroup tended to factor better with motivation/technical criteria. Tutors’
Assign3b elaboration/synthesis section factored similarly to judges’ Assign2 and Assign3a loading
patterns. As noted, motivation/technical criteria usually factored with elaboration/synthesis criteria. All
three criteria sections factored differently with each other, apparently according to a contribution’s
nature and assessment method. In this respect, four dependencies (see Discussion for further details)
tended to emerge. Nominating novelty criteria as (N), problem resolution as (R), elaboration and
synthesis as (E) and motivation/technical criteria as (T), the dependencies are: 1) [(N) + (R/E/T)]; 2)
[(N/E/T) + (R)]; 3) [(N) + (R) + (E/T); 4) [(N/R) + (E/T)] with T always associated with E (and
paradigm movement usually associated with R).
In terms of reflection report factors, CRR criteria tended to factor with novelty/creativity criteria in
Assign2, with the elaboration/synthesis structure perception subgroup in Assign3a, and were split
between novelty and other criteria in Assign3b. These variations conceivably reflect tutors’ more critical
solution considerations (e.g. Assign2: the existence, or not, of novelty within the accepted paradigm;
Assign3a: elaborations indicating, or not, radical approaches to freedom; Assign3b: the capacity, or not,
to integrate new associations, conduct research and organize material in resolving a merger).
TABLE 4.
CMAM criteria factor analysis across assignments and assessors
967-86915/210411
16
Judges
Assign2
Tutors
Assign3a
Assign3b
Assign2
1
2
1
2
1
2
1
Paradigm (movement)
.80
.38
.82
.51
.60
.73
Novelty (idea)
.37
.84
.65
.67
.94
.17
Novelty (materials)
.44
.84
.51
.77
.92
Creativity
.39
.85
.67
.68
Overall Creativity Grade
.63
.74
.77
Problem Resolution
.77
.53
Logical
.89
Useful
Assign3b
Assign3a
2
3
1
2
3
1
2
.32
.22
.57
.31
.82
.20
.51
.78
.26
.87
.15
.33
.74
.36
.40
.85
.30
.65
.59
.13
.21
.51
.66
.65
.62
.93
.29
.33
.84
.22
.48
.58
.43
.37
.91
.61
.80
.58
.44
.75
.41
.58
.63
.44
.47
.86
.79
.57
.63
.75
.39
.37
.75
.54
.73
.25
.75
.55
.26
.78
.55
.14
.96
.37
.23
.81
.41
.56
.46
.71
.56
.73
.49
.76
.59
.27
.91
.25
.29
.78
.46
.68
.37
.79
.49
Appropriate
.83
.43
.77
.59
.39
.89
.34
.33
.80
.45
.77
.27
.76
.55
Valuable
.74
.55
.73
.64
.52
.80
.36
.45
.67
.48
.70
.32
.77
.54
Elaboration & Synthesis
.83
.44
.87
.46
.73
.63
.73
.27
.48
.64
.40
.53
.80
.50
Coherent
.81
.50
.83
.50
.74
.62
.65
.16
.60
.57
.50
.48
.84
.45
Complex
.67
.55
.83
.50
.80
.56
.43
.43
.44
.53
.38
.40
.65
.56
Communicative
.72
.54
.75
.59
.66
.68
.71
.21
.53
.40
.52
.58
.81
.44
Appealing
.50
.79
.45
.87
.88
.38
.65
.40
.38
.29
.51
.63
.66
.65
Perfected
.71
.62
.76
.62
.70
.66
.67
.33
.48
.40
.48
.67
.86
.37
.50
Factor
Solution Criteria
Elegant
.40
.80
.45
.83
.79
.48
.71
.30
.26
.27
.58
.50
.74
Effort Evident
.69
.62
.84
.47
.76
.59
.76
.48
.30
.58
.27
.63
.72
.54
Planning
.78
.52
.83
.51
.72
.67
.78
.40
.36
.62
.32
.62
.77
.54
Organization
.78
.54
.86
.48
.66
.69
.81
.35
.37
.66
.30
.61
.85
.45
TechnicalGoodness
.73
.55
.67
.67
.68
.68
.80
.34
.26
.37
.16
.80
.90
.15
Theory Application
-
-
-
-
-
-
.23
.67
.62
.78
.48
.30
.62
.72
Research Proficiency
-
-
-
-
-
-
.37
.65
.54
.81
.40
.34
.75
.56
Organization
-
-
-
-
-
-
.49
.64
.49
.80
.37
.38
.71
.62
Completeness
-
-
-
-
-
-
.46
.67
.49
.80
.37
.37
.68
.66
CRR Subtotal
-
-
-
-
-
-
.36
.70
.59
.82
.44
.35
.43
.88
Tutor Total Score
-
-
-
-
-
-
.38
.72
.56
.79
.47
.37
.32
.89
Final Score
-
-
-
-
-
-
.38
.70
.46
.78
.40
.26
.77
.51
CRR Criteria
Note: higher weightings are bolded.
Although paradigm movement was listed as a single criterion, assessors were informed prior to
evaluation that assignments variously denoted paradigm acceptance, rejection or merger. Although
subjects had no choice on Assign3b (as all were integrations), Assign2 and Assign3a allowed selections
of one of three creative solution types (i.e. specific movements). Not all subjects stated the specific
movement they were attempting in their CRRs (Assign2, N=76; Assign3a, N=66). In Assign2, forward
incrementation attracted more attempts (39.47%) than replication (31.58%) or redefinition (28.95%). In
Assign3a, redirections held the majority of attempts (65.15%) followed by reconstruction/redirection
(19.70%) and reinitiation (15.15%).
Discussion
Judges’ inter-rater reliability, using CAT, was on the average higher (α=.85) across the three
assignments than the average reliabilities of Besemer and O’Quin’s (1986) a) CPAM investigation of
two creative contributions rated by 133 undergraduates using 70 bi-polar adjectives to describe eleven
solution subscales (α=.82); b) four CPAM studies (1987) involving 90 student raters using a 110 item
checklist to measure eight solution subscales (α=.76); and c) a later CPSS study (1989) where 194
967-86915/210411
17
undergraduates rated three products using 71 bi-polar adjectives defining ten solution subscales (α=.77).
Noting that CMAM contributions were more complex than those in the above-cited studies, plus
reliability for overall creativity grade, a creativity summation, (range: α=.88 to α=.93) exceeded CAT
reliabilities for creativity in simple artifacts such as collages (α=.70 to α=.82; Amabile & Hennessey,
1999), denoted that CMAM criteria have considerable value in evaluating creative contributions.
Results further indicate that experts’ implicit definitions of CMAM criteria are more reliable than nonexpert ratings of multiple adjectives describing similar criteria. However, CAT, though quite reliable, is
not an optimal assessment tool for educators, being both time-consuming and resource heavy.
The panel of tutors, for the most part, proved unreliable (α<.70). However, their most reliable criteria
were the same as judges’: overall creativity grade, elaboration & synthesis, and effort evident. Tutors’
paradigm movement and the problem resolution section exhibited the least reliability, with problem
resolution and useful faring best. According to Dobbins and Wagner (2005), conceptual memoryretrieval activity involves higher meaning, more selectivity and slower processing time than novelty and
perceptual activity. That assessors exhibited less reliability in problem resolution criteria with the tutors
less reliable overall, indicated that conceptual selectivity influenced scoring differentiation. In this
respect, tutors had the advantage of additional information via CRRs, criterion definitions and rubrics,
self-grades, level of difficulty etc. Novelty detection and perceptual activity criteria shared slightly more
commonality among assessors. Tutors’ lower mean scores and higher standard deviations in Assign2
and Assign3b support a more critical and differentiated purview of contributions, which tallies with
reliability findings. Though somewhat speculative, general criterion labels may involve more implicit
sharing of default schemas (i.e. common conceptual patterns or frameworks) leading to greater interrater reliability than more specific criterion labels which require refined interpretation and probably
increased conceptual and perceptual selectivity. Similarly, criterion labels in themselves (e.g. used in
CAT measurements) likely embody more default schemas, and exhibit more inter-rater reliability, than
the same labels coupled with specific definitions, descriptions and rubrics. The higher reliabilities found
in section headings, coupled with low tutor alphas, tend to support these contentions. Greater reliability,
967-86915/210411
18
however, does not indicate enhanced discriminative validity. Because subjects tended to agree more
with judges’ implicitly-defined scoring of creativity, self-assessment (if not founded in critical
reflection) may tend to align more with CAT’s default schemas than with the more refined schemas of
tutors. In short, accurate, meaningful measurement of creative substance depends on CMAM’s capacity
to fine-tune and align creator/observer schemas. These conjectures, however, require further
investigation.
MANOVA results found the largest judge-tutor scoring differences in elegant, perfected and useful for
the first assignment and elegant in the second. Elegant is defined (in its Chinese label) as being both
simple and profound. Perfected implies some comparison with rater expectations of a finished work, or
of other similar works, and useful denotes the number of real or imagined uses. These criteria, two of
which are considered more perception-based, involve some conceptual activity, variations in which
might lead to the larger scoring gap. Again, conceptual activity seems to diversify evaluations more so
that novelty detection or perceptual activity. For all assessors, designers and females tended to
outperform their counterparts in creativity. Judges’ results in this respect were more significant. That
designers, whose fundamental training is in creativity, outperformed non-designers in novelty/creativity
criteria is not extraordinary. Gender differences, however, are more contentious. Runco, Cramond and
Pagnani (2009) noted that the majority of gender studies on creative potential found no significant
differences, with about a third favoring females. The question is do these findings on creative potential
translate to creative works? Evidence herein indicates that females may outperform males in some
criteria (e.g. novelty ideas); but, a small sample confined to one institution is not indicative and further
work is required.
CMAM was designed to compensate for unreliable, biased tutor evaluation by incorporating
accountable self-grading, critical reflection, and timely feedback. When creativity self-grades were
combined with judges’ and tutors’ overall creativity grades, reliability decreased providing evidence
that subjects were, indeed, using a different perspective in evaluating their own work, which probably
967-86915/210411
19
exacerbated over-grading. In this respect, Hypothesis 1, which proposes that reliability should increase,
was unsupported. However, the combination of self-grades and assessors’ grades did not undermine
reliability. In fact, alpha steadily increased across assignments, an effect likely due to familiarity with
CMAM criteria, grading accountability, and expert feedback. The combined ratings in Assign3b nearly
attained reliable levels (α=.69). In this respect, Hypothesis 1 was supported because reliability did
increase over time. In the composite scores tutor total score (overall creativity grade and CRR subtotal)
and final score (tutor total score plus creativity self grade), alpha reached quite high levels indicating
that critical reflection, when included in the scoring mix, improved expert-subject inter-rater reliability.
Except for Assign3b (and tutors’ assessment of Assign2) mean differences between creativity selfgrades and assessors’ overall creativity grade were generally less than a half grade point, a margin of
error often observed in panel-based assessments. Tutors’ larger grade margin in Assign2 was expected
because subjects were unfamiliar with CMAM and, as yet, had no access to feedback. Individual
assignment results tended to support Hypothesis 2; that is, CMAM enhanced alignment in creativity
scores. Reduced grade alignment in the group assignment can be attributed to assessors’ higher
expectations for group-work leading to lower mean scores. Subjects also struggled in organizing group
meetings within their highly variant class schedules which may have promulgated inferior work. Peer
pressure to elevate group grades may also have been a contributing factor. In general, groups are known
to rate their idea generation performance much higher than individuals (Paulhus, Dzindolet. Poletes, &
Camacho, 1993; Paulhus, Larey & Ortega, 1995; Stroebe, Diehl, & Abakoumin, 1992). Groups
probably trusted their negotiating power in claiming higher grades and validating CRR responses. An
additional group assignment would have been useful to determine if CMAM promotes greater grade
alignment in groups over time; however, student/staff workload prohibited such an exercise. Although
this study did not investigate the single assessor’s grade alignment with students, a parallel study
(unpublished) in which twenty-six design students, completing final year projects in visual
communication, did evaluate the effects of CMAM engagement (or not) within a high feedback
environment. The CMAM-engaged experimental group (both student and supervisor were engaged) had
967-86915/210411
20
significantly better, and more consistent, grade alignment (mean difference=0.06) while the CMAMdisengaged control group (both student and supervisor were disengaged) dropped in alignment between
reporting periods (mean difference=0.65 to 0.75). Additionally, the experimental group outperformed
the control group by 55.6% in the final grade, and garnered 66.7% of all awards and design merit for the
entire cohort. These findings further support Hypothesis 2. They also imply that creative momentum
assessed by experts without a criterion baseline or creator input is faulted.
Factor analysis of CMAM criteria exhibited some interesting patterns. Paradigm movement loaded
consistently with problem resolution criteria, except for tutors’ ratings of Assign3b. This indicates that
creative movement is more associated with context and problem solving and therefore requires highorder conceptual processing than novelty-detection, for example. As Assign3b involved paradigm
integration, movement itself was construed by tutors as a form of novelty. Additionally, Amabile and
Hennessey’s (1999) motivation/technical criteria section (effort evident, planning, organization and
technical goodness) loaded consistently with elaboration/synthesis which, in turn, factored variously
with novelty and resolution. This implies that motivation/technical criteria support both problem
resolution and novelty via elaboration and synthesis. Besemer and O’Quin (1989) observed that
elaboration/synthesis loaded with resolution in one study and with novelty in others (Besemer &
O’Quin, 1986). These studies employed non-expert assessors using adjective checklists as opposed to
experts using implicit definitions for measurement. Results suggested that elaboration/synthesis criteria
factor according to the nature of a creative work. If creativity is defined as novelty and appropriateness,
or social value (Amabile, 1987; Sternberg and Lubart, 1991, 1995, 1996) and appropriateness is further
determined by the degree and manner of problem resolution and/or elaboration/synthesis, then
Hypothesis 3 was supported. That is, all three major criteria sections, representing novelty and
appropriateness, tended to factor according to the nature of the contribution. Loading differences
suggest that manner of assessment also played a role. One explanation for the factoring variability is
that there are four basic types of criteria dependencies found in creative contributions which are
influenced by the sort of information available to assessors:
967-86915/210411
21
•
Type I. [(N) + (R/E/T)] Novelty is distinguished from appropriateness (i.e. problem resolution and
elaboration/synthesis). This dependency pattern was found in judges’ evaluations of Assign2 (a
brochure), Assign3a (a poster) as well as tutors’ assessment of Assign3b (a PowerPoint
presentation). Judges, who lacked CRR information and therefore details on the problems being
addressed, detected novelty separately in the first two assignments but bundled problem resolution
with elaboration/synthesis. Tutors, however, managed to differentiate these sections leading to
assessors’ Type differences. Assign3b presumably demanded more elaboration and synthesis due to
the additional associations required in paradigm merger attempts. In this assignment, the better
informed tutors factored resolution criteria (e.g. the merger problem) with elaboration/synthesis
while separating out novelty. Judges, alternatively, viewed these contributions as Type II (see
below). Besemer and O’Quin’s (1987) study of three letter openers generally followed Type I. All
letter opener designs involved paradigm acceptance as their novelty was determined within a
familiar context (e.g. the field of letter openers). A consequent investigation of Le Vocifèrant, a
unique painting/sheet-metal sculpture by Jean DuBuffet, followed the same dependency pattern;
that is, novelty was obvious. In Type I, novelty is self evident but does not necessarily create value.
Value emerges when problem resolution is coupled with elaboration/synthesis;
•
Type II. [(N/E/T) + (R)] Novelty is generated through elaboration/synthesis while appropriateness
is determined via problem resolution. This Type was suggested in judges’ evaluations of Assign3b
and tutors’ evaluation of the criterion novelty (materials) in Assign2 and Assign3a. Whereas judges
could have estimated Assign3b novelty through PowerPoint elaborations, they had to determine the
existence of paradigm merger attempts (i.e. resolution) without access to CRRs, a difficult task. For
tutors, Assign2/Assign3a elaboration was combined with subjects’ novel use of materials in
expressing originality (e.g. layout and design of brochure/poster). Besemer and O’Quin (1986)
measured two t-shirt designs in which some elaboration/synthesis criteria (complex; well-crafted,
e.g. perfected) factored with novelty criteria. Investigators proposed that a certain level of
complexity makes a product more engaging whereas too much leads to confusion. Furthermore,
967-86915/210411
22
they suggested that novelty was likely associated with arousal during viewing of the shirts’
graphical images. In Dobbins and Wagner’s study (2005), an overlap between novelty detection and
perceptual recollection of simple visual objects led researchers to propose that these two memory
retrieval demands share a common visuo-perceptual attention component separate from conceptual
recollection. Their findings reflect Type II dependency. Studies with other kinds of stimuli (e.g.
text, touch) are needed to determine if the other dependencies suggested herein are also found in
interactions between these three major brain activities. Examples of Type II can be found in fine
arts, performance, novels, certain design works, etc. where novelty manifests through the processes
of elaboration and synthesis.
•
Type III. [(N/R) + (E/T)] Novelty is determined by the capacity to resolve a problem (and vice
versa) while elaboration/synthesis criteria factor independently. Except for novelty (materials), this
type emerged in tutors’ scoring of paradigm rejections (Assign3a). Subjects’ ability to reject their
personal freedom paradigms (i.e. the problem) was more obvious to tutors reading detailed CRRs
than for judges who only had brief descriptions of subjects’ intents. In this assignment, tutors’
factored elaboration/synthesis criteria into two subgroups (e.g. structure and content perception) a
result, perhaps, of better awareness of how solutions were generated. In this dependency, novelty
and resolution are interdependent. Besemer and O’Quin’s (1989) study involving four different key
chains, cat pitcher, dog chair and steer desk followed this Type as did most elaboration criteria
(excepting complex and well-crafted) in their earlier study of two t-shirt designs (Besemer and
O’Quin, 1986). Again, all artifacts exemplify paradigm acceptance, however, their originality was
determined through remote associations related to problem resolution (e.g. steer and desk, dog and
chair, cat and pitcher, etc.). Type III is prevalent in many design solutions as well as in major
discoveries like Einstein’s general relativity and Darwin’s theory of evolution, where novelty
emerges directly from the problem approach.
•
Type IV. [(N) + (R) + (E/T) Novelty, problem resolution, elaboration/synthesis are all independent.
This type was found in tutors’ assessment of Assign2 (except for novelty, materials). Subjects’
brochures, though often well elaborated, left some doubt if a paradigm had been altered in any way
967-86915/210411
23
(i.e. resolution). Many solutions were considered unoriginal, as indicated by tutors’ low means
creativity scores. This Type may also arise when a solution appears original, problem resolution (i.e.
value) is questionable, and elaboration somewhat extraneous. Many ‘silly’ inventions fall into this
category. Alternatively, Type IV could manifest in complex solutions involving diverse forms of
novelty, resolution and elaboration, such as found in certain global marketing campaigns.
Although criteria dependencies seem to vary with the nature of creative contributions, access to
information and assessment methodology, further research is required to test the four Types under more
controlled conditions. If validated, they would enhance our understanding of creative achievement.
The ability to take sensible risk is important in creativity (Sternberg, 2003). CMAM was originally
designed to encourage risk-taking. In Assign2, 68.42% of subjects selected forward incrementation and
redefinition. These are more challenging forms of paradigm acceptance as they significantly alter
paradigms, or others perceptions of them. On the other hand, in Assign3a, 65.15% selected redirection
the least challenging rejection movement, especially in a personal context (i.e. awareness of freedom)
that doesn’t require expertise. Results indicated that risk-taking is not a function of the weight given to
creativity self-assessment but of perceived challenges and skill levels related to creative movement (and
perhaps creative substance as well). For example, paradigm acceptance was generally less challenging
allowing subjects to take more risk. The reverse seemed to hold for paradigm rejection where basic
assumptions had to be over-turned. Csikszentmihalyi (1996) proposed that a balance between perceived
challenge and skill level results in the state of creative flow, an ‘intense experiential involvement in
moment-to-moment activity’ (Csikszentmihalyi et al., 2005, p.600). This balance, however, may be
difficult to achieve. Level of assignment difficulty (perceived challenge) correlated inversely yet
significantly with self-grades (perceived skill level) which tended to confirm the difficulty in attaining
flow. Increased creativity grade alignment between subjects and tutors implied that subjects’ perceived
level of challenge, and therefore risk, was indirectly validated. This does not mean, however, that
individuals will accept greater creative risk just by having their perceptions confirmed. It would be
967-86915/210411
24
interesting to investigate the relationship between self-generated creative momenta measurements, risktaking and attempts to attain, or maintain, flow. In fact, Markman and Guenther (2007) proposed that
flow contributes to psychological momentum; but, unlike p, flow does not include the notion of exerting
an effect on the individual’s capacity to attain desired outcomes. The advantage of CMAM is that it
clarifies challenges, skill levels and risk to both creator and observer thereby allowing more accurate
measurements to be made. At the same time, creative momentum can be adjusted to maintain implicit
motivation and, on the other hand, perhaps contribute to flow.
The key weaknesses in this study were lack of a control group, a sample restricted to general education
students who are typically less motivated in GE subjects, and one less group assignment for comparison.
Nonetheless, results indicated that CMAM has potential for assessing creative contributions; however,
accuracy doesn’t lie in ‘snapshot’ measurements. That is, scoring reliability and grade accuracy improve
with CMAM familiarity. The study employed a full range of CMAM criteria which, though useful for
artists and designers, may not be as appropriate in other domains. Reliability results indicate that
creative momentum can be measured simply via three basic substance criteria: novelty (ideas or
materials), problem resolution (useful or appropriate) and elaboration/synthesis plus one movement
criteria (i.e. paradigm acceptance, rejection, merger). On the other hand, CMAM allows educators to
use relevant sub-criteria from these sections to design, and selectively measure, assignments and
projects. As a measurement tool, CMAM also has the potential to enrich understanding and nurture
qualities that assist individuals in expressing their creativity.
Acknowledgements – This research was supported under the Creativity Assessment Project by grants from the
Hong Kong Polytechnic University’s Working Group on Outcome-Based Education (WGOBE) and the School
of Design. The author would like to thank Dr. Lau Wing Chuen who acted as project research associate as well
as designers Siu King Chung; Dr. Yuen Man Wah, Lee Yu Hin, Chong Wai Yung, Tam Chi-hang, and Wai Hon
Wah who served as subject tutors and course advisors.
967-86915/210411
25
References
Amabile, T.M. (1979). Effects of external evaluation on artistic creativity. Journal of
Personality and Social Psychology, 37, 221-233.
Amabile, T.M. (1982). Social psychology of creativity: A consensual assessment technique.
Journal of Personality and Social Psychology, 43, 997-1013.
Amabile, T.M. (1987). The motivation to be creative. In S.G. Isaksen (Ed.), Frontiers of
creativity research: beyond the basics (pp. 223-254). Buffalo, NY: Bearly.
Amabile, T.M. (1996). Creativity in context: Update to ‘The social psychology of creativity’.
Boulder, CO: Westview.
Amabile, T. & Hennessey, B.A. (1999). Consensual assessment. In M.A. Runco and S.R.
Pritzker (Eds.), Encyclopedia of Creativity, Vol. 1 (pp. 347-359), San Diego, CA:
Academic Press.
Amabile, T.M., Goldfarb, P., & Brackfield, S.C. (1990). Social influences on creativity:
Evaluation, coaction, and surveillance. Creativity Research Journal, 3, 6-21.
Baer, J. (1994). Performance assessments of creativity: Do they have long-term stability?
Roeper Review, 7 (1), 7-11.
Baer, J., Kaufman, J.C. & Gentile, C.A. (2004). Extension of consensual assessment
technique to nonparallel creative products. Creativity Research Journal, 16 (1), 113117.
Baker, J. (1992). Paradigms: The business of discovering the future. New York:
HarperBusiness.
Barron, F., Gaines, R., Lee, D., & Marlowe, C. (1973). Problems and pitfalls in the use of
rating schemes to describe visual art. Perceptual and Motor Skills, 37, 523-530.
Besemer, S.P. & O’Quin, K. (1986). Analyzing creative products: Refinement and test of a
judging instrument. Journal of Creative Behavior, 20, 115-126.
967-86915/210411
26
Besemer, S.P. & O’Quin, K. (1987). Creative product analysis: Testing a model by
developing a judging instrument. In S.G. Isaksen (Ed.) Frontiers of Creativity
Research, (pp.341-357), Buffalo, NY: Bearly Limited.
Besemer, S.P. & O’Quin, K. (1989). The development, reliability, and validity of the
Revised Creative Product Semantic Scale. Creativity Research Journal, 2, 267-278.
Besemer, S.P. & Treffinger, D.J. (1981). Analysis of creative products: Review and
synthesis. Journal of Creative Behavior, 15 (3), 158-178.
Capra, F. (1996). The web of life: A new scientific understanding of living systems. New
York: Anchor Books.
Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: a framework for memory research.
Journal of Verbal Learning and Verbal Behavior, 11, 671-684.
Csikszentmihalyi, M. (1996). Creativity: Flow and the psychology of discovery and
invention. New York: Harper Collins.
Csikszentmihalyi, M. (1999). Implications of a systems perspective for the study of
creativity. In R. J. Sternberg (Ed.), Handbook of creativity, (pp. 313-335), Cambridge,
UK: Cambridge University Press.
Csikszentmihalyi, M., Abuhamdeh, S., & Nakamura, J. (2005). Flow. In A. Elliot & C. Dweck (Eds.),
Handbook of competence and motivation (pp.598-608). New York, Guilford.
Dobbins, I.G., & Wagner, A.D. (2005). Domain-general and domain-sensitive prefrontal
mechanisms for recollecting events and detecting novelty. Cerebral Cortex, 15, 1768-1778.
Freyd, J.J., & Finke, R.A. (1984). Representational momentum. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 10, 126-132.
Harman, W. (1970). An incomplete guide to the future. New York: W. W. Norton.
Hébert, T.P., Cramond, B., Neumeister, K.L.S., Millar, G., & Silvian, A.F. (2002). E. Paul
Torrance: His life, accomplishments and legacy. Storrs, CT: The University of
Connecticut, The National Research Center on the Gifted and Talented (NRC/GT).
967-86915/210411
27
Hocevar, D. (1981). Measurement of creativity: review and critique. Journal of Personality
Assessment, 45, 450-464.
Hubbard, T.L. (1995). Environmental invariants in the representation of motion: Implied
dynamics and representational momentum, gravity, friction and centripetal force.
Psychonomic Bulletin and Review, 2, 322-338.
Hubbard, T.L. (2004). The perception of causality: Insights from Michotte’s launching effect,
naïve impetus theory, and representational momentum. In A.M. Oliveira, M.P. Teixera,
G.F. Borges, & M.J. Ferro (Eds.), Fechner day 2004 (pp. 116-121). Coimbra, Portugal:
International Society for Psychophysics.
Kinsbourne, M. & George, J. (1974). The mechanism of the word-frequency effect on recognition
memory. Journal of Verbal Learning and Verbal Behavior, 13, 63-69.
Kuhn, T.S. (1962) The Structure of Scientific Revolutions, 1st. ed., Chicago: University of
Chicago.
MacKinnon, D.W. (1978). In search of human effectiveness: Identifying and developing
creativity. Buffalo, NY: Creative Education Foundation.
Makel, M.C. & Plucker, J.A. (2008). Creativity. In S.I. Pfeiffer (Ed.) Handbook of
Giftedness in Children, (pp. 247-270), New York: Springer Science + Business
Media.
Markman, K.D., & Guenther, C.L. (2007). Psychological momentum: Intuitive physics and
naïve beliefs. Personality and Social Psychology Bulletin, 33 (6), 800-812.
Osgood, C.E., Suci, G., & Tannenbaum, P. (1957) The measurement of meaning. Urbana, IL:
University of Illinois Press
Ospovat, D. (1981), The Development of Darwin’s Theory: Natural History, Natural
Theology, and Natural Selection, 1838-1859, Cambridge, NY: Cambridge
University Press
Parke, B.N. & Byrnes, P. (1984). Toward objectifying the measurement of creativity. Roeper
Review, 6, 216-218.
967-86915/210411
28
Paulhus, P.B., Dzindolet, M.T., Poletes, G. & Camacho, L.M. (1993). Perception of
performance in group brainstorming: The illusion of group productivity, Personality
and Social Psychology Bulletin, 19, 78-89.
Paulhus, D.L., Larey, T.S., & Ortega, A.H. (1995). Performance and perceptions of
brainstormers in an organizational setting, Basic and Applied Social Psychology, 17,
249-265.
Pearlman, C. (1983). Teachers as an informational resource in identifying and rating student
creativity, Education, 103, 215-222.
Qui, J., Li, H., Jou, J.W., Liu, J., Luo, Y.J., Feng, T.Y., Wu, Z.Z., & Zhang, Q.L. (2010). Neural
correlates of the ‘Aha’ experiences: Evidence from an fMRI study of insight problem
solving. Cortex, 46, 397-403.
Runco, M.A., & Chand, I. (1995). Cognition and creativity. Educational Psychology Review,
7 (3), 243-267.
Runco M.A., Cramond B., & Pagnani A. (2009). Sex differences in creative potential and
creative performance. In J.C. Chrisler and D. R. McCreary (Eds.), Handbook of
Gender Research in Psychology, (pp. 343-360), New York: Springer.
Runco. M.A., McCarthy, K.A., & Svenson, E. (1994). Judgments of the creativity of artwork
from students and professional artists, Journal of Psychology, 128 (1), 23-31.
Runco, M.A., & Mraz, W. (1992). Scoring divergent thinking tests using total ideational
output and a creativity index, Educational and Psychological Measurement, 52, 213221.
Runco, M.A., Noble, E.P., & Luptak, Y. (1990). Agreement between mothers and sons on
ratings of creative activity. Educational and Psychological Measurement, 50, 673680.
Sternberg, R.J. (2003) Wisdom, intelligence, and creativity synthesized. New York:
Cambridge University Press.
967-86915/210411
29
Sternberg, R. J., & Lubart, T. I. (1991). An investment theory of creativity and its development.
Human Development,34, 1–32.
Sternberg, R. J., & Lubart, T. I. (1995). Defying the crowd: Cultivating creativity in a culture of
conformity. New York: Free Press.
Sternberg, R. J., & Lubart, T. I. (1996). Investing in creativity. American Psychologist, 51, 677–688.
Stoebe, W., Diehl, M. & Abakoumkin, G. (1992). The illusion of group effectivity,
Personality and Social Psychology Bulletin, 18, 643-650.
Taylor, I.A. & Sandler, B.E. (1972). Use of a creative product inventory for evaluating
products of chemists. Proceedings of the 80th annual convention of the American
Psychological Association, 7 (part 1), 311-312.
Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press.
Tulving, E., & Kroll, N.E.A. (1995). Novelty assessment in the brain and long-term memory
encoding. Psychonomic Bulletin and Review, 2, 387-390.
Thornton, I.M., & Hubbard, T.L. (Eds.). (2002). Representational momentum: New findings,
new directions. New York: Psychology Press.
967-86915/210411
30