Survey Paper 4

Uploaded by

only4studypurpose2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

Survey Paper 4

Uploaded by

only4studypurpose2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220817999

Development and evaluation of a model of programming errors

Conference Paper · January 2003

DOI: 10.1109/HCC.2003.1260196 · Source: DBLP

CITATIONS READS
72 344

2 authors, including:

Brad A. Myers
Carnegie Mellon University
519 PUBLICATIONS 25,962 CITATIONS

SEE PROFILE

All content following this page was uploaded by Brad A. Myers on 29 May 2014.

The user has requested enhancement of the downloaded file.

Carnegie Mellon University
Research Showcase
Human-Computer Interaction Institute School of Computer Science

1-1-2003

Development and Evaluation of a Model of

Programming Errors
Andrew J. Ko
Carnegie Mellon University

Brad A. Myers
Carnegie Mellon University

Recommended Citation
Ko, Andrew J. and Myers, Brad A., "Development and Evaluation of a Model of Programming Errors" (2003). Human-Computer
Interaction Institute. Paper 184.
http://repository.cmu.edu/hcii/184

This Conference Proceeding is brought to you for free and open access by the School of Computer Science at Research Showcase. It has been accepted
for inclusion in Human-Computer Interaction Institute by an authorized administrator of Research Showcase. For more information, please contact
kbehrman@andrew.cmu.edu.
Development and Evaluation of a Model of Programming Errors

Andrew J. Ko and Brad A. Myers

Human-Computer Interaction Institute
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213 USA
ajko@cmu.edu, bam+@cs.cmu.edu

Abstract common vocabulary for discussing the distribution of

errors within the event-based style, or for describing and
Models of programming and debugging suggest many comparing errors in other styles, languages, tasks,
causes of errors, and many classifications of error types domains, and environments. This makes it difficult to
exist. Yet, there has been no attempt to link causes of analyze what aspects of event-based programming are
errors to these classifications, nor is there a common difficult, and to design programming environments that
vocabulary for reasoning about such causal links. This help prevent errors.
makes it difficult to compare the abilities of programming To address this issue, we have developed a general
styles, languages, and environments to prevent errors. To model of programming errors that ties the cognitive
address this issue, this paper presents a model of causes of programming errors to specific errors,
programming errors based on past studies of errors. The integrating prior research on causes and classifications of
model was evaluated with two observational of Alice, an errors. In this paper, we describe the model, and evaluate
event-based programming system, revealing that most it using two observational studies of Alice [4], an event-
errors were due to attentional and strategic problems in based programming system. The model was helpful in
implementing algorithms, language constructs, and uses describing and explaining errors, as well as in eliciting
of libraries. In general, the model can support theoretical, design guidelines for new programming tools.
design, and educational programming research.
2. What is a Programming Error?

1. Introduction We must first decide on an appropriate definition of a

programming error. In this paper, we use definitions of
In the past three decades, there has been little work in error, fault and failure from IEEE standard 610.12-1990.
classifying and describing programmers’ errors. Yet, the A failure occurs when a program’s output does not match
work that has been done was largely successful in documented output requirements, or the programmer’s
motivating many novel and effective tools to help mental model of output requirements. Failures are
programmers identify and fix errors. For example, in the ultimately the result of a fault, which is a runtime state of
early ‘80’s, the Lisp Tutor drew heavily from analyses of a program that either is or appears to be incorrect (as in
novices’ errors [1], and nearly approached the assuming a lack of output from a debugging print
effectiveness of a human tutor. The testing and debugging statement to mean the code was not reached). Faults occur
features of the Forms/3 visual spreadsheet language [3] as a result of errors, which are program fragments that do
were largely motivated by the type and prevalence of not comply with documented program specifications or
spreadsheet errors [18]. the programmer’s mental model of specifications (such as
Recently however, the event-based style has become a missing increment or misspelled variable name).
widely used, and no comparable description and Failures are usually the first indication to the programmer
classification of errors common in event-based systems that one or more errors exist in a program, although some
exists. Not only do expert programmers widely use Java errors are found before they cause failures, since they may
and C# to create interactive and enterprise applications, be similar to errors already found or may be apparent
but many end users use Macromedia’s Director, when inspecting code. While a failure guarantees one or
Microsoft’s Visual Basic, Carnegie Mellon’s Alice [4], more faults exist and a fault guarantees one or more errors
and other event-based languages to create domain-specific exist, errors do not always cause faults, and faults do not
interactive applications. To complicate matters, there is no always cause failures.

Proceedings of the IEEE 2003 Symposia on Human Centric Computing Languages and
Environments (HCC’03), Auckland, New Zealand, October 28-31, pp. 1-8. © 2003 IEEE
3. Classifications and Causes of Errors of programming activity, and generalized into models that
are better able to suggest the causes of errors. For
Past classifications of errors identify a variety of types example, recent models of programming activity suggest
and causes of errors in many languages, environments, that programmers form a mental model of a program’s
and levels of expertise. Table 1 summarizes documented specifications [17]. This mental model may
classifications often cited in the literature chronologically. be insufficient because of a lack of domain or task
The summary is meant to be a representative sample of knowledge [9], an inadequate comprehension of the
past classifications, rather than an exhaustive list. specifications [5], or a poor description of a programs
There are many interesting patterns in the studies: requirements. This lack of knowledge can cause
failures due to errors can occur at compile-time and run- programmers to make a variety of errors. For example, a
time; a given error has many possible causes, including programmer may intentionally sort a list in increasing
lack of knowledge and forgetting; errors are made during order, forgetting that the specifications called for
specification, implementation, and debugging activities; decreasing order. Of course, creating and modifying
there are a variety of artifacts which are error prone. documented specifications may cause unforeseen
Many of these patterns were found in empirical studies problems, leading to errors as well.

Study Details Bug / Error / Cause Description Authors’ comments

Gould [10], Assignment bug Errors assigning variables values Requires understanding of language & behavior
novice, Fortran Iteration bug Errors iterating
Requires only an understanding of the language
1975 Array bug Errors accessing data in arrays
Eisenberg [7], Visual bug Clustering semantically related parts of expression
novice, APL Naive bug Using branching & iteration instead of parallel processing “…because of need to think step-by-step”
1983 Logical bug Omitting or misusing logical connectives or relationals
Dummy bug Experience with other languages interfering “…seem to be syntax oversights”
Inventive bug Inventing syntax
Illiteracy bug Difficulties with order of operations
Gestalt bug Not foreseeing side effects of commands "…failure to see the whole picture"
Johnson et al. Missing Omitting required program element
Errors have contexts: input/output, declaration,
[14], novice, Spurious Including unnecessary program element
initialization and update of variables, conditionals,
Pascal Misplaced Putting necessary program element in wrong place
scope delimiters, or combinations of these contexts.
1983 Malformed Putting incorrect program element in right place
Sphorer and Data-type inconsistency problem Misunderstanding differences between data types
Soloway [19]; Natural language problem Appling natural language semantics to commands
novice, Basic Human-interpreter problem Assuming computer has similar interpretation of code
“All bugs are not created equal. Some occur over
1986 Negation & whole-part problem Difficulties with constructing logical Boolean statements
and over again in many novice programs, while
Duplicate tail-digit problem Incorrectly typing constant values
others are more rare…Most bugs result because
Knowledge interference problem Domain knowledge interfering w/ entering constants
novices misunderstand the semantics of some
Coincidental ordering problem Malformed statements produced correct output
particular programming language construct.”
Boundary problem Not anticipating problems with extreme values
Plan dependency problem Unforeseen dependencies in program statements
Expectation & interpretation problem Misunderstandings of the problem specification
Knuth [15], in Algorithm awry Improperly implemented algorithms “method proved to be incorrect or inadequate”
writing TeX in Blunder or botch Accidentally writing code not to specifications “not… enough brainpower left to get the…details”
SAIL & Pascal Data structure debacle Errors using and changing data structures “I did not preserve the appropriate invariants”
1989 Forgotten function Missing implementation “I did not remember to do everything I had intended”
Language liability Misusing or misunderstanding language/environment “I misused or misunderstood the…language”
Mismatch between modules Imperfectly knowing specs, interface; reversed arguments “I forgot the conventions I had built”
Reinforcement of robustness Not handling erroneous input “I tried to make the code bullet-proof”
Surprise scenario Unforeseen interactions in program elements “forced me to change my original ideas”
Trivial typos Incorrect syntax, reference, etc. “although my original pencil draft was correct”
Eisenstadt [8], Clobbered memory bugs Overwriting memory, subscript out of bounds
industry experts, Vendor problems Buggy compilers, faulty hardware
Also identified why errors were difficult to find:
COBOL, Pascal, Design logic bugs Unanticipated case, wrong algorithm
cause/effect chasm; tools inapplicable; failure did not
Fortran, C Initialization bugs Erroneous type or initialization of variables
1993 actually happen; faulty knowledge of specs;
Variable bugs Wrong variable or operator used
“spaghetti” code
Lexical bugs Lexical problem, bad parse, ambiguous syntax
Language Misunderstandings of language semantics
Panko [18], Omission error “Facts to be put into the model…but are omitted,”
novice, Excel Quantitative errors: “errors that lead to an incorrect,
Logic error Incorrect algorithm or incorrectly implemented algorithm
1998 bottom line value”
Mechanical error “Typing the wrong number…or pointing to the wrong cell”
Overload error Working memory unable to complete task without error
Qualitative errors: “design errors and other problems
Strong but wrong error Functional fixedness (a fixed mindset)
that lead to quantitative errors in the future”
Translation error Misreading of specification
Table 1. Studies classifying errors, bugs and causes in various languages, expertise, and contexts, in chronological order.
Even if a programmer’s specification knowledge is 4. A Model of Programming Errors
sufficient, a programmer may create errors while
implementing a program’s specification because of a We use Reason’s model as a basis for our model of
problem with implementation knowledge and strategies, programming errors. In our model, breakdowns occur in
as in von Mayrhauser and Van’s model [17]. Factors specification, implementation, and debugging activities,
causing these errors may include working memory and consist of a cognitive problem, an action, and an
overload, as in Green’s parsing-gnisrap model of artifact. Cognitive problems are Reason’s knowledge,
programming [12], unfamiliarity with a programming strategic, and attentional problems discussed earlier.
language and environment [16], inadequate knowledge of Available actions depend on the type of artifacts being
programming concepts, algorithms, and data structures, or acted upon. Documented and mental models of
an inability to apply the appropriate knowledge or specifications can be created, understood, and modified.
strategies [6, 19]. Implementation errors include simple The meanings of these actions are different for each
syntax errors, malformed Boolean logic, and scoping artifact. For example, understanding a documented
problems. The space of implementation errors overlaps specification is a software engineering skill, while
greatly with that of specification errors. For example, a understanding a mental model of a specification involves
programmer may unintentionally sort a list in increasing recall and reasoning. Implementation artifacts such as
order, not because she misunderstood specifications, but algorithms, data structures, and style-specific artifacts
because of an inadequate understanding of the algorithm. (such as events in Alice) can be perceived, understood,
Vessey demonstrates that programmers can create implemented, modified, and reused. The meanings of
errors even in the process of testing and debugging [21]. these actions depend on the environment and language.
Her model of debugging suggests that only after Failures, faults, and errors can be d e t e r m i n e d.
programmers observe a failure do they realize that one or Determining failure involves perceiving and
more errors exist, and the range of possible errors causing understanding output; determining a fault involves
the failure is highly unconstrained. Further complicating searching for what faults caused a failure; determining an
the situation is that a failure may be caused by error involves searching for what error caused a fault.
independent or interacting errors. Other models of Failures, faults, and errors have visual representations, so
debugging [9, 17] suggest that as programmers try to they can also be perceived. For example, determining if a
close the gap between failures and errors, they may program failed to sort a list may involve visually
falsely determine faults and errors, possibly leading to perceiving the list on a display—whether this is easy or
erroneous modifications. For example, in response to a not depends on the representation. These representations
program displaying an unsorted list because the sort also have an underlying meaning, thus failures, faults and
procedure was not called, a programmer might instead errors can also be u n d e r s t o o d. For example,
determine the error was an incorrect swap algorithm, and understanding an error may involve understanding
erroneously modify the swap code. language semantics. Understanding a fault involves
These studies are very good at describing specific understanding a runtime state.
situations in which errors can be created. However, Our model hypothesizes many causal links between
models of error in human factors research can breakdowns. Breakdowns in an activity can cause more
significantly enhance our ability to reason about breakdowns within the activity, because actions within an
programming errors in general. Most notable is Reason’s activity often depend on each other. For example,
latent failure model of error [20]. In his model, he problems in creating specifications can cause problems
introduces the concept of b r e a k d o w n s , which are modifying them, and problems understanding algorithms
problems using knowledge and strategies. Reason argues can cause problems in implementing them.
that strengthened knowledge and strategies make Breakdowns in an activity can also cause breakdowns
breakdowns less likely, but problems with knowledge, in another activity. Specification breakdowns can cause
attention, and strategies can cause cascading breakdowns, implementation breakdowns (this is what software
each breakdown making error more likely. He discusses engineers aim to prevent). Implementation breakdowns
three cognitive problems that lead to breakdowns: can cause specification breakdowns, since perceiving and
understanding code can change mental models of
Knowledge problems: inadequate, inert, heuristic,
specifications. Errors can cause implementation
oversimplified, or interfering content or organization.
breakdowns before causing faults or failures (as in a
Attentional problems: fixation, loss of situational
programmer making a variable of Boolean instead of
awareness, or working memory strain.
integer type and trying to increment it). Errors, faults, and
Strategic problems: unforeseen interactions from
failures can cause debugging breakdowns, and debugging
goal conflict resolution or bounded rationality.
breakdowns can cause implementation breakdowns (since
programmers can create errors while debugging).
While our model suggests many links between actions
in breakdowns, it makes no assumptions about their
ordering. High-level models of software development
such as the waterfall or extreme programming models
assume a particular sequence of specification,
implementation, and debugging activities, and low-level
models of programming, program comprehension, testing,
and debugging assume a particular sequence of
programming actions. Our model hopes to describe errors
created in any of these programming processes.
A diagram of our model appears in Figure 1. The grey
regions denote programming activities. The elements in
these regions represent possible breakdowns comprised of
cognitive problems (left column), actions (center), and
artifacts (right column). In the figure, ‘x’ means that any
problem can apply to any action, which can apply to any
artifact. The arrows denote a “can cause” relationship.
Note that the figure does not portray every detail. There
are many types of knowledge, attentional, and strategic
problems, as described earlier, and there are many ways to
perform actions depending on the environment and
language. The figure only intends to provide a small,
standard vocabulary for categories of cognitive problems,
actions, and artifacts.
In general, the model assumes that as programmers
work, cognitive problems cause them to traverse these
causal links, generating a chain of breakdowns that may
lead to errors. To illustrate these traversals, imagine this
scenario. A programmer has inadequate knowledge for
understanding documented specifications for a recursive
sorting algorithm. This instigating breakdown causes a
strategic problem in implementing the swap algorithm,
which causes an erroneous variable reference, and the sort
fails. When noticing the failure, the programmer has
attentional strain in determining the fault, which leads to
inadequate knowledge for modifying the swap algorithm. Figure 1. A model of programming errors, showing causal
The programmer reads the documented specification links between breakdowns in programming activities.
again to mend this, but has attentional fixation in Breakdowns are defined by combinations of cognitive
understanding it and mistakenly modifies his mental problems, actions, and artifacts.
specification of the algorithm’s recursion. This leads to
unforeseen strategic problems when later modifying the
recursive call, eventually leading to infinite recursion.

5. Evaluation
To evaluate the utility of the model, we performed two
observational studies of programmers using the Alice 3D
event-based programming system [4]. Alice provides
objects (but does not support typical object-oriented
features such as inheritance and polymorphism), provides
explicit support for event-based constructs, and provides a
drag-and-drop, structured editing environment that
prevents syntax errors. A view of the Alice environment
can be seen in Figure 2. See www.alice.org for details. Figure 2. Alice v2.0: (1) objects in the world, (2) the 3D
worldview, (3) events, (4) details of the selected object,
and (5) the method being edited.
Experiment ID Hours Prog. Language Expertise Programming Tasks
Building B1 20 Ave. C++, Visual Parameterize rabbit hop animation; make tractor beam catch rabbit; programmatically animate camera moving
Virtual Basic, Java down stairs; prevent goat from penetrating ground after falling; play sound in parallel with character swinging bat.
Worlds B2 10 High C++, Java, Perl Randomly resize & move 20 handlebars using Jython, the Alice scripting language.
students B3 30 High C, Java Import, arrange, & programmatically animate objects involved in camera animation.
Pac-Man P1 5 High Java, C (1) Make Pac-Man move perpetually & change direction when player presses arrow keys; (2) make ghost move
participants P2 <1 Low C++, Java in random direction half the time & towards Pac-Man otherwise; (3) if chasing ghost eats Pac-Man, make Pac-
P3 10 High Java, C++ Man flatten & stop; (4) if Pac-Man eats big dot, make ghost run for 5 sec, then chase (5) if Pac-Man eats running
P4 <1 High Visual Basic ghost, make ghost stop for 5 sec & flatten (6) if Pac-Man eats all dots, make ghost stop & Pac-Man bounce.
Table 2. Total hours programmers spent programming the week of observation, self-rated language expertise, and tasks.

5.1 Experiment Descriptions

We were interested in describing programmers’
breakdowns and errors, and the time spent debugging due
to these errors. Though the studies involved a variety of
tasks and expertise, they were not designed to control for
these two factors. Both observational studies used the
method of Contextual Inquiry [13]. As programmers
worked, the experimenter tracked their goals, and asked
programmers about their strategies and intents when not
apparent. Programmers were also asked to think aloud
about their programming decisions and were videotaped
while they worked.
The first study involved 3 programmers enrolled in the
“Building Virtual Worlds” course offered at Carnegie
Mellon. In the course, students created complex Alice
programs, working on a variety of programming tasks.
Programmers had 6 weeks of prior experience with Alice.
The second study involved 4 programmers creating a
simplified Pac-Man game with one ghost, four small dots,
and one big dot, after a 15-minute tutorial on how to
create code, methods and events. Programmers were
given the objects and layout seen in Figure 2, and were
instructed to follow the specifications listed in Table 2.
Table 2 shows the programmers’ tasks and experience.

5.2 Experiment Results

Each of the videotapes was analyzed for programming

tasks, task goals, goal start and stop times, strategies for
achieving goals (as described by programmers), and
results of using each strategy. From these transcripts,
every breakdown involved in a failure was identified, Figure 3. One of P2’s longer breakdown chains, showing
6 breakdowns, 4 errors, 3 faults, and 3 failures.
along with the resulting errors, faults, failures, and times
at which these occurred. From these breakdowns, the There are many variables that could split such data,
causal chain leading to each failure was constructed, like including activity, problem, action, artifact, task,
the one in Figure 3. In the figure, the instigating environment, language, and expertise. There are also
attentional breakdown in creating the specifications for many useful measures: errors per minute, time spent
the Boolean logic led to a strategic breakdown debugging, percent of errors causing errors, number of
implementing the logic, which led to two errors. These instigating breakdowns, and chain length. For this paper,
errors led to a fault and failure, and further breakdowns. we were interested in a subset of these analyses.
The analysis of the transcripts did not find chains that did Over 895 minutes of observations, there were 69
not lead to failure, so we were unable to reason about instigating and 159 total breakdowns. These caused 102
breakdowns not involved in failures. Furthermore, due to errors, 33 of which led to one or more new errors. The
a lack of data for comparing tasks and expertise, our average chain had 2.3 breakdowns (standard deviation
analyses combined the datasets from the two studies. 2.3) and caused 1.5 errors (standard deviation 1.1).
Activity Actions – frequency (% of all 159 breakdowns)
Problem Create Perceive Understand Implement Modify Reuse Determine Total
Specification Knowledge 1 ( .6) 1 (0.6) 1 ( .6) 1 ( 0.6) 1 ( .6) 1 ( .6)
Attentional n/a n/a n/a n/a
Strategic 5 (3.1) 5 (3.1) 5 ( 3.1) 5 ( 3.1)
Total 6 (3.8) 6 (3.8) 1 ( .6) 1 ( 0.6) 7 ( 4.4) 7 ( 4.4)
Implementation Knowledge 16 (10.1) 17 (10.7) 9 ( 5.7) 17 (10.7) 0 ( 0) 6 ( 3.8) 1 ( .6) 3 (1.9) 26 (16.3) 43 (27.0)
Attentional n/a 1 ( .6) 2 (1.3) 10 ( 6.3) 15 ( 9.4) 2 (1.3) 14 ( 8.8) 3 (1.9) 4 (2.5) n/a 16 (10.1) 35 (22.0)
Strategic 14 ( 8.8) 23 (14.5) 4 (2.5) 18 (11.3) 1 ( .6) 4 (2.5) 19 (11.9) 45 (28.3)
Total 1 ( .6) 2 (1.3) 16 (10.1) 17 (10.7) 33 (20.8) 55 (34.6) 6 (3.8) 38 (23.9) 5 (3.1) 11 (6.9) 61 (38.4) 123 (77.4)
Debugging Knowledge 0 (0) 16 (10.1) 0 ( 0) 16 (10.1)
Attentional n/a 0 ( 0) 1 (0.6) n/a n/a n/a 0 (0) 12 ( 7.5) 0 ( 0) 13 ( 8.2)
Strategic
Total 0 ( 0) 1 (0.6) 0 (0) 28 (17.6) 0 ( 0) 29 (18.2)
Total Knowledge 1(0.6) 1 (0.6) 17 (10.7) 18 (11.3) 9 ( 5.7) 17 (10.7) 0 ( 0) 6 ( 3.8) 1 ( .6) 3 (1.9) 0 (0) 16 (10.0) 27 (17.0) 62 (38.4)
Attentional 2(1.2) 3 (1.9) 10 ( 6.3) 15 ( 9.4) 2 (1.3) 14 ( 8.8) 3 (1.9) 4 (2.5) 0 (0) 12 ( 7.5) 16 (10.1) 47 (30.2)
Strategic 5 (3.1) 5 (3.1) 14 ( 8.8) 23 (14.5) 4 (2.5) 18 (11.3) 1 ( .6) 4 (2.5) 24 (15.1) 50 (31.4)
Total 6 (3.8) 6 (3.8) 2(1.2) 3 (1.9) 17 (10.7) 18 (11.3) 33 (20.8) 55 (34.6) 6 (3.8) 38 (23.9) 5 (3.1) 11 (6.9) 0 (0) 28 (17.6) 67 (42.1) 159 ( 100)
Table 4. Frequency and percent of each combination of activity, problem, and action. Non-bold columns are instigating
breakdowns and bold columns are all breakdowns, instigating or not. All percents are out of all 159 breakdowns.
Frequencies of breakdowns by activity, problems, and Breakdowns Errors Debugging Time
actions are shown in Table 4. Total proportions of Artifact Type
# % of break. # % of errors Mean (SD) in minutes
knowledge, attentional, and strategic breakdowns were Algorithms 37 23.3% 34 33.3% 4.8 (6.2)
similar, but proportions of activities were not. Language Constructs 35 22.0% 31 30.4% 4.6 (5.5)
Libraries 21 13.2% 19 18.6% 7.1 (6.9)
Implementation breakdowns were 77% of breakdowns,
Faults 20 12.6% - - -
and tended to be attentional and strategic breakdowns in Style-specific 18 11.3% 10 9.8% 3.6 (4.2)
implementing and modifying artifacts, and knowledge Errors 9 5.7% - - -
problems with understanding and implementing artifacts. Data Structures 8 5.0% 7 6.9% 3.3 (4.1)
Run-Time Specification 5 3.1% - - -
Debugging breakdowns were 18% of breakdowns, and
Environment 4 2.5% 1 1.0% 1.0 ( - )
tended to be knowledge or attentional problems in Code Specification 2 1.3% - - -
determining errors and faults. Table 4 also shows Failures 0 0% - - -
frequencies of instigating breakdowns: most were Table 5. Frequency and percent of breakdowns and
knowledge problems understanding, and attentional and errors by artifact, and debugging time for errors.
strategic problems implementing artifacts. Errors Debugging Time
Table 5 shows that breakdown tended to be on Problem Action
# % of errors Mean (SD) in minutes
algorithms, language constructs, uses of Alice libraries, Attentional Implementing 3 2.9% 5.2 (4.3)
and style-specific artifacts such as events. Note that about Problem Modifing 4 3.9% 4.6 (7.1)
Reusing 2 2.0% 1.2 (1.2)
19% of breakdowns were on faults and errors, and there Total 9 8.8% 4.0 (5.1)
were no breakdowns perceiving, understanding, or Knowledge Implementing 4 3.9% 4.2 (4.8)
determining failures. Debugging times were highest for Problem Modifying 4 3.9% 5.4 (4.0)
uses of libraries, algorithms, and language constructs. Reusing 1 1.0% 5.0 ( - )
Understand 3 2.9% 6.8 (5.7)
Table 6 shows the errors and time spent debugging by
Total 12 11.8% 5.3 (4.2)
problem and action. Most errors were caused by strategic Strategic Implementing 11 10.8% 4.2 (3.4)
problems implementing, modifying, and reusing artifacts Problem Modifying 13 12.7% 4.7 (5.1)
(rather than understanding or perceiving artifacts). The Reusing 6 5.9% 6.6 (9.3)
variance in debugging times was high, and the longest Total 33 32.4% 5.1 (5.4)
debugging times were on strategic problems reusing and Table 6. Errors and debugging time by cognitive problem
and action. Only actions causing errors are shown.
knowledge problems understanding artifacts.
Table 7 shows that 46 percent of programmers’ time Prog. Time Debugging Time Errors Breakdowns Chains Chain Length
ID
was spent debugging. Programming time was correlated minutes minutes % of time # # # Mean (SD)
with debugging time (r=.96, p<.001) and number of B1 245 142 58.0% 23 41 10 4.1 (3.5)
breakdowns (r=.88, p<.01). Errors were correlated with B2 110 35 32.8% 16 32 7 4.6 (3.3)
B3 50 11 22.0% 3 5 4 1.2 (0.5)
the number of breakdowns (r=.95, p<.001) and number of P1 95 23 36.8% 14 23 11 2.1 (1.7)
breakdown chains (r=.90, p<.01), and programming P2 90 30 33.3% 7 7 7 1.0 (0.0)
(r=.88, p<.01) and debugging time (r=.91, p<.01). P3 215 165 76.7% 34 44 25 1.8 (1.2)
Number of breakdowns was correlated with programming P4 90 27 30.0% 5 7 5 1.4 (0.5)
Total 895 554 46.4% 102 159 69 2.3 (2.2)
(r=.88, p<.01) and debugging time (r=.84, p<.05).
Table 7. Programming and debugging time, and errors,
breakdowns, chains, and chain length by programmer.
5.3 Experiment Discussion 6.1 Supporting Reasoning

The majority of errors in these studies were (1) The model supports theoretical reasoning in a number
knowledge and attentional problems understanding of ways. First, it provides a vocabulary for reasoning
implementation artifacts and (2) attentional and strategic about programming errors and their causes, much like
problems implementing and modifying algorithms, Green’s Cognitive Dimensions of Notations [11] supports
language constructs, and uses of libraries. These errors reasoning about dimensions of programming languages.
forced programmers to spend nearly 50% of their time Like Green’s contribution, our model makes aspects of
debugging on average, and caused 29 knowledge and programming errors explicit. Future studies could identify
attentional breakdowns determining faults and errors, relationships between dimensions of notations and the
leading to further errors. This suggests that, at least in the causes of programming errors. For example, 24% of
tasks observed in this study, even a small number of breakdowns in our study were modification breakdowns.
debugging breakdowns lead to significant time costs. It This suggests that programming systems with structured-
also suggests a likely reason for the cost: because Alice editing environments that have high resistance to local
provides few facilities for inspecting the execution state changes (which Green would call “viscous”) may be
of programs, programmers were unable to attain particularly prone to modification breakdowns and errors.
knowledge about failures, which led to knowledge and Our model also supports reasoning about programming
attentional breakdowns in determining faults and errors, and debugging models. For example, von Mayrhauser and
leading to further errors. This data suggests that Alice Vans’ Integrated Comprehension Model [17] lacks any
needs better support for inspecting the state of execution mention of breakdowns in forming mental models of
and run-time interactions between program elements. specifications or code. Identifying areas where
Another interesting pattern was evident in comparing specification breakdowns can occur may help future
P3 and P4, who both finished the task, but had vastly studies of program comprehension explicitly link aspects
different strategies. For example, in creating a “while of the comprehension process to specific error types. Our
condition is true” event, P3 asked himself “How would I model could augment Blackwell’s Attention Investment
do this in Java?” while P4 asked the experimenter, “Just model of programming activity [2], describing how
to be clear, the ‘begin’ part of the while event only breakdowns and errors influence programmers’
executes once, right?” In these examples, P4 was perception of cost, risk, and investment. Our model also
obtaining knowledge about event concurrency, preventing supports models of debugging, such as Davies’ [9]. He
insufficient specification and implementation knowledge argues that programmers compare mental representations
breakdowns. Not only did P3 lack the knowledge to of the problem and program, but does not account for
prevent these breakdowns, but also his experience with breakdowns in knowledge formation or mismatch
Java caused interfering knowledge problems, leading to correction, which may affect debugging.
strategic breakdowns and errors. These observations show Our model also supports logical reasoning about the
that some strategies of acquiring knowledge about an errors within and between environments, languages, tasks,
unfamiliar programming system are error-prone, while and expertise. The studies reported in this paper are a
others are protective. small example of how the model is used to reason about
Although the analyses in this paper limit the errors within an environment, helping identify the most
conclusions we can draw, they demonstrate how our common breakdowns and error prone artifacts. Future
model of programming errors is helpful in forming studies can compare different programming systems’
hypotheses about errors programming systems and for abilities to prevent breakdowns, which would allow
designing better environments. Furthermore, the data we statements such as “language A is more prone to strategic
gathered in these observations is far from limited. Future problems reusing data structures than language B.”
analyses inspecting programmers’ specific errors will Finally, the model makes explicit what can prevent
reveal more specific design guidelines for more helpful breakdowns. Software engineering focuses on preventing
programming and debugging tools. Such analyses will unforeseen strategic problems in understanding, creating
also provide more insight into precisely what aspects of and modifying specifications. Programming systems
event-based programming make it difficult. focus on preventing implementation and debugging
breakdowns with support such as online documentation,
6. Discussion and colored syntax highlighting. Education focuses on
avoiding knowledge breakdowns. In fact, teaching this
We believe our model of programming errors supports model of errors to programmers might even prevent some
theoretical, educational and design research by helping to breakdowns, by strengthening knowledge and providing
describe, predict, and explain programming errors. foresight about programming and debugging strategies.
6.2 Supporting Design [3] M. Burnett, et al., "Forms/3: A First-Order Visual Language
to Explore the Boundaries of the Spreadsheet Paradigm,"
The model helps design programming systems by Journal of Functional Programming, 11, 2, pp. 155-206, 2001.
helping to identify the breakdowns that cause specific [4] M. Conway, et al., "Alice: Lessons Learned from Building a
errors. For example, from the small number of 3D System For Novices," at Proceedings of CHI 2000, The
observations presented in this paper, the authors learned Hague, The Netherlands, pp. 486-493, 2000.
two important lessons about Alice: (1) on average, [5] C. L. Corritore and S. Wiedenbeck, "Mental Representations
programmers spent 50% of their time debugging errors of Expert Procedural and Object-Oriented Programmers in a
that were caused by unforeseen strategic problems, and Software Maintenance Task," International Journal of Human-
Computer Studies, 50, pp. 61-83, 1999.
(2) debugging was aggravated by knowledge and
attentional breakdowns in determining faults. This [6] S. P. Davies, "Knowledge Restructuring and the Acquisition
suggests that a visualization of concurrent threads of of Programming Expertise," International Journal of Human-
Computer Studies, 40, pp. 703-726, 1994.
execution may prevent debugging breakdowns by
showing information that helps programmers better [7] M. Eisenberg and H. A. Peelle, "APL Learning Bugs," at
perceive and understand failures. APL Conference, pp., 1983.
Also, in reference to her difficulties in modifying a [8] M. Eisenstadt, "Tales of Debugging from the Front Lines," at
complex Boolean logic statement (the breakdown chain in Empirical Studies of Programmers, 5th Workshop, Palo Alto,
Figure 3), P2 remarked, “I’m really having trouble CA, pp. 86-112, 1993.
reading this…I think it’s right, but I can’t really tell…” [9] D. J. Gilmore, "Models of Debugging," Acta Psychologica,
This suggests that a more readable and less viscous pp. 151-173, 1992.
interface for creating logical statements may prevent [10] J. D. Gould, "Some Psychological Evidence on How People
perception and modification errors. Debug Computer Programs," International Journal of Man-
Using this model of programming errors to analyze Machine Studies, 7, pp. 151-182, 1975.
languages, environments and documentation standards [11] T. R. G. Green, "Cognitive Dimensions of Notations," in
could also suggest better design guidelines for tools and People and Computers V, A. Sutcliffe and L. Macaulay, Eds.
notations. For example, an analysis of errors in C++ Cambridge, UK: Cambridge University Press, 1989, 443-460.
would likely support the belief that operator overloading [12] T. R. G. Green, et al., "Parsing-gnisrap: A Model of Device
can cause attentional problems understanding language Use," at Empirical Studies of Programmers: 2nd Workshop, pp.,
1987.
constructs. An analysis of textual specifications would
likely reveal they cause unforeseen strategic problems [13] K. Holtzblatt and H. Beyer, Contextual Design: Defining
later in development, supporting the value of runtime Customer-Centered Systems. San Francisco, CA: Morgan
Kaufmann, 1998.
views in UML notation. Studying the use of print
statements, breakpoints, and watches might reveal that [14] W. L. Johnson, et al., "Bug Catalogue: I," Yale University,
these techniques are helpful in determining faults, but are Boston, MA, Technical Report 286, 1983.
prone to a variety of debugging breakdowns. [15] D. Knuth, "The Errors of TeX," Software: Practice and
Experience, 19, 7, pp. 607-685, 1989.
7. Conclusion [16] A. J. Ko and B. Uttl, "Individual Differences in Program
Comprehension Strategies in an Unfamiliar Programming
This paper presents a model of programming errors System," at International Workshop on Program
derived from past classifications of error types and studies Comprehension, Portland, OR, pp. (to appear), 2003.
of programming. We believe the model will be valuable [17] A. v. Mayrhauser and A. M. Vans, "Program Understanding
for future research on programming errors because it Behavior During Debugging of Large Scale Software," at
provides a common vocabulary for reasoning about Empirical Studies of Programmers, 7th Workshop, Alexandria,
VA, pp. 157-179, 1997.
programming errors, while supporting the description,
prediction, and explanation of programmers’ errors. [18] R. Panko, "What We Know About Spreadsheet Errors,"
Journal of End User Computing, pp. 302-312, 1998.
8. References [19] D. N. Perkins and F. Martin, "Fragile Knowledge and
Neglected Strategies in Novice Programmers," at Empirical
[1] J. R. Anderson and R. Jeffries, "Novice LISP Errors: Studies of Programmers, 1st Workshop, Washington, DC, pp.
Undetected Losses of Information from Working Memory," 213-229, 1986.
Human-Computer Interaction, 1, pp. 107-131, 1985. [20] J. Reason, Human Error. Cambridge, England: Cambridge
[2] A. Blackwell, "First Steps in Programming: A Rationale for University Press, 1990.
Attention Investment Models," at IEEE Symposia on Human- [21] I. Vessey, "Toward a Theory of Computer Program Bugs:
Centric Computing Languages and Environments, Arlington, An Empirical Test," International Journal of Man-Machine
VA, pp. 2-10, 2002. Studies, 30, pp. 23-46, 1989.