Informal PUIs: No Recognition Required
James A. Landay, Jason Hong, Scott Klemmer, James Lin, Mark Newman
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
Berkeley, CA 94720-1776
landay@cs.berkeley.edu
Abstract
The limitations of graphical user interfaces have slowed
the spread of computer usage to the entire population.
Perceptual user interfaces are one approach that can
overcome many of these limitations. Adding perceptual
capabilities, such as speech, sketching, and vision, is the
key to making interfaces more effective. We argue that
informal user interfaces, which do little or no up-front
recognition of the perceptual input, have important
applications and should not be forgotten by perceptual user
interface researchers.
1. Introduction
Today’s graphical user interfaces are limiting the use of
computers to only a segment of our society and in only a
few locations: the desktops of information workers and the
homes of a large, though noninclusive, portion of our
population. Current interfaces do not let users communicate
in ways that they naturally do with other human beings
(National Research Council 1997). Moreover, interfaces do
not take advantage of many of our innate perceptual, motor,
and cognitive abilities. As our society attempts to bring
about universal access to a National Information
Infrastructure and better-paying jobs to a larger percentage
of the population, interfaces will need to rely more on these
abilities.
Overcoming the limitations of GUIs is leading us
towards a future of Perceptual User Interfaces (PUIs).
These interfaces try to “perceive” what the user is doing
using computer vision, speech recognition, and sketch
recognition. Applications can then take advantage of a wide
variety of human abilities, such as speech or gesture, to
achieve a more natural human-computer communication.
We maintain that informal user interfaces, those without
much up-front recognition or interpretation of the input, will
be valuable in future perceptual applications. In particular,
applications in support of creative design or human-human
communication may be better served by preserving the
ambiguity and lack of rigid structure inherent in
uninterpreted perceptual input.
2. WIMPy User Interfaces
Although easier to use than the cryptic command-line
interfaces that preceded them, conventional WIMP
(windows, icons, menus, and pointing) interfaces are still
too hard to use and limited in when and where they can be
used. These problems come mainly from attempts to make
them overly general and from relying on only a few
perceptual and motor skills.
A non-trivial percentage of the population is blind or
has trouble seeing words and letters in ordinary newsprint
[5% of those over age 15 (National Research Council
1997)] and this percentage will only increase as our
population ages. The GUI has been far from a boon for
these users. Many others have limited literacy skills [21%
of Americans over age 16 (U.S. Department of Education
1992)] typing skills, or use of their hands. The later is often
a result of using GUIs.
More importantly, the activities required by today’s
interfaces (e.g., sitting, looking at a screen, typing, and
pointing with a mouse) are too awkward for many
situations. Workers and consumers alike perform tasks in
many different locations. For instance, doctors, delivery
people, and salespeople are often mobile while performing
their jobs. Likewise, in our homes, we often perform
different tasks in different parts of the house (e.g., we might
cook in the kitchen, entertain in the living room, and read in
the bedroom). The standard GUI does not work well when
users are standing up, are using their hands for something
else, or are interacting with another person.
A computational infrastructure that restricts users to one
device in one fixed location is simply that, restrictive. The
problems stated above are a result of relying on an interface
paradigm that was fine in its day, but has grown old,
bloated, and incapable of meeting the needs of the future.
3. Perceptual UIs Can Breakout of the GUI Box
How can we move beyond the 25 year-old GUI legacy
of the Xerox Star? During an invited talk at the CHI ’97
conference, Bill Buxton urged researchers to quit “taking
the GUI as a given” and instead push harder from other
starting points in the design space. In particular, he
encouraged researchers to move towards the goal of an
“invisible computer.”
One such push has come from researchers working on
perceptual user interfaces (PUIs) (Pentland and Darrell
1994, Turk 1997, Huang et al. 1995). These interfaces
support natural modes of input by having the computer try
to “perceive” what the user is doing. The perceptual
processing has generally been restricted to speech
recognition and computer vision. This definition of
perceptual user interface is overly restrictive. Humans
perceive sounds and light and then recognize them as
speech and visual objects, respectively. Likewise, using our
visual system we perceive handwritten words and sketches
and then recognize what the symbols mean. People are
comfortable using their sketching and writing skills
everyday for many different tasks. We are missing an
opportunity to bring natural interfaces to more people by
neglecting these modes. Thus, the definition of perceptual
user interface should also include interfaces using
handwritten and sketched input.
4. Informal and Formal User Interfaces
An important but orthogonal question is how and when
these natural modes should be interpreted. Immediate
recognition imposes structure on the input that can often get
in the way during creative or communications-oriented
tasks, such as brainstorming (Moran et al. 1995) or design
(Landay and Myers 2001). An alternative, informal
approach has recently gained research interest. We use the
term informal user interfaces to describe interfaces
designed to support natural, ambiguous forms of humancomputer interaction without much up-front recognition or
transformation of the input. Informal interfaces are a
reaction to the “overly restrictive computational formats”
(National Research Council 1997) inherent in current
applications.
These restrictive interfaces are often a result of
tradition. Computers have found their greatest success in
applications where they have replaced humans in tasks at
which humans are poor, such as performing complex,
redundant mathematical calculations, or storing and
searching large amounts of data. We use the term formal
user interfaces to describe these precision-oriented
interfaces and perceptual interfaces that try to recognize
human input immediately.
As computers grow more powerful, less expensive, and
more widespread, we expect them to assist us in tasks that
humans do well, such as writing, drawing, and designing.
Unfortunately, the historical strengths of computers have
led to a design bias towards precise computation, and away
from the more human properties of creativity and
communication. Consequently, interfaces are designed to
facilitate structured input rather than natural human
communication, which consists of the imprecise modes of
speaking, writing, gesturing, and sketching.
Most of the work on perceptual user interfaces and
sketch understanding has been biased towards precise
computation and so tries to unambiguously interpret natural
input as soon as possible and structure the data in a form
that is best for the computer. There are several domains that
are better served by a more informal approach.
4.1 Informal Sketching UIs
Design is one domain where an informal approach may
be more appropriate. Current design applications generally
require the user to drag objects from a “palette” to a
“canvas” on which they are manipulated using a mouse.
These objects, such as lines and circles, are often placed
very precisely with an exact location and size. This
precision makes it easy for the computer to represent the
objects in the system. A precise representation is also
important for some tasks, such as constructing the
mechanical drawings for a part that is to be machined.
However, this drawing interface is quite different from the
way we construct drawings without computers: we often
sketch rough, ambiguous drawings using pencil and paper.
Sketching is one mode of informal, perceptual
interaction that has been shown to be especially valuable for
creative design tasks (Gross and Do 1996, Wong 1992). For
designers, the ability to sketch ambiguous objects (i.e.,
those with uncertain types, sizes, shapes, and positions)
rapidly is very important to the creative process. Ambiguity
encourages the designer to explore more ideas in the early
design stages without being burdened by concern for
inappropriate details such as colors, fonts, and precise
alignment. Leaving a sketch uninterpreted, or at least in its
rough state, is the key to preserving this fluidity (Goel
1995).
In this early phase, ambiguity also improves
communication, both with collaborators and the target
audience of the designed artifact. For example, an audience
examining a sketched interface design will be inclined to
focus on the important issues at this early stage, such as the
overall structure and flow of the interaction, while not being
distracted by the details of the look, such as, colors, fonts,
and alignment. When the designer is ready to move past this
stage and focus on the details, the interface can be recreated
in a more formal and precise way. We have built several
design tools that incorporate this style of interface. SILK
(Landay and Myers 2001) is targeted at graphical user
interface design and DENIM (Lin et al. 2000) is targeted at
web site design (see Figure 1). DENIM has little built-in
sketch recognition.
Figure 1. DENIM is a sketch-based web design tool that takes advantage of an informal user
interface. In DENIM’s storyboard view, multiple pages and the links between them are visible.
DENIM tries to group ink on pages into links or graphical objects. Recognizing links between
items and pages is the only other sketch recognition performed by DENIM.
4.2 Informal Speech UIs
Speech is another perceptual style of interaction that is
starting to become popular. Although speech is natural and
inherently human, current speech recognition systems do
not have informal user interfaces. Consider that these
systems generally fail when faced with the ambiguity
inherent in human speech (e.g., “recognize speech” vs.
“wreck a nice beach”). Also, current speech recognition
systems try to recognize human speech and translate it into
an accurate, internal representation that machines can deal
with. These tools also often require or encourage the user to
enter a dialog with the machine to correct mistakes as they
occur1. Although this behavior might be appropriate for
purchasing an airline ticket over the phone, it gets in the
way when the user wants to write, design, or brainstorm.
Several systems from the MIT Media Lab Speech
Group had interfaces that could be considered informal
speech user interfaces. For example, this group produced a
handheld speech notetaker that only uses recognition for
1
Some dictation systems have a mode that allows the user to
ignore corrections until after the capture session.
organizing the uninterpreted notes (Stifelman et al. 1993).
They also built a paper-based notebook that synchronizes
uninterpreted speech with handwritten notes (Stifelman
1996). This area warrants more exploration.
Our research group at Berkeley has developed SUEDE,
a speech UI design tool that uses an informal interface
(Klemmer et al. 2000). SUEDE allows designers to record
their own voice for system prompts and then plays those
prompts back when testing the interface with test
participants (see Figure 2). During a test, the designer acts
as a “wizard” by listening to the test participant’s responses
and then “recognizes” and chooses the next legal state of
the interaction. By not using a speech recognizer and an
associated grammar, the wizard can accommodate multiple
responses from the participant and can use the log of those
responses later in the design process to develop a robust
grammar.
By not requiring the designer to deploy real speech
recognition and synthesis, the interface design process is
expedited in the early stages of design. This allows
designers to make several iterations on their early design
ideas. Fast design iteration has been found to be one of the
keys to creating high quality user interfaces.
Figure 2. SUEDE is a speech user interface design tool that allows a designer to create, test,
and analyze early interface design ideas. This tool does no speech recognition or synthesis. It
instead relies on the designer’s own voice for prompts and their own intelligence to
recognize the end-user’s utterances.
4.3 Informal Handwriting UIs
Handwriting recognition is another style of interaction
that suffers from some of the same problems as speech
recognition. Again, many systems try to recognize writing
as the user writes. This might be useful for taking down
someone’s phone number, but begins to get in the way when
the user is trying to focus on the task at hand (e.g., taking
notes in a talk or writing up a new idea). The dialog with
the recognition system requires users to focus their attention
on the machine, rather than on their task. Deferred
recognition is one way of dealing with this problem. We
have built systems that treat electronic ink as a first-class
type and let users write like they naturally do on paper
(Davis et al. 1999). Others have built similar informal
systems that allow searching of electronic ink (Poon,
Weber, and Cass 1995).
5. Conclusions
The primary tenet of work in informal user interfaces is
to bend computers to people’s mode of interaction, not the
other way around. For some domains it is clear that the
formal approach imposed by conventional interfaces
presents an obstacle to effective task performance. One
such domain that we have investigated is user interface
design itself, and this led to our work on SILK (Landay and
Myers 2001) and DENIM (Lin et al. 2000), sketching tools
for early stage interface and web design, respectively. We
believe that informal speech and handwriting interfaces may
also be useful. Any application that requires interaction
between humans and computers, and in which absolute
precision is unnecessary, could benefit from an informal
approach.
PUIs relying on a single input modality will likely only
see success in limited domains. This is for the same reasons
that GUIs are limited: using only a single input modality
will be good for some tasks and not for others. On the other
hand, a multimodal interface with multiple perceptual and
GUI capabilities will often be more natural to use (Mignot,
Valot and Carbonell 1993, Waibel et al. 1997). Successful
multimodal user interfaces may support uninterpreted
sketches and other informal styles, in addition to interpreted
speech, recognized sketches and other more-studied
modalities.
References
Davis, R.C., et al. 1999. NotePals: Lightweight Note
Sharing by the Group, for the Group. In Proceedings of
Human Factors in Computing Systems: CHI '99, 338-345.
Pittsburgh, PA.
Goel, V. 1995. Sketches of Thought. Cambridge, MA: The
MIT Press.
Gross, M.D. and Do, E.Y. 1996. Ambiguous Intentions: A
Paper-like Interface for Creative Design. In Proceedings of
ACM Symposium on User Interface Software and
Technology, 183-192. Seattle, WA.
Huang, X.; Acero, A.; Alleva, F.; Hwang, M.Y.; Jiang, L.;
and Mahajan, M. 1995. Microsoft Windows highly
intelligent speech recognizer: Whisper, in Proceedings of
1995 International Conference on Acoustics, Speech, and
Signal Processing, vol. 1, 93-6
Klemmer, S.R.; Sinha, A.K.; Chen, J.; Landay, J.A.;
Aboobaker, N.; and Wang, A., SUEDE: A Wizard of Oz
Prototyping Tool for Speech User Interfaces. CHI Letters,
2000. 2(2): 1-10.
Landay, J.A. and Myers, B.A., Sketching Interfaces:
Toward More Human Interface Design. IEEE Computer,
2001. 34(3): 56-64.
Lin, J.; Newman, M.W.; Hong, J.I.; and Landay, J.A.,
DENIM: Finding a tighter fit between tools and practice for
web site design. CHI Letters, 2000. 2(1): 510-517.
Mignot, C.; Valot, C.; and Carbonell, N. 1993. An
Experimental Study of Future 'Natural' Multimodal HumanComputer Interaction, in Proceedings of ACM
INTERCHI'93 Conference on Human Factors in
Computing Systems -- Adjunct Proceedings. 67-68.
Moran, T.P.; Chiu, P.; Melle, W.v.; and Kurtenbach, G.
1995. Implicit Structures for Pen-Based Systems Within a
Freeform Interaction Paradigm. In Proceedings of Human
Factors in Computing Systems, 487-494. Denver, CO.
National Research Council. 1997. More than screen deep:
toward every-citizen interfaces to the nation's information
infrastructure. Washington, D.C.: National Academy Press.
Pentland, A.P. and Darrell, T. 1994. Visual perception of
human bodies and faces for multi-modal interfaces, in 1994
International Conference on Spoken Language Processing,
vol. 2, 543-6
Poon, A.; Weber, K.; and Cass, T. 1995. Scribbler: A Tool
for Searching Digital Ink. In Proceedings of Human
Factors in Computing Systems, 252-253. Denver, CO.
Stifelman, L.J. 1996. Augmenting Real-World Objects: A
Paper-Based Audio Notebook. In Proceedings of Human
Factors in Computing Systems, 199-200. Vancouver,
Canada.
Stifelman, L.J.; Arons, B.; Schmandt, C.; and Hulteen, E.A.
1993. VoiceNotes: A Speech Interface for a Hand-Held
Voice Notetaker, in Proceedings of ACM INTERCHI'93
Conference on Human Factors in Computing Systems. 179186.
Turk, M. First Workshop on Perceptual User Interfaces, ed:
Banff, Alberta, Canada, 1997.
U.S. Department of Education. 1992. 1992 National Adult
Literacy Survey. Washington, DC: U.S. Government
Printing Office.
Waibel, A.; Suhm, B.; Vo, M.T.; and Yang, J. 1997.
Multimodal interfaces for multimedia information agents, in
Proceedings of 1997 IEEE International Conference on
Acoustics, Speech, and Signal Processing, Vol. 1, 167-70.
Wong, Y.Y. 1992. Rough and Ready Prototypes: Lessons
From Graphic Design. In Proceedings of Human Factors in
Computing Systems, 83-84. Monterey, CA.