[go: up one dir, main page]

0% found this document useful (0 votes)
36 views8 pages

Foley Automatic

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Foley Automatic

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

FOLEYAUTOMATIC:

Physically-based Sound Effects for Interactive Simulation and Animation


Kees van den Doel, Paul G. Kry, and Dinesh K. Pai
University of British Columbia
{kvdoel | pgkry | pai}@cs.ubc.ca

(a) Real rock in wok (b) Virtual rock in wok (c) Rolling and Sliding (d) Interaction

Figure 1: Animations for which sound effects were automatically added by our system, demonstrated in the accompanying
video. (a) A real wok in which a pebble is thrown; the pebble rattles around the wok and comes to rest after wobbling. (b)
A simulation of a pebble thrown in wok, with all sound effects automatically generated. (c) A ball rolling back and forth on
a ribbed surface. (d) Interaction with a sonified object.

Abstract Additional Key Words: Animation Systems, Computer Games,


Multimedia, Physically Based Animation, Physically Based Modeling,
We describe algorithms for real-time synthesis of realistic Sound Visualization, Virtual Reality, Head Mounted Displays.
sound effects for interactive simulations (e.g., games) and
animation. These sound effects are produced automatically,
from 3D models using dynamic simulation and user interac- 1 Introduction
tion. We develop algorithms that are efficient, physically-
based, and can be controlled by users in natural ways. We The importance of sound in computer graphics and inter-
develop effective techniques for producing high quality con- action has been recognized for a long time. Sounds are
tinuous contact sounds from dynamic simulations running at known to be useful for human-computer interfaces in gen-
video rates which are slow relative to audio synthesis. We ac- eral [3, 4, 15, 16]; Buxton [4] points out that sound can be
complish this using modal models driven by contact forces used for alarms and warnings, status and monitoring indi-
modeled at audio rates, which are much higher than the cators, and encoded messages. Sounds play a similar but
graphics frame rate. The contact forces can be computed more subtle role in animation and interaction, conveying —
from simulations or can be custom designed. We demon- in addition to quantitative information — a sense of pres-
strate the effectiveness with complex realistic simulations. ence, realism, and quality. Sound effects, sometimes called
CR Categories and Subject Descriptors: I.3.7 [Computer Graph- Foley sounds, are therefore widely used in animation, film
ics]: Three-Dimensional Graphics and Realism - Animation, Virtual and games industries.
reality; I.3.5 [Computer Graphics]: Computational Geometry and Ob- However, the creation of sound effects remains a slow, la-
ject Modeling - Physically based modeling; I.6.5 [Simulation and Mod- bor intensive process; sound effects need to be added by hand
eling]: Model Development - Modeling methodologies; I.6.8 [Simula-
by talented sound designers. With the system described in
tion and Modeling]: Types of Simulation - Animation, Combined,
Gaming; H.5.5 [Information Interfaces and Presentation (e.g., HCI)]:
this paper, many types of sound effects due to contact in-
Sound and Music Computing - Methodologies and techniques, Model- teraction can be synthesized automatically. Fig. 1 shows
ing; H.5.2 [Information Interfaces and Presentation (e.g., HCI)]: User images of some examples, which are presented with audio
Interfaces - Auditory (non-speech) feedback. on the accompanying video tape.
The desire to remedy this situation, by automatically syn-
thesizing sound effects based on the physics of interaction,
also has a long history. For instance, Gaver, in his pioneering
work [14, 15] discussed how contact sounds are used in “ev-
eryday listening” and can be synthesized for simple objects
like bars. In another pioneering step, Takala and Hahn [39]
described a general methodology for producing sound effects
for animation. A sound could be attached to each object,
triggered by events and synchronized with the animation,
and rendered using a sound pipeline analogous to the usual
image rendering pipeline. They also describe synthesis of
collision sounds using a version of modal synthesis that we
describe below. We discuss other relevant work in §1.1. ing continuous contact sounds; (2) describe a dynamic sim-
Despite these early successes nearly a decade ago, auto- ulation algorithm for computing contact forces suitable for
matic sound synthesis is still not an integral part of anima- sound synthesis; (3) develop micro-simulation techniques for
tion or interactive simulations such as games. Why is this? producing high resolution “audio force” from nominal con-
We speculate that there are several reasons. tact forces produced by dynamics simulation at video rates;
First, the models of the physical interaction used previ- (4) demonstrate an implemented system which combines all
ously correspond to simple impacts which triggered sounds. these techniques, and utilizes sound maps that include vari-
This works well for animations of bouncing objects but fails ations in timbre and roughness over the surface of an object.
to capture the subtleties of continuous contact sounds pro- By focusing on the specific, but large, category of interac-
duced by sliding and rolling which inevitably accompany the tion sounds due to contact, we are able to incorporate more
interaction. It is precisely these kinds of continuous contact of the relevant physical phenomena to constrain the sound
sounds which depend on the physical behavior of the ani- synthesis; the resulting sounds are not only synchronized to
mation that are most difficult for humans to create by hand subtle contact variations, but are also of high quality. We
and would benefit from automatic synthesis. believe that the work described in this paper has the po-
Second, the physical simulation used to drive the sound tential to finally realize the dream of automatic synthesis of
synthesis had the wrong temporal and spatial scales, having interaction sounds in animation and interactive simulations
been originally designed for visual simulation. High qual- such as games.
ity audio synthesis occurs at a sampling rate of 44.1 KHz,
about 1000 times faster than the graphics. It is necessary to 1.1 Related Work
bridge this gap since many surface properties such as rough-
ness produce force (and perceptible sound) variations that Apart from the pioneering work mentioned in the introduc-
can not be captured at the visual simulation rate. Running a tion, a number of studies dealing with sound-effects in ani-
detailed simulation using FEM at audio rates is an interest- mations have appeared.
ing possibility [28], but it is expensive, requiring extremely Sound-effects for virtual musical instrument animation
small time-steps. It is also difficult to integrate such an ap- were investigated by Cook [7]. In [17] a number of synthesis
proach with rigid body simulators, which are widely used in algorithms for contact sound are introduced which include a
animation. On the spatial scale, physical simulations were scraping algorithm. Synthesizing sounds from FEM simula-
also designed for polyhedral models, which produce annoy- tions is described in [28]. Synthesizing sound effects tightly
ing auditory artifacts due to discontinuities in the contact synchronized with haptic force display is described in [10].
force. For rolling, methods for dynamic simulation of con- Automated measurement of sound synthesis parameters is
tact between smooth surfaces are needed. described in [30].
Third, the models associated single sounds with objects, Many studies of room acoustics and three-dimensional
and failed to account for the variation in sound timbre and sound in graphical environments have appeared. These stud-
surface properties over the surface. The importance of tim- ies are orthogonal to the work presented here and provide
bral variation was described by [45] but they only described algorithms to further process the contact sounds described in
simple impact interactions, and did not integrate the surface this paper to add additional perceptual cues about the envi-
variation with 3D geometric models in a dynamics simula- ronment and sound location. See, for instance, [42, 13, 32],
tion. While it has been believed that it should be possible to the book [2], and references therein.
associate timbre and roughness textures with the 3D geome- Numerous publications on the synthesis of musical in-
try, just as one associates color and material information for strument sounds have appeared. Most musical instrument
visual rendering, this hasn’t been effectively demonstrated synthesis algorithms use the unit-generator paradigm intro-
previously in part because it was not clear exactly what val- duced by Mathews [24]. In [21] audio synthesis is discussed
ues must be stored in the texture. from a theoretical point of view and synthesis techniques
Finally, previous work attempted to synthesize too large a are analyzed using various criteria. Additive synthesis algo-
category of sounds from a single class of models, attempting rithms based on the FFT were described in [11]. Synthesis
to synthesize sounds ranging from contact sounds to animal of xylophone sounds, which are close to the sounds we are
cries. This often resulted in poor quality sounds, because the interested in, is described in [5]. A number of algorithms
underlying model does not sufficiently constrain the sounds for percussion sounds were described in [8]. In [34, 31] an
to be realistic. They also force the user to think about sound analysis-synthesis approach was given using a decomposition
simulation at too low a level, in terms of waveforms and filter into sinusoidal and residual components.
graphs. While experienced sound designers can use these Tribology research on noise generated by sliding and
tools to produce good sounds, most users find it extremely rolling contacts has focused mainly on machine noises. In [1]
difficult. the role of surface irregularities in the production of noise in
All these factors add up to the fact that automatically gen- rolling and sliding contacts was investigated. In [19, 26] the
erated sounds could not compete with the realism and sub- nonlinear vibrations at a Hertzian contact were studied.
tlety of recorded sounds massaged by an experienced sound The perception of object properties from their sounds is
designer. While we believe that there will always be a need studied, for instance, in [14, 16, 23].
for hand construction of some sound effects, we believe it
is not necessary for many animations, and impossible for
1.2 Overview
interactive simulations where the interaction is not known
in advance. We will show in this paper how the problems The new contributions of this paper are the theoretical mod-
listed above can be effectively addressed, making automatic els for contact interactions for impact, rolling, and sliding,
synthesis of contact sounds practical and attractive for ani- and their close integration with the dynamics of simulations,
mation and interactive simulation. and the extension of modal synthesis to incorporate multiple
Specifically, we (1) show why the modal synthesis tech- simultaneous continuous interactions at different locations
nique is an appropriate model for interaction sounds includ- with realistic timbre shifts depending on the contact point.
We have implemented a software system called FoleyAu- spanned by the gains of the modes. At intermediate points
tomatic, composed of a dynamics simulator, a graphics ren- we use a barycentric interpolation scheme. Audible artifacts
derer, and an audio modeler to create interactive simulations in the form of too abrupt changes in timbre can result from
with high-quality synthetic sound. too coarse a surface sampling. We set the density of the
The remainder of this paper is organized as follows. In surface sampling by trial and error. This map of gains on the
Section 2 we review modal synthesis, describe how we use surface samples the absolute value of the mode-shapes [45],
it to obtain contact location dependent sounds using the and with sufficiently dense sampling the mode-shapes could
modal gains mapped to the geometry (the “audio-texture”) be recovered. The timbre of the sound can also be affected
and describe how we obtain parameters for physical objects. by changing the spectral content of the excitation force, as is
Section 3 discusses the use of a dynamics simulator in an- customary in waveguide models [35] for bowed and plucked
imation and simulation and how it can be used to provide strings for example, however this is an expensive operation
control parameters which can drive realistic responsive au- to perform in real-time on the audio-force, whereas changing
dio synthesis. In Section 4 we introduce novel algorithms for the gain vector has negligible computational cost.
physically parameterized impact forces, scraping and sliding
forces, and rolling forces, which are used to excite the modal
resonance models. Section 5 describes the implementations 3 Dynamic Simulation for Sound
we made to demonstrate the effectiveness of the algorithms
and we present our conclusions in Section 6. The audio synthesis techniques described in this paper can
be incorporated with most multi-body methods since the pa-
rameters needed for audio synthesis are often directly avail-
2 Modal Resonance Models able or easily computable from the simulation. For example,
multi-body methods commonly compute constraint forces
Interacting solid objects can make a variety of different during periods of continuous contact and impulses to re-
sounds depending on how and where they are struck, solve collision when transient contacts occur. In addition,
scraped, or rolled. During such interactions, the rapidly the speed of the contact on each surface and the slip velocity
varying forces at the contact points cause deformations to necessary for rolling and sliding sounds, though not readily
propagate through the solid, causing its outer surfaces to available as by-products in most simulations, are easily com-
vibrate and emit sound waves. putable from the relative velocity and the contact location.
A good physically motivated synthesis model for solid ob- We observe nevertheless there are features which make
jects is modal synthesis [46, 15, 25, 8, 44, 45], which models some multi-body methods more desirable than others. Be-
a vibrating object by a bank of damped harmonic oscillators cause continuous contact audio synthesis is parameterized
which are excited by an external stimulus. by contact forces and velocities, multi-body methods that
The modal model M = {f , d, A}, consists of a vector f accurately simulate rolling and sliding contacts are prefer-
of length N whose components are the modal frequencies, a able. More importantly, smooth surface models should be
vector d of length N whose components are the decay rates, used because the discontinuities that arise in dealing with
and an N × K matrix A, whose elements ank are the gains polyhedral approximations do not lead to sufficiently smooth
for each mode at different locations. The modeled response rolling forces.
for an impulse at location k is given by We have developed a simulation method for continuous
contact which is particularly suited to sound generation for
N
X exactly these reasons. Our technique uses rigid bodies de-
yk (t) = ank e−dn t sin(2πfn t), (1) fined by piecewise parametric surfaces. In particular, we
n=1 use Loop subdivision surfaces with parametric evaluation as
described in [36].
for t ≥ 0 (and is zero for t < 0). The frequencies and
dampings of the oscillators are determined by the geometry
3.1 Contact Evolution
and material properties (such as elasticity) of the object,
and the coupling gains of the modes are related to the mode Our method evolves a system of contacting bodies in a re-
shapes and are dependent on the contact location on the duced set of coordinates. Suppose body 1 and body 2 are
object. in contact. Let the shape of the contacting patch on body
This model is physically well motivated, as the linear par- 1 be described by the function c : (s, t) → R3 , and on body
tial differential equation for a vibrating system, with appro- 2 by d : (u, v) → R3 . We can describe any contact config-
priate boundary conditions, has these solutions. Another uration between patches c and d of bodies 1 and 2 by the
attractive feature of modal models is that the number of 2-dimensional location of the contact in the domain of each
modes can be changed dynamically, depending on available patch along with the angle of rotation ψ shown in Fig. 2.
computational resources, with a graceful degradation in au- Assembled in a column vector, we call these 5 independent
dio quality. A modal resonator bank can be computed very variables the contact coordinates and we denote it q; i.e.,
efficiently with an O(N ) algorithm [15, 45, 43] for a model q = (s t u v ψ)T .
of N modes. As described in [29], we use contact kinematics equations
The modal model parameters can be obtained from first which relate the relative motion between two smooth con-
principles for some very simple geometries [45], or they could tacting bodies to a change in the contact coordinates. For a
be created by “hand and hear”, for example using a modal given contact configuration, q, these equations can be writ-
model editor such as described in [6], but for realistic com- ten as φ = H q̇, where φ is the relative spatial velocity, H is
plex objects we obtain them by fitting the parameters to a linear transformation from 5 dimensional space of contact
recorded sounds of real objects (see [30]). coordinate velocities into a 5 dimensional subspace of the 6
Points on the surface of the object where we have sampled dimensional space of spatial velocities.
impulse responses are mapped to points in the “sound-space” The Netwon-Euler equations for a rigid body can be com-
of the object, which we define in this context as the space bined with the derivative of the contact kinematics equations
2 actions involve some random element. We will refer to the
contact force sampled at audio rate as the “audio-force”, to
1c z distinguish it from the “dynamics-force” which is sampled
2c x much coarser. See Fig. 3.
ψ 2c y
1c y
The simulation of the audio-force is similar to audio syn-
1c x
2c z thesis, but it does not need to be of high auditory quality,
since it will not be heard directly, but is used to excite res-
onance models.
1
4.1 Impact
When two solid bodies collide, large forces are applied for a
Figure 2: Contact coordinates are defined by the parameter short period of time. The precise details of the contact force
space location of the contact on each patch along with the will depend on the shape of the contact areas as well as on
angle ψ. the elastic properties of the involved materials.
The two most important distinguishing characteristics of
an impact on an object are the energy transfer in the strike
and the “hardness” of the contact [12]. The hardness affects
to form an ordinary differential equation which we solve for
the duration of the force, and the energy transfer relates
contact coordinate accelerations. The reduced coordinate
directly to the magnitude of the force profile.
dynamics equations, though slightly more complex, can be
A generic model of contact forces based on the Hertz
integrated easily using explicit integrators without the need
model and the radii of curvatures of the surfaces in con-
for stabilization as truncation errors do not cause interpen-
tact was considered in [22]. A Hertzian model was also used
etration. Since the constraint in this formulation is bilat-
to create a detailed model of the interaction forces between
eral, we check for when the constraint force goes negative
the mallet and the bars of a xylophone [5].
and allow the objects to separate. Once separate, we al-
A simple function which has the qualitative correct form
low the bodies evolve freely until we observe transient colli-
for an impact force is, for example, 1 − cos(2πt/τ ) for
sions occurring in close succession and proximity, at which
0 ≤ t ≤ τ , with τ the total duration of the contact. The force
time we switch back to reduced coordinates. We use sphere
increases slowly in the beginning, representing a gradual in-
trees built from a polyhedral approximations of our mod-
crease in contact area, and then rises rapidly, representing
els for collision detection. Approximate contact coordinates
the elastic compression of the materials.
are computed from the minimum distance reported by the
We have experimented with a number of force profiles and
sphere tree. Before using these coordinates we apply a few
found that the exact details of the shape is relatively unim-
Newton iterations, as described in [27], to find a better ap-
portant and the hardness is conveyed well by the duration.
proximation of the contact point on the smooth surfaces.
For “very hard” collisions such as a marble on a stone
floor, the impact events are very fast (times of ∼ 50µs for a
4 Micro-simulation of Contact Interactions single contact break were measured in [38]). However, such
contacts sound too “clean” in practice when represented by
a single non-zero sample. Experimental data [38] shows that
dynamics force
there are sequences of very fast contact separations and colli-
audio force
3
sions during hard impacts. These micro collisions are caused
2.5
by modal vibrations of the objects involved, and we have
simulated this by a short burst of impulse trains at the dom-
2
inant modal frequencies. Informal experiments showed that
Newtons

1.5
the microcollisions take place within 15ms for objects we
studied. An impact audio-force composed of impulses at the
1
first 4 modal resonance frequencies sounded very convinc-
0.5
ing and would be appropriate for hitting a large resonating
object with a small hard object.
0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
t(seconds)

4.2 Scraping and Sliding


Figure 3: The dynamical force (blue) and the audio force
(red) change in time at different rates. Resonance models During scraping and sliding the audio-force is generated by
are excited by the difference between the audio-force and combining an effective surface roughness model and an inter-
the dynamics force, i.e., the rapidly varying part. Scales are action model. The audio-force will depend on the combined
exaggerated in this picture. surface properties of the objects in contact.

Interaction Models
Even though modal models by themselves are a useful
class of models, that by itself is not sufficient. For realis- A simple and effective model of scraping a surface is the
tic contact sounds we need good, physically-based models of phonograph needle model, whereby a needle exactly follows
both the resonators and the contact interactions. The sim- a given surface track. In order to generate the audio-force
ulation of the contact interactions needs to be time stepped at a sampling rate of fs , we generate an effective surface
at least at the audio sampling rate, which is generally much profile at a spatial resolution of vmax /fs , where vmax is the
higher than the simulation rate or the graphics frame-rate. maximum contact velocity for which the model is applicable.
Therefore the models need to be fast and simple. Often This ensures that at the maximum contact speed the surface
stochastic models are appropriate, since most contact inter- profile can get sampled at the audio sampling rate. For a
homogeneous surface this is easy to implement with a wave- resolution of 0.03 mm and making an autoregressive fit using
table of a surface profile which will be “played back” at the the covariance method.
rate v/vmax for a contact moving at speed v, where a rate scraping plastic
60
of one represents the audio-force at maximum speed. This slow
fast
wave-table can be constructed from a mathematical rough-
ness model as explained below or obtained experimentally. 40
The wave-table should be oversampled in order to decrease
the reduction of quality on slowdown.
For resampling we use a linear interpolation algorithm. 20

Spurious high frequency components resulting from the dis-


continuities of the first derivative can be reduced by using

G(f)
0
a computationally more expensive quadratic interpolation
scheme (or even more sophisticated algorithms to filter out
artifacts) but this offered no audible improvement in prac- −20

tice. Such a model is appropriate for scraping a rough sur-


face with an edge, or for sliding interaction as depicted in
−40
Fig. 4.

−60
1 2 3 4 5
10 10 10 10 10
Ffriction A v
f

Fn collisions Figure 5: Power spectrum scraping smooth plastic. The


fractal dimension extracted from a linear fit is roughly
D=.88. The fractal model appears valid up to about 1000
Hz.
B

4.3 Rolling
Figure 4: Sliding involves multiple micro-collisions at the
contact area Like scraping forces, rolling forces are produced by irregu-
larities of the surfaces in contact, but because the surfaces
have no relative speed at the contact point, the contact in-
For Coulomb friction, Ff riction = µF √normal , the scraping teractions are different and this is reflected in a difference in
audio-force volume is proportional to vFnormal , assuming sound.
the force is proportional to the frictional power loss. Rolling seems to be a less understood interaction as far as
audio is concerned. Experimental studies of rolling sounds
Effective Profile Models have focused mainly on very specific machine sounds like
train wheels [41] and do not suggest a universal rolling
We want to create a simple synthetic model of a scraping model.
force, which has the perceptual dimensions of roughness, What exactly causes the perception of rolling versus slid-
contact speed, and average contact force. ing is also not known. Some studies [20] suggest that the pe-
We characterize the profile by noise with an overall spec- riodicity of rolling sounds plays an important role in distin-
tral shape, on which one or more peaks are superimposed. guishing it from sliding. However, we believe rolling sounds
The spectral peak frequencies are moved with the contact are different in other respects too.
velocity and provide the perception of changing pitch at dif- Why do we hear anything when a smooth ball rolls on a
ferent speeds. rough table? A possible reason is that the surface of the ball
We implemented this by generating fractal noise, which rides on the asperities of the surface, and constantly collides
is noise with a power spectrum proportional to ω β , passed with them but because of the large (w.r.t. the length scale
through a reson filter [37]. The parameter β provides a sur- of the roughness) radius the ball only “sees” the large scale
face roughness parameter and is related to the fractal di- surface structure. The smaller the ball, the more small de-
mension D by D = β/2 + 2. Empirical data shows [33, 40] tails are “felt” by the rolling. Because the collisions in rolling
than many surfaces are fractal over a wide range of scales. occur just in front of the contact area (very small area) the
Values of D between 1.17 and 1.39 were reported for vari- downward motion of the ball is very small so the collisions
ous machined surfaces at a length scale of 10−6 m. At CD will be very soft, i.e., drawn out in time. See Fig. 6. This
quality audio sampling rate of 44100Hz this would corre- suggests rolling interactions are similar to scraping interac-
spond to a sliding speed of about 5cm/s. The frequency of tions but because of this size effect only the low frequency
the reson is scaled with the contact velocity to produce the content of the effective profile plays a role.
illusion of scraping at different speeds. The width of the res- This suggests a similar model as for scraping but with an
onance influences the simulated randomness of the surface, additional low-pass filter with adjustable cutoff frequency
narrow peaks sounding more pitched. Informal experiments capturing the rolling quality.
suggest that the fractal dimension correlates well with the Though this simple model provides reasonably convincing
perception of auditory roughness. rolling sounds, they are not as convincing as the scraping and
We can obtain the fractal dimension from a recording of sliding sounds. Analysis of recorded rolling sounds suggested
a scrape interaction with a contact microphone and fitting that the rolling force couples stronger to the modes than the
the power spectrum. See Fig. 5. The reson parameters were sliding force. This would mean the linear model of an inde-
obtained by measuring the surface structure of objects at a pendent audio-force being applied to a linearly vibrating ob-
vc ~ d v / R We created a detailed dynamic simulation of a pebble in
R a metal wok, that can bounce around, roll, and slide. A
A v modal model of the wok was created from measurements of
imminent collision a real wok. It would be nice to use a scientific method to
contact region objectively select parameters like the density of the surface
d sampling or the number of modes rendered, but this is not
possible due to the lack of objective criteria to determine how
well a given synthetic sound is perceived as approximating a
B target sound. Lacking such a perceptual metric, we had to
resort to “earballing” the approximations, with user input.
The construction of such a measure would greatly advance
Figure 6: Rolling audio-force. The collision velocity vc is the state of the art of our models.
related to the contact region d as indicated.
The impulse response was recorded at 5 locations at
equidistant radial points, and a modal model of 50 modes
was extracted (of which only 10 contribute significantly). A
ject is no longer applicable, as the rolling audio-force seems set of 5 gain vectors a (of dimension 50) was obtained. We
to “know” the modes of the object. We speculate this is be- display the “brightness” of the sound-map, as measured by
cause the surfaces are not moving relative to each other and the frequency weighted average of the gains in Fig. 7. The
are therefore in stronger contact than during sliding, which
could generate a kind of feedback mechanism leading to this
effect.
This observation is consistent with the observation in [18]
that a gamma-tone model with γ = 2 driven by noise gen-
erates better rolling sounds than a pure modal model. A
γ = 2 model driven by noise has the same spectrum as a
reson
p filter driven by noise with spectral envelope S(ω) =
1/ (ω − ρ)2 + d2 , with ρ and d the frequency and damping
of the reson. This spectral envelope is enhanced near the
objects resonance modes, as we observed in data. Figure 7: Sound map on wok. Visual brightness is mapped
We have experimented with this real-time spectral mod- onto audio brightness.
ification of the rolling-force to obtain what appear to be
better rolling sounds, at the price of the extra computation surface roughness was measured with a contact mike and
required for the filtering of the audio-force. Further study is was used to construct audio surface profiles. Our dynamics
needed to carefully evaluate the improvement. simulator was set up to simulate a small, irregularly shaped
rock being thrown into the wok. No more than one con-
tact could occur at a time. This simulation was then used
5 Results to drive the audio synthesis and graphics display simulta-
neously. In Fig. 8 we show dynamics variables driving the
We have implemented the ideas presented above in our Fo- audio. Various other examples are shown in the accompa-
leyAutomatic system. An audio synthesis toolkit was de- nying video, and demonstrate impacts, sliding and rolling.
veloped in pure Java. The toolkit consists of three layers
of software objects, a filter-graph layer which provides ob-
jects to build unit-generator graphs [24], a generator layer 5.1 Discussion
with basic audio processing blocks such as wave-tables and
filters, and a model layer which contains implementations of Although our audio algorithms are physically well moti-
the physical models described above. For efficiency, the au- vated, they are not derived using first principles from known
dio processing elements are all vectorized, i.e., they process physical laws. The physical phenomena responsible for the
audio in frames (typically 20 ms blocks) of samples rather generation of contact sounds, such as the detailed micro-
than on a sample per sample basis as in for example [9]. On a collision dynamics at the interface of sliding surfaces, are
900M hz Pentium III we found that we can synthesize about so complex that such an approach to interactive simulation
800 modes at a sampling rate of 44100Hz, good enough for seems infeasible.
about 10 reasonably complex objects, with enough cycles to We evaluated the quality of our generated audio infor-
spare for the graphics and the dynamics. mally, by comparing the animation with reality. See, for
Our dynamics simulation is implemented in Java and uses example, the video of the real versus the animated pebble
Java3D for the graphics rendering. We use Loop subdivision thrown in a wok. As there are currently no existing real-
surfaces to describe the boundaries of objects in our simula- time audio synthesis systems to compare our results with,
tions. Sphere trees detect collisions between the surfaces. our evaluation remains qualitative.
The simulator runs in real time for the relatively simple Ultimately, one would like to assign an audio quality met-
examples shown on the accompanying video, and can also ric to the synthesized sound-effects, which measures how
record simulations along with audio parameters to be played good the audio model is. There are two obstacles preventing
back later in real time for sound design and higher quality us from constructing such a measure. First, phenomena such
graphics rendering. as an object falling and rolling have sensitive dependency on
Fig. 1(d) shows real time interaction with a model of bell. initial conditions, and it is impossible to compute the exact
Using a PHANToM haptic interface we can tap and scrape trajectories using Newton’s laws. In order to compare the
the sides of the bell with a virtual screwdriver. Audio for real sound with the synthesized sound one therefore has to
the bell is generated with the 20 largest modes and sounds somehow extract the perceptually relevant statistical prop-
very convincing. erties of complex sounds. This is an unsolved and important
normalForce desktop computers without special hardware.
5 We believe these algorithms have the potential to dra-
matically increase the feeling of realism and immersion in
0 interactive simulations, by providing high-quality audio ef-
relsurfv
0.4 fects which provide important perceptual cues which, com-
0.2
bined with high quality graphics and dynamics simulation,
provide a much more compelling illusion of reality than the
0
tanv1 sum of their contributions.
0.5

0
References
tanv2

0.5 [1] T. Ananthapadmanaban and V. Radhakrishnan. An


investigation on the role of surface irregularities in the
0 noise spectrum of rolling and sliding contacts. Wear,
impacts
1500 83:399–409, 1982.
1000
500 [2] D. R. Begault. 3-D Sound for Virtual Reality and Mul-
0
0 2 4 6 8 10 12 timedia. Academic Press, London, 1994.

Figure 8: Dynamics variables driving audio for a single ob- [3] W. Buxton. Introduction to this special issue on non-
ject dropped in a wok. Shown are the normal force, the slid- speech audio. Human Computer Interaction, 4(1):1–9,
ing speed, rolling speeds (speed of contact point w.r.t. the 1989.
surface) on both objects, and the impact force.. The motion
(also shown on the video) clearly shows that the object first [4] W. Buxton. Using our ears: an introduction to the use
bounces several times, then performs a roll-slide motion on of nonspeech audio cues. In E. Farrell, editor, Extracting
the curved surface of the wok, and finally wobbles around at meaning from complex data: processing, display, inter-
the bottom. action. Proceedings of the SPIE, volume Vol 1259, pages
124–127, 1990.

[5] A. Chaigne and V. Doutaut. Numerical simulations of


problem. Second, even for reproducible events such as sin- xylophones. I. Time domain modeling of the vibrating
gle impacts (for example striking a bell) there is no obvious bars. J. Acoust. Soc. Am., 101(1):539–557, 1997.
metric to compare synthesized and real sound. Obvious mea-
sures such as the least square distance of the sampled audio [6] A. Chaudhary, A. Freed, S. Khoury, and D. Wessel. A 3-
are not satisfactory because they do not correspond at all to D Graphical User Interface for Resonance Modeling. In
perception. We believe constructing such a measure would Proceedings of the International Computer Music Con-
require substantial research on audio perception, and would ference, Ann Arbor, 1998.
significantly enhance the power of the methods presented
here. [7] P. R. Cook. Integration of physical modeling for syn-
thesis and animation. In Proceedings of the Inter-
national Computer Music Conference, pages 525–528,
6 Conclusions Banff, 1995.
We have described a collection of methods to automati- [8] P. R. Cook. Physically informed sonic modeling
cally generate high-quality realistic contact sounds driven (PhISM): Percussive synthesis. In Proceedings of the
by physical parameters obtained from a dynamics simula- International Computer Music Conference, pages 228–
tion with contacts, using physically motivated sound syn- 231, Hong Kong, 1996.
thesis algorithms integrated with the simulation. Once the
model parameters are defined, the sounds are created auto- [9] P. R. Cook and G. Scavone. The synthesis toolkit
matically. This enables the user of the interactive simulation (STK), version 2.1. In Proc. of the International Com-
to experience realistic responsive auditory feedback such as puter Music Conference, Beijing, 1999.
is expected in real-life when touching, sliding, or rolling ob-
jects. [10] D. DiFilippo and D. K. Pai. The AHI: An audio and
Solid objects were modeled by modal resonance banks and haptic interface for contact interactions. In UIST’00
realistic timbre changes depending on the location of the in- (13th Annual ACM Symposium on User Interface Soft-
teractions were demonstrated. Because of the different rates ware and Technology), 2000.
at which the contact forces which excite the modal resonance
models need to be computed, the “dynamics-force” and the [11] A. Freed and X. Depalle. Synthesis of Hundreds of Sinu-
“audio-force” have to be modeled with different models and soidal Partials on a Desktop Computer without Custom
we have presented several algorithms for impact, sliding, and Hardware. In Proceedings of The International Confer-
rolling forces, using physically-based models of surface tex- ence on Signal Processing Applications and Technology,
tures and contact interactions which were designed specifi- Santa Clara, 1993.
cally for audio synthesis.
We have implemented the algorithms using a dynamics [12] D. J. Fried. Auditory correlates of perceived mallet
simulator and an audio synthesis package written in Java. hardness for a set of recorded percussive sound events.
The audio synthesis and simulation can run in real-time on J. Acoust. Soc. Am., 87(1):311–321, 1990.
[13] T. A. Funkhouser, P. Min, and I. Carlbom. Real-time [30] D. K. Pai, K. van den Doel, D. L. James, J. Lang, J. E.
acoustic modeling for distributed virtual environments. Lloyd, J. L. Richmond, and S. H. Yau. Scanning phys-
Proc. SIGGRAPH 99, ACM Computer Graphics, 1999. ical interaction behavior of 3D objects. In Computer
Graphics (ACM SIGGRAPH 2001 Conference Proceed-
[14] W. W. Gaver. Everyday listening and auditory icons. ings), 2001.
PhD thesis, University of California in San Diego, 1988.
[31] X. Rodet. Musical Sound Signal Analysis/Synthesis: Si-
[15] W. W. Gaver. Synthesizing auditory icons. In Pro- nusoidal + Residual and Elementary Waveform Models.
ceedings of the ACM INTERCHI 1993, pages 228–235, In IEEE Time-Frequency and Time-Scale Workshop 97,
1993. Coventry, Grande Bretagne, 1997.
[16] W. W. Gaver. What in the world do we hear?: An [32] L. Savioja, J. Huopaniemi, T. Lokki, and R. Vnnen.
ecological approach to auditory event perception. Eco- Virtual environment simulation - Advances in the DIVA
logical Psychology, 5(1):1–29, 1993. project. In Proc. Int. Conf. Auditory Display, Palo
Alto, USA, 1997.
[17] J. K. Hahn, H. Fouad, L. Gritz, and J. W. Lee. Inte-
grating sounds and motions in virtual environments. In [33] R. S. Sayles and T. R. Thomas. Surface topography as
Sound for Animation and Virtual Reality, SIGGRAPH a non-stationary random process. Nature, 271:431–434,
95 Course 10 Notes, 1995. 1978.

[18] D. J. Hermes. Synthesis of the sounds produced by [34] X. Serra. A System for Sound Analysis / Transforma-
rolling balls. In Internal IPO report no. 1226, IPO, tion / Synthesis Based on a Deterministic Plus Stochas-
Center for User-System Interaction, Eindhoven, The tic Decomposition. PhD thesis, Dept. of Music, Stanford
Netherlands, 2000. University, 1989.

[19] D. P. Hess and A. Soom. Normal vibrations and [35] J. O. Smith. Physical modeling using digital waveg-
friction under harmonic loads. II. Rough planar con- uides. Computer Music Journal, 16(4):75–87, 1992.
tacts. Transactions of the ASME. Journal of Tribology, [36] J. Stam. Evaluation of loop subdivision surfaces. In
113:87–92, 1991. SIGGRAPH 98, 1998. Included on course notes CD-
ROM.
[20] M. M. J. Houben, D. J. Hermes, and A. Kohlrausch.
Auditory perception of the size and velocity of rolling [37] K. Steiglitz. A Digital Signal Processing Primer with
balls. In IPO Annual Progress Report, volume 34, 1999. Applications to Digital Audio and Computer Music.
Addison-Wesley, New York, 1996.
[21] D. Jaffe. Ten criteria for evaluating synthesis and pro-
cessing techniques. Computer Music Journal, 19(1):76– [38] D. Stoianovici and Y. Hurmuzlu. A critical study of
87, 1995. the applicability of rigid-body collision theory. ASME
Journal of Applied Mechanics, 63:307–316, 1996.
[22] K. L. Johnson. Contact Mechanics. Cambridge Univer-
sity Press, Cambridge, 1985. [39] T. Takala and J. Hahn. Sound rendering. Proc. SIG-
GRAPH 92, ACM Computer Graphics, 26(2):211–220,
[23] R. L. Klatzky, D. K. Pai, and E. P. Krotkov. Hearing 1992.
material: Perception of material from contact sounds.
PRESENCE: Teleoperators and Virtual Environments, [40] T. R. Thomas. Rough Surfaces. Imperial College Press,
9(4):399–410, 2000. London, second edition, 1999.

[24] M. V. Mathews. The Technology of Computer Music. [41] D. J. Thompson, D. F. Fodiman, and H. Mahe. Ex-
MIT Press, Cambridge, 1969. perimental validation of the twins prediction program
for rolling noise, parts I and II. Journal of Sound and
[25] J. D. Morrison and J.-M. Adrien. Mosaic: A framework Vibration, 193:123–147, 1996.
for modal synthesis. Computer Music Journal, 17(1),
1993. [42] N. Tsingos, T. Funkhouser, A. Ngan, and I. Carlbom.
Modeling acoustics in virtual environments using the
[26] P. R. Nayak. Contact vibrations. Journal of Sound and uniform theory of diffraction. In SIGGRAPH 01, 2001.
Vibration, 22:297–322, 1972.
[43] K. van den Doel. Sound Synthesis for Virtual Reality
[27] D. D. Nelson, D. E. Johnson, and E. Cohen. Haptic and Computer Games. PhD thesis, University of British
rendering of surface-to-surface sculpted model interac- Columbia, 1998.
tion. In Proceedings of the ASME Dynamic Systems and [44] K. van den Doel and D. K. Pai. Synthesis of shape de-
Control Division, volume DSC-Vol. 67, pages 101–108, pendent sounds with physical modeling. In Proceedings
1999. of the International Conference on Auditory Displays
1996, Palo Alto, 1996.
[28] J. F. O’Brien, P. R. Cook, and G. Essl. Synthesizing
Sounds from Physically Based Motion. In SIGGRAPH [45] K. van den Doel and D. K. Pai. The sounds of physical
01, 2001. shapes. Presence, 7(4):382–395, 1998.
[29] D. K. Pai, U. M. Ascher, and P. G. Kry. Forward Dy- [46] J. Wawrzynek. VLSI models for real-time music synthe-
namics Algorithms for Multibody Chains and Contact. sis. In M. Mathews and J. Pierce, editors, Current Di-
In Proceedings of the 2000 IEEE International Confer- rections in Computer Music Research. MIT Press, 1989.
ence on Robotics and Automation, pages 857–863, 2000.

You might also like