Foley Automatic
Foley Automatic
(a) Real rock in wok (b) Virtual rock in wok (c) Rolling and Sliding (d) Interaction
Figure 1: Animations for which sound effects were automatically added by our system, demonstrated in the accompanying
video. (a) A real wok in which a pebble is thrown; the pebble rattles around the wok and comes to rest after wobbling. (b)
A simulation of a pebble thrown in wok, with all sound effects automatically generated. (c) A ball rolling back and forth on
a ribbed surface. (d) Interaction with a sonified object.
1.5
the microcollisions take place within 15ms for objects we
studied. An impact audio-force composed of impulses at the
1
first 4 modal resonance frequencies sounded very convinc-
0.5
ing and would be appropriate for hitting a large resonating
object with a small hard object.
0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
t(seconds)
Interaction Models
Even though modal models by themselves are a useful
class of models, that by itself is not sufficient. For realis- A simple and effective model of scraping a surface is the
tic contact sounds we need good, physically-based models of phonograph needle model, whereby a needle exactly follows
both the resonators and the contact interactions. The sim- a given surface track. In order to generate the audio-force
ulation of the contact interactions needs to be time stepped at a sampling rate of fs , we generate an effective surface
at least at the audio sampling rate, which is generally much profile at a spatial resolution of vmax /fs , where vmax is the
higher than the simulation rate or the graphics frame-rate. maximum contact velocity for which the model is applicable.
Therefore the models need to be fast and simple. Often This ensures that at the maximum contact speed the surface
stochastic models are appropriate, since most contact inter- profile can get sampled at the audio sampling rate. For a
homogeneous surface this is easy to implement with a wave- resolution of 0.03 mm and making an autoregressive fit using
table of a surface profile which will be “played back” at the the covariance method.
rate v/vmax for a contact moving at speed v, where a rate scraping plastic
60
of one represents the audio-force at maximum speed. This slow
fast
wave-table can be constructed from a mathematical rough-
ness model as explained below or obtained experimentally. 40
The wave-table should be oversampled in order to decrease
the reduction of quality on slowdown.
For resampling we use a linear interpolation algorithm. 20
G(f)
0
a computationally more expensive quadratic interpolation
scheme (or even more sophisticated algorithms to filter out
artifacts) but this offered no audible improvement in prac- −20
−60
1 2 3 4 5
10 10 10 10 10
Ffriction A v
f
4.3 Rolling
Figure 4: Sliding involves multiple micro-collisions at the
contact area Like scraping forces, rolling forces are produced by irregu-
larities of the surfaces in contact, but because the surfaces
have no relative speed at the contact point, the contact in-
For Coulomb friction, Ff riction = µF √normal , the scraping teractions are different and this is reflected in a difference in
audio-force volume is proportional to vFnormal , assuming sound.
the force is proportional to the frictional power loss. Rolling seems to be a less understood interaction as far as
audio is concerned. Experimental studies of rolling sounds
Effective Profile Models have focused mainly on very specific machine sounds like
train wheels [41] and do not suggest a universal rolling
We want to create a simple synthetic model of a scraping model.
force, which has the perceptual dimensions of roughness, What exactly causes the perception of rolling versus slid-
contact speed, and average contact force. ing is also not known. Some studies [20] suggest that the pe-
We characterize the profile by noise with an overall spec- riodicity of rolling sounds plays an important role in distin-
tral shape, on which one or more peaks are superimposed. guishing it from sliding. However, we believe rolling sounds
The spectral peak frequencies are moved with the contact are different in other respects too.
velocity and provide the perception of changing pitch at dif- Why do we hear anything when a smooth ball rolls on a
ferent speeds. rough table? A possible reason is that the surface of the ball
We implemented this by generating fractal noise, which rides on the asperities of the surface, and constantly collides
is noise with a power spectrum proportional to ω β , passed with them but because of the large (w.r.t. the length scale
through a reson filter [37]. The parameter β provides a sur- of the roughness) radius the ball only “sees” the large scale
face roughness parameter and is related to the fractal di- surface structure. The smaller the ball, the more small de-
mension D by D = β/2 + 2. Empirical data shows [33, 40] tails are “felt” by the rolling. Because the collisions in rolling
than many surfaces are fractal over a wide range of scales. occur just in front of the contact area (very small area) the
Values of D between 1.17 and 1.39 were reported for vari- downward motion of the ball is very small so the collisions
ous machined surfaces at a length scale of 10−6 m. At CD will be very soft, i.e., drawn out in time. See Fig. 6. This
quality audio sampling rate of 44100Hz this would corre- suggests rolling interactions are similar to scraping interac-
spond to a sliding speed of about 5cm/s. The frequency of tions but because of this size effect only the low frequency
the reson is scaled with the contact velocity to produce the content of the effective profile plays a role.
illusion of scraping at different speeds. The width of the res- This suggests a similar model as for scraping but with an
onance influences the simulated randomness of the surface, additional low-pass filter with adjustable cutoff frequency
narrow peaks sounding more pitched. Informal experiments capturing the rolling quality.
suggest that the fractal dimension correlates well with the Though this simple model provides reasonably convincing
perception of auditory roughness. rolling sounds, they are not as convincing as the scraping and
We can obtain the fractal dimension from a recording of sliding sounds. Analysis of recorded rolling sounds suggested
a scrape interaction with a contact microphone and fitting that the rolling force couples stronger to the modes than the
the power spectrum. See Fig. 5. The reson parameters were sliding force. This would mean the linear model of an inde-
obtained by measuring the surface structure of objects at a pendent audio-force being applied to a linearly vibrating ob-
vc ~ d v / R We created a detailed dynamic simulation of a pebble in
R a metal wok, that can bounce around, roll, and slide. A
A v modal model of the wok was created from measurements of
imminent collision a real wok. It would be nice to use a scientific method to
contact region objectively select parameters like the density of the surface
d sampling or the number of modes rendered, but this is not
possible due to the lack of objective criteria to determine how
well a given synthetic sound is perceived as approximating a
B target sound. Lacking such a perceptual metric, we had to
resort to “earballing” the approximations, with user input.
The construction of such a measure would greatly advance
Figure 6: Rolling audio-force. The collision velocity vc is the state of the art of our models.
related to the contact region d as indicated.
The impulse response was recorded at 5 locations at
equidistant radial points, and a modal model of 50 modes
was extracted (of which only 10 contribute significantly). A
ject is no longer applicable, as the rolling audio-force seems set of 5 gain vectors a (of dimension 50) was obtained. We
to “know” the modes of the object. We speculate this is be- display the “brightness” of the sound-map, as measured by
cause the surfaces are not moving relative to each other and the frequency weighted average of the gains in Fig. 7. The
are therefore in stronger contact than during sliding, which
could generate a kind of feedback mechanism leading to this
effect.
This observation is consistent with the observation in [18]
that a gamma-tone model with γ = 2 driven by noise gen-
erates better rolling sounds than a pure modal model. A
γ = 2 model driven by noise has the same spectrum as a
reson
p filter driven by noise with spectral envelope S(ω) =
1/ (ω − ρ)2 + d2 , with ρ and d the frequency and damping
of the reson. This spectral envelope is enhanced near the
objects resonance modes, as we observed in data. Figure 7: Sound map on wok. Visual brightness is mapped
We have experimented with this real-time spectral mod- onto audio brightness.
ification of the rolling-force to obtain what appear to be
better rolling sounds, at the price of the extra computation surface roughness was measured with a contact mike and
required for the filtering of the audio-force. Further study is was used to construct audio surface profiles. Our dynamics
needed to carefully evaluate the improvement. simulator was set up to simulate a small, irregularly shaped
rock being thrown into the wok. No more than one con-
tact could occur at a time. This simulation was then used
5 Results to drive the audio synthesis and graphics display simulta-
neously. In Fig. 8 we show dynamics variables driving the
We have implemented the ideas presented above in our Fo- audio. Various other examples are shown in the accompa-
leyAutomatic system. An audio synthesis toolkit was de- nying video, and demonstrate impacts, sliding and rolling.
veloped in pure Java. The toolkit consists of three layers
of software objects, a filter-graph layer which provides ob-
jects to build unit-generator graphs [24], a generator layer 5.1 Discussion
with basic audio processing blocks such as wave-tables and
filters, and a model layer which contains implementations of Although our audio algorithms are physically well moti-
the physical models described above. For efficiency, the au- vated, they are not derived using first principles from known
dio processing elements are all vectorized, i.e., they process physical laws. The physical phenomena responsible for the
audio in frames (typically 20 ms blocks) of samples rather generation of contact sounds, such as the detailed micro-
than on a sample per sample basis as in for example [9]. On a collision dynamics at the interface of sliding surfaces, are
900M hz Pentium III we found that we can synthesize about so complex that such an approach to interactive simulation
800 modes at a sampling rate of 44100Hz, good enough for seems infeasible.
about 10 reasonably complex objects, with enough cycles to We evaluated the quality of our generated audio infor-
spare for the graphics and the dynamics. mally, by comparing the animation with reality. See, for
Our dynamics simulation is implemented in Java and uses example, the video of the real versus the animated pebble
Java3D for the graphics rendering. We use Loop subdivision thrown in a wok. As there are currently no existing real-
surfaces to describe the boundaries of objects in our simula- time audio synthesis systems to compare our results with,
tions. Sphere trees detect collisions between the surfaces. our evaluation remains qualitative.
The simulator runs in real time for the relatively simple Ultimately, one would like to assign an audio quality met-
examples shown on the accompanying video, and can also ric to the synthesized sound-effects, which measures how
record simulations along with audio parameters to be played good the audio model is. There are two obstacles preventing
back later in real time for sound design and higher quality us from constructing such a measure. First, phenomena such
graphics rendering. as an object falling and rolling have sensitive dependency on
Fig. 1(d) shows real time interaction with a model of bell. initial conditions, and it is impossible to compute the exact
Using a PHANToM haptic interface we can tap and scrape trajectories using Newton’s laws. In order to compare the
the sides of the bell with a virtual screwdriver. Audio for real sound with the synthesized sound one therefore has to
the bell is generated with the 20 largest modes and sounds somehow extract the perceptually relevant statistical prop-
very convincing. erties of complex sounds. This is an unsolved and important
normalForce desktop computers without special hardware.
5 We believe these algorithms have the potential to dra-
matically increase the feeling of realism and immersion in
0 interactive simulations, by providing high-quality audio ef-
relsurfv
0.4 fects which provide important perceptual cues which, com-
0.2
bined with high quality graphics and dynamics simulation,
provide a much more compelling illusion of reality than the
0
tanv1 sum of their contributions.
0.5
0
References
tanv2
Figure 8: Dynamics variables driving audio for a single ob- [3] W. Buxton. Introduction to this special issue on non-
ject dropped in a wok. Shown are the normal force, the slid- speech audio. Human Computer Interaction, 4(1):1–9,
ing speed, rolling speeds (speed of contact point w.r.t. the 1989.
surface) on both objects, and the impact force.. The motion
(also shown on the video) clearly shows that the object first [4] W. Buxton. Using our ears: an introduction to the use
bounces several times, then performs a roll-slide motion on of nonspeech audio cues. In E. Farrell, editor, Extracting
the curved surface of the wok, and finally wobbles around at meaning from complex data: processing, display, inter-
the bottom. action. Proceedings of the SPIE, volume Vol 1259, pages
124–127, 1990.
[18] D. J. Hermes. Synthesis of the sounds produced by [34] X. Serra. A System for Sound Analysis / Transforma-
rolling balls. In Internal IPO report no. 1226, IPO, tion / Synthesis Based on a Deterministic Plus Stochas-
Center for User-System Interaction, Eindhoven, The tic Decomposition. PhD thesis, Dept. of Music, Stanford
Netherlands, 2000. University, 1989.
[19] D. P. Hess and A. Soom. Normal vibrations and [35] J. O. Smith. Physical modeling using digital waveg-
friction under harmonic loads. II. Rough planar con- uides. Computer Music Journal, 16(4):75–87, 1992.
tacts. Transactions of the ASME. Journal of Tribology, [36] J. Stam. Evaluation of loop subdivision surfaces. In
113:87–92, 1991. SIGGRAPH 98, 1998. Included on course notes CD-
ROM.
[20] M. M. J. Houben, D. J. Hermes, and A. Kohlrausch.
Auditory perception of the size and velocity of rolling [37] K. Steiglitz. A Digital Signal Processing Primer with
balls. In IPO Annual Progress Report, volume 34, 1999. Applications to Digital Audio and Computer Music.
Addison-Wesley, New York, 1996.
[21] D. Jaffe. Ten criteria for evaluating synthesis and pro-
cessing techniques. Computer Music Journal, 19(1):76– [38] D. Stoianovici and Y. Hurmuzlu. A critical study of
87, 1995. the applicability of rigid-body collision theory. ASME
Journal of Applied Mechanics, 63:307–316, 1996.
[22] K. L. Johnson. Contact Mechanics. Cambridge Univer-
sity Press, Cambridge, 1985. [39] T. Takala and J. Hahn. Sound rendering. Proc. SIG-
GRAPH 92, ACM Computer Graphics, 26(2):211–220,
[23] R. L. Klatzky, D. K. Pai, and E. P. Krotkov. Hearing 1992.
material: Perception of material from contact sounds.
PRESENCE: Teleoperators and Virtual Environments, [40] T. R. Thomas. Rough Surfaces. Imperial College Press,
9(4):399–410, 2000. London, second edition, 1999.
[24] M. V. Mathews. The Technology of Computer Music. [41] D. J. Thompson, D. F. Fodiman, and H. Mahe. Ex-
MIT Press, Cambridge, 1969. perimental validation of the twins prediction program
for rolling noise, parts I and II. Journal of Sound and
[25] J. D. Morrison and J.-M. Adrien. Mosaic: A framework Vibration, 193:123–147, 1996.
for modal synthesis. Computer Music Journal, 17(1),
1993. [42] N. Tsingos, T. Funkhouser, A. Ngan, and I. Carlbom.
Modeling acoustics in virtual environments using the
[26] P. R. Nayak. Contact vibrations. Journal of Sound and uniform theory of diffraction. In SIGGRAPH 01, 2001.
Vibration, 22:297–322, 1972.
[43] K. van den Doel. Sound Synthesis for Virtual Reality
[27] D. D. Nelson, D. E. Johnson, and E. Cohen. Haptic and Computer Games. PhD thesis, University of British
rendering of surface-to-surface sculpted model interac- Columbia, 1998.
tion. In Proceedings of the ASME Dynamic Systems and [44] K. van den Doel and D. K. Pai. Synthesis of shape de-
Control Division, volume DSC-Vol. 67, pages 101–108, pendent sounds with physical modeling. In Proceedings
1999. of the International Conference on Auditory Displays
1996, Palo Alto, 1996.
[28] J. F. O’Brien, P. R. Cook, and G. Essl. Synthesizing
Sounds from Physically Based Motion. In SIGGRAPH [45] K. van den Doel and D. K. Pai. The sounds of physical
01, 2001. shapes. Presence, 7(4):382–395, 1998.
[29] D. K. Pai, U. M. Ascher, and P. G. Kry. Forward Dy- [46] J. Wawrzynek. VLSI models for real-time music synthe-
namics Algorithms for Multibody Chains and Contact. sis. In M. Mathews and J. Pierce, editors, Current Di-
In Proceedings of the 2000 IEEE International Confer- rections in Computer Music Research. MIT Press, 1989.
ence on Robotics and Automation, pages 857–863, 2000.