Spectral Processing
Spectral Processing
1
Spectral Processing for
Ambient Film Scoring and Sound Design
Table of Contents
Part III: Current FFT-based Software Applications for the Mac ………… 40
• The Phase Vocoder for Time-Stretching and Pitch Shifting …………… 40
• Cross Synthesis (Convolution and Morphing) …………………………. 46
• Convolution Reverb ……………………………………………………. 49
• Frequency Band Manipulation and Filtering …………………………... 50
• Granular Synthesis ………………………………………………….….. 55
• Other Applications for Spectral Processing ……………………………. 57
2
Part I: Introduction - Computer Music and the FFT in Film
In 1965, the discovery and subsequent publication of the fast Fourier transform (FFT) by
James W. Cooley and John W. Tukey were instrumental events in the establishment of
the field of digital signal processing (DSP). However, of particular significance was the
fact that the implementation of the FFT algorithm, for the first time, validated the use of
computers for accomplishing long and complex tasks. In fact, while the very technique
of implementing the FFT as described by Cooley and Tukey had previously been
discovered and used by German mathematician Karl Friedrich Gauss (1777-1855) in his
own work, it “was largely forgotten because it lacked the tool to make it practical: the
digital computer.” What makes Cooley and Tukey’s discovery most memorable is the
fact that “they discovered the FFT at the right time, namely the beginning of the
Alan Oppenheim, professor at MIT recalls, “The excitement really got generated when
the Cooley-Tukey paper came out. When the FFT hit, then there was a big explosion,
because then you could see that by using a computer you could do some things incredibly
efficiently. You could start thinking about doing things in real time... The birth of the
However, while spectral processing applications were not to be widely available for use
in digital (music) applications until many years after the discovery of the FFT by Cooley
and Tukey, the framework was quickly falling into place. By the mid-1960s, research
concerning music and computers had been underway for almost ten years at facilities
3
such as Bell Laboratories. In fact, it had been reported that music was generated using
as the CSIR Mk1, and later known as CSIRAC, was one of the world’s earliest stored-
program electronic digital computers” and was programmed to play popular musical
melodies. [4]. Later, in the form of a 17-second piece, the first computer-music
composition was generated using a IBM704 computer in 1957 by Max Mathews of Bell
quickly evolving computer music community nurtured by the academic and scientific
communities.
As the fledgling DSP and computer-music communities continued to grow and develop
through the 1960s, the Hollywood motion picture industry during this time relied almost
for realizing both film-composition and sound design. Computer- (and FFT) based
applications for these purposes would not be available to composers for over twenty
years to come. Although the ability to process digital audio signals (and algorithms such
as the FFT) existed, computers were not yet fast enough to perform the many thousands
computers were not yet readily available to the mainstream and lacked general-purpose
Within the film industry, a small number of productions over time had featured newly
available tools for added visual and aural affect. Specifically, with regards to sound, as
early as the 1940s, film productions had been known to utilize unconventional electronic
4
musical instruments and sound effects for a variety of compositional and sound design
purposes.
The first example where a Hollywood motion picture featured the use of an electronic
musical instrument can be heard in 1945, as part of the soundtrack for Alfred Hitchcock's
film "Spellbound.” Miklos Rozsa’s use of the theremin, with its unsteady electronic and
specific scenes in the film. Following in 1956, the science fiction thriller “Forbidden
Planet became the first major motion picture to feature an all-electronic film score.”
Louis and Bebe Barron composed the score using only a tape recorder that they had
received as a wedding gift. Louis Barron also reportedly “built electronic circuits, which
he manipulated to generate sounds.” [6] Next, in 1963, Alfred Hitchcock's "The Birds"
featured the use of the “Trautonium, a novel musical instrument that produced many of
the sound effects.” The Trautonium, a “precursor to the synthesizer”, was “billed as the
world's first electronic musical instrument on its invention in 1929.” [53] Finally, Jerry
Goldsmith’s memorable score for “Planet of the Apes” in 1968 was also important for its
use of an Echoplex, which created percussive delay effects from pizzicato strings. [7]
During the late 1960s, synthesizers became important musical composition and
performance tools. Factors that led to the synthesizer’s continued use in film scores
included their gained popularity in the pop industry as well as their relative cost-
effectiveness against that of hiring a 60-piece (or more) orchestra. [8] This phenomenon
led to numerous electronic film scores [34] such as the award-winning score for Giorgio
5
Moroder’s 1978 classic “Midnight Express”, which featured the use of the synthesizer for
the creation of ambient and electronic music. With the proliferation and abundance of
synthesizers and later MIDI, it was soon considered a common occurrence for a film
Additionally during the 1960s, a specific type of synthesizer called the phase-vocoder
was developed that relied on a crude form of spectral analysis. The vocoder synthesizer
was first used compositionally in the 1971 film “A Clockwork Orange” and additionally
for sound-design purposes in films such as the 1978 classic “Star Wars.” As an
application, a vocoder can be defined as “an electronic device for analyzing and
Laboratories in Murray Hill, New Jersey, for telephonic applications” and later used by
constructed using capacitors and inductors [10] to divide up the incoming signal into
between ten and twenty frequency bands before processing and re-synthesis of the signal
occurred. While early phase-vocoder applications were not designed to use an FFT as
part of it’s implementation, future software implementations would incorporate the FFT
as a technique for dividing the incoming signal into as many as 512 or 1024 frequency
As a musical application, the phase-vocoder was first incorporated into synthesizers such
as the Siemens Synthesizer around 1960, and was further implemented in popular models
such as the Korg VC-10, Roland VP-330, and Moog Vocoder. [11]
6
Computer Music Comes of Age
Through the 1970s, computer music programs were written for large mainframe
computers in variants of assembly language and Fortran. This meant that there was no
computers, and that computer ceased to be further developed, the language faced the
possibility of becoming obsolete. A program written for an obsolete language would then
require the potentially monumental task of a complete re-write of the code. However, it
became clear by 1984 that “microprocessors would eventually become the affordable
machine power, that un-ported assembler code would lose its usefulness, and the ANSI C
would become the lingua france.” [12] Tom Erbe recalls that up until 1987, he was
mainly “working with mainframes – these million dollar computers running long signal
In 1986, Csound, a Music V variant, written by Barry Vercoe of MIT, and the Max
program, a similar application developed at IRCAM by Miller Puckette for working with
audio and MIDI, were written in the C language and publicly released. A graphical
version of Max would later be released for the Macintosh and would be MIDI
compatible. [13]
In 1987, Apple introduced the Mac II. This would mark the first time that a consumer-
level personal computer would ship with the required level of CPU power needed to
calculate DSP algorithms such as software utilizing the FFT. This would quickly lead to
7
support for and commercial release of numerous sound/music production and
packaged within music production software such as ProTools (released as Sound Tools
by Digidesign in 1989), Logic (1992) and Performer would soon provide composers and
sound designers with an entire set of tools for performing a number audio-based spectral
It was not until 1991, however, before two important applications would be released on a
consumer level for the PC or Macintosh that would include a number of spectral
processing algorithms. First, Csound would once again be ported, this time to Microsoft
DOS and available for purchase. Of particular significance was that this version of
Csound introduced the inclusion of spectral data types for sensing and analyzing audio
input. [14] Additionally, Tom Erbe’s SoundHack would be released as freeware for the
Macintosh platform.
With the availability of these and other applications that were specifically designed to run
on personal computers, the potential for artistic creativity and editing with audio-based
software running on personal computers had reached a new plateau where composers and
sound designers had direct and quick access to spectral processing tools for creating and
8
Spectral Processing In Major Motion Pictures
The first notable example where FFT-based processing was used as a technique for
ambient film scoring can be taken from Jeff Rona’s compositions in the 1997 motion
and time-stretching were used. Software applications that were heavily utilized include
Csound, Logic and Max/MSP. [J. Rona, personal communication, May 2005]
Additionally, a number of more-recent films such as Exit Wounds, Black Hawk Down,
Mothman Prophecies, Traffic, Narc and the TV series The Dead Zone include musical
scores suggesting the use of spectral processing techniques. [33] For example, composer
and sound designer Tobias Enhus assisted in composing the score to Narc in 2002 by
processing struck-metal source material: turbines, metal sheets, steel drums, and even the
suspended back end of a fork lift. Starting from these non-harmonic sources, Tobias used
a spectrum editor and tuned filter banks to create atmospheres that matched the key of the
musical score. [16] Jeff Rona, who did significant film scoring for Black Hawk Down,
Mothman Prophecies, Traffic as well as television programs such as The Dead Zone
states, “I've used … Logic, Reaktor, Max/MSP, Kyma, Peak, SoundHack, and every
plug-in I can get my hands on.” [J. Rona, personal communication, May 2005] All of
these programs mentioned above offer various types of spectral processing functions.
While it seemed that the full potential of spectral processing had been tapped, in 1997, a
company called Opcode released a virtual software instrument/effect unit plug-in called
“Fusion: Vocode” that would run on either a Macintosh or PC platform inside a number
9
of host applications. [17] Fusion: Vocode not only provided basic spectral processing
(most-likely the 1936 non-FFT-based algorithm), but also presented a case where a
software instrument could now replace it’s soon to be obsolete hardware counterpart. For
example, film-composer Rob Arbittier in 1998 reported the benefits of this: ''I mostly use
plug-ins as outboard effects toys…Opcode Fusion: Vocode can do some really cool
effects, I usually send a sound to it, make a new .WAV file out of it, and then use that file
as an element in the Cubase sequence. I used to bring a sound in, manipulate it, and then
send it back out to a sampler: now I can just do it all in my computer.'' [18]
In March 1999, Cycling74, using objects from their already available Max/MSP
architecture, released a set of 74 VST plug-ins bundled into a single offering called
Pluggo. Included in this release were spectral modification plug-ins. This event presented
the first instance where a real-time FFT could be performed on an incoming audio signal
inside a popular digital audio workstation, perform spectral processing based on user
input, and then transform the signal back into the time-domain output signal. For the first
time, composers and sound designers were able to take full advantage of the tremendous
potential that the FFT algorithm provides due to the significant boosts in available CPU
The Kyma workstation from Symbolic Sound is a visual sound design environment that
runs alongside a dedicated processing computer called the Capybara and has been used in
many types of sound media including film since the 1990s. Using a number of included
10
working on the film ‘Finding Nemo,’ states: “Kyma allowed me to modulate and morph
my own voice into other sounds,” [31] Rydstrom explains. “A lot of the ocean ambience
was just me making a sound and using that to modulate a whole variety of sounds so I
could shape them into tonalities that I liked. None of my actual voice sounds are heard.
But, I could run sounds through the Kyma Vocoder and shape the sound of the water into
something interesting.” [31] Rydstrom said, “By growling into the microphone, I could
use the Kyma to create the sound of the ‘growling’ water rush as Bruce, the shark,
In 2001, Native Instruments’ announced their first real-time effects plug-in called
Spektral Delay for the Macintosh and Windows operating systems. Spektral Delay, likely
the first commercial plug-in to perform complex spectral operations in real-time, would
reportedly split a channel into a maximum of 160 separately modifiable frequency bands,
each of which would have individual delay and feedback settings, an input attenuation
filter, and the ability to apply modulation effects to the various parameters. Spektral
Delay would not actually be released until 2003, the same year in which a number of
other plug-ins and stand-alone applications that performed real-time spectral processing
While only a few years have passed since real-time spectral processing tools for audio
manipulation have become available that will run inside today’s digital audio
workstations, the FFT has come to the forefront of digital audio via full-blown software
applications such as Sony’s Acid and Ableton Live, which offer advanced real-time time-
11
stretching and pitch-shifting algorithms to control and match any desired tempo for
As a topic for this graduate project, I will discuss the inner-workings of the FFT, closely
algorithms, examine aesthetics behind spectral processing in motion picture films, and
discuss personal, aesthetic and technical choices as a film composer-sound designer for
using such tools in Erik Ryerson’s Graduate thesis film, “Red Letters”. I will also include
cosine terms can be used to analyze heat conduction in solid bodies.” [36] More
generally, “he made the remarkable discovery that any periodic waveform can be
and phase.” [1] Additionally, Fourier “derived a mathematical expression that” allows a
showing the amplitudes and phases of the sinusoids that comprise it.” This was likely the
The sections below will explore the mathematical concepts behind the Fourier transform,
12
Trigonometric Functions – The Sine Wave
Before taking an in-depth look at the Fourier transform, it will first be necessary to define
some of the trigonometric functions that form the basis of Fourier analysis. In looking at
the ratio of the circumference of a circle over the diameter of a circle. Additionally, the
radius of a circle is defined as half its diameter (cross section). Therefore, there are 2π
radians in a full circle where radian is a natural measure for angles and is based on π. To
represent a right angle (90 degrees of a possible 360), we can take one quarter of 2π
(Figure 1)
13
Next, we will define the sine of an angle θ as “the ratio of the length of the side of the
right triangle opposite angle θ (O) to the length of its hypotenuse (H).” [1] This can be
expressed as follows:
The above expression allows us to generalize that an angle can be measured by increasing
counterclockwise from 0 degrees on the positive horizontal axis (x) on the circle (see
figure 2 below). When the opposite side (O) of the angle θ points upward from the
horizontal axis (between 0 and 180 degrees), its length is expressed as a positive number.
However, when it points downward (for angles measuring between 181 degrees and 359
degrees), its length can be expressed as negative. [1] By graphing the sine of angle θ as it
travels around a circle (as illustrated below), we can see that our angle traces one full
(Figure 2: the opposite side (O) of the angle θ traveling around circle from 0º)
14
cycle of a sinusoidal curve as the angle increases from zero degrees to a full circle (360
degrees or 2π radians).
While it can be seen how to trace a sinusoidal waveshape, in order to create a sinusoidal
waveform, we must factor in a unit of time, represented in seconds (t ). This allows us the
ability to further graph our sine function in terms of frequency (f ), defined as the number
of times O travels around the circle per second (measured in Hertz). Now, if a circle
whose radius is given an arbitrary length of one (known as a unit circle), the product 2π
multiplied by t will travel around the circle as t goes from zero to one, giving us a direct
It can therefore be observed that the quantity 2πft goes through exactly f cycles each time
t increases by one unit. This can be viewed as a sinusoidal waveform and is presented by
Before accomplishing the goal of fully representing an arbitrary sine wave, there are two
final points that must be taken into consideration. The first is the starting point for the
rotation of the waveform when t is equal to zero. This is known as the phase offset and is
phase offset measuring one-quarter of a circle (90º or π/2 radians), it is known as a cosine
15
Finally, in order to limit the peak amplitude of a sinusoid to A (a normal sine curve), we
must consider a peak-amplitude scaling factor. With the final remaining considerations
f(t) = A sin(ωt + φ)
It should be noted that going forward, we substitute ω (the Greek letter omega), known as
As detailed above, the sine and cosine waveforms are quite similar, the only exception
being a 90-degree phase offset. Based on these similarities, they can be extremely useful
when considered together since sine and cosine functions exhibit the properties of
oddness and evenness [1]. An odd function can generally be defined as that which can be
inverted when reversed. A sine is an odd function of angle θ. An even function can be
defined by its retrograde being identical to the original. A cosine is an even function of
angle θ. As will be discussed, these properties of sine and cosine functions are important
measuring the same frequency, the resulting waveform will also be sinusoidal in shape
and have the same frequency, but with a potentially different amplitude and/or phase
16
A sin(ωt + φ) = a cosωt + b sinωt
The above calculation shows that if the phase offset (φ) is equal to a value of zero, than a
will also be equal to zero and b will be equal to A. However, if φ equals π/2 radians (90-
(Figure3)
Until now, only properties of analog or continuous waveforms have been discussed. A
digital signal differs from an analog signal in that there must be a discrete set of linear
Contrary to our previous discussion of analog signals, a digital signal can be described
mathematically be replacing the continuous time variable (t ) with a new set of variables
nT, which represent discrete integer values for the current sample number of a digital
17
time-domain signal x(n). [1] In this regard, T is defined as the sample period (equal to
1/R, where R is the sample rate) and n is the integer value for the current sample number.
n successively increases as each new sample enters the system (for example: …-2, -1, 0,
1, 2, 3, …). As commonly notated, the sampling rate (1/R) and sampling period (T) can
x(n) = A sin(ωn + φ)
To gain a full understanding of the concepts behind the fast Fourier transform (FFT), it is
using less complex signals. The most basic signal that we can use is the sinusoid, which
can be “summed to produce an approximation of the original signal.” [22] This arithmetic
sum is described by the principle of superposition. [35] As more terms (sinusoids) are
18
added, the approximation of the original waveform becomes more accurate. However, in
order for a function to be approximated by a Fourier series, it is required that the function
will remain periodic. Because sine waves are periodic signals, the Fourier Transform
The following example shows how we can use the Fourier series to generate a square
wave:
(Figure 5 [39])
In the above illustration, we begin with a sine wave having an arbitrary amplitude value
of 1.0 and radian frequency ω or (2π * f), in this case the fundamental frequency. The
first sine wave is added to a second sine wave that has three times frequency ω and one-
third the amplitude (center image) as compared with the first sine wave. While not shown
above, a third sine wave is added to the first two with the properties of five times the
fundamental frequency ω and one-fifth the amplitude. The third image above shows the
emerging square-shaped waveform after a fourth sine wave is summed (which has a
frequency seven times the fundamental and an amplitude one-seventh the fundamental. If
[42]
19
The Fourier Transform
The Fourier series allows us to see how a complex waveform can be created out of
One of the defining properties of the Fourier Transform is that it assumes that any signal,
created by adding a series of sine waves, is periodic and therefore must exist for an
infinite duration. Because sine waves are the building block of Fourier analysis and are
periodic signals, the Fourier Transform works as if the data, too, were periodic for all
time. Therefore, in order to measure a dynamically changing signal for even a short time,
the Fourier transform offers nothing regarding what happens to that signal at any point
before or after the portion of the signal for which we are analyzing.
In analyzing even a portion of a signal, we must first take a sample of that signal
rectangular window function, where all parts of the signal falling within that window
discussion considers only analogue signals, the figure below illustrates an example of a
rectangular window being taken from a discrete signal (which will be later discussed).
20
(Figure 6)
From the above equation, the lowercase x(t) is a periodic function of time (t ), the
uppercase X(f ) represents the frequency domain spectrum of x(t) and is a function of
defined as "an infinitely small change in t." [B. Greenhut, personal communication, May
2005] It is important to note from our earlier discussions that if t is measured in seconds,
By making a few small adjustments to the above equation, it can be shown how the
Fourier transform can be reversed using the inverse Fourier transform as calculated
below:
21
In comparing the two equations above, the only significant difference (in addition to
solving for a time domain signal instead of a frequency spectrum) is the sign of the
exponent of e. However, returning to our earlier discussion of sine and cosine functions,
Euler’s relation states, “the quantity e, when raised to an imaginary exponent is equal to
the complex sum of cosine and sine functions” [1], and is expressed as follows:
Euler’s relation directly relates to the Fourier transform when we set θ to 2πft.
Taking this one step further, because cos(-x) is equal to cos x and sin (-x) is equal to -sin x
Because the cosine function cos(-x) = cos x exhibits the property f(-x) = f x, it is known as
an even function and “is left-right symmetrical around the point x = 0.” [1] Similarly,
because the sine function sin (-x) = -sin x exhibits the property f (-x) = -f x, it is called an
odd function and “is left-right anti-symmetrical around the point x = 0.” [1]. This is
important because any function can “be broken down into the sum of two unique
functions, one being purely even and one being purely odd” [1]:
From the above expression, it is also possible to solve for or . With regards to
the Fourier transform, this is useful since the complex exponential separates a
22
given waveform or spectrum into its even and odd parts. The real (cosine) part affects
only the even part of x(t) or X(f) while the imaginary (sine) part affects only the odd part
The Discrete Fourier Transform (DFT) can be described as an algorithm for computing
the Fourier transform on a set of discretely sampled data. While the above discussion of
the Fourier Transform specifically deals only with continuous or analog waveforms, we
will now extend our discussion to deal with discretely sampled, digital signals.
The discrete Fourier transform allows us a change in representation from the digital or
where n is the discretely valued sample index. The sampled time-domain waveform is
known as x(n). Additionally, from the above equation, k, equal to 0, 1, 2, …, N-1, is the
Specifically, the discrete Fourier transform partitions a given windowed input signal into
separate frequency bands, bins or channels. [38] The DFT “operates by constructing the
analyzed waveform out of a unique set of harmonics… and is calculated “in terms of
complex coefficients from which amplitudes and phases” for each frequency band are
23
extracted from the real and imaginary parts of each complex number. [1]
The inverse discrete Fourier transform (IDFT) can be similarly calculated as follows:
The output of the discrete Fourier transform as a filter bank. The fundamental frequency
of the input signal is called the “analysis frequency” and can be computed for a value in
Hertz given the sampling rate divided by the window size as measured by the number of
samples.
For example, as part of the sampling theory, the Nyquist criterion states that the upper
frequency range is half the sampling rate. Therefore, if the sampling rate is 44,100
samples per second, we know that the upper range of the DFT is 22,050 Hz. Additionally,
if the DFT window size (N) is chosen to be 512 samples, the resulting DFT will have 256
spectrum lines since the available harmonics lie between –22,050 and +22,050 Hertz.
Finally, we can calculate the frequency resolution by dividing the sample rate by the
window size. In this example, 44,100 / 512 would yield an analysis frequency of
approximately 86 Hertz. This means that each frequency bin will be evenly spaced
exactly 86 Hz apart for a positive frequency range of 0 to 22,050 Hertz. For example, the
first frequency bin will include the frequencies 0-86 Hz, the next bin will include 87-173
Hz and so on.
The above example allowed us to see how the size of each frequency bin is directly
related to the chosen window size. Since, each bin allows us to see the amount of spectral
24
energy that falls within the frequency range covered by that bin, we can say that, if we
choose a larger window size (N), we in turn get a higher frequency resolution. However,
So far, our discussion of Fourier analysis has been limited to treating a signal as infinitely
repeating and periodic in nature. We can summarize our previous discussions by stating
that the DFT treats a non-periodic signal by taking a slice or window of some specified
periodic.
Since musical sounds are not periodic or infinite, it is necessary when performing Fourier
of some number of samples and then treat each window as if it were an infinitely
repeating signal. The Fourier transform can be calculated for each window and
slightly misrepresented signal in the frequency domain. [38] For example, if our chosen
window size (N) is equal to 512 samples, a new Fourier transform must be calculated
every 512 samples or 86 times every second (if our sample rate is 44,100). In this regard,
window size is enlarged, our time resolution becomes correspondingly degraded since
25
(Figure 7: Trumpet tone represented over time in the frequency domain)
Because the Fourier transform treats a given windowed signal as if it were one cycle of a
periodic waveform, abrupt changes in amplitude can take place where one cycle of a
window ends and the edge of the window where the next cycle begins, causing
undesirable artifacts in the sound. [25] By smoothing each window around its edges (see
figure 8 below), these artifacts can be prevented. By multiplying our windowed time-
accomplished. The window function most commonly used is the Hamming window (see
figure 9), which is known for its fine resolution of spectral peaks.
(Figure 8 [24])
However, if successive windows are placed next to each other while having all been
26
necessary to overlap windows. An additional benefit of overlapping windows is improved
frequency resolution.
A desired offset for the overlap between windows can be generally considered to fall
between 50 to 75% the window size. [37, 38] For example, if a window has a length of
1024 samples and a Hamming window is used, successive windows might be spaced
approximately 44,100 / 172 = 256 samples apart. Calculation of this 75% overlap factor
is as follows:
1. Window Bandwidth (b) = 2 * sample rate (44,100) / window size (1024) = 86.13
3. 44,100 / 172 = overlap every 256 samples or four times every 1024 samples [41]
(Figure 9: Left: Hamming Window; Right: Product of a Hamming window with a windowed signal [40])
Although the results of using a window function such as a Hamming window are not
rectangular window.
It should be noted that one side effect of using a Hamming or similar window function “is
that partials which are exact harmonics of the FFT fundamental frequency will no longer
27
show up as a single spectral line” because, in using a window function, we are
introducing amplitude modulation: “the amplitude is made to rise from and fall to zero at
the edges of the window.” However, “given the rarity of exact harmonics in 'real'
Finally, while we have only considered the Hamming window in this dicussion, there are
a number of window functions that are commonly used. These include the Hanning,
It has so far been discussed how we can perform Fourier analysis on a digital signal using
the discrete Fourier transform. In this section, I will demonstrate how to implement the
DFT on a discrete set of data using an adapted C language DFT implementation [1].
Based on this implementation, it will be discussed how the discrete Fourier transform is
unpractical for solving Fourier analysis leading to a discussion of the infinitely more
efficient fast Fourier transform (FFT) algorithm. Complete source code can be found in
Appendix I.
In looking at the code example below, the first seven lines set up and initialize the
necessary variables for our DFT program. In the second eight lines, a waveform is
created, specifically located within the ‘for’ loop. As previously discussed, the square
wave here is represented by the sum of four sine waves, all having different frequencies
28
and amplitude values and was detailed in the previous section on the Fourier series (see
main() {
int N = LENGTH; // DFT window size
double pi2oN = 8. * atan(1.)/N; // (2 pi/N)
double pi2 = 8. * atan(1.); // (2 pi)
float x1[LENGTH], x2[LENGTH]; // stores time domain waves
complex X[LENGTH]; // stores re and im spectral values
int n; // current sample index: from 0 - N-1
From the above code, it will be assumed that N, our DFT window size is equal to 16.
However, if we were processing samples for a real audio application, it would be realistic
to use a much larger window size. For purposes of discussion, however, a smaller
If we compute the results of our time-domain square wave, which is stored in an array
called ‘x1’, we can see how 16 amplitude values are created and stored. These values are
listed as part of the output from this program and can be graphically represented as
follows:
29
(Figure 10: Left: data from x1 array / values for square wave; Right: graphed discrete x1 waveform)
Once a discrete waveform has been created, we can perform a transformation using our
DFT function to compute real and imaginary arrays. It should be noted that a structure
named ‘complex’ was created in the C header file (see below) that allows us to store real
and imaginary float values for each corresponding sample that is processed by our DFT
function:
The ‘dft’ function, which accepts our ‘x1’ square wave and performs the discrete Fourier
30
// dft function ----------------------------------------------------
void dft( float x[], int N, complex X[])
{
double pi2oN = 8.*atan(1.)/N; // (2 pi/N)
int k; // frequency index
int n; // time-domain sample index
X[k].re /= N; // real / N
X[k].im /= N; // imag / N
}
}
As above, the DFT is computed by applying the Fourier integral (see embedded ‘for’
loop) divided by the window interval N (see final commented lines above) of the time-
domain waveform (our square wave) x[n]. The output data set prints as follows:
(Figure 11)
Using the above data that represents spectral values for our time-domain waveform, we
31
can convert each complex value of X, consisting of a real and imaginary part to further
represent amplitude and phase values for specific frequencies. But before we can do this,
we also must make sure we have the following information: our window size N, the
number of samples of the spectrum of waveform ‘x1’, and the specific frequencies to
which successive values of X correspond. [1] With this information, we can use the two-
dimensional complex plane (see figure 12 below) to interpret each complex value of X.
From the image below, the horizontal axis represents the real part (where we can place
our real output value a) and the vertical axis represents the imaginary part of the value
Measuring the length of vector A, drawn from the planes’ origin, gives us the amplitude.
From a mathematical point of view, we can compute the length of the vector using the
32
Pythagorean theorem:
Additionally, measuring angle θ of vector A gives us the phase, which can be computed
What we have seen here is that these conversions allow us to express the complex
functions of frequency called the amplitude (or magnitude) spectrum and the phase
spectrum.” [1] From our C program, the following two lines calculate these conversions
for us:
If we print the results from the above two lines of code, we can view and interpret the
33
(Figure 13: Left: converted data from complex X arrays; Right: positive-frequency amplitude spectrum)
However, if we look at the program output on the left (above), we see that there are 16
complex values representing amplitude and phase. By contrast, there are only eight
amplitude values plotted on the right. This is because our frequency information on the
left “corresponds to both minus and plus half the sampling rate.” [1] What this translates
to is that all frequency components falling above half the sampling rate are “equivalent to
those lying between minus the Nyquist rate and 0 Hz.” [1]
For example, for our 16-point DFT, if we have a sampling rate of 44,100 samples per
second, our analysis frequency would be 44,100 / 16 or 2756.25 Hz. This would
34
±22,050 (Nyquist), -19293.75, -16,537.5, -13,781.25, -11,025, -8268.75, -5512.5, -
2756.25, 0. In our frequency graph (see figure 13, right) on the previous page, we are
purposely only plotting positive frequencies that lie between 0 Hz and the Nyquist
One final point regarding the DFT that we must consider is that, although we sent a
square wave with a peak amplitude value near 1.0 into our DFT program, the outputted
frequency spectrum yields a peak amplitude value of 0.5. In fact, for the exception of
frequency components at 0 Hz (D.C.) and the Nyquist rate, all post-DFT component
amplitudes “are half their “true” values.” [1] The reason for this is related to the fact that
the Fourier transform divides each single amplitude value between positive and negative
frequency components. For example, if we look at the output (see figure 13, left) of our
DFT program above, we can see that X[1] (see second line of output) has an amplitude of
0.5 and a phase of –90º (where -90º represents a sine wave as measured with respect to
X[15] (see final line of output), the amplitude value is again 0.5, however, the phase this
time is inverted (positive 90º) when compared with X[1] and therefore represents an
inverted sine wave. These corresponding positive and negative frequency components
have the same amplitude and inverted phase relationships. We can therefore say that the
Fourier transform does not distinguish between positive and negative frequency
components except for the phase of sine components (since sine is an odd function as
previously discussed). [1] In looking again at figure 13 (above), we can now conclude
that we only need to graph frequency components zero through eight because “the
35
negative frequency amplitude spectrum is just the (left-right) mirror image of the positive
The major shortcoming of the DFT is computation time. We can say that if our window
size is equal to N and we want determine the amplitude of N separate sinusoids using the
DFT, then computation time is proportional to N2, the number of multiplications. [1, 26,
44] In many applications, N must increase to numbers ranging between 256 and 2048.
(Figure 14: Comparison of required operations for N-sized window with DFT and FFT [40, 26])
36
requires excessive machine time for a large window size N. This is precisely why the Fast
For example, in looking at the DFT equation, an eight-sample signal would require 64
complex multiplications for computation. [37] While manageable at this level, a more
common scenario would see a window size of either 512 or 1024 samples. When N is
chosen to be 512, the DFT “requires 200 times more complex multiplications than those
required by the FFT.” [1] Furthermore, if N is increased to 8192, over 1000 times as
many complex multiplications would be required as compared with the FFT. Therefore,
“on a computer where the FFT takes 30 seconds to compute a transform this size, the
The fast Fourier Transform (FFT) can be described as a computationally efficient version
of the DFT that works on powers of 2. [45] This limits our window size N to the
following values: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, etc. As with the DFT, N
represents a periodic sampling taken from a real-world signal and the FFT expresses the
output data in terms of its component frequencies. The FFT also solves the near identical
inverse problem (via the iFFT) of reconstructing a signal from frequency-domain specific
data.
As previously stated, the FFT is a DFT algorithm developed by Tukey and Cooley at
IBM in 1965 which reduces the number of computations from N2 to Nlog2N when N is
chosen to be a power of 2. [45] If N is not chosen to be a power of 2, the FFT can still be
37
used as long as the data is “padded” with zero-values for all non-power of 2 numbered
samples. [1] The fast Fourier Transform essentially takes the DFT and evaluates it much
faster. In figure 14 (above), the number of operations for both the DFT and the FFT are
shown. For a large window size N, using the FFT allows a monumental reduction in
calculation time.
Calculation of the FFT utilizes a recursive approach for dividing up the input windowed
signal into N individual signals. Specifically, the FFT continuously breaks down the
window into two (N / 2) sequences until all samples are represented as a unique signal.
An interlaced decomposition is used each time a signal is broken in two. Note that the set
on the left consists of even-numbered samples while the set on the right consists only of
(Figure 15 [46])
signal, it is useful to note: “the binary representations of these sample values are the
38
reversals of each other. For example, sample 3,” [46] represented in binary as 0011 is
exchanged with sample number 12, represented in binary as 1100. “Likewise, sample
number 14 (1110) is swapped with sample number 7 (0111), and so forth.” [46] The FFT
algorithm.
Once our window (N) has been decomposed into N individual time domain signals, each
composed of a single sample value, the next step is to calculate the N frequency spectra
point signal is equal to itself.” [46] Therefore, no calculations are required to complete
this step. It should be noted, however that “each of the 1 point signals is now a frequency
Based on our knowledge that the DFT, when calculated on a window with a very small
value for N, is indeed fast, it is easy to see why the Cooley-Tukey algorithm aims to
break a potentially enormous DFT calculation into many minute-sized DFT calculations
The final “step in calculating the FFT is to combine the N frequency spectra in the exact
reverse order [from which] the time domain decomposition took place.” [46] For this
step, the bit reversal shortcut is no longer applicable and calculation must take place one
stage at a time. For example, “in the first stage, 16 frequency spectra (1 point each) are
synthesized into 8 frequency spectra (2 points each). In the second stage, the 8 frequency
39
spectra (2 points each) are synthesized into 4 frequency spectra (4 points each), and so
on. The last stage results in the output of the FFT, a 16-point frequency spectrum.” [46]
DFT and FFT, I will now discuss a number of applications that utilize the FFT as part of
their core architecture and outline techniques for using these applications creatively.
The phase vocoder is one of the most powerful methods of manipulating sounds in the
frequency domain and has been more recently implemented using the FFT for increased
frequency resolution. This process allows for pitch changes without adjusting the length
of a sound file or length changes without adjusting pitch. A phase vocoder, as used for
spaced across the frequency band of interest. Originally, it was hoped such a device
would be able to reduce the bandwidth necessary for the transmission of voice telephony,
Another example of a phase vocoder application would be where a voice signal could be
analyzed for frequency content by a filter bank in real time, and the output applied to a
the original.
40
As a software implementation, pitch-scale modifications can be implemented as a
combination of time-scaling and sampling rate conversion.” [27] For example, to raise
the pitch of a signal by a factor 2, we could first time-stretch the signal by a factor 2 to
increase its duration and then resample it at half the sampling rate. This would restore the
original duration of the sample while also modifying the frequency content of the signal
For time-stretching, the phase vocoder relies on the FFT to extract amplitude and phase
information from 8 to 4096 frequency bands (depending on the desired window size)
with a bank of filters or frequency bins. If time-stretching is desired, phase and amplitude
envelopes are lengthened (or shortened, for time compression), and then passed to a bank
Additional techniques employed by the phase vocoder “allow direct manipulation of the
partial stretching and other exotic modifications which cannot be achieved by the
useful processing tools. Perhaps the most useful tools for creating evocative sounds are
discussed later). Below are outlined steps for performing time and pitch-scale
41
modifications.
To use the phase vocoder in SoundHack for time-stretching, we can first open a sound
file under the ‘File’ menu and then choose ‘Phase Vocoder…’ under the ‘Hack’ menu.
Next, we can set desired parameters such as the number of ‘Bands’ (see above). This
corresponds to the number of FFT filter banks/oscillator pairs desired. After choosing the
‘Time Scale’ radio button, choose ‘Edit Function…’ to open a new window where we
can set our desired stretch amount. In this example, a value of 10.0 is chosen from the
‘Set’ button dialog to specify a desired stretch factor of 10. After hitting ‘Done’ and then
‘Process’ in the main ‘Phase Vocoder’ window, a new sample will be created that
42
stretches the signal from our original sample with a length of 5.8 seconds to
approximately 58 seconds.
SoundHack’s Phase Vocoder. “In this technique, the sound to be shifted is sent is sent
through” an FFT and into “a bank of band-pass filters which are evenly spaced from 0 Hz
to half the sample rate. SoundHack measures the amplitude and phase for each frequency
at the output of this filter bank, These amplitudes, phases and frequencies are used to
control a bank of oscillators. Pitch shifting simply involves multiplying each frequency
by a factor.” [47]
With our sample loaded in SoundHack with the ‘Phase Vocoder…’ window open, we can
achieve pitch-shifting by choosing the ‘Pitch Scale’ radio button. We will now follow
similar steps as above but this time choose a value of 12.0 from the ‘Set’ button within
the ‘Edit Function…’ window. When we process this Hack, a new sample will be created
that will now sound once octave, or 12 semitones above the original.
workstations including Digital Performer, Logic Pro and ProTools. Of particular interest
is TimeToyPro, which allows a user to set variable amounts of stretching and listen to
43
(Figure 17: TimeToyPro allows automation control over time-stretching amount)
As seen below, applications such as Logic Pro include a single window for destructive
(Figure 18: Logic Pro Factory window allows for tempo and pitch editing of a single sound file)
44
For example, Vokator can take the frequency spectrum of two real-time input channel
signals (referred to as ‘A’ and ‘B’) and continuously analyze them over time. The
each frequency band for both signals.” [48] The phase information from input A is then
combined with the spectral envelope from input B to generate a unique signal at the
For example, it is possible to speak into a microphone that is routed to input A of the
vocoder and then shape and control the digitized signal at input B by playing a C major
chord on the included synthesizer into channel B, thus superimposing the vocal signal
(Figure 19: Vokator set up to for a live input on input A to be cross-synthesized with a synth on input B)
45
Cross Synthesis (Convolution and Morphing)
adding together of two sounds to create a new sound. The process of convolution is
accomplished by taking the FFT of two separate signals and then multiplying their
spectra to create a new signal. However, it should be noted that convolution “emphasizes
frequencies which are held in common and greatly reduces frequencies which are not.”
[47] The best results can therefore be obtained when convolving two sound sources that
To perform spectral convolution in SoundHack, we first open a sound file and then
choose ‘Convolution…’ from the ‘Hack’ menu bar. In order to choose a second sound
46
file that will be convolved with the first, choose ‘Pick Impulse’ and scroll to a desired
sample. Finally, hit ‘Process’. In my example (above), I chose to convolve a flute sample
by the same flute sample that was pitch-shifted an octave up from the phase vocoder.
two signals with the ‘Ring Modulate’ option checked, “the spectrum of the product of”
the two samples would consist only “of the sum and difference frequencies of the
original” sounds. For example, “if we multiply a 100-Hz sinusoid by a 10-Hz [sinusoid],
we would expect to hear” a new signal consisting of frequencies of 90 and 110 Hz, each
application that offers easy-to-use functions that produce often interesting and effective
results. Simply pre-loading a desired impulse file followed by a target file and choosing
‘Convolve’ from the ‘DSP’ menu are the only steps required.
Spectral morphing using a real-time plug-in developed by Izotope called Spectron allows
a user to spectrally modify one signal based on the spectrum of another signal. Spectron
percussive audio file with a harmonically-rich audio file. [49] It is also useful when
combining two files similar in harmonic and/or rhythmic content. Using the ‘Morph’
module, found along the bottom Spectron window menu, an audio file can be loaded to
become the target signal that will be morphed with the inputted audio stream from the
DAW. The target signal is continuously looped in the background for processing in real-
47
time.
“As the two signals play, Spectron compares the spectrum or frequency response of the
two signals. When the spectrum of the input signal is different than the target signal,
Spectron adjusts the frequency content of the Input Signal to match the instantaneous
spectrum of the Target signal. For example, if a 100-Hz tone appears in the Target signal
at 0 dB, Spectron looks at the 100-Hz band of the Input signal, and adjusts the gain to be
0 dB. Spectron does this for each frequency band -- up to 2048 individual bands are
48
Spectron also allows the user to control which frequency ranges are morphed by
adjusting a number of nodes within the ‘Morph’ window. Furthermore, a threshold can
also be set to “limit frequencies that the Morph module tries to match, based on their
level.” [49] If a frequency band is located below the threshold, the Morph module will
Convolution Reverb
Over the past few years, a number of convolution reverb software plug-ins have entered
the market that allow a user to add custom impulses to be convolved in real-time with an
incoming audio signal and run inside most of today’s digital audio workstations. In figure
18 (below), Logic Pro’s Space Designer plug-in is used unconventionally by adding any
non-reverb sample as an impulse response to be convolved with an incoming audio
stream. While similar to SoundHack’s convolution functionality where both applications
require the use of the FFT, the implementation found in plug-ins such as Space Designer
and AudioEase Altiverb differ in that processing takes place on real-time signals,
requiring increased but available CPU power.
A custom impulse response can consist of any sampled sound and can be loaded into
Space Designer by simply clicking the “IR Sample” button in the upper left portion of the
screen and adjusting parameters as desired.
49
(Figure 22: Space Designer convolution reverb plugin for Logic Pro)
One of the first software applications to offer complex real-time spectral processing was
Native Instruments Spektral Delay plug-in and stand-alone application. Spektral Delay
allows the user to send the input signal through an FFT of up to 1024 frequency bands
followed by further processing options via effects, filters, delays and feedbacks on each
separate band before resynthesizing the signal back into the time domain.
50
(Figure 23: Spektral Delay user interface)
The ‘Harmonic Rotate’ function (see figure 24 below) “allows the frequency spectrum in
a selected range of audio to be rotated around a horizontal axis, which has the effect of
taking frequencies that were previously associated with one section of a file with a
particular amplitude, and assigning them to different areas of audio with different
amplitudes.” [50]
51
(Figure 24: Harmonic Rotate dialog box in Peak)
Additionally, the user has the option of choosing Real and/or Imaginary calculations.
Finally, a slider and text field are available for setting the desired amount of rotation. The
‘Harmonic Rotate’ tool is available from the ‘DSP’ window after a sound file has been
The next group of plug-ins to be examined are the SoundHack Spectral Shapers bundle
which includes +spectralcompand, +morphfilter, +binaural and +spectralgate.
52
(Figure 25: +spectralcompand plugin from SoundHack Spectral Shapers)
The Spectral Shapers plug-ins can be described as filters that “emphasize the reshaping of
the timbre of sound.” [51] Each spectral shaper uses the FFT to divide “the frequency
spectrum into 513 bands, and processes each band individually.” While all plug-ins can
be used in real-time to process an audio stream, I will focus on two of my favorites from
smoothly adjusted from a compression ratio of 5:1 to an expansion ratio of 1:5. “In
53
expansion mode, +spectralcompand becomes a highly tunable broadband noise remover,
capable of removing hiss, hum and machine noise, without damaging the original sound.
The compression mode can be used as a spectral flattener or re-shaper, changing the
spectral profile of a sound file to another template.” [51] While the invert control is
designed to allow the user to invert the compression or expansion output and hear the
difference between the processed sound and the input, it also is capable of rendering
The +morphfilter plug-in allows the user to draw or learn a filter shape (see “shaping
line” in figure 26 below) using the mouse on the filter graph as well as available plug-in
controls. “Each +morphfilter setting contains two filter shapes and morphing can occur
between them. Filter depth can be applied to increase, decrease or even invert the filter
shape. This is an easily controllable, yet complex, filter plug-in capable of some
(Figure 26: Three of the Spectral Shapers plug-ins have drawable filter graphs)
An included LFO (low frequency oscillator) setting allows for modulation of the filter
54
number, which allows the user the option to morph between the two filter shapes. This
changing of the filter number can result in a smooth fade between the shapes.
Granular Synthesis
prize-winning physicist Dennis Gabor showed that “any sound can be analyzed and
(frequency of the waveform inside the grain, spectrum of the waveform and envelope).”
[52]
the minimum perceivable event time for duration, frequency, and amplitude
sample taken from a sound file or audio stream in the same way that a window is
extracted from an audio file before being applied to a smoothing function and sent though
scattering these acoustic particles in space and in time. While many grains are necessary
55
to create a complex sound, they can be grouped into larger units called clouds, lasting
While a number of methods for organizing grains are available for implementing granular
synthesis, only a few require an FFT as part of their implementation. These include the
following methods: Fourier and wavelet grids (where frequency content is measured
versus time so “each point in the analysis grid is associated with a unit of time-frequency
energy” [38]), and pitch synchronous granular synthesis (where tones are generated “with
one or more formant regions in their spectra” [38]). Additionally, it is common to see
used by current applications, programs such as Reaktor (above) have extremely useful
56
tools for implementing granular synthesis on both existing digital audio samples as well
as real-time audio streams. Composer Jeff Rona states: “Tobias (Enhus) and I both have
taken advantage of [granular synthesis] for projects.” [J. Rona, personal communication,
May 2005]
While not discussed in detail, it should be noted that programming environments such as
Csound, Max/MSP and SuperCollider all offer significant spectral objects for building
custom audio processing tools. Additionally, Ableton Live and Melodyne are applications
featuring efficient and advanced time-stretching and pitch-shifting capabilities for
quickly adjusting audio files.
As film composer and creative sound designer for a 17-minute independent film by Erik
Ryerson entitled Red Letters, I was tasked with the opportunity to compose a score using
After meeting the director Erik Ryerson and viewing Red Letters, it was evident that he
was specifically interested in generally non-pitched, non-melodic material for the score.
In fact, another film composer had previously worked on his film and had written a score
using orchestral sounds. Erik as looking for something different and played for me a
classical piece that was mostly composed of un-pitched, low rumbling. Originally, I was
brought into the project as a sound designer. However, it soon became apparent that a
57
My inclination was therefore to base a non-musical score around techniques utilized in
pre-recorded sounds.” [54] In this regard, I first viewed the film a number of times to
gain a feel for the overall mood of the picture and then decided on some existing sound
sources that I believed would convey the feeling behind Erik’s film.
With the intent of utilizing limited raw sound materials for purposes of unifying the
overall composition, I settled on three audio samples to form a basis for the score. My
goal would then be to process these sounds as much as desired or needed so they would
fit nicely into the film while having only remote similarities to the original sound files
that they were derived from. The raw sound sources I chose to work with were as
follows:
1. 29 seconds from Miles Davis’ introduction solo from the album Jack Johnson.
However, although this piece was played and recorded in the key of D, I used a
convolution between this sample and the Miles Davis sample (above), the result
of combining these samples, when in the same key, would produce improved
58
results as well as create an ambient texture that would have a tonal center (around
Eb).
Once samples were chosen, I went ahead and processed each of them separately as well
as together. This would serve two purposes: to make the samples original-sounding and
could begin to insert into my Logic Pro session, where I was composing to picture using
a Quicktime movie.
goals. I time-stretched each of my three samples to different durations and listened to the
results. For example, my 19-second Miles Davis sample now existed as a 14:45 minute
sequence of bass rumbles, cymbal swooshes, screaming trumpet tones that sounded more
like a roaring elephant and much more. Secondly, my acoustic guitar piece was stretched
to almost 21 minutes and now sounded as rich textures of slowly vibrating strings
centered about various tonalities with dynamically varying passages where a single attack
could last many seconds and decay could last minutes. Finally, my 11-second trombone
sample was stretched to almost 14 minutes. What was a single, unpitched growl now
59
Before I would introduce these samples into my Logic Pro session, I also was interested
time-stretched Miles Davis sample. The result was a new sample exactly the length of the
time-stretched guitar piece (21 minutes) that included harmonic and rhythmic aspects of
both samples but with less direct amplitude variations while exhibiting an overall
envelope that could be described as glassy around the edges. This sample would form the
Scoring Notes
A particularly useful aspect of having longer samples to work with was that a single
portion of a given sample could be extracted from the overall performance and used by
itself as fit for a scene. Specifically during the opening credits of Red Letters, I carefully
chose a section of the newly convolved sample that fit with the cinematography and
added it in Logic Pro while also using a portion of the time-stretched guitar composition
(pre-convolved). I then used volume automation to set desired amplitude envelopes for
each tone. This allowed me to get the exact sound I was after: something dark,
Approximately 1:56 minutes into Red Letters, there is a sequence where the camera fades
from day to night. The director asked me to avoid all “swooshes” that were created as a
natural artifact from time-stretching in SoundHack. With the necessary goal of continuing
60
to use existing material, I would have to find a portion of processed audio or create a new
sustain/decay section to be heard over time that included no noticeably loud artifacts.
In this scene, the camera seems to waver as it fades from day to night and slowly zooms
in as the main character, David sits in his bedroom, saddened by the news of his sister’s
death. This presented an opportunity compositionally where I could create a new tone
that would have a slightly unsettled feeling while not exhibiting any of these undesirable
artifacts. I did this by first taking a portion of my convolved sample and making a backup
copy of a select number of seconds of the sample in Logic Pro. After ensuring that this
newly created sample was added back into my project via the ‘Audio’ window, I opened
the file in Logic’s ‘Sample Editor’ window, reversed the sample, pitch-shifted it slightly
When added back into my project, I had performed the desired amount of processing
where my sound very much resembled that of earlier used tones and could be cross-faded
with them to give the listener a sense of natural unfolding of sound. However, the portion
of this new sample would offer a subtle trembling in the audio as the camera wavered and
The next significant challenge in composing for Red Letters was to underscore the diner
scene, where the supporting character, Chloe, tells David how her father had been shot
and killed by her mother as she watched as a young girl. While Chloe explains to David
61
that her mother is not dead as previously assumed, and proceeds to detail the gory events,
I was asked to add an ominous sounding ambience that would slowly draw in the
viewers’ attention while increasingly produce a feeling of departure from the general
surroundings of the diner. I accomplished this in two ways. First, I inserted the trombone-
effect sample (described above) by allowing the time-stretched attack to start at the
beginning of Chloe’s monologue and slowly crescendo until abruptly fading as she
eventually and abruptly changes the subject. Additionally, with the goal of dissipating the
natural ambience of the diner, I inserted an instance of the Space Designer convolution
reverb plug-in that comes native with Logic Pro. However, instead of using an impulse
sample of street and residential traffic noise with footsteps, typical of what might be
heard outside the diner, as an impulse response to be convolved with the naturally
recorded ambience in the film. The result of this created a quieter, slightly muffled, yet
organic ambience. I gradually faded the convolution effect over the diner ambience
The climax of Red Letters sees David confronting his past as he steps through the scene
of the crime described earlier by Chloe. For this scene, the director had previously
inserted a temp-score using ambience from the band Shellac as well as a number of
additional sources. This scene proved to be the most difficult to score since Eric already
was quite satisfied with the sounds over the scene but did not want to create any licensing
issues by using existing licensed material. I therefore had to create something completely
62
original that used complex but similar processing techniques as those used in the temp
score.
My first inclination was to re-process existing materials. However, any sort of spectral
processing using existing sounds did not seem to work since it seemed to take much of
the excitement out of the existing ambience. After reworking the scene and further
discussing ideas with Eric, I decided to create completely new audio material and then
As the lead character David first enters the apartment where his father has been shot, it
was necessary to showcase the fact that the mother was flipping through stations on the
television set, mostly consisting of static. My first step was therefore to create a
background static that would gradually become louder as David walked closer to the
living room. I began by placing a different section of the time-stretched trombone sample
used in the diner scene into this scene but with a very low volume so that there would
remain a significant amount of headroom for additional sounds and layers to be added.
I also set up a large-diaphragm microphone in my living room and flipped through some
static on my television set while recording to give some ‘realistic’ ambiance to the scene
as well.
In order to capture additional elements that could be heard above the static of the
television and trombone effect, and to mimic certain sounds desired from the Shellac
63
1. pressure from our water heater that was causing a small metal washer to vibrate
2. the evening news on the television that included voices but no music.
surprise bark from our then three month old puppy Miles.
4. myself whispering the types of thoughts that I imagined David might be having as
whispered recording and sending the water heater sample through a heavy phase
distortion with some LFO modulation via Logic, I was also able to successfully process
my television news recording using the ‘Varispeed’ function in SoundHack. While this
did not make the news recording completely unrecognizable, the ‘Varispeed’ function did
perform enough of a variable time-shift to make it impossible to tell what was being
presented in the dialog. Finally, by inserting the Spektral Delay plug-in with the ‘smear’
algorithm enabled over my recorded cell-phone beeps, I was able to create a rhythmic,
yet crackling pulse that would effectively build tension as the scene would grow to a
climax before David’s mother looks over at David before he faints back to reality.
Finally, at various hit points in the film, I also utilized the familiar time-stretched acoustic
guitar sample. Careful placement and mixing of all above-described sounds would make
for a complete ambience that would properly complement the events onscreen and satisfy
64
The final piece of music required for the score to Red Letters would be faded in towards
the final shot of the film and last throughout the credits. While Erik gave me the option of
creating a theme song for the film, I decided that the theme would more appropriately be
something that was time-stretched. I chose to take a short sample from an electric guitar
performance, reverse the sample and time-stretch it using an application called TimeToy
Pro. The reason I chose to go with this time-stretching program over SoundHack or Logic
Pro’s stretching algorithm was that I believe TimeToyPro to have a cleaner sounding
time-stretching algorithm that creates fewer artifacts. While the artifacts created from
using SoundHack were desired in my raw processed files, I felt that a more resolved tone
would be effective for the final credits and would complement other pre-existing sound
material that would also be used for this portion of the film. I used this sample in
conjunction with the convolution sample used at the beginning of the film.
Part V: Conclusion
The advent of the Fast Fourier Transform (FFT) has brought both the potential and
eventual birth of a new era of DSP applications based on spectral processing. Since the
1980s, programs such as Csound, Max/MSP, Kyma and SoundHack continue to provide
composers and sound designers a means for performing spectral analysis and re-synthesis
on digitized audio files, first through processing of non-realtime audio files, and more
recently on real-time digital audio streams. This can be generally attributed to the
65
Applications that can run inside a digital audio workstation’s plug-in format are
time audio and MIDI file playback. For example, Native Instruments Vokator and
Spectral Delay plug-ins, SoundHack’s Spectral Shapers plug-ins and Izotope’s Spectron
plugin as well as the numerous convolution reverb plug-ins such as Apple’s Space
Designer and Audio Ease Altiverb have come to market and are readily available for use
as composition and sound design tools. Additionally, almost all commercial digital audio
including time stretching and expansion and pitch-shifting to name a few. Finally,
programs such as SoundHack, and Time Toy Pro offer the user advanced functionality
for performing a variety of spectral processing tasks. To conclude, it has only been over
the past two years that the power and potential of spectral processing has finally reached
maturity.
One of the greatest advantages of using spectral processing techniques in film is that it
allows a composer to take organically created sounds and create new, unfamiliar sounds
or groups of sounds from these reference files. Additionally, from the listener’s point of
view, there is oftentimes no way to tell exactly what they are hearing, thus no “direct
emotional connection” can be made from past experience with a particular and otherwise
familiar sound. [J. Rona, personal communication, May 2005] This makes spectral
processing ideal when scoring for films where general settings including but not limited
may be desired.
66
Additionally, spectral processing also lends itself as an ideal bridge between sound
design and film scoring. Due to the potentially un-pitched nature of many spectrally
effectively with environmental ambience while not drawing too much attention to itself.
Since music in film is often most effective when it is hardly noticeable but is rather
Finally, spectral processing can also be quite effective as a means for creating sound
effects. By utilizing many of the techniques discussed in this paper, it is possible when
using the correct source material to create a range of percussive, screetching or industrial
67
Appendix I: C source code for DFT
/*
* DFTCalculation.h
* DFT
*
* Created by Mike Raznick on Thu Feb 24 2005.
* 2005 Solarfunk Studios.
*/
#define LENGTH 16
68
/*
* DFTCalculation.c
*
* Created by Mike Raznick on Thu Feb 24 2005.
* Adapted from Richard Moore’s
* Elements of Computer Music
*/
#include "DFTCalculation.h"
#include <stdio.h>
#include <math.h>
main() {
int N = LENGTH; // DFT window size
double pi2oN = 8. * atan(1.)/N; // (2 pi/N)
double pi2 = 8. * atan(1.); // (2 pi)
float x1[LENGTH], x2[LENGTH]; // stores time domain waves
complex X[LENGTH]; // stores re and im spectral values
int n; // current sample index: from 0 - N-1
// The complex spectrum (real and imag parts) X(k) of a waveform x(n)
// computed with the DFT may be equivalently expressed as a pair of
// real-valued functions of frequency known as the amplitude
// (magnitude) spectrum and the phase spectrum
69
// Ex. 2: print amp and phase from real and imag. parts on 2D plane
printf("\n Pre-DFT| Spectral values (re and im) | Ampl and Phase");
printf("\n ---------------------------------------------------\n");
for (n = 0; n < N; n++) {
printf("%2d: x1 = %6.3f, X(re, im) = (%6.3f, %6.3f), A = %6.3f, P =
%6.3f\n", n,
x1[n], // print time domain plot
X[n].re, X[n].im, // real and imag spectrals
sqrt( (double) X[n].re*X[n].re + X[n].im*X[n].im),// Amplitude
360.*atan2( (double) X[n].im, (double) X[n].re ) / pi2);// Phase
}
}
X[k].re /= N; // real / N
X[k].im /= N; // imag / N
}
}
70
// idft function ----------------------------------------------------
void idft( float x[], int N, complex X[])
{
double pi2oN = 8.*atan(1.)/N; // (2 pi/N)
double imag;
int k; // freq. index = N samples of spectrum
int n; // sample index = time-domain waveform
71
References:
1. Moore, Richard F., Elements of Computer Music, Prentice Hall, Englewood Cliffs, New Jersey
(1990)
2. Smith, Steven W., The Scientist and Engineer's Guide to Digital Signal Processing Second
Edition,
3. IEEE History Center, Alan Oppenheim Oral History Transcript (Sept 13, 1999)
http://www.ieee.org/organizations/history_center/sloan/ASSR_Oral_Histories/aoppenheim_transc
ript.htm
4. Doornbusch, Paul, Computer Sound Synthesis in 1951: The Music of CSIRAC, Computer
Music Journal, Vol. 28, Issue 1 - March 2004
5. Boulanger, Richard (Editor), The Csound Book: Perspectives in Software Synthesis, Sound
Design, Signal Processing, and Programming, MIT Press (March 06, 2000)
6. Stone, Susan: The Barrons: Forgotten Pioneers of Electronic Music NPR Morning Addition
(Feb. 07, 2005) http://www.npr.org/templates/story/story.php?storyId=4486840
7. Jerry Goldsmith Online, Soundtrack Release: Planet of the Apes (Expanded) / Escape from the
Planet of the Apes (Suite), (1997)
http://www.jerrygoldsmithonline.com/planet_of_the_apes_expanded_1997_soundtrack.html
8. Prendergast, Roy, Film Music: A Neglected Art, W. W. Norton & Co., New York (1977)
12. Barry Vercoe's History of Csound... The Csound Book: Perspectives in Software Synthesis,
Sound Design, Signal Processing, and Programming, MIT Press, (March 06, 2000)
13. Burns, Dr. Kristine H, History of Electronic and Computer Music Including Automatic
Instruments and Composition Instruments, Florida International University (2004)
http://eamusic.dartmouth.edu/~wowem/electronmedia/music/eamhistory.html
15. Edge, Douglas, Interview with Tom Erbe: Sound Hacker, audioMIDI.com
(May 12, 2004)
16. Film, Television, and Radio News: Here are just a few of the sound tracks where you can
hear Kyma in action..., (2003) www.symbolicsound.com
72
17. Press Release: Opcode Releases fusion: VOCODE Effects Plug-In, Harmony Central
(September 26, 1997) http://news.harmony-central.com/Newp/103AES/Opcode/Vocode.html
18. Westfall, Lachlan, Computers in the Movies: How Desktop PCs Help Create Hollywood's
Amazing Music and Sound Effects, Music & Computers 4:4 (May-June 1998)
19. Harmony Central, Cycling '74 Releases Pluggo: Technology That Enables Custom VST Plug-
Ins Comes With 74 Plug-Ins, (May 3, 1999) http://www.harmony-
central.com/Events/MusikMesse99/Cycling74/Pluggo.html
20. Makovsek, Janez, FFT Properties 3.5 Spectrum Analyzer Tutorial, Dew Research (2003)
21. Preater, Richard W. T.; Swain, Robin C., Fourier transform fringe analysis of electronic
speckle pattern interferometry fringes from high-speed rotating components, Optical Engineering
(1994)
22. Multi-Semester Interwoven Project for Teaching Basic Core STEM Material (Science,
Technology, Engineering, Mathematics) Critical for Solving Dynamic Systems Problems,
Dynamic Systems Tutorial: Fourier Series, NSF Engineering Education Division Grant EEC-
0314875 (2004)
24. Lyons, Richard, Windowing Functions Improve FFT Results, Part I, Test & Measurement
World (June 1998)
25. Dobson, Richard, The Operation of the Phase Vocoder: A non-mathematical introduction to
the Fast Fourier Transform, Composers' Desktop Project, (June 1993)
26. Keith, Murphy and Butler, FFT Basics and Applications, Oregon State university (2005)
http://me.oregonstate.edu/classes/me452/winter95/ButlerKeithMurphy/insth.html
27. Laroche, Jean, Time and pitch scale modification of audio signals in Applications of Digital
Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg,
Eds. Kluwer, Norwell, MA, (1998)
28. Laroche, Jean and Dolson, Mark, New Phase-Vocoder Techniques For Pitch-Shifting, Other
Exotic Effects, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
New Paltz, New York (October 1999)
29. Baxter, Michael, Csound: An Interview with Dr. Richard Boulanger by Michael Baxter, Linux
Gazette, Issue 96 (November 2003) http://www.linuxgazette.com/node/125
30. Ramirez, Robert W., The FFT Fundamentals and Concepts. Englewood Cliffs, NJ: Prentice-
Hall, Inc., (1985)
31. The Editors Guild Magazine, Gary Rydstrom and Michael Silvers: Finding Nemo, Vol. 25,
No. 3 (May/June 2004)
http://www.editorsguild.com/Newsletter/MayJun04/best_sound_editing/fn_sound_edit.htm
73
Cycling 74 (2004) http://www.cycling74.com/community/kleinsasser.html
34. Brown, Royal S., Overtones and Undertones, University of California Press, Berkeley, Ca.
(1994)
35. Broesch, James D., Digital Signal Processing Demystified (Engineering Mentor Series),
Newnes (March 1, 1997)
36. Bringham, E. Oran, The Fast Fourier Transform, Prentice-Hall, New Jersey (1974)
37. Lyons, Richard, Understanding Digital Signal Processing, Prentice Hall PTR (2001)
38. Roads, Curtis, the computer music tutorial, The MIT Press, Cambridge, Mass. (1996)
39. Bernsee, Stephan, Tutorial: The DFT à Pied - Mastering Fourier in One Day, The DSP
Dimension (1999) http://www.dspdimension.com
40. Angoletta, Maria Elena, Fourier Analysis Part II: Technicalities, FFT & System Analysis,
AB/BDI (February 27, 2003)
41. Gerhardt, Lester A. & Zou, Jie, Lecture Notes: Short Time Fourier Analysis, RPI ECSE,
Rensselaer’s School of Engineering (2004)
42. Bores Signal Processing, Introduction to DSP: Frequency analysis: Fourier transforms
(2004)
43. Bracewell, Ronald N., The Fourier Transform and Its Application, McGraw-Hill (1986)
44. Johnson, D., Fast Fourier Transform (FFT), (April 15, 2005) Retrieved from the Connexions
Web site: http://cnx.rice.edu/content/m10250/2.14/
45. Cooley J W & Tukey J W., An algorithm for the machine calculation of complex Fourier
series, Mathematics of Computation (April 1965)
46. Smith, Steven W., The Scientist and Engineer's Guide to Digital Signal Processing: The Fast
Fourier Transform, California Technical Publishing (1997) http://www.dspguide.com/
47. Erbe, Tom, SoundHack Users Manual, Version 0.888, School of Music, CalArts
48. Haas, Joachim and Sippel, Stephan, Vokator Operation Manual, Native instruments Software
Synthesis (2004)
50. Wheatcroft, Zac, Berkley, Steve and Bennett, Bruce, Peak Version 4.0 Software User’s
Guide, BIAS (Berkley Integrated Audio Software), Inc., Petaluma, Ca. (2003)
74
52. Alexander, John and Roads, Curtis, Granular Synthesis, Keyboard (June 1997)
54. Dodge, Charles and Jerse, Thomas, Computer Music: Synthesis, Composition and
Performance, Schirmer (1997)
Acknowledgements
To Jeff Rona for inspiration, for giving me the opportunity to interview him regarding
personal aesthetics and ambient film-scoring techniques, for taking the time to review
75