[go: up one dir, main page]

0% found this document useful (0 votes)
163 views75 pages

Spectral Processing

This document provides an overview of spectral processing techniques and their applications for film scoring and sound design. It discusses the development of computer music and the fast Fourier transform (FFT) algorithm. While early electronic instruments and synthesizers were used in some film scores starting in the 1940s, widespread use of digital spectral processing tools was not possible until the 1990s due to limitations in computer processing power and software. The document reviews FFT theory and popular spectral processing applications like the phase vocoder, and provides a case study of its author's scoring process using these techniques for the film "Red Letters".
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views75 pages

Spectral Processing

This document provides an overview of spectral processing techniques and their applications for film scoring and sound design. It discusses the development of computer music and the fast Fourier transform (FFT) algorithm. While early electronic instruments and synthesizers were used in some film scores starting in the 1940s, widespread use of digital spectral processing tools was not possible until the 1990s due to limitations in computer processing power and software. The document reviews FFT theory and popular spectral processing applications like the phase vocoder, and provides a case study of its author's scoring process using these techniques for the film "Red Letters".
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Spectral Processing for

Ambient Film Scoring and


Sound Design
By: Mike Raznick

Submitted in partial fulfillment of the requirements for the


Masters of Music in Music Technology
in the Department of Music and Performing Arts Professions
in the Steinhardt School of Education
New York University

Advisors: Kenneth J. Peacock, Robert J. Rowe


Spring 2005

1
Spectral Processing for
Ambient Film Scoring and Sound Design

Table of Contents

Part I: Introduction - Computer Music and the FFT in Film ……………... 3


• Computer Music Comes of Age ………………………………………... 7
• Spectral Processing In Major Motion Pictures ………………………… 9

Part II: The Fourier Transform – Declassified ……………………………... 12


• Trigonometric Functions – The Sine Wave ……………………………. 13
• Digital Waveform Representation ……………………………………... 17
• The Fourier Series ……………………………………………………… 18
• The Fourier Transform …………………………………………………. 20
• The Discrete Fourier Transform ……………………………………….. 23
• Windowing of Non-Periodic Signals …………………………………... 25
• Software Implementation of DFT ……………………………………… 28
• The fast Fourier Transform …………………………………………….. 36

Part III: Current FFT-based Software Applications for the Mac ………… 40
• The Phase Vocoder for Time-Stretching and Pitch Shifting …………… 40
• Cross Synthesis (Convolution and Morphing) …………………………. 46
• Convolution Reverb ……………………………………………………. 49
• Frequency Band Manipulation and Filtering …………………………... 50
• Granular Synthesis ………………………………………………….….. 55
• Other Applications for Spectral Processing ……………………………. 57

Part IV: Scoring for Erik Ryerson’s “Red Letters” ………………………... 57


• Initial Processing of Raw Material ……………………………………... 58
• Scoring Notes …………………………………………………………... 60

Part V: Conclusion ……………………………………………………………… 65

Appendix I: C source code for DFT ………………………………………….. 68

2
Part I: Introduction - Computer Music and the FFT in Film

In 1965, the discovery and subsequent publication of the fast Fourier transform (FFT) by

James W. Cooley and John W. Tukey were instrumental events in the establishment of

the field of digital signal processing (DSP). However, of particular significance was the

fact that the implementation of the FFT algorithm, for the first time, validated the use of

computers for accomplishing long and complex tasks. In fact, while the very technique

of implementing the FFT as described by Cooley and Tukey had previously been

discovered and used by German mathematician Karl Friedrich Gauss (1777-1855) in his

own work, it “was largely forgotten because it lacked the tool to make it practical: the

digital computer.” What makes Cooley and Tukey’s discovery most memorable is the

fact that “they discovered the FFT at the right time, namely the beginning of the

computer revolution.” [2]

Alan Oppenheim, professor at MIT recalls, “The excitement really got generated when

the Cooley-Tukey paper came out. When the FFT hit, then there was a big explosion,

because then you could see that by using a computer you could do some things incredibly

efficiently. You could start thinking about doing things in real time... The birth of the

FFT was a very significant event.” [3]

However, while spectral processing applications were not to be widely available for use

in digital (music) applications until many years after the discovery of the FFT by Cooley

and Tukey, the framework was quickly falling into place. By the mid-1960s, research

concerning music and computers had been underway for almost ten years at facilities

3
such as Bell Laboratories. In fact, it had been reported that music was generated using

computers as early as 1951. The “Australian-built ‘‘automatic computer’’ initially known

as the CSIR Mk1, and later known as CSIRAC, was one of the world’s earliest stored-
program electronic digital computers” and was programmed to play popular musical

melodies. [4]. Later, in the form of a 17-second piece, the first computer-music

composition was generated using a IBM704 computer in 1957 by Max Mathews of Bell

Laboratories [5]. From this period, early telecommunications experiments led to a

quickly evolving computer music community nurtured by the academic and scientific

communities.

As the fledgling DSP and computer-music communities continued to grow and develop

through the 1960s, the Hollywood motion picture industry during this time relied almost

exclusively on analog or natural sound design elements as well as orchestral instruments

for realizing both film-composition and sound design. Computer- (and FFT) based

applications for these purposes would not be available to composers for over twenty

years to come. Although the ability to process digital audio signals (and algorithms such

as the FFT) existed, computers were not yet fast enough to perform the many thousands

of computations needed to process digital audio by today’s standards. Additionally,

computers were not yet readily available to the mainstream and lacked general-purpose

software applications for processing audio.

Within the film industry, a small number of productions over time had featured newly

available tools for added visual and aural affect. Specifically, with regards to sound, as

early as the 1940s, film productions had been known to utilize unconventional electronic

4
musical instruments and sound effects for a variety of compositional and sound design

purposes.

The first example where a Hollywood motion picture featured the use of an electronic

musical instrument can be heard in 1945, as part of the soundtrack for Alfred Hitchcock's

film "Spellbound.” Miklos Rozsa’s use of the theremin, with its unsteady electronic and

constantly modulating timbre, added an intentional and particular eeriness to enhance

specific scenes in the film. Following in 1956, the science fiction thriller “Forbidden

Planet became the first major motion picture to feature an all-electronic film score.”

Louis and Bebe Barron composed the score using only a tape recorder that they had

received as a wedding gift. Louis Barron also reportedly “built electronic circuits, which

he manipulated to generate sounds.” [6] Next, in 1963, Alfred Hitchcock's "The Birds"

featured the use of the “Trautonium, a novel musical instrument that produced many of

the sound effects.” The Trautonium, a “precursor to the synthesizer”, was “billed as the

world's first electronic musical instrument on its invention in 1929.” [53] Finally, Jerry

Goldsmith’s memorable score for “Planet of the Apes” in 1968 was also important for its

use of an Echoplex, which created percussive delay effects from pizzicato strings. [7]

During the late 1960s, synthesizers became important musical composition and

performance tools. Factors that led to the synthesizer’s continued use in film scores

included their gained popularity in the pop industry as well as their relative cost-

effectiveness against that of hiring a 60-piece (or more) orchestra. [8] This phenomenon

led to numerous electronic film scores [34] such as the award-winning score for Giorgio

5
Moroder’s 1978 classic “Midnight Express”, which featured the use of the synthesizer for

the creation of ambient and electronic music. With the proliferation and abundance of

synthesizers and later MIDI, it was soon considered a common occurrence for a film

score to feature synthesizers as an alternative to the traditional symphony orchestra.

Additionally during the 1960s, a specific type of synthesizer called the phase-vocoder

was developed that relied on a crude form of spectral analysis. The vocoder synthesizer

was first used compositionally in the 1971 film “A Clockwork Orange” and additionally

for sound-design purposes in films such as the 1978 classic “Star Wars.” As an

application, a vocoder can be defined as “an electronic device for analyzing and

resynthesizing sounds. It was originally developed in 1936 at the Bell Telephone

Laboratories in Murray Hill, New Jersey, for telephonic applications” and later used by

the military [9]. Specifically, hardware-based vocoders utilized analogue filters

constructed using capacitors and inductors [10] to divide up the incoming signal into

between ten and twenty frequency bands before processing and re-synthesis of the signal

occurred. While early phase-vocoder applications were not designed to use an FFT as

part of it’s implementation, future software implementations would incorporate the FFT

as a technique for dividing the incoming signal into as many as 512 or 1024 frequency

bands for improved frequency resolution.

As a musical application, the phase-vocoder was first incorporated into synthesizers such

as the Siemens Synthesizer around 1960, and was further implemented in popular models

such as the Korg VC-10, Roland VP-330, and Moog Vocoder. [11]

6
Computer Music Comes of Age

Through the 1970s, computer music programs were written for large mainframe

computers in variants of assembly language and Fortran. This meant that there was no

portability of code between computers. If a software application was written in one

variant of assembly language that was specific to a single computer or group of

computers, and that computer ceased to be further developed, the language faced the

possibility of becoming obsolete. A program written for an obsolete language would then

require the potentially monumental task of a complete re-write of the code. However, it

became clear by 1984 that “microprocessors would eventually become the affordable

machine power, that un-ported assembler code would lose its usefulness, and the ANSI C

would become the lingua france.” [12] Tom Erbe recalls that up until 1987, he was

mainly “working with mainframes – these million dollar computers running long signal

processing jobs.” [15]

In 1986, Csound, a Music V variant, written by Barry Vercoe of MIT, and the Max

program, a similar application developed at IRCAM by Miller Puckette for working with

audio and MIDI, were written in the C language and publicly released. A graphical

version of Max would later be released for the Macintosh and would be MIDI

compatible. [13]

In 1987, Apple introduced the Mac II. This would mark the first time that a consumer-

level personal computer would ship with the required level of CPU power needed to

calculate DSP algorithms such as software utilizing the FFT. This would quickly lead to

7
support for and commercial release of numerous sound/music production and

manipulation applications. Applications such as Cycling74’s Max/MSP, James

McCartney’s SuperCollider as well as built-in spectral processing tools that would be

packaged within music production software such as ProTools (released as Sound Tools

by Digidesign in 1989), Logic (1992) and Performer would soon provide composers and

sound designers with an entire set of tools for performing a number audio-based spectral

processing transformations such as time-stretching, pitch-shifting, FFT-based phase-

vocoding, morphing as well as convolution on digitized audio.

It was not until 1991, however, before two important applications would be released on a

consumer level for the PC or Macintosh that would include a number of spectral

processing algorithms. First, Csound would once again be ported, this time to Microsoft

DOS and available for purchase. Of particular significance was that this version of

Csound introduced the inclusion of spectral data types for sensing and analyzing audio

input. [14] Additionally, Tom Erbe’s SoundHack would be released as freeware for the

Macintosh platform.

With the availability of these and other applications that were specifically designed to run

on personal computers, the potential for artistic creativity and editing with audio-based

software running on personal computers had reached a new plateau where composers and

sound designers had direct and quick access to spectral processing tools for creating and

manipulating new and existing digitized sound files.

8
Spectral Processing In Major Motion Pictures

The first notable example where FFT-based processing was used as a technique for

ambient film scoring can be taken from Jeff Rona’s compositions in the 1997 motion

picture “Traffic.” Specifically, FFT-based processing techniques such as phase-vocoding

and time-stretching were used. Software applications that were heavily utilized include

Csound, Logic and Max/MSP. [J. Rona, personal communication, May 2005]

Additionally, a number of more-recent films such as Exit Wounds, Black Hawk Down,

Mothman Prophecies, Traffic, Narc and the TV series The Dead Zone include musical

scores suggesting the use of spectral processing techniques. [33] For example, composer

and sound designer Tobias Enhus assisted in composing the score to Narc in 2002 by

creating musical atmospheres using applications such as Kyma and CSound by

processing struck-metal source material: turbines, metal sheets, steel drums, and even the

suspended back end of a fork lift. Starting from these non-harmonic sources, Tobias used

a spectrum editor and tuned filter banks to create atmospheres that matched the key of the

musical score. [16] Jeff Rona, who did significant film scoring for Black Hawk Down,

Mothman Prophecies, Traffic as well as television programs such as The Dead Zone

states, “I've used … Logic, Reaktor, Max/MSP, Kyma, Peak, SoundHack, and every

plug-in I can get my hands on.” [J. Rona, personal communication, May 2005] All of

these programs mentioned above offer various types of spectral processing functions.

While it seemed that the full potential of spectral processing had been tapped, in 1997, a

company called Opcode released a virtual software instrument/effect unit plug-in called

“Fusion: Vocode” that would run on either a Macintosh or PC platform inside a number

9
of host applications. [17] Fusion: Vocode not only provided basic spectral processing

(most-likely the 1936 non-FFT-based algorithm), but also presented a case where a

software instrument could now replace it’s soon to be obsolete hardware counterpart. For

example, film-composer Rob Arbittier in 1998 reported the benefits of this: ''I mostly use

plug-ins as outboard effects toys…Opcode Fusion: Vocode can do some really cool

effects, I usually send a sound to it, make a new .WAV file out of it, and then use that file

as an element in the Cubase sequence. I used to bring a sound in, manipulate it, and then

send it back out to a sampler: now I can just do it all in my computer.'' [18]

In March 1999, Cycling74, using objects from their already available Max/MSP

architecture, released a set of 74 VST plug-ins bundled into a single offering called

Pluggo. Included in this release were spectral modification plug-ins. This event presented

the first instance where a real-time FFT could be performed on an incoming audio signal

inside a popular digital audio workstation, perform spectral processing based on user

input, and then transform the signal back into the time-domain output signal. For the first

time, composers and sound designers were able to take full advantage of the tremendous

potential that the FFT algorithm provides due to the significant boosts in available CPU

processing power of available computers.

The Kyma workstation from Symbolic Sound is a visual sound design environment that

runs alongside a dedicated processing computer called the Capybara and has been used in

many types of sound media including film since the 1990s. Using a number of included

real-time spectral functions included in Kyma, sound designer Gary Rydstrom, in

10
working on the film ‘Finding Nemo,’ states: “Kyma allowed me to modulate and morph

my own voice into other sounds,” [31] Rydstrom explains. “A lot of the ocean ambience

was just me making a sound and using that to modulate a whole variety of sounds so I

could shape them into tonalities that I liked. None of my actual voice sounds are heard.

But, I could run sounds through the Kyma Vocoder and shape the sound of the water into

something interesting.” [31] Rydstrom said, “By growling into the microphone, I could

use the Kyma to create the sound of the ‘growling’ water rush as Bruce, the shark,

whooshed by.” [31]

In 2001, Native Instruments’ announced their first real-time effects plug-in called

Spektral Delay for the Macintosh and Windows operating systems. Spektral Delay, likely

the first commercial plug-in to perform complex spectral operations in real-time, would

reportedly split a channel into a maximum of 160 separately modifiable frequency bands,

each of which would have individual delay and feedback settings, an input attenuation

filter, and the ability to apply modulation effects to the various parameters. Spektral

Delay would not actually be released until 2003, the same year in which a number of

other plug-ins and stand-alone applications that performed real-time spectral processing

would come to market.

While only a few years have passed since real-time spectral processing tools for audio

manipulation have become available that will run inside today’s digital audio

workstations, the FFT has come to the forefront of digital audio via full-blown software

applications such as Sony’s Acid and Ableton Live, which offer advanced real-time time-

11
stretching and pitch-shifting algorithms to control and match any desired tempo for

sample loop-based playback.

As a topic for this graduate project, I will discuss the inner-workings of the FFT, closely

examine a select number of current representative applications that utilize FFT-based

algorithms, examine aesthetics behind spectral processing in motion picture films, and

discuss personal, aesthetic and technical choices as a film composer-sound designer for

using such tools in Erik Ryerson’s Graduate thesis film, “Red Letters”. I will also include

an example of a discrete Fourier transform (DFT) software implementation.

Part II: The Fourier Transform - Declassified


Jean Baptiste Fourier, born in 1768, demonstrated that a “mathematical series of sine and

cosine terms can be used to analyze heat conduction in solid bodies.” [36] More

generally, “he made the remarkable discovery that any periodic waveform can be

represented as a sum of harmonically related sinusoids, each with a particular amplitude

and phase.” [1] Additionally, Fourier “derived a mathematical expression that” allows a

transformation between a time-domain waveform and its frequency spectrum, precisely

showing the amplitudes and phases of the sinusoids that comprise it.” This was likely the

first systematic application of a trigonometric series to a problem solution. Fourier would

[eventually] expand it to include the Fourier integral. [36]

The sections below will explore the mathematical concepts behind the Fourier transform,

leading to full understanding of how it is implemented. As an example, I will examine at

C language implementation of the discrete Fourier transform (DFT).

12
Trigonometric Functions – The Sine Wave

Before taking an in-depth look at the Fourier transform, it will first be necessary to define

some of the trigonometric functions that form the basis of Fourier analysis. In looking at

the below representation of a circle, π can be defined as 3.14159265… and is derived by

the ratio of the circumference of a circle over the diameter of a circle. Additionally, the

radius of a circle is defined as half its diameter (cross section). Therefore, there are 2π

radians in a full circle where radian is a natural measure for angles and is based on π. To

represent a right angle (90 degrees of a possible 360), we can take one quarter of 2π

radians and get π/2 radians.

(Figure 1)

13
Next, we will define the sine of an angle θ as “the ratio of the length of the side of the

right triangle opposite angle θ (O) to the length of its hypotenuse (H).” [1] This can be

expressed as follows:

The above expression allows us to generalize that an angle can be measured by increasing

counterclockwise from 0 degrees on the positive horizontal axis (x) on the circle (see

figure 2 below). When the opposite side (O) of the angle θ points upward from the

horizontal axis (between 0 and 180 degrees), its length is expressed as a positive number.

However, when it points downward (for angles measuring between 181 degrees and 359

degrees), its length can be expressed as negative. [1] By graphing the sine of angle θ as it

travels around a circle (as illustrated below), we can see that our angle traces one full

(Figure 2: the opposite side (O) of the angle θ traveling around circle from 0º)

14
cycle of a sinusoidal curve as the angle increases from zero degrees to a full circle (360

degrees or 2π radians).

While it can be seen how to trace a sinusoidal waveshape, in order to create a sinusoidal

waveform, we must factor in a unit of time, represented in seconds (t ). This allows us the

ability to further graph our sine function in terms of frequency (f ), defined as the number

of times O travels around the circle per second (measured in Hertz). Now, if a circle

whose radius is given an arbitrary length of one (known as a unit circle), the product 2π

multiplied by t will travel around the circle as t goes from zero to one, giving us a direct

calculation for determining frequency as related to time in seconds. [1]

It can therefore be observed that the quantity 2πft goes through exactly f cycles each time

t increases by one unit. This can be viewed as a sinusoidal waveform and is presented by

the non-bold of the two wave-shapes in figure 3 (see page 17).

Before accomplishing the goal of fully representing an arbitrary sine wave, there are two

final points that must be taken into consideration. The first is the starting point for the

rotation of the waveform when t is equal to zero. This is known as the phase offset and is

represented by φ. It is important to note at this time that if a sinusoidal waveform has a

phase offset measuring one-quarter of a circle (90º or π/2 radians), it is known as a cosine

waveform. This can be shown using the following expression:

15
Finally, in order to limit the peak amplitude of a sinusoid to A (a normal sine curve), we

must consider a peak-amplitude scaling factor. With the final remaining considerations

for a time-domain waveform representation, we can construct a sinusoidal waveform

showing amplitude (A), frequency (f ), and phase offset (φ ) described as a function of

continuous time (t ) according to the calculation:

f(t) = A sin(ωt + φ)

It should be noted that going forward, we substitute ω (the Greek letter omega), known as

the radian frequency to represent 2π multiplied by the frequency f.

As detailed above, the sine and cosine waveforms are quite similar, the only exception

being a 90-degree phase offset. Based on these similarities, they can be extremely useful

when considered together since sine and cosine functions exhibit the properties of

oddness and evenness [1]. An odd function can generally be defined as that which can be

inverted when reversed. A sine is an odd function of angle θ. An even function can be

defined by its retrograde being identical to the original. A cosine is an even function of

angle θ. As will be discussed, these properties of sine and cosine functions are important

since they “may be combined to represent amplitude and phase information” in a

frequency-domain representation of a complex waveform. [1]

An important point to consider is: if we combine any two sinusoidal waveforms

measuring the same frequency, the resulting waveform will also be sinusoidal in shape

and have the same frequency, but with a potentially different amplitude and/or phase

offset. This can be expressed as follows:

16
A sin(ωt + φ) = a cosωt + b sinωt

The above calculation shows that if the phase offset (φ) is equal to a value of zero, than a

will also be equal to zero and b will be equal to A. However, if φ equals π/2 radians (90-

degree offset), then b will equal zero and a will be equal to A.

(Figure3)

Digital Waveform Representation

Until now, only properties of analog or continuous waveforms have been discussed. A

digital signal differs from an analog signal in that there must be a discrete set of linear

time-invariant sample values representing that signal. [37]

Contrary to our previous discussion of analog signals, a digital signal can be described

mathematically be replacing the continuous time variable (t ) with a new set of variables

nT, which represent discrete integer values for the current sample number of a digital

17
time-domain signal x(n). [1] In this regard, T is defined as the sample period (equal to

1/R, where R is the sample rate) and n is the integer value for the current sample number.

n successively increases as each new sample enters the system (for example: …-2, -1, 0,

1, 2, 3, …). As commonly notated, the sampling rate (1/R) and sampling period (T) can

be omitted and we can describe a digital sinusoidal waveform signal as follows:

x(n) = A sin(ωn + φ)

Figure 4 (below) illustrates a typical digital waveform as graphed in the time-domain.

(Figure 4: Trumpet tone represented as a discretely sampled time-domain signal)

The Fourier Series

To gain a full understanding of the concepts behind the fast Fourier transform (FFT), it is

useful to begin by considering the foundation of Fourier theory with a description of a

Fourier series. A Fourier series is a method of representing a complex periodic signal

using less complex signals. The most basic signal that we can use is the sinusoid, which

can be “summed to produce an approximation of the original signal.” [22] This arithmetic

sum is described by the principle of superposition. [35] As more terms (sinusoids) are

18
added, the approximation of the original waveform becomes more accurate. However, in

order for a function to be approximated by a Fourier series, it is required that the function

will remain periodic. Because sine waves are periodic signals, the Fourier Transform

treats the data as if it were periodic for an infinite duration. [1]

The following example shows how we can use the Fourier series to generate a square

wave:

(Figure 5 [39])

In the above illustration, we begin with a sine wave having an arbitrary amplitude value

of 1.0 and radian frequency ω or (2π * f), in this case the fundamental frequency. The

first sine wave is added to a second sine wave that has three times frequency ω and one-

third the amplitude (center image) as compared with the first sine wave. While not shown

above, a third sine wave is added to the first two with the properties of five times the

fundamental frequency ω and one-fifth the amplitude. The third image above shows the

emerging square-shaped waveform after a fourth sine wave is summed (which has a

frequency seven times the fundamental and an amplitude one-seventh the fundamental. If

we continue to proceed as discussed, a near-perfect square wave will eventually emerge.

[42]

19
The Fourier Transform

The Fourier series allows us to see how a complex waveform can be created out of

sinusoidal waveform components. What is significant about Joseph Fourier’s discovery is

that “any periodic waveform can be represented as a sum of harmonically related

sinusoids, each with a particular amplitude and phase.” [1]

One of the defining properties of the Fourier Transform is that it assumes that any signal,

created by adding a series of sine waves, is periodic and therefore must exist for an

infinite duration. Because sine waves are the building block of Fourier analysis and are

periodic signals, the Fourier Transform works as if the data, too, were periodic for all

time. Therefore, in order to measure a dynamically changing signal for even a short time,

the Fourier transform offers nothing regarding what happens to that signal at any point

before or after the portion of the signal for which we are analyzing.

In analyzing even a portion of a signal, we must first take a sample of that signal

represented by an arbitrary number of consecutive samples. This can be defined by a

rectangular window function, where all parts of the signal falling within that window

represent a single cycle of an infinitely repeating periodic waveform. While this

discussion considers only analogue signals, the figure below illustrates an example of a

rectangular window being taken from a discrete signal (which will be later discussed).

20
(Figure 6)

Mathematically, the (continuous or non-discrete) Fourier transform can be expressed

using the following equation:

From the above equation, the lowercase x(t) is a periodic function of time (t ), the

uppercase X(f ) represents the frequency domain spectrum of x(t) and is a function of

frequency (f ), e is referred to as the “natural base of logarithms and is equal to

2.7182818… i is referred to as the imaginary unit and is defined as i2 = -1. Finally, d is

defined as "an infinitely small change in t." [B. Greenhut, personal communication, May

2005] It is important to note from our earlier discussions that if t is measured in seconds,

then f will be measured in cycles per second, or Hertz.

By making a few small adjustments to the above equation, it can be shown how the

Fourier transform can be reversed using the inverse Fourier transform as calculated

below:

21
In comparing the two equations above, the only significant difference (in addition to

solving for a time domain signal instead of a frequency spectrum) is the sign of the

exponent of e. However, returning to our earlier discussion of sine and cosine functions,

Euler’s relation states, “the quantity e, when raised to an imaginary exponent is equal to

the complex sum of cosine and sine functions” [1], and is expressed as follows:

Euler’s relation directly relates to the Fourier transform when we set θ to 2πft.

This allows the above equation to be expressed as:

Taking this one step further, because cos(-x) is equal to cos x and sin (-x) is equal to -sin x

[1] we can similarly write:

Because the cosine function cos(-x) = cos x exhibits the property f(-x) = f x, it is known as

an even function and “is left-right symmetrical around the point x = 0.” [1] Similarly,

because the sine function sin (-x) = -sin x exhibits the property f (-x) = -f x, it is called an

odd function and “is left-right anti-symmetrical around the point x = 0.” [1]. This is

important because any function can “be broken down into the sum of two unique

functions, one being purely even and one being purely odd” [1]:

From the above expression, it is also possible to solve for or . With regards to

the Fourier transform, this is useful since the complex exponential separates a

22
given waveform or spectrum into its even and odd parts. The real (cosine) part affects

only the even part of x(t) or X(f) while the imaginary (sine) part affects only the odd part

of x(t) or X(f). [1]

The Discrete Fourier Transform

The Discrete Fourier Transform (DFT) can be described as an algorithm for computing

the Fourier transform on a set of discretely sampled data. While the above discussion of

the Fourier Transform specifically deals only with continuous or analog waveforms, we

will now extend our discussion to deal with discretely sampled, digital signals.

The discrete Fourier transform allows us a change in representation from the digital or

discrete time-domain to a frequency-domain representation of a given signal. For this, it

is necessary to use the DFT, defined by the following expression:

where n is the discretely valued sample index. The sampled time-domain waveform is

known as x(n). Additionally, from the above equation, k, equal to 0, 1, 2, …, N-1, is the

frequency index. X(k) therefore “represents N samples of a continuous frequency

spectrum” [1] where N represents the window size.

Specifically, the discrete Fourier transform partitions a given windowed input signal into

separate frequency bands, bins or channels. [38] The DFT “operates by constructing the

analyzed waveform out of a unique set of harmonics… and is calculated “in terms of

complex coefficients from which amplitudes and phases” for each frequency band are

23
extracted from the real and imaginary parts of each complex number. [1]

The inverse discrete Fourier transform (IDFT) can be similarly calculated as follows:

The output of the discrete Fourier transform as a filter bank. The fundamental frequency

of the input signal is called the “analysis frequency” and can be computed for a value in

Hertz given the sampling rate divided by the window size as measured by the number of

samples.

For example, as part of the sampling theory, the Nyquist criterion states that the upper

frequency range is half the sampling rate. Therefore, if the sampling rate is 44,100

samples per second, we know that the upper range of the DFT is 22,050 Hz. Additionally,

if the DFT window size (N) is chosen to be 512 samples, the resulting DFT will have 256

spectrum lines since the available harmonics lie between –22,050 and +22,050 Hertz.

Finally, we can calculate the frequency resolution by dividing the sample rate by the

window size. In this example, 44,100 / 512 would yield an analysis frequency of

approximately 86 Hertz. This means that each frequency bin will be evenly spaced

exactly 86 Hz apart for a positive frequency range of 0 to 22,050 Hertz. For example, the

first frequency bin will include the frequencies 0-86 Hz, the next bin will include 87-173

Hz and so on.

The above example allowed us to see how the size of each frequency bin is directly

related to the chosen window size. Since, each bin allows us to see the amount of spectral

24
energy that falls within the frequency range covered by that bin, we can say that, if we

choose a larger window size (N), we in turn get a higher frequency resolution. However,

as discussed below, there are costs associated with this.

Windowing of Non-Periodic Signals

So far, our discussion of Fourier analysis has been limited to treating a signal as infinitely

repeating and periodic in nature. We can summarize our previous discussions by stating

that the DFT treats a non-periodic signal by taking a slice or window of some specified

number of samples and transforming that window or block of samples as if it were

periodic.

Since musical sounds are not periodic or infinite, it is necessary when performing Fourier

analysis to closely approximate a dynamic signal by dividing it into successive windows

of some number of samples and then treat each window as if it were an infinitely

repeating signal. The Fourier transform can be calculated for each window and

overlapped consecutively with previously transformed windows to recreate, at best, a

slightly misrepresented signal in the frequency domain. [38] For example, if our chosen

window size (N) is equal to 512 samples, a new Fourier transform must be calculated

every 512 samples or 86 times every second (if our sample rate is 44,100). In this regard,

it is important to consider that, while frequency resolution gets better as a choice of

window size is enlarged, our time resolution becomes correspondingly degraded since

DFT analyses will be calculated at fewer time intervals.

25
(Figure 7: Trumpet tone represented over time in the frequency domain)

Because the Fourier transform treats a given windowed signal as if it were one cycle of a

periodic waveform, abrupt changes in amplitude can take place where one cycle of a

window ends and the edge of the window where the next cycle begins, causing

undesirable artifacts in the sound. [25] By smoothing each window around its edges (see

figure 8 below), these artifacts can be prevented. By multiplying our windowed time-

domain waveform by a window function as illustrated below, this can be easily

accomplished. The window function most commonly used is the Hamming window (see

figure 9), which is known for its fine resolution of spectral peaks.

(Figure 8 [24])

However, if successive windows are placed next to each other while having all been

multiplied by a window function, an audible tremolo will occur. It therefore becomes

26
necessary to overlap windows. An additional benefit of overlapping windows is improved

frequency resolution.

A desired offset for the overlap between windows can be generally considered to fall

between 50 to 75% the window size. [37, 38] For example, if a window has a length of

1024 samples and a Hamming window is used, successive windows might be spaced

approximately 44,100 / 172 = 256 samples apart. Calculation of this 75% overlap factor

is as follows:

1. Window Bandwidth (b) = 2 * sample rate (44,100) / window size (1024) = 86.13

2. To avoid aliasing, multiply b * 2 = 172 (overlaps per second)

3. 44,100 / 172 = overlap every 256 samples or four times every 1024 samples [41]

(Figure 9: Left: Hamming Window; Right: Product of a Hamming window with a windowed signal [40])

Although the results of using a window function such as a Hamming window are not

perfect, a significant improvement is made when considering only the presence of a

rectangular window.

It should be noted that one side effect of using a Hamming or similar window function “is

that partials which are exact harmonics of the FFT fundamental frequency will no longer

27
show up as a single spectral line” because, in using a window function, we are

introducing amplitude modulation: “the amplitude is made to rise from and fall to zero at

the edges of the window.” However, “given the rarity of exact harmonics in 'real'

signals,” this is well worth it. [25]

Finally, while we have only considered the Hamming window in this dicussion, there are

a number of window functions that are commonly used. These include the Hanning,

Gaussion, Kaiser, and Blackman window functions among others.

Software Implementation of DFT

It has so far been discussed how we can perform Fourier analysis on a digital signal using

the discrete Fourier transform. In this section, I will demonstrate how to implement the

DFT on a discrete set of data using an adapted C language DFT implementation [1].

Based on this implementation, it will be discussed how the discrete Fourier transform is

unpractical for solving Fourier analysis leading to a discussion of the infinitely more

efficient fast Fourier transform (FFT) algorithm. Complete source code can be found in

Appendix I.

In looking at the code example below, the first seven lines set up and initialize the

necessary variables for our DFT program. In the second eight lines, a waveform is

created, specifically located within the ‘for’ loop. As previously discussed, the square

wave here is represented by the sum of four sine waves, all having different frequencies

28
and amplitude values and was detailed in the previous section on the Fourier series (see

figure 5 on page 19).

main() {
int N = LENGTH; // DFT window size
double pi2oN = 8. * atan(1.)/N; // (2 pi/N)
double pi2 = 8. * atan(1.); // (2 pi)
float x1[LENGTH], x2[LENGTH]; // stores time domain waves
complex X[LENGTH]; // stores re and im spectral values
int n; // current sample index: from 0 - N-1

// create square wave with odd partials 1, 3, 5, 7 to be transformed


for (n = 0; n < N; n++) {
x1[n] =
sin(pi2oN * 1 * n) + // fundamental sine = sin((2*pi/N)*f*n)
.33 * sin(pi2oN * 3 * n) + // second sine = 3 * freq and 1/3 ampl
.2 * sin(pi2oN * 5 * n); // third sine = 5 * freq and 1/5 ampl
.143 * sin(pi2oN * 7 * n); // fourth sine = 7 * freq and 1/7 ampl
}

From the above code, it will be assumed that N, our DFT window size is equal to 16.

However, if we were processing samples for a real audio application, it would be realistic

to use a much larger window size. For purposes of discussion, however, a smaller

window size is practical.

If we compute the results of our time-domain square wave, which is stored in an array

called ‘x1’, we can see how 16 amplitude values are created and stored. These values are

listed as part of the output from this program and can be graphically represented as

follows:

29
(Figure 10: Left: data from x1 array / values for square wave; Right: graphed discrete x1 waveform)

Once a discrete waveform has been created, we can perform a transformation using our

DFT function to compute real and imaginary arrays. It should be noted that a structure

named ‘complex’ was created in the C header file (see below) that allows us to store real

and imaginary float values for each corresponding sample that is processed by our DFT

function:

// first define complex structure/record


typedef struct {
float re;
float im;
} complex;

The ‘dft’ function, which accepts our ‘x1’ square wave and performs the discrete Fourier

transform on its sample values, is listed below:

30
// dft function ----------------------------------------------------
void dft( float x[], int N, complex X[])
{
double pi2oN = 8.*atan(1.)/N; // (2 pi/N)
int k; // frequency index
int n; // time-domain sample index

for (k = 0; k < N; k++) {


X[k].re = X[k].im = 0.0; // init real and imag arrays

for (n = 0; n < N; n++) {


X[k].re += x[n] * cos(pi2oN*k*n); // compute real array
X[k].im -= x[n] * sin(pi2oN*k*n); // compute imaginary array
}

X[k].re /= N; // real / N
X[k].im /= N; // imag / N
}
}

As above, the DFT is computed by applying the Fourier integral (see embedded ‘for’

loop) divided by the window interval N (see final commented lines above) of the time-

domain waveform (our square wave) x[n]. The output data set prints as follows:

(Figure 11)

Using the above data that represents spectral values for our time-domain waveform, we

31
can convert each complex value of X, consisting of a real and imaginary part to further

represent amplitude and phase values for specific frequencies. But before we can do this,

we also must make sure we have the following information: our window size N, the

number of samples of the spectrum of waveform ‘x1’, and the specific frequencies to

which successive values of X correspond. [1] With this information, we can use the two-

dimensional complex plane (see figure 12 below) to interpret each complex value of X.

From the image below, the horizontal axis represents the real part (where we can place

our real output value a) and the vertical axis represents the imaginary part of the value

(where we can place our imaginary output value b).

(Figure 12 [1, 37])

Measuring the length of vector A, drawn from the planes’ origin, gives us the amplitude.

From a mathematical point of view, we can compute the length of the vector using the

32
Pythagorean theorem:

Additionally, measuring angle θ of vector A gives us the phase, which can be computed

using the following equation:

What we have seen here is that these conversions allow us to express the complex

frequency spectrum X(k) of a time-domain waveform x(n) “as a pair of real-valued

functions of frequency called the amplitude (or magnitude) spectrum and the phase

spectrum.” [1] From our C program, the following two lines calculate these conversions

for us:

sqrt( (double) X[n].re*X[n].re + X[n].im*X[n].im), // Amplitude


360.*atan2( (double) X[n].im, (double) X[n].re ) / pi2); // Phase

If we print the results from the above two lines of code, we can view and interpret the

output in the following ways (see figure 13 below):

33
(Figure 13: Left: converted data from complex X arrays; Right: positive-frequency amplitude spectrum)

However, if we look at the program output on the left (above), we see that there are 16

complex values representing amplitude and phase. By contrast, there are only eight

amplitude values plotted on the right. This is because our frequency information on the

left “corresponds to both minus and plus half the sampling rate.” [1] What this translates

to is that all frequency components falling above half the sampling rate are “equivalent to

those lying between minus the Nyquist rate and 0 Hz.” [1]

For example, for our 16-point DFT, if we have a sampling rate of 44,100 samples per

second, our analysis frequency would be 44,100 / 16 or 2756.25 Hz. This would

correspond to the following 16 harmonic frequencies (measured in Hz) being examined

by our DFT: 0, 2756.25, 5512.5, 8268.75, 11,025, 13,781.25, 16,537.5, 19293.75,

34
±22,050 (Nyquist), -19293.75, -16,537.5, -13,781.25, -11,025, -8268.75, -5512.5, -

2756.25, 0. In our frequency graph (see figure 13, right) on the previous page, we are

purposely only plotting positive frequencies that lie between 0 Hz and the Nyquist

frequency (22,050 Hz).

One final point regarding the DFT that we must consider is that, although we sent a

square wave with a peak amplitude value near 1.0 into our DFT program, the outputted

frequency spectrum yields a peak amplitude value of 0.5. In fact, for the exception of

frequency components at 0 Hz (D.C.) and the Nyquist rate, all post-DFT component

amplitudes “are half their “true” values.” [1] The reason for this is related to the fact that

the Fourier transform divides each single amplitude value between positive and negative

frequency components. For example, if we look at the output (see figure 13, left) of our

DFT program above, we can see that X[1] (see second line of output) has an amplitude of

0.5 and a phase of –90º (where -90º represents a sine wave as measured with respect to

the cosine function). Additionally, if we look at the corresponding negative frequency

X[15] (see final line of output), the amplitude value is again 0.5, however, the phase this

time is inverted (positive 90º) when compared with X[1] and therefore represents an

inverted sine wave. These corresponding positive and negative frequency components

have the same amplitude and inverted phase relationships. We can therefore say that the

Fourier transform does not distinguish between positive and negative frequency

components except for the phase of sine components (since sine is an odd function as

previously discussed). [1] In looking again at figure 13 (above), we can now conclude

that we only need to graph frequency components zero through eight because “the

35
negative frequency amplitude spectrum is just the (left-right) mirror image of the positive

frequency amplitude spectrum.” [1]

The fast Fourier Transform

The major shortcoming of the DFT is computation time. We can say that if our window

size is equal to N and we want determine the amplitude of N separate sinusoids using the

DFT, then computation time is proportional to N2, the number of multiplications. [1, 26,

44] In many applications, N must increase to numbers ranging between 256 and 2048.

Even by today’s standards using high-speed computers, computation of the DFT

(Figure 14: Comparison of required operations for N-sized window with DFT and FFT [40, 26])

36
requires excessive machine time for a large window size N. This is precisely why the Fast

Fourier transform (FFT) is so important.

For example, in looking at the DFT equation, an eight-sample signal would require 64

complex multiplications for computation. [37] While manageable at this level, a more

common scenario would see a window size of either 512 or 1024 samples. When N is

chosen to be 512, the DFT “requires 200 times more complex multiplications than those

required by the FFT.” [1] Furthermore, if N is increased to 8192, over 1000 times as

many complex multiplications would be required as compared with the FFT. Therefore,

“on a computer where the FFT takes 30 seconds to compute a transform this size, the

DFT would take over five hours.” [1]

The fast Fourier Transform (FFT) can be described as a computationally efficient version

of the DFT that works on powers of 2. [45] This limits our window size N to the

following values: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, etc. As with the DFT, N

represents a periodic sampling taken from a real-world signal and the FFT expresses the

output data in terms of its component frequencies. The FFT also solves the near identical

inverse problem (via the iFFT) of reconstructing a signal from frequency-domain specific

data.

As previously stated, the FFT is a DFT algorithm developed by Tukey and Cooley at

IBM in 1965 which reduces the number of computations from N2 to Nlog2N when N is

chosen to be a power of 2. [45] If N is not chosen to be a power of 2, the FFT can still be

37
used as long as the data is “padded” with zero-values for all non-power of 2 numbered

samples. [1] The fast Fourier Transform essentially takes the DFT and evaluates it much

faster. In figure 14 (above), the number of operations for both the DFT and the FFT are

shown. For a large window size N, using the FFT allows a monumental reduction in

calculation time.

Calculation of the FFT utilizes a recursive approach for dividing up the input windowed

signal into N individual signals. Specifically, the FFT continuously breaks down the

window into two (N / 2) sequences until all samples are represented as a unique signal.

An interlaced decomposition is used each time a signal is broken in two. Note that the set

on the left consists of even-numbered samples while the set on the right consists only of

odd-numbered samples. For example, figure 15 (below) illustrates how a window is

broken down (where N = 16):

(Figure 15 [46])

While the above-described decomposition is merely a reordering of the samples in the

signal, it is useful to note: “the binary representations of these sample values are the

38
reversals of each other. For example, sample 3,” [46] represented in binary as 0011 is

exchanged with sample number 12, represented in binary as 1100. “Likewise, sample

number 14 (1110) is swapped with sample number 7 (0111), and so forth.” [46] The FFT

time-domain decomposition is therefore often computed using a bit-reversal sorting

algorithm.

Once our window (N) has been decomposed into N individual time domain signals, each

composed of a single sample value, the next step is to calculate the N frequency spectra

corresponding to these N time domain signals. However, “ the frequency spectrum of a 1

point signal is equal to itself.” [46] Therefore, no calculations are required to complete

this step. It should be noted, however that “each of the 1 point signals is now a frequency

spectrum, and not a time domain signal.” [46]

Based on our knowledge that the DFT, when calculated on a window with a very small

value for N, is indeed fast, it is easy to see why the Cooley-Tukey algorithm aims to

break a potentially enormous DFT calculation into many minute-sized DFT calculations

where N is always equal to 1.

The final “step in calculating the FFT is to combine the N frequency spectra in the exact

reverse order [from which] the time domain decomposition took place.” [46] For this

step, the bit reversal shortcut is no longer applicable and calculation must take place one

stage at a time. For example, “in the first stage, 16 frequency spectra (1 point each) are

synthesized into 8 frequency spectra (2 points each). In the second stage, the 8 frequency

39
spectra (2 points each) are synthesized into 4 frequency spectra (4 points each), and so

on. The last stage results in the output of the FFT, a 16-point frequency spectrum.” [46]

Part III – Current FFT-based Software Applications for the Mac

With a deeper understanding of Fourier theory and implementation considerations of the

DFT and FFT, I will now discuss a number of applications that utilize the FFT as part of

their core architecture and outline techniques for using these applications creatively.

The Phase Vocoder for Time-Stretching and Pitch Shifting

The phase vocoder is one of the most powerful methods of manipulating sounds in the

frequency domain and has been more recently implemented using the FFT for increased

frequency resolution. This process allows for pitch changes without adjusting the length

of a sound file or length changes without adjusting pitch. A phase vocoder, as used for

pitch-shifting, is traditionally an electronic signal processor consisting of a bank of filters

spaced across the frequency band of interest. Originally, it was hoped such a device

would be able to reduce the bandwidth necessary for the transmission of voice telephony,

but it rapidly found other applications in popular music.

Another example of a phase vocoder application would be where a voice signal could be

analyzed for frequency content by a filter bank in real time, and the output applied to a

voltage-controlled filter bank or an oscillator bank to produce a distorted reproduction of

the original.

40
As a software implementation, pitch-scale modifications can be implemented as a

combination of time-scaling and sampling rate conversion.” [27] For example, to raise

the pitch of a signal by a factor 2, we could first time-stretch the signal by a factor 2 to

increase its duration and then resample it at half the sampling rate. This would restore the

original duration of the sample while also modifying the frequency content of the signal

(via sample rate conversion) as desired. [28]

For time-stretching, the phase vocoder relies on the FFT to extract amplitude and phase

information from 8 to 4096 frequency bands (depending on the desired window size)

with a bank of filters or frequency bins. If time-stretching is desired, phase and amplitude

envelopes are lengthened (or shortened, for time compression), and then passed to a bank

of oscillators with corresponding frequencies to each filter.

Additional techniques employed by the phase vocoder “allow direct manipulation of the

signal in the frequency-domain, enabling such applications as chorusing, harmonizing,

partial stretching and other exotic modifications which cannot be achieved by the

standard time-scale sampling rate conversion scheme.” [28]

In looking at a commercial software vocoder-based implementation, SoundHack includes

an early example of a FFT-based phase vocoder implementation as well as many other

useful processing tools. Perhaps the most useful tools for creating evocative sounds are

included functionality for time-stretching, pitch-shifting as well as cross-synthesis (to be

discussed later). Below are outlined steps for performing time and pitch-scale

41
modifications.

(Figure 16: Time-stretching in SoundHack)

To use the phase vocoder in SoundHack for time-stretching, we can first open a sound

file under the ‘File’ menu and then choose ‘Phase Vocoder…’ under the ‘Hack’ menu.

Next, we can set desired parameters such as the number of ‘Bands’ (see above). This

corresponds to the number of FFT filter banks/oscillator pairs desired. After choosing the

‘Time Scale’ radio button, choose ‘Edit Function…’ to open a new window where we

can set our desired stretch amount. In this example, a value of 10.0 is chosen from the

‘Set’ button dialog to specify a desired stretch factor of 10. After hitting ‘Done’ and then

‘Process’ in the main ‘Phase Vocoder’ window, a new sample will be created that

42
stretches the signal from our original sample with a length of 5.8 seconds to

approximately 58 seconds.

Next, if we are interested in pitch-shifting a given sample, we can again utilize

SoundHack’s Phase Vocoder. “In this technique, the sound to be shifted is sent is sent

through” an FFT and into “a bank of band-pass filters which are evenly spaced from 0 Hz

to half the sample rate. SoundHack measures the amplitude and phase for each frequency

at the output of this filter bank, These amplitudes, phases and frequencies are used to

control a bank of oscillators. Pitch shifting simply involves multiplying each frequency

by a factor.” [47]

With our sample loaded in SoundHack with the ‘Phase Vocoder…’ window open, we can

achieve pitch-shifting by choosing the ‘Pitch Scale’ radio button. We will now follow

similar steps as above but this time choose a value of 12.0 from the ‘Set’ button within

the ‘Edit Function…’ window. When we process this Hack, a new sample will be created

that will now sound once octave, or 12 semitones above the original.

Additional applications that provide similar functionality as described above include

stand-alone applications such as TimeToyPro as well as most native digital audio

workstations including Digital Performer, Logic Pro and ProTools. Of particular interest

is TimeToyPro, which allows a user to set variable amounts of stretching and listen to

results before recording a file to disk.

43
(Figure 17: TimeToyPro allows automation control over time-stretching amount)

As seen below, applications such as Logic Pro include a single window for destructive

tempo and pitch editing of sound files.

(Figure 18: Logic Pro Factory window allows for tempo and pitch editing of a single sound file)

One final vocoder-based application to be discussed is Native Instruments Vokator.

Available as both a stand-alone application and a plug-in, Vokator includes a range of

FFT and vocoder-based features.

44
For example, Vokator can take the frequency spectrum of two real-time input channel

signals (referred to as ‘A’ and ‘B’) and continuously analyze them over time. The

“spectral envelope is computed together with the corresponding phase information of

each frequency band for both signals.” [48] The phase information from input A is then

combined with the spectral envelope from input B to generate a unique signal at the

output containing properties from both inputs A and B.

For example, it is possible to speak into a microphone that is routed to input A of the

vocoder and then shape and control the digitized signal at input B by playing a C major

chord on the included synthesizer into channel B, thus superimposing the vocal signal

onto the C major synthesizer chord. [48]

(Figure 19: Vokator set up to for a live input on input A to be cross-synthesized with a synth on input B)

45
Cross Synthesis (Convolution and Morphing)

Convolution and more generally cross-synthesis can be defined as the combining or

adding together of two sounds to create a new sound. The process of convolution is

accomplished by taking the FFT of two separate signals and then multiplying their

spectra to create a new signal. However, it should be noted that convolution “emphasizes

frequencies which are held in common and greatly reduces frequencies which are not.”

[47] The best results can therefore be obtained when convolving two sound sources that

have at least some harmonic and rhythmic similarities.

(Figure 20: Convolution)

To perform spectral convolution in SoundHack, we first open a sound file and then

choose ‘Convolution…’ from the ‘Hack’ menu bar. In order to choose a second sound

46
file that will be convolved with the first, choose ‘Pick Impulse’ and scroll to a desired

sample. Finally, hit ‘Process’. In my example (above), I chose to convolve a flute sample

by the same flute sample that was pitch-shifted an octave up from the phase vocoder.

This created an interesting flute-related timbre. Of special note, if we chose to convolve

two signals with the ‘Ring Modulate’ option checked, “the spectrum of the product of”

the two samples would consist only “of the sum and difference frequencies of the

original” sounds. For example, “if we multiply a 100-Hz sinusoid by a 10-Hz [sinusoid],

we would expect to hear” a new signal consisting of frequencies of 90 and 110 Hz, each

having half the amplitude of the original sinusoids. [1]

Regarding convolution performed on existing sound files, Bias Peak is another

application that offers easy-to-use functions that produce often interesting and effective

results. Simply pre-loading a desired impulse file followed by a target file and choosing

‘Convolve’ from the ‘DSP’ menu are the only steps required.

Spectral morphing using a real-time plug-in developed by Izotope called Spectron allows

a user to spectrally modify one signal based on the spectrum of another signal. Spectron

is particularly useful for creating complex rhythmic textures when combining a

percussive audio file with a harmonically-rich audio file. [49] It is also useful when

combining two files similar in harmonic and/or rhythmic content. Using the ‘Morph’

module, found along the bottom Spectron window menu, an audio file can be loaded to

become the target signal that will be morphed with the inputted audio stream from the

DAW. The target signal is continuously looped in the background for processing in real-

47
time.

“As the two signals play, Spectron compares the spectrum or frequency response of the

two signals. When the spectrum of the input signal is different than the target signal,

Spectron adjusts the frequency content of the Input Signal to match the instantaneous

(Figure 21: Spectron plugin with morph window enabled)

spectrum of the Target signal. For example, if a 100-Hz tone appears in the Target signal

at 0 dB, Spectron looks at the 100-Hz band of the Input signal, and adjusts the gain to be

0 dB. Spectron does this for each frequency band -- up to 2048 individual bands are

compared and adjusted in realtime.” [49]

48
Spectron also allows the user to control which frequency ranges are morphed by

adjusting a number of nodes within the ‘Morph’ window. Furthermore, a threshold can

also be set to “limit frequencies that the Morph module tries to match, based on their

level.” [49] If a frequency band is located below the threshold, the Morph module will

ignore it and will not attempt to boost the level.

Convolution Reverb

Over the past few years, a number of convolution reverb software plug-ins have entered
the market that allow a user to add custom impulses to be convolved in real-time with an
incoming audio signal and run inside most of today’s digital audio workstations. In figure
18 (below), Logic Pro’s Space Designer plug-in is used unconventionally by adding any
non-reverb sample as an impulse response to be convolved with an incoming audio
stream. While similar to SoundHack’s convolution functionality where both applications
require the use of the FFT, the implementation found in plug-ins such as Space Designer
and AudioEase Altiverb differ in that processing takes place on real-time signals,
requiring increased but available CPU power.

A custom impulse response can consist of any sampled sound and can be loaded into

Space Designer by simply clicking the “IR Sample” button in the upper left portion of the
screen and adjusting parameters as desired.

49
(Figure 22: Space Designer convolution reverb plugin for Logic Pro)

Frequency Band Manipulation and Filtering

One of the first software applications to offer complex real-time spectral processing was
Native Instruments Spektral Delay plug-in and stand-alone application. Spektral Delay
allows the user to send the input signal through an FFT of up to 1024 frequency bands
followed by further processing options via effects, filters, delays and feedbacks on each
separate band before resynthesizing the signal back into the time domain.

50
(Figure 23: Spektral Delay user interface)

The signal path of Spektral delay is as follows:


1. FFT performed using user-defined FFT frame size
2. Input sonogram (above left)
3. Separate input modulation options for left and right channels
4. Attenuation matrix
5. Delay matrix (above center)
6. Feedback matrix
7. Output sonogram
8. Inverse FFT is performed for resynthesis of signal

Another application (previously mentioned) that allows creative manipulation of


frequency bands is the harmonic rotate tool found within the BIAS Peak application.

The ‘Harmonic Rotate’ function (see figure 24 below) “allows the frequency spectrum in

a selected range of audio to be rotated around a horizontal axis, which has the effect of

taking frequencies that were previously associated with one section of a file with a

particular amplitude, and assigning them to different areas of audio with different

amplitudes.” [50]

51
(Figure 24: Harmonic Rotate dialog box in Peak)

Additionally, the user has the option of choosing Real and/or Imaginary calculations.

Finally, a slider and text field are available for setting the desired amount of rotation. The

‘Harmonic Rotate’ tool is available from the ‘DSP’ window after a sound file has been

loaded into Peak.

The next group of plug-ins to be examined are the SoundHack Spectral Shapers bundle
which includes +spectralcompand, +morphfilter, +binaural and +spectralgate.

52
(Figure 25: +spectralcompand plugin from SoundHack Spectral Shapers)

The Spectral Shapers plug-ins can be described as filters that “emphasize the reshaping of

the timbre of sound.” [51] Each spectral shaper uses the FFT to divide “the frequency

spectrum into 513 bands, and processes each band individually.” While all plug-ins can

be used in real-time to process an audio stream, I will focus on two of my favorites from

this collection, +spectralcompand and +morphfilter.

+spectralcompand (see figure 25 above) is a spectral version of the standard

expander/compressor. Each transformed frequency band is processed with a combination

expander and compression unit (commonly known as a “compander”). It can be

smoothly adjusted from a compression ratio of 5:1 to an expansion ratio of 1:5. “In

53
expansion mode, +spectralcompand becomes a highly tunable broadband noise remover,

capable of removing hiss, hum and machine noise, without damaging the original sound.

The compression mode can be used as a spectral flattener or re-shaper, changing the

spectral profile of a sound file to another template.” [51] While the invert control is

designed to allow the user to invert the compression or expansion output and hear the

difference between the processed sound and the input, it also is capable of rendering

interesting and evocative timbres.

The +morphfilter plug-in allows the user to draw or learn a filter shape (see “shaping

line” in figure 26 below) using the mouse on the filter graph as well as available plug-in

controls. “Each +morphfilter setting contains two filter shapes and morphing can occur

between them. Filter depth can be applied to increase, decrease or even invert the filter

shape. This is an easily controllable, yet complex, filter plug-in capable of some

extremely evocative sounds.” [51]

(Figure 26: Three of the Spectral Shapers plug-ins have drawable filter graphs)

An included LFO (low frequency oscillator) setting allows for modulation of the filter

54
number, which allows the user the option to morph between the two filter shapes. This

changing of the filter number can result in a smooth fade between the shapes.

Granular Synthesis

A final processing technique to be discussed is granular synthesis. In 1946, the Nobel-

prize-winning physicist Dennis Gabor showed that “any sound can be analyzed and

reconstructed by means of acoustical quanta or grains“ since “the grain is an apt

representation for sound” … “it combines time-domain information (starting time,

duration, envelope shape, waveform shape) with frequency-domain information

(frequency of the waveform inside the grain, spectrum of the waveform and envelope).”

[52]

“A grain can be defined as “a brief moment (1 to 100 milliseconds), which approaches

the minimum perceivable event time for duration, frequency, and amplitude

discrimination.” [38] Specifically, a grain can be described as a discretely windowed

sample taken from a sound file or audio stream in the same way that a window is

extracted from an audio file before being applied to a smoothing function and sent though

a Fourier transformation. However, instead of a applying a Hamming or similar window

as a smoothing function, a typical grain envelope might include a Gaussian, Quasi-

Gaussion, Three-stage linear or pulse curve.

Granular synthesis works by building up various densities of grains and streaming or

scattering these acoustic particles in space and in time. While many grains are necessary

55
to create a complex sound, they can be grouped into larger units called clouds, lasting

between any number of seconds or minutes. [52]

While a number of methods for organizing grains are available for implementing granular

synthesis, only a few require an FFT as part of their implementation. These include the

following methods: Fourier and wavelet grids (where frequency content is measured

versus time so “each point in the analysis grid is associated with a unit of time-frequency

energy” [38]), and pitch synchronous granular synthesis (where tones are generated “with

one or more formant regions in their spectra” [38]). Additionally, it is common to see

time-stretching of grains, which would additionally require the use of an FFT.

(Figure 27: A basic ensemble using Reaktor’s Grain Cloud sampler)

While it is questionable as to the specific granular synthesis implementation methods

used by current applications, programs such as Reaktor (above) have extremely useful

56
tools for implementing granular synthesis on both existing digital audio samples as well

as real-time audio streams. Composer Jeff Rona states: “Tobias (Enhus) and I both have

taken advantage of [granular synthesis] for projects.” [J. Rona, personal communication,

May 2005]

Other Applications for Spectral Processing

While not discussed in detail, it should be noted that programming environments such as
Csound, Max/MSP and SuperCollider all offer significant spectral objects for building
custom audio processing tools. Additionally, Ableton Live and Melodyne are applications
featuring efficient and advanced time-stretching and pitch-shifting capabilities for
quickly adjusting audio files.

Part IV: Scoring for Erik Ryerson’s “Red Letters”

As film composer and creative sound designer for a 17-minute independent film by Erik

Ryerson entitled Red Letters, I was tasked with the opportunity to compose a score using

many of the above-described spectral processing techniques and applications.

After meeting the director Erik Ryerson and viewing Red Letters, it was evident that he

was specifically interested in generally non-pitched, non-melodic material for the score.

In fact, another film composer had previously worked on his film and had written a score

using orchestral sounds. Erik as looking for something different and played for me a

classical piece that was mostly composed of un-pitched, low rumbling. Originally, I was

brought into the project as a sound designer. However, it soon became apparent that a

sound design-based ambient film score was desired.

57
My inclination was therefore to base a non-musical score around techniques utilized in

the spirit of musique concrete composition. By definition, musique concrete, invented by

French composer Pierre Schaeffer, is based on “splicing, mixing, and modification of

pre-recorded sounds.” [54] In this regard, I first viewed the film a number of times to

gain a feel for the overall mood of the picture and then decided on some existing sound

sources that I believed would convey the feeling behind Erik’s film.

Initial Processing of Raw Material

With the intent of utilizing limited raw sound materials for purposes of unifying the

overall composition, I settled on three audio samples to form a basis for the score. My

goal would then be to process these sounds as much as desired or needed so they would

fit nicely into the film while having only remote similarities to the original sound files

that they were derived from. The raw sound sources I chose to work with were as

follows:

1. 29 seconds from Miles Davis’ introduction solo from the album Jack Johnson.

This was performed in the key of Eb.

2. Opening notes from an acoustic-guitar composition I wrote and recorded.

However, although this piece was played and recorded in the key of D, I used a

high-quality pitch-shifting algorithm provided by an application called Melodyne

to shift the digital recording up a half-step to Eb. Because I wished to perform

convolution between this sample and the Miles Davis sample (above), the result

of combining these samples, when in the same key, would produce improved

58
results as well as create an ambient texture that would have a tonal center (around

Eb).

3. The final sample I chose would be a pre-processed, unpitched trombone ‘growl’

included in the SAM trombones orchestral sample collection.

Once samples were chosen, I went ahead and processed each of them separately as well

as together. This would serve two purposes: to make the samples original-sounding and

unrecognizable, and to give me an immediate palette of compositional elements that I

could begin to insert into my Logic Pro session, where I was composing to picture using

a Quicktime movie.

Using SoundHack, I was able to accomplish much of my initially desired processing

goals. I time-stretched each of my three samples to different durations and listened to the

results. For example, my 19-second Miles Davis sample now existed as a 14:45 minute

sequence of bass rumbles, cymbal swooshes, screaming trumpet tones that sounded more

like a roaring elephant and much more. Secondly, my acoustic guitar piece was stretched

to almost 21 minutes and now sounded as rich textures of slowly vibrating strings

centered about various tonalities with dynamically varying passages where a single attack

could last many seconds and decay could last minutes. Finally, my 11-second trombone

sample was stretched to almost 14 minutes. What was a single, unpitched growl now

grew to a mountainous rumble.

59
Before I would introduce these samples into my Logic Pro session, I also was interested

in performing some cross-synthesis between my samples. Specifically, I again used

SoundHack to perform convolution between my time-stretched guitar sample and my

time-stretched Miles Davis sample. The result was a new sample exactly the length of the

time-stretched guitar piece (21 minutes) that included harmonic and rhythmic aspects of

both samples but with less direct amplitude variations while exhibiting an overall

envelope that could be described as glassy around the edges. This sample would form the

basis for the first two scenes in the film.

Scoring Notes

A particularly useful aspect of having longer samples to work with was that a single

portion of a given sample could be extracted from the overall performance and used by

itself as fit for a scene. Specifically during the opening credits of Red Letters, I carefully

chose a section of the newly convolved sample that fit with the cinematography and

added it in Logic Pro while also using a portion of the time-stretched guitar composition

(pre-convolved). I then used volume automation to set desired amplitude envelopes for

each tone. This allowed me to get the exact sound I was after: something dark,

mysterious, non-melodic, slightly-pitched and generally obscure. These compositional

elements would fade out to natural environmental ambience.

Approximately 1:56 minutes into Red Letters, there is a sequence where the camera fades

from day to night. The director asked me to avoid all “swooshes” that were created as a

natural artifact from time-stretching in SoundHack. With the necessary goal of continuing

60
to use existing material, I would have to find a portion of processed audio or create a new

section of sound from existing time-stretched samples where I could allow a

sustain/decay section to be heard over time that included no noticeably loud artifacts.

In this scene, the camera seems to waver as it fades from day to night and slowly zooms

in as the main character, David sits in his bedroom, saddened by the news of his sister’s

death. This presented an opportunity compositionally where I could create a new tone

that would have a slightly unsettled feeling while not exhibiting any of these undesirable

artifacts. I did this by first taking a portion of my convolved sample and making a backup

copy of a select number of seconds of the sample in Logic Pro. After ensuring that this

newly created sample was added back into my project via the ‘Audio’ window, I opened

the file in Logic’s ‘Sample Editor’ window, reversed the sample, pitch-shifted it slightly

and then time-stretched it again by a factor of about 2.

When added back into my project, I had performed the desired amount of processing

where my sound very much resembled that of earlier used tones and could be cross-faded

with them to give the listener a sense of natural unfolding of sound. However, the portion

of this new sample would offer a subtle trembling in the audio as the camera wavered and

zoomed slowly towards our main character.

The next significant challenge in composing for Red Letters was to underscore the diner

scene, where the supporting character, Chloe, tells David how her father had been shot

and killed by her mother as she watched as a young girl. While Chloe explains to David

61
that her mother is not dead as previously assumed, and proceeds to detail the gory events,

I was asked to add an ominous sounding ambience that would slowly draw in the

viewers’ attention while increasingly produce a feeling of departure from the general

surroundings of the diner. I accomplished this in two ways. First, I inserted the trombone-

effect sample (described above) by allowing the time-stretched attack to start at the

beginning of Chloe’s monologue and slowly crescendo until abruptly fading as she

eventually and abruptly changes the subject. Additionally, with the goal of dissipating the

natural ambience of the diner, I inserted an instance of the Space Designer convolution

reverb plug-in that comes native with Logic Pro. However, instead of using an impulse

response customarily found to recreate natural sounding reverbs, I inserted a one-second

sample of street and residential traffic noise with footsteps, typical of what might be

heard outside the diner, as an impulse response to be convolved with the naturally

recorded ambience in the film. The result of this created a quieter, slightly muffled, yet

organic ambience. I gradually faded the convolution effect over the diner ambience

following the shape of the crescendo of the trombone effect.

The climax of Red Letters sees David confronting his past as he steps through the scene

of the crime described earlier by Chloe. For this scene, the director had previously

inserted a temp-score using ambience from the band Shellac as well as a number of

additional sources. This scene proved to be the most difficult to score since Eric already

was quite satisfied with the sounds over the scene but did not want to create any licensing

issues by using existing licensed material. I therefore had to create something completely

62
original that used complex but similar processing techniques as those used in the temp

score.

My first inclination was to re-process existing materials. However, any sort of spectral

processing using existing sounds did not seem to work since it seemed to take much of

the excitement out of the existing ambience. After reworking the scene and further

discussing ideas with Eric, I decided to create completely new audio material and then

use processes similar to what I believed Shellac might have done.

As the lead character David first enters the apartment where his father has been shot, it

was necessary to showcase the fact that the mother was flipping through stations on the

television set, mostly consisting of static. My first step was therefore to create a

background static that would gradually become louder as David walked closer to the

living room. I began by placing a different section of the time-stretched trombone sample

used in the diner scene into this scene but with a very low volume so that there would

remain a significant amount of headroom for additional sounds and layers to be added.

I also set up a large-diaphragm microphone in my living room and flipped through some

static on my television set while recording to give some ‘realistic’ ambiance to the scene

as well.

In order to capture additional elements that could be heard above the static of the

television and trombone effect, and to mimic certain sounds desired from the Shellac

recording, I also recorded the following elements in my living room:

63
1. pressure from our water heater that was causing a small metal washer to vibrate

significantly when loosened.

2. the evening news on the television that included voices but no music.

3. beeping from continual pressing of the buttons of my cell phone, including a

surprise bark from our then three month old puppy Miles.

4. myself whispering the types of thoughts that I imagined David might be having as

he revisited dark and previously suppressed memories of his childhood.

While performing non-spectral modifications such as reversing and overlapping my

whispered recording and sending the water heater sample through a heavy phase

distortion with some LFO modulation via Logic, I was also able to successfully process

my television news recording using the ‘Varispeed’ function in SoundHack. While this

did not make the news recording completely unrecognizable, the ‘Varispeed’ function did

perform enough of a variable time-shift to make it impossible to tell what was being

presented in the dialog. Finally, by inserting the Spektral Delay plug-in with the ‘smear’

algorithm enabled over my recorded cell-phone beeps, I was able to create a rhythmic,

yet crackling pulse that would effectively build tension as the scene would grow to a

climax before David’s mother looks over at David before he faints back to reality.

Finally, at various hit points in the film, I also utilized the familiar time-stretched acoustic

guitar sample. Careful placement and mixing of all above-described sounds would make

for a complete ambience that would properly complement the events onscreen and satisfy

the director’s vision for the film.

64
The final piece of music required for the score to Red Letters would be faded in towards

the final shot of the film and last throughout the credits. While Erik gave me the option of

creating a theme song for the film, I decided that the theme would more appropriately be

something that was time-stretched. I chose to take a short sample from an electric guitar

performance, reverse the sample and time-stretch it using an application called TimeToy

Pro. The reason I chose to go with this time-stretching program over SoundHack or Logic

Pro’s stretching algorithm was that I believe TimeToyPro to have a cleaner sounding

time-stretching algorithm that creates fewer artifacts. While the artifacts created from

using SoundHack were desired in my raw processed files, I felt that a more resolved tone

would be effective for the final credits and would complement other pre-existing sound

material that would also be used for this portion of the film. I used this sample in

conjunction with the convolution sample used at the beginning of the film.

Part V: Conclusion

The advent of the Fast Fourier Transform (FFT) has brought both the potential and

eventual birth of a new era of DSP applications based on spectral processing. Since the

1980s, programs such as Csound, Max/MSP, Kyma and SoundHack continue to provide

composers and sound designers a means for performing spectral analysis and re-synthesis

on digitized audio files, first through processing of non-realtime audio files, and more

recently on real-time digital audio streams. This can be generally attributed to the

phenomenal increase in CPU processing power that accompanies today’s computers.

65
Applications that can run inside a digital audio workstation’s plug-in format are

additionally successful in performing a variety of spectral processing techniques on real-

time audio and MIDI file playback. For example, Native Instruments Vokator and

Spectral Delay plug-ins, SoundHack’s Spectral Shapers plug-ins and Izotope’s Spectron

plugin as well as the numerous convolution reverb plug-ins such as Apple’s Space

Designer and Audio Ease Altiverb have come to market and are readily available for use

as composition and sound design tools. Additionally, almost all commercial digital audio

workstations today offer functionality for performing non-realtime spectral processing,

including time stretching and expansion and pitch-shifting to name a few. Finally,

programs such as SoundHack, and Time Toy Pro offer the user advanced functionality

for performing a variety of spectral processing tasks. To conclude, it has only been over

the past two years that the power and potential of spectral processing has finally reached

maturity.

One of the greatest advantages of using spectral processing techniques in film is that it

allows a composer to take organically created sounds and create new, unfamiliar sounds

or groups of sounds from these reference files. Additionally, from the listener’s point of

view, there is oftentimes no way to tell exactly what they are hearing, thus no “direct

emotional connection” can be made from past experience with a particular and otherwise

familiar sound. [J. Rona, personal communication, May 2005] This makes spectral

processing ideal when scoring for films where general settings including but not limited

to the unknown, obscurity, suspense, horror, science-fiction, mystery and/or darkness

may be desired.

66
Additionally, spectral processing also lends itself as an ideal bridge between sound

design and film scoring. Due to the potentially un-pitched nature of many spectrally

processed sounds, it is possible to create a semi-naturalistic ambience that can coexist

effectively with environmental ambience while not drawing too much attention to itself.

Since music in film is often most effective when it is hardly noticeable but is rather

functioning to complementing the onscreen drama, spectral processing is an ideal creative

tool for film sound when considered using appropriate timbres.

Finally, spectral processing can also be quite effective as a means for creating sound

effects. By utilizing many of the techniques discussed in this paper, it is possible when

using the correct source material to create a range of percussive, screetching or industrial

sounding effects as well as more aquatic and similarly liquid-based sounds.

67
Appendix I: C source code for DFT
/*
* DFTCalculation.h
* DFT
*
* Created by Mike Raznick on Thu Feb 24 2005.
* 2005 Solarfunk Studios.
*/

#define LENGTH 16

// first define complex structure/record


typedef struct {
float re;
float im;
} complex;

void dft( float x[], int N, complex X[]);


void idft( float x[], int N, complex X[]);

68
/*
* DFTCalculation.c
*
* Created by Mike Raznick on Thu Feb 24 2005.
* Adapted from Richard Moore’s
* Elements of Computer Music
*/

#include "DFTCalculation.h"
#include <stdio.h>
#include <math.h>

main() {
int N = LENGTH; // DFT window size
double pi2oN = 8. * atan(1.)/N; // (2 pi/N)
double pi2 = 8. * atan(1.); // (2 pi)
float x1[LENGTH], x2[LENGTH]; // stores time domain waves
complex X[LENGTH]; // stores re and im spectral values
int n; // current sample index: from 0 - N-1

// create square wave with odd partials 1, 3, 5, 7 to be transformed


for (n = 0; n < N; n++) {
x1[n] =
sin(pi2oN * 1 * n) + // fundamental sine = sin((2*pi/N)*f*n)
.33 * sin(pi2oN * 3 * n) + // second sine = 3 * freq and 1/3 ampl
.2 * sin(pi2oN * 5 * n); // third sine = 5 * freq and 1/5 ampl
.143 * sin(pi2oN * 7 * n); // fourth sine = 7 * freq and 1/7 ampl
}

// The complex spectrum (real and imag parts) X(k) of a waveform x(n)
// computed with the DFT may be equivalently expressed as a pair of
// real-valued functions of frequency known as the amplitude
// (magnitude) spectrum and the phase spectrum

dft(x1, N, X); // feed x1 array into dft function


idft(x2, N, X); // feed x2 array into idft function

// Ex. 1: print (input signal | spectral values | recreated signal)


printf("\n Pre-DFT | Spectral values (re and im) | Post-iDFT");
printf("\n ----------------------------------------------------\n");
for (n = 0; n < N; n++) {
printf("%2d: x1 = %6.3f, X(re, im) = (%6.3f, %6.3f), x2 = %6.3f\n",
n, x1[n], X[n].re, X[n].im, x2[n]);
}

69
// Ex. 2: print amp and phase from real and imag. parts on 2D plane
printf("\n Pre-DFT| Spectral values (re and im) | Ampl and Phase");
printf("\n ---------------------------------------------------\n");
for (n = 0; n < N; n++) {
printf("%2d: x1 = %6.3f, X(re, im) = (%6.3f, %6.3f), A = %6.3f, P =
%6.3f\n", n,
x1[n], // print time domain plot
X[n].re, X[n].im, // real and imag spectrals
sqrt( (double) X[n].re*X[n].re + X[n].im*X[n].im),// Amplitude
360.*atan2( (double) X[n].im, (double) X[n].re ) / pi2);// Phase
}
}

// dft function ----------------------------------------------------


void dft( float x[], int N, complex X[])
{
double pi2oN = 8.*atan(1.)/N; // (2 pi/N)
int k; // frequency index
int n; // time-domain sample index

for (k = 0; k < N; k++) {


X[k].re = X[k].im = 0.0; // init real and imag arrays

for (n = 0; n < N; n++) {


X[k].re += x[n] * cos(pi2oN*k*n); // compute real array
X[k].im -= x[n] * sin(pi2oN*k*n); // compute imaginary array
}

X[k].re /= N; // real / N
X[k].im /= N; // imag / N
}
}

70
// idft function ----------------------------------------------------
void idft( float x[], int N, complex X[])
{
double pi2oN = 8.*atan(1.)/N; // (2 pi/N)
double imag;
int k; // freq. index = N samples of spectrum
int n; // sample index = time-domain waveform

for (n = 0; n < N; n++) {


imag = x[n] = 0.0; // initialize imaginary array

for (k = 0; k < N; k++) {


// recompute time domain signal
x[n] += X[k].re*cos(pi2oN*k*n) - X[k].im*sin(pi2oN*k*n);

// check imag part of waveform - roundoff


imag += X[k].re*sin(pi2oN*k*n) + X[k].im*cos(pi2oN*k*n);
}

if (fabs (imag) > 1.e-5) {


fprintf( stderr, "warning: nonzero imaginary (%f) in waveform\n",
imag);
}
}
}

71
References:
1. Moore, Richard F., Elements of Computer Music, Prentice Hall, Englewood Cliffs, New Jersey
(1990)

2. Smith, Steven W., The Scientist and Engineer's Guide to Digital Signal Processing Second
Edition,

3. IEEE History Center, Alan Oppenheim Oral History Transcript (Sept 13, 1999)
http://www.ieee.org/organizations/history_center/sloan/ASSR_Oral_Histories/aoppenheim_transc
ript.htm

4. Doornbusch, Paul, Computer Sound Synthesis in 1951: The Music of CSIRAC, Computer
Music Journal, Vol. 28, Issue 1 - March 2004

5. Boulanger, Richard (Editor), The Csound Book: Perspectives in Software Synthesis, Sound
Design, Signal Processing, and Programming, MIT Press (March 06, 2000)

6. Stone, Susan: The Barrons: Forgotten Pioneers of Electronic Music NPR Morning Addition
(Feb. 07, 2005) http://www.npr.org/templates/story/story.php?storyId=4486840

7. Jerry Goldsmith Online, Soundtrack Release: Planet of the Apes (Expanded) / Escape from the
Planet of the Apes (Suite), (1997)
http://www.jerrygoldsmithonline.com/planet_of_the_apes_expanded_1997_soundtrack.html

8. Prendergast, Roy, Film Music: A Neglected Art, W. W. Norton & Co., New York (1977)

9. Logic Pro 7 Reference Manual, Apple Computer, Inc. (2004)

10. Max Mathews: Personal Correspondence (April 2005)

11. Davies, Richard, Analogue Vocoder Information Page (2002)


http://web.inter.nl.net/hcc/davies/vocpage.htm

12. Barry Vercoe's History of Csound... The Csound Book: Perspectives in Software Synthesis,
Sound Design, Signal Processing, and Programming, MIT Press, (March 06, 2000)

13. Burns, Dr. Kristine H, History of Electronic and Computer Music Including Automatic
Instruments and Composition Instruments, Florida International University (2004)
http://eamusic.dartmouth.edu/~wowem/electronmedia/music/eamhistory.html

14. Gogins, Michael: Csound 5 User’s Guide, www.csounds.com

15. Edge, Douglas, Interview with Tom Erbe: Sound Hacker, audioMIDI.com
(May 12, 2004)

16. Film, Television, and Radio News: Here are just a few of the sound tracks where you can
hear Kyma in action..., (2003) www.symbolicsound.com

72
17. Press Release: Opcode Releases fusion: VOCODE Effects Plug-In, Harmony Central
(September 26, 1997) http://news.harmony-central.com/Newp/103AES/Opcode/Vocode.html

18. Westfall, Lachlan, Computers in the Movies: How Desktop PCs Help Create Hollywood's
Amazing Music and Sound Effects, Music & Computers 4:4 (May-June 1998)

19. Harmony Central, Cycling '74 Releases Pluggo: Technology That Enables Custom VST Plug-
Ins Comes With 74 Plug-Ins, (May 3, 1999) http://www.harmony-
central.com/Events/MusikMesse99/Cycling74/Pluggo.html

20. Makovsek, Janez, FFT Properties 3.5 Spectrum Analyzer Tutorial, Dew Research (2003)

21. Preater, Richard W. T.; Swain, Robin C., Fourier transform fringe analysis of electronic
speckle pattern interferometry fringes from high-speed rotating components, Optical Engineering
(1994)

22. Multi-Semester Interwoven Project for Teaching Basic Core STEM Material (Science,
Technology, Engineering, Mathematics) Critical for Solving Dynamic Systems Problems,
Dynamic Systems Tutorial: Fourier Series, NSF Engineering Education Division Grant EEC-
0314875 (2004)

23. Dobrian, Chris, MSP Manual, Cycling 74 (2003)

24. Lyons, Richard, Windowing Functions Improve FFT Results, Part I, Test & Measurement
World (June 1998)

25. Dobson, Richard, The Operation of the Phase Vocoder: A non-mathematical introduction to
the Fast Fourier Transform, Composers' Desktop Project, (June 1993)

26. Keith, Murphy and Butler, FFT Basics and Applications, Oregon State university (2005)
http://me.oregonstate.edu/classes/me452/winter95/ButlerKeithMurphy/insth.html

27. Laroche, Jean, Time and pitch scale modification of audio signals in Applications of Digital
Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg,
Eds. Kluwer, Norwell, MA, (1998)

28. Laroche, Jean and Dolson, Mark, New Phase-Vocoder Techniques For Pitch-Shifting, Other
Exotic Effects, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
New Paltz, New York (October 1999)

29. Baxter, Michael, Csound: An Interview with Dr. Richard Boulanger by Michael Baxter, Linux
Gazette, Issue 96 (November 2003) http://www.linuxgazette.com/node/125

30. Ramirez, Robert W., The FFT Fundamentals and Concepts. Englewood Cliffs, NJ: Prentice-
Hall, Inc., (1985)

31. The Editors Guild Magazine, Gary Rydstrom and Michael Silvers: Finding Nemo, Vol. 25,
No. 3 (May/June 2004)
http://www.editorsguild.com/Newsletter/MayJun04/best_sound_editing/fn_sound_edit.htm

32. Zicarelli, David, Cycling74 Community: An Interview With William Kleinsasser,

73
Cycling 74 (2004) http://www.cycling74.com/community/kleinsasser.html

33. M-Audio: Artists: Tobias Enhus, Avid Technology (2005)


http://www.m-audio.com/artists/en_us/TobiasEnhus.html

34. Brown, Royal S., Overtones and Undertones, University of California Press, Berkeley, Ca.
(1994)

35. Broesch, James D., Digital Signal Processing Demystified (Engineering Mentor Series),
Newnes (March 1, 1997)

36. Bringham, E. Oran, The Fast Fourier Transform, Prentice-Hall, New Jersey (1974)

37. Lyons, Richard, Understanding Digital Signal Processing, Prentice Hall PTR (2001)

38. Roads, Curtis, the computer music tutorial, The MIT Press, Cambridge, Mass. (1996)

39. Bernsee, Stephan, Tutorial: The DFT à Pied - Mastering Fourier in One Day, The DSP
Dimension (1999) http://www.dspdimension.com

40. Angoletta, Maria Elena, Fourier Analysis Part II: Technicalities, FFT & System Analysis,
AB/BDI (February 27, 2003)

41. Gerhardt, Lester A. & Zou, Jie, Lecture Notes: Short Time Fourier Analysis, RPI ECSE,
Rensselaer’s School of Engineering (2004)

42. Bores Signal Processing, Introduction to DSP: Frequency analysis: Fourier transforms
(2004)

43. Bracewell, Ronald N., The Fourier Transform and Its Application, McGraw-Hill (1986)

44. Johnson, D., Fast Fourier Transform (FFT), (April 15, 2005) Retrieved from the Connexions
Web site: http://cnx.rice.edu/content/m10250/2.14/

45. Cooley J W & Tukey J W., An algorithm for the machine calculation of complex Fourier
series, Mathematics of Computation (April 1965)

46. Smith, Steven W., The Scientist and Engineer's Guide to Digital Signal Processing: The Fast
Fourier Transform, California Technical Publishing (1997) http://www.dspguide.com/

47. Erbe, Tom, SoundHack Users Manual, Version 0.888, School of Music, CalArts

48. Haas, Joachim and Sippel, Stephan, Vokator Operation Manual, Native instruments Software
Synthesis (2004)

49. iZotope Spectron Help Guide, iZotope, Inc. (2004)

50. Wheatcroft, Zac, Berkley, Steve and Bennett, Bruce, Peak Version 4.0 Software User’s
Guide, BIAS (Berkley Integrated Audio Software), Inc., Petaluma, Ca. (2003)

51. Erbe, Tom, SoundHack Spectral Shapers User Manual (2003)

74
52. Alexander, John and Roads, Curtis, Granular Synthesis, Keyboard (June 1997)

53. Doepfer Musikelektronik, The Trautonium Project http://www.doepfer.de/traut/traut_e.htm

54. Dodge, Charles and Jerse, Thomas, Computer Music: Synthesis, Composition and
Performance, Schirmer (1997)

Acknowledgements

To Jeff Rona for inspiration, for giving me the opportunity to interview him regarding

personal aesthetics and ambient film-scoring techniques, for taking the time to review

this paper and for his positive feedback.

75

You might also like