0% found this document useful (0 votes)

109 views208 pages

Chapt 07

The document discusses the history and current state of parallelism in computing. It describes how parallelism has evolved from single processors to multi-core CPUs and many-core GPUs. It also provides examples of how parallel techniques are used for animation processing and geometry processing on the Cell Broadband Engine.

Uploaded by

Stanford Adderley

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views208 pages

Chapt 07

Uploaded by

Stanford Adderley

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 208

Current Generation Parallelism

In Games

Jon Olick
id Software

Beyond Programmable Shading: In Action

Brief History of Parallelism
• 1 Processor
– The good old days.
– Why parallelize? Just wait a little and your programs
will get faster.

Beyond Programmable Shading: In Action

Brief History of Parallelism
• 2 to 3 Processors
– Logical splitting of game process into pipelined
pieces.
• Game
• Rendering
• Sound
• Loading/Decompression

Beyond Programmable Shading: In Action

Brief History of Parallelism
• About 6 to 8 Processors
– The transition to a job scheduling type architecture
– 1st order parallelism
• Game
• Rendering
• Sound
• Physics
• Collision
• Loading/Decompression
• Etc…

Beyond Programmable Shading: In Action

Brief History of Parallelism
• About 8 to 16 Processors
– End of CPU history.
– Enter 1998 in GPU history.
• Approx # of processors as average parallel scalar
operations.
– 2nd order parallelism
– Jobs which create and manage the resources of
other jobs.
• GPU Command Processor (DMA engine)

Beyond Programmable Shading: In Action

Brief History of Parallelism
• About 16+ processors
– 3rd order parallelism
– GPU Vertex Processors
– Jobs which create and manage the resources of
other jobs which create and manage the resources of
other jobs

Beyond Programmable Shading: In Action

Brief History of Parallelism
Riva 128 Riva TNT GeForce 256
NVidia
300

250

200
Approx # of CPUs

150
Nvidia

100

0
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Beyond Programmable Shading: In Action

Current State of Parallelism
• Desktop Processors
– Intel Core 2 Quad, 4 processors, 3.2 ghz, 102 Gflops
• Soon to be 8 core?
• Multimedia Processors
– Cell Processor - 8 processors - 3.2 ghz - 192.0 Gflops
• 1 main, 7 co-processors
• Graphics Accelerators
– GTX 280 - 1.296 ghz – 0.933 Tflops
• 240 stream processors

Beyond Programmable Shading: In Action

CELL BROADBAND ENGINE™

Beyond Programmable Shading: In Action

PLAYSTATION®3 Processor Overview
• Game
• Animation
• Geometry Processing
• Post Processing
• Occlusion Rasterization
• Sorting
• Collision Detection
• Fourier Transform
• (De)Compression
• Not going to cover all of these…
Beyond Programmable Shading: In Action
PLAYSTATION®3 Processor Overview
• Parallelize ordinarily sequential CPU
processing
• Assist in what is typically considered GPU
processing

Beyond Programmable Shading: In Action

Primary Programming Challenges
• Fitting code and data in the 256k local co-
processor memory
– Best solutions are ones that don't treat the 256k
local store as a typical on demand caching
architecture
• Scattered reads/writes bad, sequential reads/writes
good
• Software Pipelining
• Only 16 byte aligned reads/writes
• Synchronization

Beyond Programmable Shading: In Action

MD6 ANIMATION PROCESSING

Beyond Programmable Shading: In Action

MD6 Animation Processing

Game
Logic

Beyond Programmable Shading: In Action

MD6 Animation Processing

Game
Logic

Blending Tree
Generation

Beyond Programmable Shading: In Action

MD6 Animation Processing

Game
Logic

Blending Tree
Generation

Low Level Operation

List Generation

Beyond Programmable Shading: In Action

MD6 Animation Processing

Game
Logic

Blending Tree
Generation

Low Level Operation

List Generation

Low Level Operation Execution

Beyond Programmable Shading: In Action

MD6 Animation Processing

Serial
Game
Logic

Blending Tree
Generation

Low Level Operation

List Generation
Parallel

Low Level Operation Execution

Beyond Programmable Shading: In Action

MD6 Animation Webs
• Separates Thinking from Representation
– Game Object says what it wants to look like.
– Animation Webs take care of the rest.
• Unstructured graph
– Each node has a blend tree
• Designed with simplicity in mind
– Animators should animate, not fiddle with nodes.
– Extract as much information as possible directly
from the animation data.

Beyond Programmable Shading: In Action

MD6 Animation Processing
• Additive Blending
• Subtractive Blending
• Animation Algebra
– Blend Equations
• Animation blending trees in the form of an equation.
• Example equation:
– (animA + animB) – animC

Beyond Programmable Shading: In Action

Partial Animation Blending
• Generalized play an animation only on the
face, torso, etc…
• One weight per joint per animation
• Compute alpha for slerp via following
equation:
– For each joint
• Let w0 = weight of joint in animation A
• Let w1 = weight of joint in animation B
• If(w1 > w0)
– Let alpha = (alpha * w1) / w0
• Else
– Let alpha = ((w1 – w0) + alpha * w0) / w1
Beyond Programmable Shading: In Action
Varying parameter treatment

Beyond Programmable Shading: In Action

Varying parameter treatment

Beyond Programmable Shading: In Action

Varying parameter treatment

16k 16k 16k

Beyond Programmable Shading: In Action
GEOMETRY PROCESSING

Beyond Programmable Shading: In Action

Two modes of usage
• Primary mode
– Use offline tools
– Partition into vertex sets
– Use indexed triangles
– All features of pipeline can be used

SPU

Beyond Programmable Shading: In Action

Two modes of usage (cont)
• Secondary mode
– Data generated by other tools
– Formats other than indexed triangles
– Non-partitioned objects
– Subset of pipeline features can be used

SPU

Beyond Programmable Shading: In Action

SPU Geometry Pipeline Stages

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Vertex Decompression

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Vertex Attributes

Unique Vertex
Array 0

Instance Vertex
Array 1

Beyond Programmable Shading: In Action

Vertex Decompression

Unique Vertex Float Tables

Array 0

Instance Vertex
Array 1

Beyond Programmable Shading: In Action

24bit Unit Vector

• Smallest 2 compression
– Two smallest components with 10 bits each
• Encoded from –sqrt(2)/2 to +sqrt(2)/2
– Largest component reconstructed via
• Largest = sqrt(1 – smallestA2 – smallestB2)
• One additional bit for sign of largest component.

Beyond Programmable Shading: In Action

24bit Unit Vector

Beyond Programmable Shading: In Action

N-bit Fixed Point with integer offsets

• Simple n.x fixed point values

– Per-segment integer offset
• Bit count may vary from attribute to attribute

Beyond Programmable Shading: In Action

Index Decompression

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Index Table Construction
• Index table is created by a vertex cache
optimizer
– Based on K-cache algorithm
• Number of vertex program outputs affects
Vertex Cache size.
• Four vertex mini cache most important
optimization factor

Beyond Programmable Shading: In Action

Index Cache Optimizer
• Our vertex cache optimizer produces very
regular index data

0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11

Beyond Programmable Shading: In Action

Index Decompression
• Our vertex cache optimizer produces very
regular index data
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11

Beyond Programmable Shading: In Action

Index Decompression

2
1
Triangle Indexes
0 1 2
Beyond Programmable Shading: In Action
Index Decompression

1
0
Triangle Indexes
2 0 1
Beyond Programmable Shading: In Action
Index Decompression
• Before Rotation

0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11

Beyond Programmable Shading: In Action

Index Decompression
• After Rotation

0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
6 8 9
9 11 10
11 9 8

Beyond Programmable Shading: In Action

Index Decompression

Beyond Programmable Shading: In Action

Index Decompression

85% compression
6.5 : 1

Beyond Programmable Shading: In Action

Blend Shapes

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Blend Shapes in MLB® 08 The Show

Beyond Programmable Shading: In Action

Skinning

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Skinning on SPUs

void SkinVs(float4 inPosition : ATTR0, float4 weights : ATTR3,

float4 matrixIndex : ATTR4,
out float4 position : POSITION,
uniform float4 joints[72], uniform float4x4 modelViewProj)
{
position = 0;
for (int i = 0; i < 4; i++)
{
float idx = matrixIndex[i];
float3x4 joint = float3x4(joints[idx+0], joints[idx+1],
joints[idx+2]);
position += weights[i] * mul(joint, inPosition);
}
position = mul(modelViewProj, position);
}

Beyond Programmable Shading: In Action

Skinning on SPUs

30% Performance Improvement

Beyond Programmable Shading: In Action

Skinning on SPUs

30% Performance Improvement

Shadow map generation.... 70%!

Beyond Programmable Shading: In Action

Discrete Progressive Mesh

SPU Pipeline • Smoothly reduces the

Vertex Decompress triangle count as a model
Index Decompress moves into the distance
Blend Shapes • With discrete progressive
Skinning mesh, the LOD calculation is
done once for an entire
Progressive Mesh
object
Triangle Culling

Compression

Output

Beyond Programmable Shading: In Action

At an LOD there are two types of
vertexes

Parent Vertex
Child Vertex

LOD = 0.0

Beyond Programmable Shading: In Action

As the LOD level decreases, the
children “slide” towards their parents

Parent Vertex
Child Vertex

LOD = 0.2

Beyond Programmable Shading: In Action

The children continue to move towards
their parents

Parent Vertex
Child Vertex

LOD = 0.7

Beyond Programmable Shading: In Action

At the next integral LOD, all child
vertexes disappear as do the triangles

Parent Vertex
Child Vertex

LOD = 1.0

Beyond Programmable Shading: In Action

Continuous Progressive Mesh

SPU Pipeline • Like discrete progressive

Vertex Decompress mesh, child vertexes move
Index Decompress smoothly toward their
Blend Shapes parents
Skinning • However, the LOD is
calculated for each vertex
Progressive Mesh
instead of just once for the
Triangle Culling object
Compression

Output

Beyond Programmable Shading: In Action

Vertex set about to undergo continuous
progressive mesh

Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0

Beyond Programmable Shading: In Action

A single vertex set can straddle several
LOD ranges

Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0

LOD = 1.0

LOD = 0.0

Beyond Programmable Shading: In Action

Vertexes move depending on their
distance

Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0

LOD = 1.0

LOD = 0.0

Beyond Programmable Shading: In Action

Triangle Culling

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Up to 70% of triangles do not contribute to
final image.

Beyond Programmable Shading: In Action

Off Screen Triangles

Beyond Programmable Shading: In Action

Back Facing Triangles

Beyond Programmable Shading: In Action

Zero Area Triangles

Beyond Programmable Shading: In Action

Zero Area Triangles

Beyond Programmable Shading: In Action

No Pixel Triangles

Beyond Programmable Shading: In Action

Triangle Culling

Beyond Programmable Shading: In Action

Multisampling adds some complications…

Beyond Programmable Shading: In Action

Culled

Beyond Programmable Shading: In Action

Triangle Culling

10% to 20%
Performance Improvement

Beyond Programmable Shading: In Action

Compression for Output

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action

Float Tables

Beyond Programmable Shading: In Action

When done, the vertex attributes are
compressed into one output stream

Float Tables

Output
Vertex Array

Beyond Programmable Shading: In Action

Output Buffering Schemes

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes
Skinning

Progressive Mesh

Triangle Culling

Compression

Output

Beyond Programmable Shading: In Action

Double Buffer

• Each buffer stores vertex

Vertex Vertex and index data for an entire
and and frame
Index Index
Data Data • SPUs atomically access a
for for mutex which is used to
Frame 0 Frame 1
allocate memory from a
buffer
• Uses lots of memory

Beyond Programmable Shading: In Action

It is possible to completely fill a buffer

Vertex
and
Index
Data

Data SPU

Beyond Programmable Shading: In Action

Double buffering adds a frame of lag
Standard Pipeline
Build Jobs Render on
Scan Out
on PPU RSX™
Build Jobs Render on
Scan Out
on PPU RSX™
Build Jobs Render on
Scan Out
on PPU RSX™

Build Jobs Process Jobs Render on

Scan Out
on PPU on SPU RSX™
Build Jobs Process Jobs Render on
Scan Out
on PPU on SPU RSX™
Build Jobs Process Jobs Render on
Scan Out
on PPU on SPU RSX™

Pipeline with Double Buffered SPU Processing

Beyond Programmable Shading: In Action

Single Buffering

• Uses only half the memory!

Vertex • Still possible to completely
and
Index fill the buffer
Data
for
Single
Frame

Beyond Programmable Shading: In Action

Single Buffering has a shorter pipeline

Build Jobs on SPU Processing/

Scan Out
PPU RSX™ Rendering
Build Jobs on SPU Processing/
Scan Out
PPU RSX™ Rendering
Build Jobs on SPU Processing/
Scan Out
PPU RSX™ Rendering

• Vertex and index data is created just-in-time

for the RSX™
• Draw commands are inserted into the
command buffer while the RSX™ is rendering
• Requires tight SPU↔RSX™ synchronization

Beyond Programmable Shading: In Action

SPU↔RSX™ Synchronization
Using Local Stalls

Command  Place local stalls in the command buffer

Buffer where necessary
Draw 17  RSX™ will stop processing at a local stall
Local Stall until it is overwritten by new commands
Local Stall  SPUs will generally stay ahead of the
RSX™, so stalls rarely occur
Local Stall

Local Stall

Other

Beyond Programmable Shading: In Action

SPU will overwrite local stalls when it
outputs a set of new commands

Command
Buffer
Draw 17
SPU
Local Stall

Local Stall

Local Stall
New
Commands
Local Stall

Local Stall

Put
Other
Pointer

Beyond Programmable Shading: In Action

Ring Buffers

Data
Start of • Small memory footprint
Free Area
• Will not run out of memory

Vertex End of
and Free Area
Index

Beyond Programmable Shading: In Action

RSX™ writes a semaphore once a
chunk of data has been consumed

• A command to write a semaphore needs to be

added to the command buffer after all
commands that use the data
– The value of the semaphore to be written is the new
end of free area pointer

Start of
Data 19 Free Area
Draw 5

Current
Semaphore New End
RSX™ of Free
Draw 6 Data 6
Execution Area
Semaphore

Draw 7 Data 14

Semaphore

Beyond Programmable Shading: In Action

Each SPU has its own buffer

Buffer 0 Buffer 1

Data 22
SPU 0
Data 8
Buffer 3
Buffer 2 Data 11 SPU 1 Data 13
Data 9 Data 17

Data 12 Data 16 Data 18

SPU 2 Data 23

Data 21
Data 19 SPU 3
Data 10
Data 7
SPU 4
Data 6 Data 15
SPU 5
Data 14 Data 20

Buffer 4 Buffer 5
Beyond Programmable Shading: In Action
Geometry Performance

Beyond Programmable Shading: In Action

Software Pipelined C with SPU Intrinsics

do
{
m1 = in1;
in1 = si_lqx(pIn1, offset);
m2 = in2;
in2 = si_lqx(pIn2, offset);
m3 = in3;
in3 = si_lqx(pIn3, offset);
temp2 = si_selb(m3, m1, mask_0X00);
si_stqx(out1, pOut1, offset);
temp3 = si_selb(m2, m1, mask_00X0);
si_stqx(out2, pOut2, offset);
temp1 = si_selb(m1, m2, mask_0X00);
si_stqx(out3, pOut3, offset);
offset = si_ai(offset, 0x30);
out2 = si_shufb(m2, temp2, qs_bCaD);
out1 = si_selb(temp1, m3, mask_00X0);
out3 = si_shufb(m3, temp3, qs_caBD);
} while(si_to_int(offset) != 0);
Beyond Programmable Shading: In Action
Software Pipelined C with SPU Intrinsics

do
{
m1 = in1;
in1 = si_lqx(pIn1, offset);
m2 = in2;
in2 = si_lqx(pIn2, offset);
m3 = in3;
in3 = si_lqx(pIn3, offset);
Up to 20x faster
than naive C/C++
temp2 = si_selb(m3, m1, mask_0X00);
si_stqx(out1, pOut1, offset);
temp3 = si_selb(m2, m1, mask_00X0);
si_stqx(out2, pOut2, offset);
temp1 = si_selb(m1, m2, mask_0X00);
si_stqx(out3, pOut3, offset);
offset = si_ai(offset, 0x30);
out2 = si_shufb(m2, temp2, qs_bCaD);
out1 = si_selb(temp1, m3, mask_00X0);
out3 = si_shufb(m3, temp3, qs_caBD);
} while(si_to_int(offset) != 0);
Beyond Programmable Shading: In Action
1 SPU

Beyond Programmable Shading: In Action

1 SPU

800,000+
Triangles Per Frame
at 60 Frames per Second

Beyond Programmable Shading: In Action

1 SPU

800,000+
Triangles Per Frame
at 60 Frames per Second

60% of which are culled!

Beyond Programmable Shading: In Action

Next Generation Parallelism
In Games

Jon Olick
id Software

Beyond Programmable Shading: In Action

GAME ENTITY PROCESSING

Beyond Programmable Shading: In Action

Game Entity Processing
• Current Generation
– Serial Processing of entities in a giant for loop.

for(int i = 0; i < numEntities; ++i) {

entity[i]->Think();
}

Beyond Programmable Shading: In Action

Game Entity Processing
• Current Generation
– Serial Processing of entities in a giant for loop.
• Next Generation
– Parallelism via Double Buffering
– Every entity runs in parallel with each other with no
dependency stalls.
– Each entity can only read from previous frame’s
results
– Each entity can only write to itself

Beyond Programmable Shading: In Action

Game Entity Processing
• Record the progress of the game and replay
to debug.
• Single thread and randomize processing of
entities to help find bugs.
• Can protect memory so that bad accesses
cause exceptions to enforce double
buffering rules.

Beyond Programmable Shading: In Action

Game Entity Processing
• What about entities which have dependant
entities?
• Bucketing and Synchronization Points

Beyond Programmable Shading: In Action 96

RAY
CASTING
THE NEXT GENERATION

Beyond Programmable Shading: In Action

Why Ray Casting?

Beyond Programmable Shading: In Action

Why Ray Casting?
• A good question…

Beyond Programmable Shading: In Action

Beyond Programmable Shading: In Action
• Back in Quake 1
– If you had to make a decision between an additional
CPU and a Graphics Card which would you choose?

Beyond Programmable Shading: In Action

• Back in Quake 1
– If you had to make a decision between an additional
CPU and a Graphics Card which would you choose?
– Why is this any different today?

Beyond Programmable Shading: In Action

• Back in Quake 1
– If you had to make a decision between an additional
CPU and a Graphics Card which would you choose?
– Why is this any different today?
– Its not any different.

Beyond Programmable Shading: In Action

Why Ray Casting?
• What value does it provide to developers?

Beyond Programmable Shading: In Action

Why Ray Casting?
• What value does it provide to developers?
– Shorter & Cheaper Development
– Higher Quality Games

Beyond Programmable Shading: In Action

Why Ray Casting?
• What value does it provide to end users?

Beyond Programmable Shading: In Action

Screenshot From E3 Rage Video

Beyond Programmable Shading: In Action

Screenshot From E3 Rage Video

Beyond Programmable Shading: In Action

Why Ray Casting?

Beyond Programmable Shading: In Action

Current State of Rasterization

Command Buffer
Vertex
Processing
Vertex Processing

Triangle Setup

Fragment Processing
Fragment
Processing

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Beyond Programmable Shading: In Action

Future of Rasterization

Multiple Cores
Command Buffer
Vertex Processing
Vertex Processing

Triangle Setup

Fragment Processing
Fragment Processing

Beyond Programmable Shading: In Action

Future of Rasterization
28 31 29 28 32 29 28 33 34 35 36 37 38 39 37 38 40 37 38 41

GPU Triangle Setup

Beyond Programmable Shading: In Action

Future of Rasterization

Command Buffer

Vertex
Vertex Processing
Processing

Triangle Sorting

Triangle Setup
Fragment
Processing
Fragment Processing

Beyond Programmable Shading: In Action

Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 121

Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 122

Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 123

Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 124

Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 125

Why voxels, and not triangles?

Beyond Programmable Shading: In Action

Why voxels, and not triangles?
• Unique Texturing is possible with
rasterization
– Rage – idTech 5

Beyond Programmable Shading: In Action

Why voxels, and not triangles?
• Unique Texturing is possible with
rasterization
– Rage – idTech 5
• Unique Geometry is possible with
rasterization
– Progressive Mesh

Beyond Programmable Shading: In Action

Why voxels, and not triangles?
• Unique Texturing is possible with
rasterization
– Rage – idTech 5
• Unique Geometry is possible with
rasterization
– Progressive Mesh
• SVO Solves Two Problems in One
– Unique Texturing & Unique Geometry

Beyond Programmable Shading: In Action

Why is the control flow efficient?

Beyond Programmable Shading: In Action

Why is the control flow efficient?

Beyond Programmable Shading: In Action

Why is the control flow efficient?

Beyond Programmable Shading: In Action

Why is the control flow efficient?

Beyond Programmable Shading: In Action

Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action

Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action

Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action

Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action

Caveats of Ray-Tracing?
• “Primary rays cache, secondary rays thrash”™
– Importance sampling to the rescue!
• Ray Tracing != Ray Casting

Beyond Programmable Shading: In Action

Sparse Voxel Oct-trees
• Oct-trees as collection of maximal blocks.
– Related to run-length encoding.
– Variable splitting planes

Beyond Programmable Shading: In Action

Data Structure
• Disk Caching with Virtual and Physical Pages

Beyond Programmable Shading: In Action

Is Disk Caching a Valid Lever?

Beyond Programmable Shading: In Action 141

Hot Data Structure
• Disk Caching with Virtual and Physical Pages

Beyond Programmable Shading: In Action

Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.

Beyond Programmable Shading: In Action

Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.
– Render some voxels into the tree until page capacity
is reached.

Beyond Programmable Shading: In Action

Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.
– Render some voxels into the tree until page capacity
is reached.
– Split page into 8 sub-pages and attempt to add the
overflow voxel again.

Beyond Programmable Shading: In Action

Hot Data Structure
• Page capacity can be based on...
– CUDA's shared memory size
– SPU local store size
– Optimum disk streaming performance
– Minimum physical page memory

Beyond Programmable Shading: In Action

Virtual Page Fragmentation
• Traverse indexing oct-tree
– Write out pages according to optimal layout (breadth
first, depth first, etc...)

Beyond Programmable Shading: In Action

Physical Page Fragmentation
• Constantly loading / unloading data
fragments memory over time
• Bucket memory into sections and assign
each page to a section.

Beyond Programmable Shading: In Action 149

Page Optimization
• Execution time proportional to number of
blocks.

Beyond Programmable Shading: In Action

Page Optimization
• Execution time proportional to number of
blocks.
• Number of blocks can be reduced through
translation.

Beyond Programmable Shading: In Action

Page Optimization
• Execution time proportional to number of
blocks.
• Number of blocks can be reduced through
translation.
• Translating by 2n doesn’t affect any oct-tree
level smaller than 2n

Beyond Programmable Shading: In Action

Page Optimization
• Create scratch page with enlarged region
– 2n+1 x 2n+1 x 2n+1
• Apply successive translations of magnitude
power of 2 in the x, y, & z directions and
keep track of the number of nodes.
• Store off total translation for ray casting
adjustment.
• O(n * 22n)
– n is the number of levels in the oct-tree

Beyond Programmable Shading: In Action

Page Optimization

Beyond Programmable Shading: In Action 154

Page Optimization

Beyond Programmable Shading: In Action 155

Page Optimization
• Minimize outside nodes for faster casting

Beyond Programmable Shading: In Action

Page Optimization

Beyond Programmable Shading: In Action 157

Page Optimization

Beyond Programmable Shading: In Action 158

Page Optimization

Beyond Programmable Shading: In Action 159

Data Structure
• Different structures for editing, runtime and
storage.

Beyond Programmable Shading: In Action

Runtime Data Structure
• child offsets : 32
• diffuse rgb : 3
• specular scale/power : 1
• planes : 12
• normal xyz : 3
• pad : 1
• total : 52 bytes per node

Beyond Programmable Shading: In Action

Storage Data Structure
• children bit mask : 1
• diffuse rgb : 3
• specular scale/power : 1
• normal xyz : 3
• total : 8 bytes per node

Beyond Programmable Shading: In Action

Data Compression
• Compressing child bits
• Compressing Colors

Beyond Programmable Shading: In Action

Compressing Child Bits
0.06

0.05

0.04

0.03

0.02

0.01

0
0 50 100 150 200 250

Beyond Programmable Shading: In Action 164

Compressing Child Bits
0.06 0.08
0.06
0.04
0.04
0.02 Level 0 Level 4
0.02
0 0
0 50 100 150 200 250 0 50 100 150 200 250
0.05 0.08
0.04 0.06
0.03
0.04
0.02 Level 1 Level 5
0.01 0.02
0 0
0 50 100 150 200 250 0 50 100 150 200 250
0.06 0.05
0.05 0.04
0.04 0.03
0.03
Level 2 0.02 Level 6
0.02
0.01
0.01
0 0
1 51 101 151 201 251 0 50 100 150 200 250
0.05 0.08
0.07
0.04 0.06
0.05
0.03
0.04
0.02 Level 3 0.03 Level 7
0.02
0.01 0.01
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Beyond Programmable Shading: In Action
Compressing Child Bits
• Split by oct-tree level.
• Entropy Encoding

Beyond Programmable Shading: In Action

Compressing Color Data

15 35 90 63 29

10 20 35 87 98 81 95 61 65 29 35 26 30 25

Beyond Programmable Shading: In Action

Compressing Color Data

-31 -11 44 17 -17

-5 +5 0 -3 8 -9 5 -2 2 0 6 3 1 -4

Beyond Programmable Shading: In Action

Compressing Color Data
• Split by oct-tree level.
• Quantization
• Entropy Encoding
• 8:1 expected compression ratio

Beyond Programmable Shading: In Action

Data Storage Size
• 1.15 bits of positional data per voxel
• Cost savings improves as triangle size
decreases.
• 160 bits per triangle in traditional format
– x,z,y,s,t all 32-bits

• 80 bits per triangle in compressed format

– x,y,z,s,t all 16-bits

• 72 bits equivalent per triangle in oct-tree

– (for next generation)
Beyond Programmable Shading: In Action
Generating the Data
• Every surface can enumerate into voxels.
– Triangles
• 3D Scan Conversion, Volume Projection, Subdivision

Beyond Programmable Shading: In Action

3D Scan Conversion
Z=0

Z=1

= Z=2

Z=3

Z=4
Beyond Programmable Shading: In Action
Volume Projection

Beyond Programmable Shading: In Action

Volume Projection

Beyond Programmable Shading: In Action

Subdivision

Beyond Programmable Shading: In Action

Subdivision

Beyond Programmable Shading: In Action

Subdivision

Beyond Programmable Shading: In Action

Generating the Data
• Every surface can enumerate into voxels.
– 3D Scan Conversion, Volume Projection, Subdivision
• Thick surfaces are unnecessary

Beyond Programmable Shading: In Action

Generating the Data
• Every surface can enumerate into voxels.
– 3D Scan Conversion, Volume Projection, Subdivision
• Thick surfaces are unnecessary
– Flood fill world and remove unnecessary voxels.

Beyond Programmable Shading: In Action

Using the Data
• For each pixel on the screen
– Shoot out a ray into the oct-tree and write out the
node number (and depth)

Beyond Programmable Shading: In Action

Oct-tree Ray Traversal
• Similar to KD-tree traversal. Clip the line
with the mid-planes only.
• Tree traversal with two lookup tables.
– One to find which nodes to intersect with a given ray
direction in a worst-case scenario.
– The other to determine the order of intersection.
• Faster than most stackless traversal
methods for CUDA.

Beyond Programmable Shading: In Action

LOD
• How to handle oct-tree mip-mapping and
when is it necessary to load additional detail
levels?

Beyond Programmable Shading: In Action 184

LOD - Stop Depth
• Stop Depth based on pixel and voxel size
[Wald07]

Beyond Programmable Shading: In Action 185

LOD - Stop Depth
• Stop Depth based on pixel and voxel size
[Wald07]
– Oblique surfaces have unnecessary extra detail
• Hurts casting performance by traversing detail that
you won’t see
• Hurts streaming performance by loading unnecessary
data

Beyond Programmable Shading: In Action 186

LOD - Post Process
• Ray casting outputs node indexes

Beyond Programmable Shading: In Action 187

LOD - Post Process
• Ray casting outputs node indexes
• A post process which looks at ratios of nodes
to pixels.
– Small feedback buffer (320x180) contains list of
pages which require additional detail.

Beyond Programmable Shading: In Action 188

Beyond Programmable Shading: In Action 189

Post Process Blur
• Fixes the “Jam your head into a wall”
scenario.

Beyond Programmable Shading: In Action 190

Post Process Blur
• Fixes the “Jam your head into a wall”
scenario.
• Width of blur kernel related to size of voxel
on screen.

Beyond Programmable Shading: In Action 191

Rendering Dynamic Geometry
• With voxels
– Option 1
• Ray cast or rasterize a triangle mesh
• Transform to base pose
• Trace with local oct-tree
• Allows instancing of geometry

Beyond Programmable Shading: In Action

Rendering Dynamic Geometry
• With voxels
– Option 1
• Ray cast or rasterize a triangle mesh
• Transform to base pose
• Trace with local oct-tree
• Allows instancing of geometry
– Option 2
• Have two different oct-trees: static & dynamic
• Render both and merge results together with depth information

Beyond Programmable Shading: In Action

Depth Advance Optimization
• Render a coarse hull of the geometry into a
depth-buffer.
– Automatically calculate from voxel geometry.
• Start the ray casting at the depth-buffer
values.

Beyond Programmable Shading: In Action

Depth Advance Optimization
• Render a coarse hull of the geometry into a
depth-buffer.
– Automatically calculate from voxel geometry.
• Start the ray casting at the depth-buffer
values.
• Skips most of the traversal process.
– Up to 2x speed improvement

Beyond Programmable Shading: In Action

Adaptive Sub-Sampling
• After rendering the scene, perform a Sobel
edge filter over the frame buffer to figure
out where additional rays would improve
the quality of the image.
• Cast additional rays.
• Repeat until 16 ms.

Beyond Programmable Shading: In Action

Adaptive Sub-Sampling Problems
• Inherently always sampling the most
divergent parts of the scene
• Can manage performance hit by sampling
highly aliased to less aliased in chunks

Beyond Programmable Shading: In Action

Infinite Surface Detail
• Oct-tree node's recursively point back in on
themselves to create an infinite amount of
detail
• Create detail octree sub-segments to
simulate rough, smooth, porous, sharp
edges, etc..
• Programatically simulate virtual detail
levels.

Beyond Programmable Shading: In Action

How much time to innovate?
• 1 year tools
• 3 months runtime

Beyond Programmable Shading: In Action

Expected Runtime Performance
• 33% of the time rendering characters / etc
• 66% of the time rendering world
• Ray-casting the world must complete in
~20ms for 30 FPS
• Theoretically possible on today's technology
at 720p and 30 fps (GeForce 8800 Series)

Beyond Programmable Shading: In Action

How would this affect a platform
launch?
• Generational skip in geometric complexity
• Next gen platforms 4 times better at least
• 60 FPS at 1080p with Anti-aliasing

Beyond Programmable Shading: In Action

Special Thanks
• Paul Debevec
– Light probes used with permission
• Dimitry Parkin
– www.parkparkin.com
• John Carmack
• Cass Everitt • Sony
• Mark Harris • Intel
• Nathaniel Duca • Nvidia
• Aaron Lefohn
• Mike Houston
• Tom Forsyth
Beyond Programmable Shading: In Action 205
Beyond Programmable Shading: In Action 206
Questions

Jon Olick (jon.olick@gmail.com)

id Software
Beyond Programmable Shading: In Action
References
– http://www.sci.utah.edu/~wald/Publications/2007/
MROct/download/mroct.pdf
– [Wald07]

Beyond Programmable Shading: In Action 208

05-JP Id Tech 5 Challenges
No ratings yet
05-JP Id Tech 5 Challenges
37 pages
A Brief Introduction To 3d
100% (1)
A Brief Introduction To 3d
84 pages
GPU Clay Simulation & Ray-Tracing
No ratings yet
GPU Clay Simulation & Ray-Tracing
70 pages
Lauritzen Deferred Shading Siggraph 2010
No ratings yet
Lauritzen Deferred Shading Siggraph 2010
34 pages
10 RoadToRealTimeOIT Salvi BPS2011
No ratings yet
10 RoadToRealTimeOIT Salvi BPS2011
66 pages
Smartshader
100% (1)
Smartshader
23 pages
3D Graphics & Terrain Modeling
100% (2)
3D Graphics & Terrain Modeling
59 pages
Rick Skarbez, Instructor COMP 575 October 16, 2007
No ratings yet
Rick Skarbez, Instructor COMP 575 October 16, 2007
51 pages
Direct3D 11 Computer Shader More Generality For Advanced Techniques
No ratings yet
Direct3D 11 Computer Shader More Generality For Advanced Techniques
54 pages
3D Graphics Rendering: Technology and Historical Overview
No ratings yet
3D Graphics Rendering: Technology and Historical Overview
55 pages
Mset Rendering April29 2014
No ratings yet
Mset Rendering April29 2014
41 pages
CineFX 3
No ratings yet
CineFX 3
11 pages
Advanced Deferred Shading Techniques
No ratings yet
Advanced Deferred Shading Techniques
40 pages
Object Space Lighting Rev 21
No ratings yet
Object Space Lighting Rev 21
62 pages
Computer Graphics Course Overview
No ratings yet
Computer Graphics Course Overview
70 pages
Real-Time Rendering Challenges
No ratings yet
Real-Time Rendering Challenges
48 pages
GFXHW
No ratings yet
GFXHW
38 pages
Smartshader PDF
No ratings yet
Smartshader PDF
14 pages
Smarts Ha Der
No ratings yet
Smarts Ha Der
14 pages
Procedural Shaders
No ratings yet
Procedural Shaders
28 pages
Understanding The Graphics Pipeline
No ratings yet
Understanding The Graphics Pipeline
35 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
38 pages
2D Shader Development 02 Illumination and Shadows
No ratings yet
2D Shader Development 02 Illumination and Shadows
136 pages
Flexible Rendering Multiple Platforms
No ratings yet
Flexible Rendering Multiple Platforms
41 pages
Introductions and Tutorials With DirectX 9
No ratings yet
Introductions and Tutorials With DirectX 9
393 pages
Gameenginegems 2
No ratings yet
Gameenginegems 2
526 pages
Core Techniques and Algorithms in Game Programming PDF
No ratings yet
Core Techniques and Algorithms in Game Programming PDF
745 pages
Module 2A
No ratings yet
Module 2A
85 pages
AVR - Unit-3 Notes
No ratings yet
AVR - Unit-3 Notes
19 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
(Foundations of Game Engine Development 2) Eric Lengyel - Foundations of Game Engine Development Volume 2 Rendering (2019, Terathon Software)
No ratings yet
(Foundations of Game Engine Development 2) Eric Lengyel - Foundations of Game Engine Development Volume 2 Rendering (2019, Terathon Software)
409 pages
3D Graphics Basics for i.MX MBX
No ratings yet
3D Graphics Basics for i.MX MBX
16 pages
Batch and Cull in Opengl
No ratings yet
Batch and Cull in Opengl
25 pages
Ar VR Lab
No ratings yet
Ar VR Lab
42 pages
Parallel Programming Models For Real-Time Graphics: Aaron Lefohn
No ratings yet
Parallel Programming Models For Real-Time Graphics: Aaron Lefohn
25 pages
L11 Handout
No ratings yet
L11 Handout
121 pages
CCS347 GD - Unit 3
No ratings yet
CCS347 GD - Unit 3
47 pages
Siggraph2016 Idtech6
No ratings yet
Siggraph2016 Idtech6
58 pages
GPU Pro 2 (Edited by W.Engel) (2011)
No ratings yet
GPU Pro 2 (Edited by W.Engel) (2011)
490 pages
Unit 1 GP
No ratings yet
Unit 1 GP
151 pages
3d Graphics & Modelling Unit 1 Chapt 3
No ratings yet
3d Graphics & Modelling Unit 1 Chapt 3
11 pages
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Updated 2025
No ratings yet
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Updated 2025
115 pages
Graphics Card:: FPS or Frames Per Second
No ratings yet
Graphics Card:: FPS or Frames Per Second
10 pages
DirectX 10 For Techies
No ratings yet
DirectX 10 For Techies
29 pages
Are We Done With Ray Tracing
No ratings yet
Are We Done With Ray Tracing
91 pages
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Online Version
No ratings yet
GPU Gems Programming Techniques Tips and Tricks For Real Time Graphics 1st Edition Randima Fernando Online Version
139 pages
How Modern GPUs Work and Evolve
No ratings yet
How Modern GPUs Work and Evolve
87 pages
How A GPU Works - Kayvon Fatahalian
No ratings yet
How A GPU Works - Kayvon Fatahalian
87 pages
201403-GDC UnityPhysicallyBasedShading Notes
No ratings yet
201403-GDC UnityPhysicallyBasedShading Notes
55 pages
Game Engine Design Insights
No ratings yet
Game Engine Design Insights
20 pages
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
No ratings yet
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
74 pages
Ym2151 Synthesis
No ratings yet
Ym2151 Synthesis
31 pages
Super C Abacus PDF
No ratings yet
Super C Abacus PDF
303 pages
Becker Basic Manual PDF
100% (1)
Becker Basic Manual PDF
263 pages
Amd Nvme/Sata Raid Quick Start Guide For Windows Operating Systems
No ratings yet
Amd Nvme/Sata Raid Quick Start Guide For Windows Operating Systems
21 pages
Definition and Purpose: 700. The Importance of Dead Reckoning
No ratings yet
Definition and Purpose: 700. The Importance of Dead Reckoning
8 pages
PDF
No ratings yet
PDF
118 pages
087930474X (5CB2B48A) Windows Assembly Language and Systems Programming (2nd Ed.) (Kauler 1997-01-09)
50% (2)
087930474X (5CB2B48A) Windows Assembly Language and Systems Programming (2nd Ed.) (Kauler 1997-01-09)
228 pages
Programming A Multiplayer FPS in DirectX PDF
No ratings yet
Programming A Multiplayer FPS in DirectX PDF
496 pages
Windows Assembly Language and System Programming (2nd Edition) - Barry Kauler
No ratings yet
Windows Assembly Language and System Programming (2nd Edition) - Barry Kauler
421 pages
Learn Vertex and Pixel Shader Programming With DirectX 9
No ratings yet
Learn Vertex and Pixel Shader Programming With DirectX 9
305 pages
John Carmack Archive - Slashdot Posts
100% (1)
John Carmack Archive - Slashdot Posts
128 pages
Mantle Programming Guide and API Reference
No ratings yet
Mantle Programming Guide and API Reference
435 pages
ShepherdMortonSpence HigherElectricalEngineering Text
100% (7)
ShepherdMortonSpence HigherElectricalEngineering Text
887 pages
Advanced Impostor Rendering
No ratings yet
Advanced Impostor Rendering
7 pages
Intro to C++ STL: Pair, Vector, Map
No ratings yet
Intro to C++ STL: Pair, Vector, Map
8 pages
Lesson 1 - Communities
No ratings yet
Lesson 1 - Communities
4 pages
Siemens Power Engineering Guide 7E 16
50% (2)
Siemens Power Engineering Guide 7E 16
1 page
Power Transistor Specifications
No ratings yet
Power Transistor Specifications
2 pages
Blackmer Parts List Pump Models: Txd2.5A, Txsd2.5A TX2.5A, TXS2.5A
No ratings yet
Blackmer Parts List Pump Models: Txd2.5A, Txsd2.5A TX2.5A, TXS2.5A
2 pages
TDS-9SA Lubrication Guide
No ratings yet
TDS-9SA Lubrication Guide
2 pages
BOQ Restaurant PDF
No ratings yet
BOQ Restaurant PDF
5 pages
Exam Paper Grade 10 Paper 2 (I.C.T.)
No ratings yet
Exam Paper Grade 10 Paper 2 (I.C.T.)
12 pages
Places and Directions Vocabulary - German - 4th Grade by Slidesgo
No ratings yet
Places and Directions Vocabulary - German - 4th Grade by Slidesgo
40 pages
Notes On Control Systems 05
No ratings yet
Notes On Control Systems 05
11 pages
Thermodynamics Basics & Principles
No ratings yet
Thermodynamics Basics & Principles
1 page
BOP Manual
88% (8)
BOP Manual
70 pages
Chennai Bus E-Ticket Details
No ratings yet
Chennai Bus E-Ticket Details
1 page
Alternator E1S13S A/4: Three-Phase Synchronous Alternator With Brushes and Compound - 4 Poles
No ratings yet
Alternator E1S13S A/4: Three-Phase Synchronous Alternator With Brushes and Compound - 4 Poles
7 pages
Arif Khan Latest Resume
No ratings yet
Arif Khan Latest Resume
2 pages
ENG790s2 Lab Sheet - Hilton Combustion Unit-1
100% (2)
ENG790s2 Lab Sheet - Hilton Combustion Unit-1
8 pages
10.1007/978 3 319 10662 5 - 23
No ratings yet
10.1007/978 3 319 10662 5 - 23
8 pages
U000 9600a 11 12 U and G Series Crushers Operation Manual
No ratings yet
U000 9600a 11 12 U and G Series Crushers Operation Manual
83 pages
DAC-ISAT200A Flyer-11-12-20152.pdf - 249971
No ratings yet
DAC-ISAT200A Flyer-11-12-20152.pdf - 249971
2 pages
Power Management For IoT
No ratings yet
Power Management For IoT
32 pages
EC21101 Basic Electronics ES 2018
100% (1)
EC21101 Basic Electronics ES 2018
4 pages
Properties of Pure Substances: Thermodynamics: An Engineering Approach
No ratings yet
Properties of Pure Substances: Thermodynamics: An Engineering Approach
37 pages
Streamlining Philippine Business Permits
67% (3)
Streamlining Philippine Business Permits
5 pages
Sounding Pipe
No ratings yet
Sounding Pipe
13 pages
Standard 1 A Triacs: Features
No ratings yet
Standard 1 A Triacs: Features
16 pages
Chapter 3 Relational Algebra
No ratings yet
Chapter 3 Relational Algebra
7 pages
3gpp Ts 36 104 (Lte Enb TX and RX)
No ratings yet
3gpp Ts 36 104 (Lte Enb TX and RX)
117 pages
Engineering Surface Analysis Guide
100% (1)
Engineering Surface Analysis Guide
8 pages
FDA 510(k) Submission Guide
100% (1)
FDA 510(k) Submission Guide
25 pages
Anderson Bridge Experiment
No ratings yet
Anderson Bridge Experiment
5 pages
Industrial Submersible Pumps Guide
No ratings yet
Industrial Submersible Pumps Guide
5 pages