Nerdy Nights NES Tutorials v1

About This PDF (v1.
0)
This PDF is an adaption of content created and curated that together forms the Nerdy Nights tutorial
series for the NES.
Original Tutorials By: Brian Parker / bunnyboy
Curated By: Christopher Parker / cppchriscpp / http://cpprograms.net/
Curation URL: https://nerdy-nights.nes.science
PDF adaption: Fuzzy Mannerz / https://fuzzytek.ml. / fuzzytek@fuzzymannerz.co.uk
Overview
Nerdy Nights is a tutorial series to help people write games for the NES. It introduces the concepts
in a user-friendly way, and for years has been the go-to solution for many people to learn NES
programming. It was originally written by Brian Parker (aka BunnyBoy), with some additional
audio tutorials written by MetalSlime.
Links to the audio series fell off the internet due to a domain expiring. Luckily, members of the
community kept these links alive. However, this exposed the fragility of having these tutorials on a
bunch of forum posts, which could one day also disappear. MetalSlime's site also went offline,
though there is a mirror of that available thanks to archive.org. In addition, the entire NintendoAge
site was recently merged with gocollect's forums, breaking most of the links, images, avatars, etc.
Since some of the images from the audio tutorial have fully disappeared from the internet, they have
been removed from this PDF, however, they are not strictly necessary for understanding the tutorial.
There is also a port of the main tutorial series to ca65 done by Dave Dribin, (ddbribin) which is
good for getting your start with that assembler. Not all tutorials are ported, but it's a good start, and
a good way to get familiar with that syntax. Find it on his BitBucket repo.
Finally, there are a few other popular topics and tutorials in the Miscellaneous section, all taken
from popular topic on NintendoAge. These are by various authors, who are credited in their
individual posts.
Table of Contents
Main Tutorial Series
• Nerdy Nights intro
• Nerdy Nights week 1: number systems and core programming ideas
• Nerdy Nights week 2: NES architecture overview
• Nerdy Nights week 3: 6502 ASM, first app
• Nerdy Nights week 4: Color Palettes, Sprites, second app
• Nerdy Nights week 5: multiple sprites, reading controllers, more instructions
• Nerdy Nights week 6: Backgrounds
• Nerdy Nights week 7: subroutines, game layout, starting Pong
• Nerdy Nights week 8: 16 bit math, pointers, nested loops
• Nerdy Nights week 9: Numbers, Bin to Dec
Advanced Tutorial Series

• Advanced Nerdy Nights #1: CHR Bank switching
• Advanced Nerdy Nights #2: MMC1 CHR and PRG Bank switching, WRAM + Battery
• Advanced Nerdy Nights #3: Horizontal background scrolling
• Advanced Nerdy Nights #4: Sprite 0 hit for a status bar
Audio Tutorial Series

• Nerdy Nights Sound intro: About the Nerdy Nights Sound series
• Nerdy Nights Sound: Part 1: make a music/sfx engine
• Nerdy Nights Sound: Part 2: Square 2 and Triangle Basics
• Nerdy Nights Sound: Part 3: Periods and lookup tables
• Nerdy Nights Sound: Part 4: sound engine skeleton
• Nerdy Nights Sound: Part 5: Sound Data, Pointer Tables, Headers
• Nerdy Nights Sound: Part 6: Tempo, Note Lengths, Buffering and Rests
• Nerdy Nights Sound: Part 7: Volume Envelopes
• Nerdy Nights Sound: Part 8: Opcodes and Looping
• Nerdy Nights Sound: Part 9: Finite Loops, Key Changes, Chord Progressions
• Nerdy Nights Sound: Part 10: Simple Drums
Miscellaneous Articles
• Nerdy Nights: hex editing: Alter your own title screens!
• AtariAge: 6502 Killer hacks
• MMC1 Memory Mapping: NintendoAge Programming Resources - MMC1 Memory
Mapping
• World Building with SGP: Introductions
• World Building with SGP 1: Foundations of NES Graphics
• World Building with SGP 2: Introduction to NES Graphics Editors
• Top Status Bar and Controller Question
• Background Horizontal Scrolling Buffer
Nerdy Nights intro
This is going to be a series of weekly NES programming lessons, starting from absolutely no
knowledge. Right now the plan is 16-20 lessons ending with a complete game like pong or
breakout. The first lessons may be easy but they will get harder! People who have done any type of
programming before may have an easier time but anyone should be able to make it through. All the
tools will be Windows based, so Linux users will have to use wine and MacOS users will have to
use Parallels, Boot Camp, or VirtualPC.
Many things will simply not be covered in this series. No audio or scrolling will be done. Only the
NROM mapper will be used. After you make it through all the lessons those will be much more
simple than when you are first learning.
And finally these will not be without errors to begin with and may not be the absolute best way to
do anything. People develop their own programming styles and this is just mine, which tends to be
quickly written and not super efficient.
Nerdy Nights week 1:
number systems and core programming ideas
Number Systems
Decimal
The decimal system is base 10. Every digit can be 0-9. Each digit place is a power of 10. Each digit
place to the left is 10 times more than the previous digit place. If you take the number 10 and put a
0 to the right, it becomes 100 which is 10 times more. Remove the 0 from the right, it becomes 1
which is 10 times less.

100's place 10's place 1's place
0 0 1 = 001
0 1 0 = 010
1 0 0 = 100
To get the value of a number, you multiply each digit by it's place value and add them all together.

100's place 10's place 1's place
3 8 0 = 3*100 + 8*10 + 0*1 = 380
0 4 1 = 0*100 + 4*10 + 1*1 = 41
Binary
Everything in computers is done in base 2, binary. This is because the lowest level of computing is a
switch; on/off, 1/0.
Base 2 binary works the same way, except each digit can be 0-1 and the place values are powers of
2 instead of 10. Insert a 0 to the right of a number and it becomes 2 times bigger. Remove a 0 and it
becomes 2 times smaller.

8's place 4's place 2's place 1's place
0 1 0 0 = 0*8 + 1*4 + 0*2 + 0*1 = 4
1 1 1 1 = 1*8 + 1*4 + 1*2 + 1*1 = 15
The NES is an 8 bit system, which means the binary number it works with are 8 binary digits long.
8 bits is one byte. Some examples are:

Binary Decimal
00000000 = 0
00001111 = 15
00010000 = 16
10101010 = 170
11111111 = 255
Eventually you become fast at reading binary numbers, or at least recognizing patterns. You can see
that one byte can only range from 0-255. For numbers bigger than that you must use 2 or more
bytes. There are also no negative numbers. More on that later.
Hexadecimal
Hexadecimal or Hex is base 16, so each digit is 0-15 and each digit place is a power of 16. The
problem is anything 10 and above needs 2 digits. To fix this letters are used instead of numbers
starting with A:

Decimal Hex
0 = 0
1 = 1
9 = 9
10 = A
11 = B
12 = C
13 = D
14 = E
15 = F
As with decimal and hex the digit places are each a power of 16:

16's place 1's place

6 A = 6*16 + A(10)*1 = 106
1 0 = 1*16 + 0*1 = 16
Hex is largely used because it is much faster to write than binary. An 8 digit binary number turns
into a 2 digit hex number:

Binary 01101010
split | |
in half / \
0110 1010
into | |
hex 6 A
| |
put \ /
back 6A
01101010 = 6A
And more examples:

Binary Hex Decimal
00000000 = 00 = 0
00001111 = 0F = 15
00010000 = 10 = 16
10101010 = AA = 170
11111111 = FF = 255
For easy converting open up the built in Windows calculator and switch it to scientific mode.
Choose the base (Hex, Dec, or Bin), type the number, then switch to another base.
When the numbers are written an extra character is added so you can tell which base is being used.
Binary is typically prefixed with a %, like %00001111. Hex is prefixed with a $ like $2A. Some
other conventions are postfixing binary with a b like 00001111b and postfixing hex with an h like
2Ah.
The NES has a 16 bit address bus (more on that later), so it can access 2^16 bytes of memory. 16
binary digits turns into 4 hex digits, so typical NES addresses look like $8000, $FFFF, and $4017.
Core Programming Concepts

All programming languages have three basic concepts. They are instructions, variables, and control
flow. If any of those three are missing it is no longer a true programming language. For example
HTML has no control flow so it is not a programming language.
Instructions
An instruction is the smallest command that the processor runs. Instructions are run one at a time,
one after another. In the NES processor there are only 56 instructions. Typically around 10 of those
will be used constantly, and at least 10 will be completely ignored. Some examples of these would
be addition, loading a number, or comparing a variable to zero.
Variables
A variable is a place that stores data that can be modified. An example of this would be the vertical
position of Mario on the screen. It can be changed any time during the game. Variables in source
code all have names you set, so it would be something like MarioHorizPosition.
Control Flow
Normally your instructions run in sequential order. Sometimes you will want to run a different
section of code depending on a variable. This would be a control flow statement which changes the
normal flow of your program. An example would be if Mario is falling, jump to the code that
checks if he hit the ground yet.
Nerdy Nights week 2: NES architecture overview
This week: general overview of the NES architecture with the major components covered. All
general purpose computers are arranged the same way with a place to store code (ROM), a place to
store variables (RAM), and a processor to run code (CPU). The NES also adds another processor to
generate the graphics (PPU) and a section of the CPU to generate audio (APU). Everything here is
very general and will have more details than you want in the next few weeks.
NES System Architecture
KB - Memory size is listed in KiloBytes or KB. 1KB = 1024 bytes. Everything is powers of 2, so
2^10 = 1024 is used instead of 1000. If the capitalization is different, the meaning can change. Kb
is Kilobits. Divide Kb by 8 to get KB, because 1 byte = 8 bits.
ROM - Read Only Memory, holds data that cannot be changed. This is where the game code or
graphics is stored on the cart.
RAM - Random Access Memory, holds data that can be read and written. When power is removed,
the chip is erased. A battery can be used to keep power and data valid.
PRG - Program memory, the code for the game
CHR - Character memory, the data for graphics
CPU - Central Processing Unit, the main processor chip
PPU - Picture Processing Unit, the graphics chip
APU - Audio Processing Unit, the sound chip inside the CPU
System Overview
The NES includes a custom 6502 based CPU with the APU and controller handling inside one chip,
and a PPU that displays graphics in another chip. Your code instructions run on the CPU and sends
out commands to the APU and PPU. The NOAC (NES On A Chip) clones like the Yobo and NEX
put all of these parts onto one chip.
There is only 2KB of RAM connected to the CPU for storing variables, and 2KB of RAM
connected to the PPU for holding two TV screens of background graphics. Some carts add extra
CPU RAM, called Work RAM or WRAM. If a cart needs to store saved games, this WRAM will
have a battery attached to make sure it isn't erased. A few carts add extra PPU RAM to hold four
screens of background graphics at once. This is not common. The rest of this tutorial will not use
WRAM or four screen RAM.
Each cart includes at least three chips. One holds the program code (PRG), another holds the
character graphics (CHR), and the last is the lockout. The graphics chip can be RAM instead of
ROM, which means the game code would copy graphics from the PRG ROM chip to the CHR
RAM. PRG is always a ROM chip.
Lockout Chip
Inside the NES and the cart are also two lockout chips. The lockout chip controls resetting the
console. First the NES lockout sends out a stream ID, 0-15. The cart lockout records this number.
Then both lockout chips run a complex equation using that number and send the results to each
other. Both chips know what the other is supposed to send so they both know when something is
wrong. If that happens the system enters the continuous reseting loop. This is the screen flashing
you see with a dirty cart.
When you cut pin 4 of the NES lockout chip, you are making it think it is inside the cart. It sits there
waiting for the ID from the NES which never happens, so the system is never reset. If you were to
completely remove the NES lockout chip the system would not work because it controls the reset
button.
Most lockout defeaters used by the unlicensed game companies used large voltage spikes sent from
the cart to the NES lockout. When timed right those would crash the NES lockout, preventing it
from resetting the system. Nintendo slowly added protection against those on the NES board. Next
time you open your NES, check the board for the revision number. Right in the middle it will say
NES-CPU- then a number. That number is the revision. If you have 05 it is an early one. 07 and 09
added some lockout protection. 11 was the last version with the most lockout protection. Almost all
unlicensed carts that use lockout defeaters will not work on a NES-CPU-11 system.
CPU Overview
The NES CPU is a modified 6502, an 8 bit data processor similar to the Apple 2, Atari 2600, C64,
and many other systems. By the time the Famicom was created it was underpowered for a computer
but great for a game system.
The CPU has a 16 bit address bus which can access up to 64KB of memory. 2^16 = 65536, or
64KB. Included in that memory space is the 2KB of CPU RAM, ports to access
PPU/APU/controllers, WRAM (if on the cart), and 32KB for PRG ROM. The 16 bit addresses are
written in hex, so they become 4 digits starting with a $ symbol. For example the internal RAM is at
$0000-0800. $0800 = 2048 or 2KB. 32KB quickly became too small for games, which is why
memory mappers were used. Those mappers can swap in different banks of PRG code or CHR
graphics. Mappers like the MMC3 allowed up to 512KB of PRG, and 256KB of CHR. There is no
limit to the memory size if you create a new mapper chip, but 128KB PRG and 64KB CHR was the
most common size.
PPU Overview
The NES PPU is a custom chip that does all the graphics display. It includes internal RAM for
sprites and the color palette. There is RAM on the NES board that holds the background, and all
actual graphics are fetched from the cart CHR memory.
Your program does not run on the PPU, the PPU always goes through the same display order. You
only set some options like colors and scrolling. The PPU processes one TV scanline at a time. First
the sprites are fetched from the cart CHR memory. If there are more than 8 sprites on the scanline
the rest are ignored. This is why some games like Super Dodge Ball will blink when there is lots
happening on screen. After the sprites the background is fetched from CHR memory. When all the
scanlines are done there is a period when no graphics are sent out. This is called VBlank and is the
only time graphics updates can be done. PAL has a longer VBlank time (when the TV cathode ray
gun is going back to the top of the screen) which allows more time for graphics updates. Some PAL
games and demos do not run on NTSC systems because of this difference in VBlank time. Both the
NTSC and PAL systems have a resolution of 256x240 pixels, but the top and bottom 8 rows are
typically cut off by the NTSC TV resulting in 256x224. TV variations will cut off an additional 0-8
rows, so you should allow for a border before drawing important information.
NTSC runs at 60Hz and PAL runs at 50Hz. Running an NTSC game on a PAL system will be
slower because of this timing difference. Sounds will also be slower.
Graphics System Overview
Tiles
All graphics are made up of 8x8 pixel tiles. Large characters like Mario are made from multiple 8x8
tiles. All the backgrounds are also made from these tiles. The tile system means less memory is
needed (was expensive at the time) but also means that things like bitmap pictures and 3d graphics
aren't really possible. To see all the tiles in a game, download Tile Molester and open up your .NES
file. Scroll down until you see graphics that don't look like static. You can see that small tiles are
arranged by the game to make large images.
Sprites
The PPU has enough memory for 64 sprites, or things that move around on screen like Mario. Only
8 sprites per scanline are allowed, any more than that will be ignored. This is where the flickering
comes from in some games when there are too many objects on screen.
Background
This is the landscape graphics, which scrolls all at once. The sprites can either be displayed in front
or behind the background. The screen is big enough for 32x30 background tiles, and there is enough
internal RAM to hold 2 screens. When games scroll the background graphics are updated off screen
before they are scrolled on screen.
Pattern Tables
These are where the actual tile data is stored. It is either ROM or RAM on the cart. Each pattern
table holds 256 tiles. One table is used for backgrounds, and the other for sprites. All graphics
currently on screen must be in these tables.
Attribute Tables
These tables set the color information in 2x2 tile sections. This means that a 16x16 pixel area can
only have 4 different colors selected from the palette.
Palettes
These two areas hold the color information, one for the background and one for sprites. Each palette
has 16 colors.
To display a tile on screen, the pixel color index is taken from the Pattern Table and the Attribute
Table. That index is then looked up in the Palette to get the actual color.
To see all the graphics, download the FCEUXD SP emulator. Open up your .NES game and choose
PPU Viewer from the Tools menu. This will show you all the active background tiles, all the active
sprite tiles, and the color palettes. Then choose Name Table Viewer from the Tools menu. This will
show you the backgrounds as they will appear on screen. If you choose a game that scrolls like
SMB you can see the off screen background sections being updated.
Nerdy Nights week 3: 6502 ASM, first app
This Week: starts getting into more details about the 6502 and intro to assembly language. The
lessons for asm usage and NES specifics will be done in sections together. There are many other
6502 websites and good books which may help you learn better.
6502 Assembly
Bit - The smallest unit in computers. It is either a 1 (on) or a 0 (off), like a light switch.
Byte - 8 bits together form one byte, a number from 0 to 255. Two bytes put together is 16 bits,
forming a number from 0 to 65535. Bits in the byte are numbered starting from the right at 0.
Instruction - one command a processor executes. Instructions are run sequentially.
Code Layout
In assembly language there are 5 main parts. Some parts must be in a specific horizontal position
for the assembler to use them correctly.
Directives
Directives are commands you send to the assembler to do things like locating code in memory. They
start with a . and are indented. Some people use tabs, or 4 spaces, and I use 2 spaces. This sample
directive tells the assembler to put the code starting at memory location $8000, which is inside the
game ROM area:
.org $8000
Labels
The label is aligned to the far left and has a : at the end. The label is just
something you use to organize your code and make it easier to read. The
assembler translates the label into an address. Sample label:
.org $8000
MyFunction:
When the assembler runs, it will do a find/replace to set MyFunction to $8000. The if you have any
code that uses MyFunction like:
STA MyFunction
It will find/replace to:

STA $8000
Opcodes
The opcode is the instruction that the processor will run, and is indented like
the directives. In this sample, JMP is the opcode that tells the processor to
jump to the MyFunction label:
.org $8000
MyFunction:
JMP MyFunction
Operands
The operands are additional information for the opcode. Opcodes have between one and three
operands. In this example the #$FF is the operand:
.org $8000
MyFunction:
LDA #$FF
JMP MyFunction
Comments
Comments are to help you understand in English what the code is doing. When you
write code and come back later, the comments will save you. You do not need a
comment on every line, but should have enough to explain what is happening.
Comments start with a ; and are completely ignored by the assembler. They can be
put anywhere horizontally, but are usually spaced beyond the long lines.
.org $8000
MyFunction: ; loads FF into accumulator
LDA #$FF
JMP MyFunction
This code would just continually run the loop, loading the hex value $FF into the accumulator each
time.
6502 Processor Overview

The 6502 is an 8 bit processor with a 16 bit address bus. It can access 64KB of memory without
bank switching. In the NES this memory space is split up into RAM, PPU/Audio/Controller access,
and game ROM.
$0000-0800 - Internal RAM, 2KB chip in the NES

$2000-2007 - PPU access ports
$4000-4017 - Audio and controller access ports
$6000-7FFF - Optional WRAM inside the game cart
$8000-FFFF - Game cart ROM
Any of the game cart sections can be bank switched to get access to more memory, but memory
mappers will not be included in this tutorial.
6502 Assembly Overview

The assembly language for 6502 starts with a 3 character code for the instruction "opcode". There
are 56 instructions, 10 of which you will use frequently. Many instructions will have a value after
the opcode, which you can write in decimal or hex. If that value starts with a # then it means use the
actual number. If the value doesn't have then # then it means use the value at that address. So LDA
#$05 means load the value 5, LDA $0005 means load the value that is stored at address $0005.
6502 Registers
A register is a place inside the processor that holds a value. The 6502 has three 8 bit registers and a
status register that you will be using. All your data processing uses these registers. There are
additional registers that are not covered in this tutorial.
Accumulator
The Accumulator (A) is the main 8 bit register for loading, storing, comparing, and doing math on
data. Some of the most frequent operations are:
LDA #$FF ;load the hex value $FF (decimal 256) into A
STA $0000 ;store the accumulator into memory location $0000, internal RAM
Index Register X
The Index Register X (X) is another 8 bit register, usually used for counting or memory access. In
loops you will use this register to keep track of how many times the loop has gone, while using A to
process data. Some frequent operations are:
LDX $0000 ;load the value at memory location $0000 into X
INX ;increment X X = X + 1
Index Register Y
The Index Register Y (Y) works almost the same as X. Some instructions (not covered here) only
work with X and not Y. Some operations are:
STY $00BA ;store Y into memory location $00BA

TYA ;transfer Y into Accumulator
Status Register
The Status Register holds flags with information about the last instruction. For example when doing
a subtract you can check if the result was a zero.
6502 Instruction Set

These are just the most common and basic instructions. Most have a few different options which
will be used later. There are also a few more complicated instructions to be covered later.
Common Load/Store opcodes
LDA #$0A ; LoaD the value 0A into the accumulator A

; the number part of the opcode can be a value or an address
; if the value is zero, the zero flag will be set.
LDX $0000 ; LoaD the value at address $0000 into the index register X
LDY #$FF ; LoaD the value $FF into the index register Y
STA $2000 ; STore the value from accumulator A into the address $2000
; the number part must be an address
STX $4016 ; STore value in X into $4016
STY $0101 ; STore Y into $0101
TAX ; Transfer the value from A into X
; if the value is zero, the zero flag will be set
TAY ; Transfer A into Y
TXA ; Transfer X into A
TYA ; Transfer Y into A
Common Math opcodes
ADC #$01 ; ADd with Carry

; A = A + $01 + carry
; if the result is zero, the zero flag will be set
SBC #$80 ; SuBtract with Carry
; A = A - $80 - (1 - carry)
CLC ; CLear Carry flag in status register
; usually this should be done before ADC
SEC ; SEt Carry flag in status register
; usually this should be done before SBC
INC $0100 ; INCrement value at address $0100
DEC $0001 ; DECrement $0001
INY ; INcrement Y register
INX ; INcrement X register
DEY ; DEcrement Y
DEX ; DEcrement X
ASL A ; Arithmetic Shift Left
; shift all bits one position to the left
; this is a multiply by 2
LSR $6000 ; Logical Shift Right
; shift all bits one position to the right
; this is a divide by 2
Common Comparison opcodes

CMP #$01 ; CoMPare A to the value $01
; this actually does a subtract, but does not keep the result
; instead you check the status register to check for equal,
; less than, or greater than
CPX $0050 ; ComPare X to the value at address $0050
CPY #$FF ; ComPare Y to the value $FF
Common Control Flow opcodes

JMP $8000 ; JuMP to $8000, continue running code there
BEQ $FF00 ; Branch if EQual, contnue running code there
; first you would do a CMP, which clears or sets the zero flag
; then the BEQ will check the zero flag
; if zero is set (values were equal) the code jumps to $FF00 and runs
there
; if zero is clear (values not equal) there is no jump, runs next
instruction
BNE $FF00 ; Branch if Not Equal - opposite above, jump is made when zero flag
is clear
NES Code Structure
Getting Started
This section has a lot of information because it will get everything set up to run your first NES
program. Much of the code can be copy/pasted then ignored for now. The main goal is to just get
NESASM to output something useful.
iNES Header
The 16 byte iNES header gives the emulator all the information about the game including mapper,
graphics mirroring, and PRG/CHR sizes. You can include all this inside your asm file at the very
beginning.
.inesprg 1 ; 1x 16KB bank of PRG code

.ineschr 1 ; 1x 8KB bank of CHR data
.inesmap 0 ; mapper 0 = NROM, no bank swapping
.inesmir 1 ; background mirroring (ignore for now)
Banking
NESASM arranges everything in 8KB code and 8KB graphics banks. To fill the 16KB
PRG space 2 banks are needed. Like most things in computing, the numbering
starts at 0. For each bank you have to tell the assembler where in memory it
will start.
.bank 0
.org $C000
;some code here
.bank 1
.org $E000
; more code here
.bank 2
.org $0000
; graphics here
Adding Binary Files Additional data files are frequently used for graphics data
or level data. The incbin directive can be used to include that data in
your .NES file. This data will not be used yet, but is needed to make the .NES
file size match the iNES header.
.bank 2
.org $0000
.incbin "mario.chr" ;includes 8KB graphics file from SMB1
Vectors
There are three times when the NES processor will interrupt your code and jump to a new location.
These vectors, held in PRG ROM tell the processor where to go when that happens. Only the first
two will be used in this tutorial.
NMI Vector - this happens once per video frame, when enabled. The PPU tells the processor it is
starting the VBlank time and is available for graphics updates.
RESET Vector - this happens every time the NES starts up, or the reset button is pressed.
IRQ Vector - this is triggered from some mapper chips or audio interrupts and will not be covered.
These three must always appear in your assembly file the right order. The .dw directive is used to
define a Data Word (1 word = 2 bytes):
.bank 1
.org $FFFA ;first of the three vectors starts here
.dw NMI ;when an NMI happens (once per frame if enabled) the
;processor will jump to the label NMI:
.dw RESET ;when the processor first turns on or is reset, it will jump
;to the label RESET:
.dw 0 ;external interrupt IRQ is not used in this tutorial
Reset Code
The reset vector was set to the label RESET, so when the processor starts up it will start from
RESET: Using the .org directive that code is set to a space in game ROM. A couple modes are set
right at the beginning. We are not using IRQs, so they are turned off. The NES 6502 processor does
not have a decimal mode, so that is also turned off. This section does NOT include everything
needed to run code on the real NES, but will work with the FCEUXD SP emulator. More reset code
will be added later.
.bank 0
.org $C000
RESET:
SEI ; disable IRQs
CLD ; disable decimal mode
Completing The Program

Your first program will be very exciting, displaying an entire screen of one color! To do this the first
PPU settings need to be written. This is done to memory address $2001. The 76543210 is the bit
number, from 7 to 0. Those 8 bits form the byte you will write to $2001.
PPUMASK ($2001)
76543210
||||||||
|||||||+- Grayscale (0: normal color; 1: AND all palette entries
||||||| with 0x30, effectively producing a monochrome display;
||||||| note that colour emphasis STILL works when this is on!)
||||||+-- Disable background clipping in leftmost 8 pixels of screen
|||||+--- Disable sprite clipping in leftmost 8 pixels of screen
||||+---- Enable background rendering
|||+----- Enable sprite rendering
||+------ Intensify reds (and darken other colors)
|+------- Intensify greens (and darken other colors)
+-------- Intensify blues (and darken other colors)
So if you want to enable the sprites, you set bit 3 to 1. For this program bits 7, 6, 5 will be used to
set the screen color:
LDA %10000000 ;intensify blues
STA $2001
Forever:
JMP Forever ;infinite loop
Putting It All Together
Download and unzip the background.zip sample files. All the code above is in the background.asm
file. Make sure that file, mario.chr, and background.bat is in the same folder as NESASM3, then
double click on background.bat. That will run NESASM3 and should produce background.nes. Run
that NES file in FCEUXD SP to see your background color! Edit background.asm to change the
intensity bits 7-5 to make the background red or green.
You can start the Debug... from the Tools menu in FCEUXD SP to watch your code run. Hit the
Step Into button, choose Reset from the NES menu, then keep hitting Step Into to run one
instruction at a time. On the left is the memory address, next is the hex opcode that the 6502 is
actually running. This will be between one and three bytes. After that is the code you wrote, with
the comments taken out and labels translated to addresses. The top line is the instruction that is
going to run next. So far there isn't much code, but the debugger will be very helpful later.
Nerdy Nights week 4: Color Palettes, Sprites, second app
This Week: now that you can make and run a program, time to put something on screen!
Palettes
Before putting any graphics on screen, you first need to set the color palette. There are two separate
palettes, each 16 bytes. One palette is used for the background, and the other for sprites. The byte in
the palette corresponds to one of the 64 base colors the NES can display. $0D is a bad color and
should not be used. These colors are not exact and will look different on emulators and TVs.
The palettes start at PPU address $3F00 and $3F10. To set this address, PPU address port $2006 is
used. This port must be written twice, once for the high byte then for the low byte:
LDA $2002 ; read PPU status to reset the high/low latch to high
LDA #$3F
STA $2006 ; write the high byte of $3F10 address
LDA #$10
STA $2006 ; write the low byte of $3F10 address
That code tells the PPU to set its address to $3F10. Now the PPU data port at $2007 is ready to
accept data. The first write will go to the address you set ($3F10), then the PPU will automatically
increment the address ($3F11, $3F12, $3F13) after each read or write. You can keep writing data
and it will keep incrementing. This sets the first 4 colors in the palette:
LDA #$32 ;code for light blueish
STA $2007 ;write to PPU $3F10
LDA #$14 ;code for pinkish
LDA #$2A ;code for greenish
LDA #$16 ;code for redish
You would continue to do writes to fill out the rest of the palette. Fortunately there is a smaller way
to write all that code. First you can use the .db directive to store data bytes:
PaletteData:
.db $0F,$31,$32,$33,$0F,$35,$36,$37,$0F,$39,$3A,$3B,$0F,$3D,$3E,
$0F ;background palette data
.db $0F,$1C,$15,$14,$0F,$02,$38,$3C,$0F,$1C,$15,$14,$0F,$02,$38,$3C ;sprite
palette data
Then a loop is used to copy those bytes to the palette in the PPU. The X register is used as an index
into the palette, and used to count how many times the loop has repeated. You want to copy both
palettes at once which is 32 bytes, so the loop starts at 0 and counts up to 32.
LDX #$00 ; start out at 0
LoadPalettesLoop:
LDA PaletteData, x ; load data from address (PaletteData + the value in
x)
; 1st time through loop it will load PaletteData+0
; 2nd time through loop it will load PaletteData+1
; 3rd time through loop it will load PaletteData+2
; etc
STA $2007 ; write to PPU
INX ; X = X + 1
CPX #$20 ; Compare X to hex $20, decimal 32
BNE LoadPalettesLoop ; Branch to LoadPalettesLoop if compare was Not Equal
to zero
; if compare was equal to 32, keep going down
Once that code finishes, the full color palette is ready. One byte or the whole thing can be changed
while your program is running.
Sprites
Anything that moves separately from the background will be made of sprites. A sprite is just an 8x8
pixel tile that the PPU renders anywhere on the screen. Generally objects are made from multiple
sprites next to each other. Examples would be Mario and any of the enemies like Goombas and
Bowser. The PPU has enough internal memory for 64 sprites. This memory is separate from all
other video memory and cannot be expanded.
Sprite DMA
The fastest and easiest way to transfer your sprites to the sprite memory is using DMA (direct
memory access). This just means a block of RAM is copied from CPU memory to the PPU sprite
memory. The on board RAM space from $0200-02FF is usually used for this purpose. To start the
transfer, two bytes need to be written to the PPU ports:
LDA #$00
STA $2003 ; set the low byte (00) of the RAM address
LDA #$02
STA $4014 ; set the high byte (02) of the RAM address, start the transfer
Once the second write is done the DMA transfer will start automatically. All data for the 64 sprites
will be copied. Like all graphics updates, this needs to be done at the beginning of the VBlank
period, so it will go in the NMI section of your code.
Sprite Data
Each sprite needs 4 bytes of data for its position and tile information in this order:
1 - Y Position - vertical position of the sprite on screen. $00 is the top of the screen. Anything above
$EF is off the bottom of the screen.
2 - Tile Number - this is the tile number (0 to 256) for the graphic to be taken from a Pattern Table.
3 - Attributes - this byte holds color and displaying information:
76543210
||| ||
||| ++- Color Palette of sprite. Choose which set of 4 from the 16 colors
to use
|||
||+------ Priority (0: in front of background; 1: behind background)
|+------- Flip sprite horizontally
+-------- Flip sprite vertically
4 - X Position - horizontal position on the screen. $00 is the left side, anything above $F9 is off
screen.
Those 4 bytes repeat 64 times (one set per sprite) to fill the 256 bytes of sprite memory. If you want
to edit sprite 0, you change bytes $0200-0203. Sprite 1 is $0204-0207, sprite 2 is $0208-020B, etc
Turning NMI/Sprites On
The PPU port $2001 is used again to enable sprites. Setting bit 4 to 1 will make them appear. NMI
also needs to be turned on, so the Sprite DMA will run and the sprites will be copied every frame.
This is done with the PPU port $2000. The Pattern Table 0 is also selected to choose sprites from.
Background will come from Pattern Table 1 when that is added later.
PPUCTRL ($2000)
76543210
| ||||||
| ||||++- Base nametable address
| |||| (0 = $2000; 1 = $2400; 2 = $2800; 3 = $2C00)
| |||+--- VRAM address increment per CPU read/write of PPUDATA
| ||| (0: increment by 1, going across; 1: increment by 32, going down)
| ||+---- Sprite pattern table address for 8x8 sprites (0: $0000; 1: $1000)
| |+----- Background pattern table address (0: $0000; 1: $1000)
| +------ Sprite size (0: 8x8; 1: 8x16)
|
+-------- Generate an NMI at the start of the
vertical blanking interval vblank (0: off; 1: on)
And the new code to set up the sprite data:

LDA #$80
STA $0200 ;put sprite 0 in center ($80) of screen vertically
STA $0203 ;put sprite 0 in center ($80) of screen horizontally
LDA #$00
STA $0201 ;tile number = 0
STA $0202 ;color palette = 0, no flipping
LDA #%10000000 ; enable NMI, sprites from Pattern Table 0
STA $2000
LDA #%00010000 ; no intensify (black background), enable sprites
STA $2001

Download and unzip the sprites.zip sample files. All the code above is in the sprites.asm file. Make
sure sprites.asm, mario.chr, and sprites.bat are all in the same folder as NESASM3, then double
click sprites.bat. That will run NESASM3 and should produce the sprites.nes file. Run that NES file
in FCEUXD SP to see your sprite! Tile number 0 is the back of Mario's head and hat, can you see
it? Edit sprites.asm to change the sprite position (0 to 255), or to change the color palette for the
sprite (0 to 3). You can choose the PPU viewer in FCEUXD SP to see both Pattern Tables, and both
Palettes.
multiple sprites, reading controllers, more instructions
This Week: One sprite is boring, so now we add many more! Also move that sprite around using
the controller.
Multiple Sprites
Last time there was only 1 sprite loaded so we just used a few LDA/STA pairs to load the sprite
data. This time we will have 4 sprites on screen. Doing that many load/stores just takes too much
writing and code space. Instead a loop will be used to load the data, like was used to load the palette
before. First the data bytes are set up using the .db directive:
sprites:
;vert tile attr horiz
.db $80, $32, $00, $80 ;sprite 0
.db $80, $33, $00, $88 ;sprite 1
.db $88, $34, $00, $80 ;sprite 2
.db $88, $35, $00, $88 ;sprite 3
There are 4 bytes per sprite, each on one line. The bytes are in the correct order and easily changed.
This is only the starting data, when the program is running the copy in RAM can be changed to
move the sprite around.
Next you need the loop to copy the data into RAM. This loop also works the same way as the
palette loading, with the X register as the loop counter.
LoadSprites:
LDX#$00 ; start at 0
LoadSpritesLoop:
LDA sprites, x ; load data from address (sprites + x)
STA $0200, x ; store into RAM address ($0200 + x)
INX ; X = X + 1
CPX #$10 ; Compare X to hex $10, decimal 16
BNE LoadSpritesLoop ; Branch to LoadSpritesLoop if compare was Not Equal to zero
; if compare was equal to 16, continue down
If you wanted to add more sprites, you would add lines into the sprite .db section then increase the
CPX compare value. That will run the loop more times, copying more bytes.
Once the sprites have been loaded into RAM, you can modify the data there.
Controller Ports
The controllers are accessed through memory port addresses $4016 and $4017. First you have to
write the value $01 then the value $00 to port $4016. This tells the controllers to latch the current
button positions. Then you read from $4016 for first player or $4017 for second player. The buttons
are sent one at a time, in bit 0. If bit 0 is 0, the button is not pressed. If bit 0 is 1, the button is
pressed.
Button status for each controller is returned in the following order: A, B, Select, Start, Up, Down,
Left, Right.
LDA #$01
STA $4016
LDA #$00
STA $4016 ; tell both the controllers to latch buttons
LDA $4016 ; player 1 - A

LDA $4016 ; player 1 - B
LDA $4016 ; player 1 - Select
LDA $4016 ; player 1 - Start
LDA $4016 ; player 1 - Up
LDA $4016 ; player 1 - Down
LDA $4016 ; player 1 - Left
LDA $4016 ; player 1 - Right

LDA $4017 ; player 2 - Start
LDA $4017 ; player 2 - Up
LDA $4017 ; player 2 - Down
LDA $4017 ; player 2 - Left
LDA $4017 ; player 2 - Right
AND Instruction
Button information is only sent in bit 0, so we want to erase all the other bits. This can be done
with the AND instruction. Each of the 8 bits is ANDed with the bits from another value. If the bit
from both the first AND second value is 1, then the result is 1. Otherwise the result is 0.
0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1
For a full random 8 bit value:
01011011
AND 10101101
--------------
00001001
We only want bit 0, so that bit is set and the others are cleared:
01011011 controller data

AND 00000001 AND value
--------------
00000001 only bit 0 is used, everything else erased
So to erase all the other bits when reading controllers, the AND should come after each read from
$4016 or $4017:
AND #%00000001
AND #%00000001
AND #%00000001
BEQ instruction
The BNE instruction was used earlier in loops to Branch when Not Equal to a compared value. Here
BEQ will be used without the compare instruction to Branch when EQual to zero. When a button is
not pressed, the value will be zero, so the branch is taken. That skips over all the instructions that do
something when the button is pressed:
ReadA:
AND #%00000001 ; erase everything but bit 0
BEQ ReadADone ; branch to ReadADone if button is NOT pressed (0)
; add instructions here to do something when button IS pressed (1)
ReadADone: ; handling this button is done
CLC/ADC instructions
For this demo we will use the player 1 controller to move the Mario sprite around. To do that we
need to be able to add to values. The ADC instruction stands for Add with Carry.
Before adding, you have to make sure the carry is cleared, using CLC. This sample will load the
sprite position into A, clear the carry, add one to the value, then store back into the sprite position:
LDA $0203 ; load sprite X (horizontal) position
CLC ; make sure the carry flag is clear
ADC #$01 ; A = A + 1
STA $0203 ; save sprite X (horizontal) position
SEC/SBC instructions
To move the sprite the other direction, a subtract is needed. SBC is Subtract with Carry. This time
the carry has to be set before doing the subtract:
LDA $0203 ; load sprite position
SEC ; make sure carry flag is set
SBC #$01 ; A = A - 1
STA $0203 ; save sprite position

Download and unzip the controller.zip sample files. All the code above is in the controller.asm file.
Make sure that file, mario.chr, and controller.bat is in the same folder as NESASM, then double
click on controller.bat. That will run NESASM and should produce controller.nes. Run that NES file
in FCEUXD SP to see small Mario. Press the A and B buttons on the player 1 controller to move
one sprite of Mario. The movement will be one pixel per frame, or 60 pixels per second on NTSC
machines. If Mario isn't moving, make sure your controls are set up correctly in the Config menu
under Input... If you hold both buttons together, the value will be added then subtracted so no
movement will happen.
Try editing the ADC and SBC values to make him move faster. The screen is only 256 pixels across,
so too fast and he will just jump around randomly! Also try editing the code to move all 4 sprites
together.
Finally try changing the code to use the dpad instead of the A and B buttons. Left/right should
change the X position of the sprites, and up/down should change the Y position of the sprites.
Nerdy Nights week 6: Backgrounds
This Week: Now that you have a basic understanding of the NES tile graphics, we learn how to
display one static non scrolling background.
Backgrounds
There are three components used to generate backgrounds on the NES. First is the background
color palette, used to select the colors that will be used on screen. Next is the nametable that tells
the layout of the graphics. Finally is the attribute table that assigns the colors in the palette to areas
on screen.
Background Palette
Like the sprites there are 16 colors in the background palette. Our previous apps were already
loading a background palette but it was not being used yet. You can use the PPU Viewer in
FCEUXD SP to see the color palettes.
Nametables
Like the sprites, background images are made up from 8x8 pixel tiles. The screen video resolution
is 32x30 tiles, or 256x240 pixels. PAL systems will show this full resolution but NTSC crops the
top 8 and bottom 8 rows of pixels for a final resolution of 256x224. Additionally TV's on either
system can crop another few rows on the top or bottom.
One screen full of background tiles is called a nametable, and the NES has enough internal RAM
connected to the PPU for two nametables. Only one will be used here. The nametable has one byte
(0-255) for which 8x8 pixel graphics tile to draw on screen. The nametable we will use starts at
PPU address $2000 and takes up 960 bytes (32x30). You can use the Nametable viewer in
FCEUXD SP to see all the nametables.
Attribute Tables
The attribute tables may be the most difficult thing to understand, and sets many of the graphics
limitations. Each nametable has an attribute table that sets which colors in the palette will be used
in sections of the screen. The attribute table is stored in the same internal RAM as the nametable,
and we will use the one that starts at PPU address $23C0 ($2000+960).
First the screen is divided into a 32x32 pixel grid, or 4x4 tiles. Each byte in the attribute table sets
the color group (0-3) in the background palette that will be used in that area. That 4x4 tile area is
divided again into 4 2x2 tile grids. Two bits of the attribute table byte are assigned to each 2x2
area. That is the size of one block in SMB. This limitation means that only 4 colors (one color
group) can be used in any 16x16 pixel background section. A green SMB pipe section cannot use
the color red because it already uses 4 colors.
When looking at a sample SMB screen, first the 4x4 tile grid is added and the palette is shown at
the bottom:
You can see there are 8 grid squares horizontally, so there will be 8 attribute bytes horizontally.
Then each one of those grid squares is split up into 2x2 tile sections to generate the attribute byte:
No 16x16 area can use more than 4 colors, so the question mark and the block cannot use the greens
from the palette.
Uploading the data
To set the background graphics your data has to be defined in your ROM using the .db directive,
then copied to the PPU RAM. Some graphics tools will generate this data but here it will just be
done manually. To keep it shorter only a few rows of graphics will be created. The same CHR file
from SMB will be used here too. First the nametable data is defined, with each graphics row split
into two 16 byte sections to keep lines shorter:
nametable:
.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;row 1
.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;all sky ($24 = sky)
.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;row 2
.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;all sky
.db $24,$24,$24,$24,$45,$45,$24,$24,$45,$45,$45,$45,$45,$45,$24,$24 ;;row 3
.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$53,$54,$24,$24 ;;some brick tops
.db $24,$24,$24,$24,$47,$47,$24,$24,$47,$47,$47,$47,$47,$47,$24,$24 ;;row 4
.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$55,$56,$24,$24 ;;brick bottoms
Then the attribute table data is defined. Each byte covers 4x4 tiles, so only 8 bytes are needed here.
Binary is used so editing the 2 bits per 2x2 tile area is easier:
attribute:
.db %00000000, %00010000, %0010000, %00010000, %00000000, %00000000, %00000000, %00110000
And finally the same color palette as SMB is used:
palette:
.db $22,$29,$1A,$0F, $22,$36,$17,$0F, $22,$30,$21,$0F, $22,$27,$17,$0F
Just like our previous palette loading, a loop is used to copy a specific number of bytes from a
memory location to the PPU. First the PPU address is set to the beginning of the nametable at
$2000. Then our 128 bytes of background data are copied.
Next the PPU address is set to the beginning of the attribute table at $23C0 and 8 bytes are copied.
LoadBackground:
LDA $2002 ; read PPU status to reset the high/low latch
LDA #$20
STA $2006 ; write the high byte of $2000 address
LDA #$00
STA $2006 ; write the low byte of $2000 address
LDX#$00 ; start out at 0

LoadBackgroundLoop:
LDA background, x ; load data from address (background + the value in x)
INX ; X = X + 1
CPX #$80 ; Compare X to hex $80, decimal 128 - copying 128 bytes
BNE LoadBackgroundLoop ; Branch to LoadBackgroundLoop if compare was Not Equal to zero
; if compare was equal to 128, keep going down

LoadAttribute:
LDA #$23
STA $2006 ; write the high byte of $23C0 address
LDA #$C0
STA $2006 ; write the low byte of $23C0 address
LDX#$00 ; start out at 0

LoadAttributeLoop:
LDA attribute, x ; load data from address (attribute + the value in x)
INX ; X = X + 1
CPX #$08 ; Compare X to hex $08, decimal 8 - copying 8 bytes
BNE LoadAttributeLoop
The final changes are to tell the PPU to use the Pattern Table 0 graphics for sprites, and Pattern
Table 1 for background:
LDA #%10010000 ;enable NMI, sprites from Pattern 0, background from Pattern 1
STA $2000
Enable the background rendering:
LDA #%00011110 ; enable sprites, enable background
STA $2001
And to tell the PPU that we are not doing any scrolling at the end of NMI:
LDA #$00
STA $2005
STA $2005
Download and unzip the background2.zip sample files. All the code above is in the background.asm
file. Make sure that file, mario.chr, and background.bat is in the same folder as NESASM, then
double click on background.bat. That will run NESASM and should produce background.nes. Run
that NES file in FCEUXD SP to see the background. Set it to PAL Emulation so you get to see the
whole screen.
Any background areas that you did not write to will still be using tile 0, which happens to be the
number 0 in the SMB graphics. Try adding more nametable and attribute data to the .db sections,
then changing the loops so they copy more bytes to the PPU RAM. You can also try changing the
starting PPU address of the nametable and attribute table writes to move the rows further down.
subroutines, game layout, starting Pong
This Week: Most of this lesson is about how to organize and structure your game. Subroutines and
game states help arrange the code for easier reading and reuse of code.
Variables
As covered in week 1, variables are data stored in RAM that you can change any time. The sprite
data in RAM is all variables. You will need more variables for keeping track of things like the score
in the game. To do that you first need to tell NESASM where in RAM to put the variable. This is
done using the .rsset and .rs directives. First .rsset is used to set the starting address of the variable.
Then .rs is used to reserve space. Usually just 1 byte is reserved, but you can have as much as you
want. Each time you do a .rs the address gets incremented so you don't need to do .rsset again.
.rsset $0000 ;put variables starting at 0
score1 .rs 1 ;put score for player 1 at $0000
score2 .rs 1 ;put score for player 2 at $0001
buttons1 .rs 1 ;put controller data for player 1 at $0002
buttons2 .rs 1 ;put controller data for player 2 at $0003
Once you set the address for the variable, you do not need to know the address anymore. You can
just reference it using the variable name you created. You can insert more variables above the
current ones and the assembler will automatically recalculate the addresses.
Constants
Constants are numbers that you do not change. They are just used to make your code easier to read.
In Pong an example of a constant would be the position of the outer walls. You will need to
compare the ball position to the walls to make the ball bounce, but the walls do not change so they
are good constants. Doing a compare to LEFTWALL is easier to read and understand than a
comparison to $F6.
To declare constants you use the = sign:
RIGHTWALL = $02 ; when ball reaches one of these, do something
TOPWALL = $20
BOTTOMWALL = $D8
LEFTWALL = $F6
The assembler will then do a find/replace when building your code.

Subroutines
As your program gets larger, you will want subroutines for organization and to reuse code. Instead
of progressing linearly down your code, a subroutine is a block of code located somewhere else that
you jump to, then return from. The subroutine can be called at any time, and used as many times as
you want. Here is what some code looks like without subroutines:
RESET:
SEI ; disable IRQs
vblankwait1: ; First wait for vblank to make sure PPU is ready

BIT $2002
BPL vblankwait1
clrmem:
LDA #$FE
STA $0200, x
INX
BNE clrmem

vblankwait2: ; Second wait for vblank, PPU is ready after this
BIT $2002
BPL vblankwait2
Notice that the vblankwait is done twice, so it is a good choice to turn into a subroutine. First the
vblankwait code is moved outside the normal linear flow:
vblankwait: ; wait for vblank
BIT $2002
BPL vblankwait
RESET:
SEI ; disable IRQs
clrmem:
LDA #$FE
STA $0200, x
INX
BNE clrmem
Then that code needs to be called, so the JSR (Jump to SubRoutine) instruction is where the
vblankwait code used to be:
RESET:
SEI ; disable IRQs
JSR vblankwait ;;jump to vblank wait #1
clrmem:
LDA #$FE
STA $0200, x
INX
BNE clrmem
JSR vblankwait ;; jump to vblank wait again
And then when the subroutine has finished, it needs to return back to the spot it was called from.
This is done with the RTS (ReTurn from Subroutine) instruction. The RTS will jump back to the
next instruction after the JSR:
vblankwait: ; wait for vblank <--------

BIT $2002 \
BPL vblankwait |
----- RTS |
/ |
| RESET: |
| SEI ; disable IRQs |
| CLD ; disable decimal mode |
| |
| JSR vblankwait ;;jump to vblank wait #1 --/
|
\--> clrmem:
LDA #$FE
STA $0200, x
INX
BNE clrmem
JSR vblankwait ;; jump to vblank wait again, returns here
Better Controller Reading

Now that you can set up subroutines, you can do much better controller reading. Previously the
controller was read as it was processed. With multiple game states, that would mean many copies of
the same controller reading code. This is replaced with one controller reading subroutine that saves
the button data into a variable. That variable can then be checked in many places without having to
read the whole controller again.
ReadController:
LDA #$01
STA $4016
LDA #$00
STA $4016
LDX#$08
ReadControllerLoop:
LDA $4016
LSR A ; bit0 -> Carry
ROL buttons ; bit0 <- Carry
DEX
BNE ReadControllerLoop
RTS
This code uses two new instructions. The first is LSR (Logical Shift Right). This takes each bit in A
and shifts them over 1 position to the right. Bit 7 is filled with a 0, and bit 0 is shifted into the Carry
flag.
bit number 7 6 5 4 3 2 1 0 carry
original data 1 0 0 1 1 0 1 1 0
\ \ \ \ \ \ \ \
\ \ \ \ \ \ \ \
shifted data 0 1 0 0 1 1 0 1 1
Each bit position is a power of 2, so LSR is the same thing as divide by 2.
The next new instruction is ROL (ROtate Left) which is the opposite of LSR. Each bit is shifted to
the left by one position. The Carry flag is put into bit 0. This is the same as a multiply by 2.
These instructions are used together in a clever way for controller reading. When each button is
read, the button data is in bit 0. Doing the LSR puts the button data into Carry. Then the ROL shifts
the previous button data over and puts Carry back to bit 0. The following diagram shows the values
of Accumulator and buttons data at each step of reading the controller:
Accumulator buttons data
bit: 7 6 5 4 3 2 1 0 Carry 7 6 5 4 3 2 1 0 Carry
read button A 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0
LSR A 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 A
ROL buttons 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 A 0

read button B 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 A 0
LSR A 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 A B
ROL buttons 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A B 0

read button SEL 0 0 0 0 0 0 0 SEL 0 0 0 0 0 0 0 0 A 0
LSR A 0 0 0 0 0 0 0 0 SEL 0 0 0 0 0 0 0 A SEL
ROL buttons 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A B SEL 0
read button STA 0 0 0 0 0 0 0 STA 0 0 0 0 0 0 0 0 A 0

LSR A 0 0 0 0 0 0 0 0 STA 0 0 0 0 0 0 0 A STA
ROL buttons 0 0 0 0 0 0 0 0 0 0 0 0 0 A B SEL STA 0
The loop continues for a total of 8 times to read all buttons.
When it is done there is one button in each bit:
bit: 7 6 5 4 3 2 1 0
button: A B select start up down left right
If the bit is 1, that button is pressed.
Game Layout
The Pong game engine will use the typical simple NES game layout. First all the initialization is
done. This includes clearing out RAM, setting up the PPU, and loading in the title screen graphics.
Then it enters an infinite loop, waiting for the NMI to happen. When the NMI hits the PPU is ready
to accept all graphics updates. There is a short time to do these so code like sprite DMA is done
first. When all graphics are done the actual game engine starts. The controllers are read, then game
processing is done. The sprite position is updated in RAM, but does not get updated until the next
NMI. Once the game engine has finished it goes back to the infinite loop.
Init Code -> Infinite Loop -> NMI -> Graphics Updates -> Read Buttons -> Game Engine --\
^ |
\------------------------------------------------------------------/
Game State
The use of a "game state" variable is a common way to arrange code. The game state just specifies
what code should be run in the game engine each frame. If the game is in the title screen state, then
none of the ball movement code needs to be run. A flow chart can be created that includes what
each state should do, and the next state that should be set when it is done. For Pong there are just 3
basic states.
->Title State /--> Playing State /--> Game Over State
/ wait for start button --/ move ball / wait for start button -\
| move paddles | \
| check for collisions / |
| check for score = 15 -/ |
\ /
\-----------------------------------------------------------------------------------/
The next step is to add much more detail to each state to figure out exactly what is needed. These
layouts are done before any significant coding starts. Some of the game engine like the second
player and the score will be added later. Without the score there is no way to get to the Game Over
State yet.
Title State:
if start button pressed
turn screen off
load game screen
set paddle/ball position
go to Playing State
turn screen on
Playing State:
move ball
if ball moving right
add ball speed x to ball position x
if ball x > right wall
bounce, ball now moving left
if ball moving left

subtract ball speed x from ball position x
if ball x < left wall
bounce, ball now moving right
if ball moving up
subtract ball speed y from ball position y
if ball y < top wall
bounce, ball now moving down
if ball moving down

add ball speed y to ball position y
if ball y > bottom wall
bounce, ball now moving up
if up button pressed
if paddle top > top wall
move paddle top and bottom up
if down button pressed

if paddle bottom < bottom wall
move paddle top and bottom down

if ball x < paddle1x
if ball y > paddle y top
if ball y < paddle y bottom
bounce, ball now moving left
Game Over State:
if start button pressed
turn screen off
load title screen
go to Title State
turn screen on


Download and unzip the pong1.zip sample files. The playing game state and ball movement code is
in the pong1.asm file. Make sure that file, mario.chr, and pong1.bat is in the same folder as
NESASM3, then double click on pong1.bat. That will run NESASM3 and should produce
pong1.nes. Run that NES file in FCEUXD SP to see the ball moving!
Other code segments have been set up but not yet completed. See how many of those you can
program yourself. The main parts missing are the paddle movements and paddle/ball collisions. You
can also add the intro state and the intro screen, and the playing screen using the background
information from the previous week.
16 bit math, pointers, nested loops
This Week: The NES is an 8 bit machine, but sometimes you need more! Learn to handle 16+ bit
numbers, and use them for bigger loops.
16 Bit Math
Doing 16 bit addition and subtraction is fairly simple because of the carry flag that we had
previously been clearing. First the normal add is done using the clc/adc pair. This add is for the
lower 8 bits of the 16 bit number. For the upper 8 bits the adc instruction is used again, but without
the clc. You want to keep the carry from the first add, in case it overflowed. To only add in the carry
the second adc value is just 0.
Here are some examples in decimal. One digit column is added at a time. The carry (1) is added to
the next column as needed.
0 3
+ 0 4
0 7 (no carry needed, top digit = 0)
0 4
+ 0 8
1 2 (carry only, top digit = 1)
2 2
+ 1 9
4 1 (carry plus 2 plus 1, top digit = 4)
And the code to do it on the NES, adding 1 to a 16 bit number:

LDA lowbyte ; load low 8 bits of 16 bit value
CLC ; clear carry
ADC #$01 ; add 1
STA lowbyte ; done with low bits, save back
LDA highbyte ; load upper 8 bits
ADC #$00 ; add 0 and carry from previous add
STA highbyte ; save back
The same process of adding 0 without clearing the carry can be continued to do 24 bit, 32 bit, or
higher numbers. It is also the same process to do 16 bit subtraction:
LDA lowbyte ; load low 8 bits of 16 bit value
SEC ; set carry
SBC #$01 ; subtract 1
STA lowbyte ; done with low bits, save back
LDA highbyte ; load upper 8 bits
SBC #$00 ; subtract 0 and carry from previous sub
STA highbyte ; save back
Pointers and Indirect Indexed Mode
Previously when loading background tiles the x register was used as an 8 bit offset. Now that we
can handle 16 bit numbers a different addressing mode can be used. The 16 bit address is saved into
two 8 bit variables, which are then used as a "pointer" which points to the background data we
want. The LDA instruction then uses the "Indirect Indexed" addressing mode. This takes the 16 bit
variable inside the brackets and uses it as an address. For the address to be correct, the low byte
must be first and the high byte must come immediately after. Then the value in the Y register is
added to the address. This forms the final address to load from. Both variables must also be in the
first 256 bytes of RAM, called "Zero Page", and the X register cannot be used with this addressing
mode.
.rsset $0000 ; put pointers in zero page

pointerLo .rs 1 ; pointer variables are declared in RAM
pointerHi .rs 1 ; low byte first, high byte immediately after
LDA #$D0
STA pointerHi
LDA #$12
STA pointerLo ; pointer now says $D012
LDY #$00 ; no offset from Y

LDA [pointerLo], y ; load data from the address pointed to by the 16 bit
pointer variable plus the value in the Y register
That last line is the same as

LDA $D012, y
Because we kept Y = 0, that is the same as

LDA $D012
Copy Loops
Now using your 16 bit math the pointer address can be incremented. Instead of being limited to 256
background tiles like when using the x offset, the whole background can be copied in one loop.
First the address of the background data is put into the pointer variable. The high and low bytes of
the address are each copied individually. Then the number of tiles to copy is put into the loop
counter, which will count down to 0. Each time through the loop one byte will be copied, the 16 bit
pointer address will be incremented, and the 16 bit loop counter will be decremented. The Y offset
is always kept at 0, because the pointer always points to the correct byte. When the loop counter
reaches 0 everything is done.
LDA #LOW(background)
STA pointerLo ; put the low byte of the address of background into
pointer
LDA #HIGH(background)
STA pointerHi ; put the high byte of the address into pointer
LDA #$00
sta counterLo ; put the loop counter into 16 bit variable
LDA #$04
sta counterHi ; count = $0400 = 1KB, the whole screen at once including
attributes
LDY #$00 ; put y to 0 and don't change it

LoadBackgroundLoop:
LDA [pointerLo], y
STA $2007 ; copy one background byte
LDA pointerLo
CLC
ADC #$01
STA pointerLo
LDA pointerHi
ADC #$00
STA pointerHi ; increment the pointer to the next byte
LDA counterLo
SEC
SBC #$01
STA counterLo
LDA counterHi
SBC #$00
STA counterHi ; decrement the loop counter
LDA counterLo
CMP #$00
BNE LoadBackgroundLoop
LDA counterHi
CMP #$00
BNE LoadBackgroundLoop ; if the loop counter isn't 0000, keep copying

That is a lot of code to copy just one byte!

Nested Loops
To avoid using so much code, we can use both the X and Y registers as loop counters. By putting
one loop inside another loop we create a "nested loop". First the inside loop counts all the way up.
Then the outside loop counts up once, and the inside loop counts all the way again. Normally using
only X or Y would only give a maximum of 256 times through a loop like we have previously done.
With nested loops using both X and Y the maximum is the inside counter multiplied by the outside
counter, or 256*256 = 65536.
LDX #$00
LDY #$00
OutsideLoop:
InsideLoop:
;
; this section runs 256 x 256 times
;
INY ; inside loop counter

CPY #$00
BNE InsideLoop ; run the inside loop 256 times before continuing down
INX
CPX #$00
BNE OutsideLoop ; run the outside loop 256 times before continuing down
First the Inside Loop runs and Y will count from 0 to 256. When that finishes X will count 0 to 1,
and branch back to the beginning of the loops. Then the Inside Loop runs again, Y 0 -> 256. X now
goes 1 -> 2 and the process continues. Everything ends when both X and Y have each counted to
256.
When we are using nested loops to copy entire backgrounds we want 256 x 4 = 1KB. The Y code
from above can be unchanged, but the X code is changed to CPX #$04.
Because we are changing the Y register our previous pointer copying code also needs to be
modified. Instead of incrementing the pointer every time, the incrementing Y register is doing the
same thing. The low byte of the pointer will be kept at 0. This means your background data needs to
be aligned to where the low byte of the address is $00. However the high byte of the pointer still
needs to change. By always making the inside loop count 256 times, that will end at the same time
that the high byte needs to change. This time 16 bit math isn't needed because only the high byte is
incremented.
No loop counter is used because X and Y are used instead. If you cannot align your data so the low
byte of the address is $00, you will have to use the CopyLoop above.
LDA #$00
STA pointerLo ; put the low byte of the address of background into
pointer
LDA #HIGH(background)
STA pointerHi ; put the high byte of the address into pointer
LDX #$00 ; start at pointer + 0

LDY #$00
OutsideLoop:
InsideLoop:
LDA [pointerLo], y ; copy one background byte from address in pointer plus Y
STA $2007 ; this runs 256 * 4 times
INY ; inside loop counter

CPY #$00
BNE InsideLoop ; run the inside loop 256 times before continuing down
INC pointerHi ; low byte went 0 to 256, so high byte needs to be changed
now
INX
CPX #$04
BNE OutsideLoop ; run the outside loop 256 times before continuing down

Download and unzip the background3.zip sample files. All the code is in the background.asm file.
Make sure that file, mario.chr, and background.bat is in the same folder as NESASM, then double
click on background.bat. That will run NESASM and should produce background3.nes. Run that
NES file in FCEUXD SP to see the full background.
The new nested loop is used to copy a whole background to the screen instead of only 128 bytes.
The background is aligned using the .org directive so the low address byte is $00. The attributes are
also placed directly after the background data so it is are copied at the same time.
Your task is to separate out the code that sets the pointer variables from the code that copies the
loop. That way you can have multiple backgrounds that use different pointer loading code, but the
same copy code.
If you are using a different assembler, the Indirect Indexed mode may use () instead of []. The
LOW() and HIGH() syntax may also be different.
Nerdy Nights week 9: Numbers, Bin to Dec
This Week: NES uses binary and hex, but your gamers want to read in decimal? Here are two
solutions for displaying scores and other numbers in a readable way.
BCD Mode
The 6502 processor has a mode called BCD, or Binary Coded Decimal, where the adc/sbc
instructions properly handle decimal numbers instead of binary numbers. The NES is not a full
6502 processor and does not include this mode. Be careful when you are searching for code to not
copy any that uses that mode, or you will get incorrect results. If the code is doing a SED
instruction, it is enabling the decimal mode and you should not use it. Instead you get to do all the
decimal handling yourself!
Storing Digits
The first method uses more code, but may be easier to understand. Say your score is a 5 digit
number. You will make 5 variables, one for each digit. Those variables will only count from 0 to 9
so you need to write code to handle addition and subtraction. Super Mario uses this method. It's
lowest digit is always 0, so that isn't actually stored in a variable. Instead it is just a permanent part
of the background.
We will start with just incrementing a 3 digit number to see how its done:
IncOnes:
LDA onesDigit ; load the lowest digit of the number
CLC
ADC #$01 ; add one
STA onesDigit
CMP #$0A ; check if it overflowed, now equals 10
BNE IncDone ; if there was no overflow, all done
IncTens:
LDA #$00
STA onesDigit ; wrap digit to 0
LDA tensDigit ; load the next digit
CLC
ADC #$01 ; add one, the carry from previous digit
STA tensDigit
CMP #$0A ; check if it overflowed, now equals 10
BNE IncDone ; if there was no overflow, all done
IncHundreds:
LDA #$00
STA tensDigit ; wrap digit to 0
LDA hundredsDigit ; load the next digit
CLC
ADC #$01 ; add one, the carry from previous digit
STA hundredsDigit
IncDone:
When the subroutine starts, the ones digit is incremented. Then it is checked if it equals $0A which
is decimal 10. That number doesn't fit in just one digit, so the ones digit is set to 0 and the tens digit
is incremented. The tens digit is then checked in the same way, and the chain continues for as many
digits as you want.
The same process is used for decrementing, except you check for underflow (digit=$FF) and wrap
the digit to $09.
Adding two numbers is the same idea, except other than checking if each digit equals $0A you need
to check if the digit is $0A or above. So instead of BEQ the opcode will be BCC.
AddOnes:
LDA onesDigit ; load the lowest digit of the number

CLC
ADC onesAdd ; add new number, no carry
STA onesDigit
CMP #$0A ; check if digit went above 9. If accumulator >= $0A, carry
is set
BCC AddTens ; if carry is clear, all done with ones digit
; carry was set, so we need to handle wrapping
LDA onesDigit
SEC
SBC #$0A ; subtract off what doesnt fit in 1 digit
STA onesDigit ; then store the rest
INC tensDigit ; increment the tens digit
AddTens:
LDA tensDigit ; load the next digit
CLC
ADC tensAdd ; add new number
STA tensDigit
CMP #$0A ; check if digit went above 9
BCC AddHundreds ; no carry, digit done
LDA tensDigit
SEC
SBC #$0A ; subtract off what doesnt fit in 1 digit
STA tensDigit ; then store the rest
INC hundredsDigit ; increment the hundreds digit
AddHundreds:
LDA hundredsDigit ; load the next digit
CLC
ADC hundredsAdd ; add new number
STA hundredsDigit
AddDone:
When that code is all done, the ones/tens/hundreds digits will hold the new value. With both code
samples there is no check at the end of the hundreds digit. That means when the full number is 999
and you add one more, the result will be wrong! In your code you can either wrap around all the
digits to 0, or set all the digits to 999 again for a maximum value. Of course if your players are
hitting the max they likely want more digits!
Binary to Decimal Conversion
The second method of handling number displays uses less code, but could use much more CPU
time. The idea is to keep you numbers in plain binary form (8 or 16 bit variables) for the math, then
convert them to decimal for displaying only. An 8 bit binary value will give you 3 decimal digits,
and a 16 bit binary will give 5 decimal digits.
This first example is coded to be understandable, not fast or small. Each step compares the binary
value to a significant decimal value (100 and then 10). If the binary is larger, that value is subtracted
from the binary and the final decimal digit is incremented. So for a text example:
initial binary: 124
initial decimal: 000
1: compare to 100
2: 124 greater than 100, so subtract 100 and increment the decimal hundreds
digit
3: repeat hundreds again
current binary: 024

current decimal: 100
1: compare to 100
2: 024 less than 100, so all done with hundreds digit
current binary: 024

1: compare to 10
2: 024 greater than 10, so subtract 10 and increment the decimal tens digit
3 repeat tens again
current binary: 014

1: compare to 10
2: 014 greater than 10, so subtract 10 and increment the decimal tens digit
3 repeat tens again
current binary: 004

etc for ones digit
You can see this will transfer the binary to decimal one digit at a time.
For numbers with large digits (like 249) this will take longer than numbers with small digits (like
112). Here is the code:
HundredsLoop:
LDA binary
CMP #100 ; compare binary to 100
BCC TensLoop ; if binary < 100, all done with hundreds digit
LDA binary
SEC
SBC #100
STA binary ; subtract 100, store whats left
INC hundredsDigit ; increment the digital result
JMP HundredsLoop ; run the hundreds loop again
TensLoop:
LDA binary
CMP #10 ; compare binary to 10
BCC OnesLoop ; if binary < 10, all done with hundreds digit
LDA binary
SEC
SBC #10
STA binary ; subtract 10, store whats left
INC tensDigit ; increment the digital result
JMP TensLoop ; run the tens loop again
OnesLoop:
LDA binary
STA onesDigit ; result is already under 10, can copy directly to result
This code can be expanded to 16 bit numbers, but the compares become harder. Instead a more
complex series of loops and shifts with a table is used. This code does shifting of the binary value
into the carry bit to tell when to add numbers to the final decimal result. I did not write this code, it
came from a post by Tokumaru at http://nesdev.parodius.com/bbs/vi... There are many more
examples of different conversion styles at that forum thread.
Notice there are no branches other than the loop running 16 times (one for each binary input bit), so
the conversion always takes the same number of cycles.
tempBinary - 16 bits input binary value

decimalResult - 5 bytes for the decimal result
BinaryToDecimal:
lda #$00
sta decimalResult+0
sta decimalResult+1
sta decimalResult+2
sta decimalResult+3
sta decimalResult+4
ldx #$10
BitLoop:
asl tempBinary+0
rol tempBinary+1
ldy decimalResult+0
lda BinTable, y
rol a
sta decimalResult+0
ldy decimalResult+1
lda BinTable, y
rol a
sta decimalResult+1
ldy decimalResult+2
lda BinTable, y
rol a
sta decimalResult+2
ldy decimalResult+3
lda BinTable, y
rol a
sta decimalResult+3
rol decimalResult+4
dex
bne BitLoop
rts
BinTable:
.db $00, $01, $02, $03, $04, $80, $81, $82, $83, $84
Displaying Numbers
Once you have your numbers in decimal format you need to display them on the screen. With the
code above all the results have 00000 = $00 $00 $00 $00 $00. If your background tiles for digits
start at tile 0 then that will work fine. However if you are using ASCII you will need to add an
offset to each digit. The ASCII code for the digit 0 is $30, so you just add $30 to each digit before
writing it to the background. If your code uses the first method of compare/wrapping digits, then
you could compare to $3A and wrap to $30 to automatically handle this. You would just need to
make sure you set each digit to $30 instead of $00 when clearing the number to 00000. You have
control over where background tiles are located, so the offset for the digit tiles can be whatever you
choose.

Download and unzip the pong2.zip sample files. The playing game state and ball movement code is
in the pong2.asm file. Make sure that file, mario.chr, and pong2.bat is in the same folder as
NESASM3, then double click on pong1.bat. That will run NESASM3 and should produce
pong2.nes. Run that NES file in FCEUXD SP to see the score! Right now the score just increments
every time the ball bounces off a side wall.
Try making two scoring variables and drawing them both. You can also use the other binary to
decimal converters to add more than 1 to the score each time. In the DrawScore you can also check
the score digits and not draw any leading zeros. Instead replace them with spaces when you are
drawing to the background.
Advanced Nerdy Nights #1: CHR Bank switching
To do the advanced lessons you should have already finished Pong.
This Week: As you complete a full game you may find the NROM memory limits to be too small.
To enable more ROM on carts many forms of "bank switching" were used. This article deals with
just one type of CHR switching, used on CNROM carts. CNROM is easy to use and very cheap to
manufacture. The ReproPak, PowerPak, and PowerPak Lite all support CNROM completely so it is
easy to get your code running on real hardware. If you are using donor carts you can look up games
that use CNROM at BootGod's NES Cart Database.
CHR Bank Switching

Bank switching is exchanging one chunk of ROM for a different chunk, while keeping everything
in same address range. It is not making a copy, so it happens instantly. You can switch between
different banks whenever you want. The size and memory range of the banks depends on the
mapper. For the CNROM mapper used in this article the bank size is 8KB of CHR ROM. The
whole 8KB range of PPU memory $0000-1FFF is switched at once. This means the graphics for all
background tiles and sprite tiles will be swapped. In your game you may have some tiles duplicated
in multiple banks so they do not appear to change on screen. PRG is not bank switched, so it
remains at the NROM limit of 32KB.
Set Mapper Number

The first part of adding bank switching is changing the mapper number your .NES file uses. At the
top of your code has previously been:
.inesmap 0 ; mapper 0 = NROM, no bank swapping
The new line is:

.inesmap 3 ; mapper 3 = CNROM, 8KB CHR ROM bank swapping
This line in the header just tells the emulator to use CNROM to play your game. A list of other
iNES mapper numbers can be seen at the wiki at http://nesdevwiki.org/....
Set CHR Size

The next part is to increase the size of your CHR ROM. Change the .ineschr value from 1 to 2,
showing that there are now two 8KB banks. CNROM can handle 32KB of CHR ROM or four 8KB
banks but this example will only use two.
Add CHR Data

The third part adds the data for the next bank into your game. Just make a new .bank statement
below your current one for CHR, giving it the next sequential number. In your code when you set
which bank to switch to this is the number used. PRG bank numbers are ignored so your original
CHR bank will be #0 and the new one will be #1.
Bank Switching Code

The final part it to write your bank switching code. This subroutine will take a bank number in the A
register and switch the CHR bank to it immediately. The actual switch is done by writing the
desired bank number anywhere in the $8000-FFFF memory range. The cart hardware sees this write
and changes the CHR bank.
... your game code ...

LDA #$01 ;;put new bank to use into the A register
JSRBankswitch ;;jump to bank switching code

... your game code ...
Bankswitch:
STA $8000 ;;new bank to use
RTS
Bus Conflicts
When you start running your code on real hardware there is one catch to worry about. For basic
mappers, the PRG ROM does not care if it receives a read or a write command. It will respond to
both like a read by putting the data on the data bus. This is a problem for bank switching, where the
CPU is also trying to put data on the data bus at the same time. They electrically fit in a "bus
conflict". The CPU could win, giving you the right value. Or the ROM could win, giving you the
wrong value. This is solved by having the ROM and CPU put the same value on the data bus, so
there is no conflict. First a table of bank numbers is made, and the value from that table is written to
do the bank switch.
... code ...
LDA #$01 ;;put new bank to use into A
JSRBankswitch ;;jump to bank switching code

... code ...
Bankswitch:
TAX ;;copy A into X
STA Bankvalues, X ;;new bank to use
RTS
Bankvalues:
.db $00, $01, $02, $03 ;;bank numbers
The X register is used as an index into the Bankvalues table, so the value written by the CPU will
match the value coming from the ROM.
Download and unzip the chrbanks.zip sample files. This set is based on the previous Week 5 code.
Make sure that file, mario0.chr, mario1.chr, and chrbanks.bat is in the same folder as NESASM3,
then double click on chrbanks.bat. That will run NESASM3 and should produce chrbanks.nes. Run
that NES file in FCEUXD SP to see small Mario.
Inside the LatchController subroutine a new section is added to read the Select and Start buttons
from the controller. The Select button switches to CHR bank 0, and the Start button switches to
CHR bank 1. Graphics of CHR bank 1 have been rearranged so Mario will change to a beetle. The
tile numbers are not changed, but the graphics for those tiles are.
Open the PPU Viewer from the Tools menu in FCEUXD SP and try hitting the buttons. You can see
all the graphics changing at once when the active bank switches.
Advanced Nerdy Nights #2:
MMC1 CHR and PRG Bank switching, WRAM + Battery
This Week: The MMC1 is the first advanced mapper made by Nintendo. It is used for many games
including top titles like The Legend of Zelda. The main benefits are mirroring control, up to 256KB
of PRG ROM, 128KB of CHR RAM or ROM, and 8KB of WRAM. The WRAM can be battery
backed for saved games. This tutorial will cover all features of the MMC1 and how to use them.
You should be comfortable with the normal Nerdy Nights series before starting. Another more
simple lesson for bankswitching is Advanced Nerdy Nights #1. If you only need one or two of the
banking features then you may want to consider more simple and cheaper mappers instead such as
UNROM or CNROM.
Carts using the MMC1 will have the S*ROM board code, like SNROM and SGROM. BootGod's
NesCartDB database can be searched for which games use which boards. The ReproPak MMC1
board can also be used to build carts.
Shift Registers
The MMC1 uses a 5 bit shift register to temporarily store the banking bits. Shift registers were
covered in Week 7. When writing to the register data comes in from data bit 0 only. This is similar
to the controller reading where data outputs to data bit 0. Every time a write happens the current bits
are shifted and D0 is inserted. The first bit you write eventually becomes to lowest bank bit. On the
5th write when the shift register is full the 5 bit value gets copied to the banking register. At this
point the bank switch happens immediately without any delays. To load a bank register the LSR
instruction is used for shifting:
LDA banknumber
STA bankreg ; load bank bit 0 to shift register from data bit 0
LSR A ; shift in next data bit to position 0
STA bankreg ; load bank bit 1 from data bit 0
LSR A
STA bankreg ; bank bit 2
LSR A
STA bankreg ; bank bit 3
LSR A
STA bankreg ; bank bit 4, bank register loaded, bank switch happens here
Unlike other simple mappers like UNROM and CNROM, there are no bus conflicts. The ROM is
not enabled while you are writing so you do not have to make the data you are writing match.
Data bit 7 is also connected to the MMC1. When a write happens to any banking register with D7=1
the shift register is reset back to position 0. It will then take another 5 writes to fully load the next
value. All other bits are ignored and D0 is not loaded into the shift register. The PRG bits of the
control register are also reset to their default values as shown in the next section.
Usually you will only reset the MMC1 at the very beginning of your program:
LDA #%10000000
STA $8000
Config Register at $8000-9FFF

To load the config register, do 5 writes to the $8000-9FFF range. The config bits are:
43210
-----
CPRMM
|||||
|||++- Mirroring (0: one-screen, lower bank; 1: one-screen, upper bank;
||| 2: vertical; 3: horizontal)
||+--- PRG swap range (0: switch 16 KB bank at $C000; 1: switch 16 KB bank at
$8000;
|| only used when PRG bank mode bit below is set to
1)
|+---- PRG size (0: switch 32 KB at $8000, ignoring low bit of bank number;
| 1: switch 16 KB at address specified by location bit
above)
+----- CHR size (0: switch 8 KB at a time; 1: switch two separate 4 KB banks)
Mirroring Config
Your program can change the mirroring at any point using these bits. You do not need to wait for
vblank to change them. When using the MMC1 the .inesmir directive bit is ignored. You must set it
through your code. Mirroring set to 0 and 1 are single screen mirroring modes. Only 1KB is used
for all nametables. When scrolling the screen will wrap both vertically and horizontally. Mirroring
set to 2 is the typical vertical mirroring, and 3 is horizontal mirroring.
PRG Bank Size Config

The MMC1 swaps PRG ROM in either 16KB or 32KB chunks. By default this bit is set to 1 for
16KB banks. Clearing it to 0 enables 32KB banks. Notice these are not the same size as the 8KB
NESASM banks so the bank numbers will be different. When using 16KB banks the MMC1 banks
are twice as big, so you must divide your NESASM bank number by 2 when writing it to the bank
register. When using 32KB banks you must divide by 4. 16KB banks is most commonly used, with
the bulk of the code in the fixed bank and data/graphics/music in the swappable banks.
PRG Swap Range Config

When using 16KB banks set above, the PRG address range that gets swapped can be configured. If
32KB banks are used this bit is ignored and the entire $8000-FFFF range is swapped at once.
By default this bit is set to 1, making the $8000-BFFF range swappable while the $C000-FFFF
range is fixed to the last bank Of PRG. This matches the PRG swapping of the UNROM mapper
and is most commonly used. Clearing this bit to 0 changes this so $8000-BFFF is fixed and $C000-
FFFF is swappable.
Changing the range or bank size can be useful for swapping audio samples but you have to be
careful to put IRQ/reset/NMI vectors in all banks that are loaded into the vector area at $FFFA-
FFFF.
CHR Bank Size Config
Like the PRG the CHR bank size can be configured to either 4KB or 8KB banks. With 8KB banks
the whole $0000-1FFF range is one bank. With 4KB banks there are two banks at PPU $0000-0FFF
and $1000-1FFF. This can be used with background in one bank and sprites in another. Then, for
example, all sprites could be swapped and the background could stay.
CHR Bank 0 Register at $A000-BFFF

This is the register for CHR bank 0. To set it do 5 writes to the $A000-BFFF range. When in 4KB
CHR mode it selects a bank for PPU $0000-0FFF. The full 5 bit value is used so there are 32
possible banks. Each bank is 4KB making it 128KB CHR maximum. When in 8KB CHR mode this
register controls the full PPU $0000-1FFF. The bottom bit is ignored so there are 16 possible banks.
Each bank is now 8KB which is still 128KB max.
4KB mode 8KB mode

controls $0000-1FFF
controls $0000-0FFF
bottom bit is ignored
CHR Bank 1 Register at $C000-DFFF

This is the register for CHR bank 1. To set it do 5 writes to the $C000-DFFF range. When in 4KB
CHR mode it selects a bank for PPU $1000-1FFF. When in 8KB CHR mode it is completely
ignored.
4KB mode 8KB mode

controls $1000-1FFF register ignored
PRG Bank Register at $E000-FFFF

This is the register for PRG banking. To set it do 5 writes to the $E000-FFFF range. The bits are:
43210
-----
WPPPP
|||||
|++++- Select a PRG ROM bank (low bit ignored in 32 KB mode)
+----- WRAM chip enable (0: enabled; 1: disabled)
In 16KB PRG mode it selects a 16KB PRG bank for the current swappable address range. Only the
4 lower bits are used for 16 possible PRG banks. That is 256KB maximum. In 32KB PRG mode it
selects a 32KB bank for the $8000-FFFF range. Only bits 3-1 are used for 8 possible banks. Bit 0 is
ignored.
16KB mode 16KB mode

32KB mode
swap=0 swap=1
controls $8000-BFFF (default controls $8000-FFFF
controls $C000-FFFF
setting) low bit ignored
WRAM
Bit 5 of the PRG Bank register also controls WRAM access. Clear this bit to enable WRAM access.
Setting the bit to 1 disables the WRAM. If the WRAM is used for saved games it is usually disabled
when it is not being accessed to prevent unwanted writes from corrupting the saves when the
console is reset.
To use WRAM in your program nothing needs to be changed in the iNES header. The emulator will
assume there is WRAM based on the mapper number. Next you need to enable the WRAM in the
PRG Bank register. With that bit at 0 you can now use the WRAM. It is just plain RAM that you
can read or write at the $6000-7FFF range. Like the console RAM the contents are unknown when
the console is powered on. All RAM is cleared to unknown values when power is removed.
However the RAM is not cleared when just the reset button is pushed. Power is still going to the
cart so the RAM is still valid. This can be useful for telling if the console was just turned on or was
only reset.
To add a battery to the WRAM, set the .inesmir directive to 2 in the iNES header. Now read and
write to the WRAM normally. When power is removed the RAM contents will remain. The
emulator will create an 8KB .sav file for the WRAM, however some emulators will not do this
unless you have done some WRAM access.
Banking Routines
Keeping track of all the register addresses and bits can get confusing, so a few simple routines are
used instead. In general subroutines should be used for even simple bank switching so the mapper
can be changed more easily later. Only the most commonly used Config and PRG Bank register
routines are shown here, it is your job to write the others.
ConfigWrite: ; make sure this is in a fixed PRG bank so the RTS doesn't
get swapped away
LDA #$80
STA $8000 ; reset the shift register
LDA #%00001110 ; 8KB CHR, 16KB PRG, $8000-BFFF swappable, vertical
mirroring
STA $8000 ; first data bit
LSR A ; shift to next bit
STA $8000 ; second data bit
LSR A ; etc
STA $8000
LSR A
STA $8000
LSR A
STA $8000 ; config bits written here, takes effect immediately
RTS
PRGBankWrite: ; make sure this is in a fixed bank so it doesnt get swapped
away
LDA bankNumber ; get bank number into A
STA $E000 ; first data bit
LSR A ; shift to next bit
STA $E000
LSR A
STA $E000
LSR A
STA $E000
LSR A
STA $E000 ; bank switch happens immediately here
RTS
Download and unzip the cyoammc1.zip sample files. The CYOA code has been changed to the
MMC1 mapper. Running the cyoammc1.bat file will create cyoammc1.nes, which will run in an
emulator. This sample uses 8KB of CHR RAM so no CHR banking has been included. The changes
from UNROM PRG mapping are minimal. Some of the variables have been moved to the WRAM
area to show how to use it. An example of using WRAM to detect reset is also included in
resetCount. Try changing the battery info in the iNES header to see how it makes resetCount change
between power and resets.
Advanced Nerdy Nights #3: Horizontal background scrolling
This Week: Time to learn how to do horizontal background scrolling, like Super Mario Bros.
Hopefully it is explained with the most easy to understand code. There is no compression, no
buffers, and no metatiles, so only the ideas of scrolling are presented. Once you understand the
scrolling part you should look into those other topics to save code/data space and increase
performance if needed.
Nametable Review
Before starting the scrolling you must fully understand how nametables work. One nametable is
32x30 background tiles, which covers exactly one visible screen. Including the attribute table, each
screen needs 1KB of PPU RAM. The NES PPU has the address space for 4 nametables ($2000,
$2400, $2800, $2C00) in a 2x2 grid:
+-----------+-----------+
| | |
| | |
| $2000 | $2400 |
| | |
| | |
+-----------+-----------+
| | |
| | |
| $2800 | $2C00 |
| | |
| | |
+-----------+-----------+
However the NES only has 2KB of PPU RAM inside the console, so there are only two actual
nametables. The other two nametables are copies of those actual ones. Your mirroring settings
determine the layout of the actual nametables and which ones are copies.
Vertical mirroring means the nametables stacked vertically are the same data. 0 ($2000) is a mirror
of 2 ($2800), and 1 ($2400) is a mirror of 3 ($2C00). 0 and 1 are next to each other and have
different data. This is what we want for horizontal scrolling. When you are looking at nametable 0
and scroll to the right, nametable 1 will be in view. Typically your mirroring setting is the opposite
of the scrolling direction. To set the iNES header:
.inesmir 1 ;;VERT mirroring for HORIZ scrolling
Scroll registers
Before scrolling we will fill both nametables 0 ($2000) and 1 ($2400). The same data will be copied
into both, except the attribute table will be different. By setting the second nametable attributes to
another color palette the two screens will have a very visible difference.
FillNametables:
LDA #$20
STA $2006 ; write the high byte of $2000 address (nametable 0)
LDA #$00
STA $2006 ; write the low byte of $2000 address
LDY #$08
LDX #$00 ; fill 256 x 8 bytes = 2KB, both nametables all full
LDA #$7F
FillNametablesLoop:
STA $2007
DEX
BNE FillNametablesLoop
DEY
BNE FillNametablesLoop
FillAttrib0:
LDA #$23
STA $2006 ; write the high byte of $23C0 address (nametable 0
attributes)
LDA #$C0
LDX #$40 ; fill 64 bytes
LDA #$00 ; palette group 0
FillAttrib0Loop:
STA $2007
DEX
BNE FillAttrib0Loop
FillAttrib1:
LDA #$27
STA $2006 ; write the high byte of $27C0 address (nametable 1
attributes)
LDA #$C0
LDX #$40 ; fill 64 bytes
LDA #$FF ; palette group 3
FillAttrib1Loop:
STA $2007
DEX
BNE FillAttrib1Loop
The scroll registers are at $2005. Like some other PPU registers you need to write to it twice. The
first write is the horizontal scroll count, the second write is the vertical scroll count. The scroll sets
which pixel of the nametable for the start of the left side of the screen. Previously we have set the
scroll to 0 so the left side of the screen is aligned with the left edge of the nametable. The scroll
registers are both 8 bit registers, making the scroll range 0 to 255. The screen is 256 pixels wide so
the horizontal scroll register covers one full screen wide.
This sample code just increments the horizontal scroll register ($2005) by 1 on every frame. You
can see when the first nametable scrolls off the screen, the second one comes on screen. The
previously set colors make the split between nametables obvious. As the scroll register wraps from
255 to 0 the first nametable becomes completely visible again. You can also see the sprites are not
affected by the scroll registers. They have their own separate x and y position data.
NMI:
LDA #$00
STA $2003
LDA #$02
STA $4014 ; sprite DMA from $0200
; run other game graphics updating code here
LDA #$00
STA $2006 ; clean up PPU address registers
STA $2006
INC scroll ; add one to our scroll variable each frame
LDA scroll
STA $2005 ; write the horizontal scroll count register
LDA #$00 ; no vertical scrolling
STA $2005
;;This is the PPU clean up section, so rendering the next frame starts
properly.
LDA #%10010000 ; enable NMI, sprites from Pattern Table 0, background from
Pattern Table 1
STA $2000
LDA #%00011110 ; enable sprites, enable background, no clipping on left side
STA $2001
; run normal game engine code here
; reading from controllers, etc
RTI ; return from interrupt
The full code and compiled .NES file is available from the download link at the bottom of this
tutorial. scrolling1.asm includes everything up to this point.
Nametable Register
The problem with just the scroll register is that it isn't big enough. In the previous example the
scroll wrapped from 255 to 0, so the second nametable is never shown on the left side. Both
nametables together is 512 pixels wide but the scroll can only count 256 pixels. The solution is to
switch which nametable is on the left side of the screen at the same time the scroll register wraps to
0.
Vertical mirroring means nametables are arranged horizontally

Scrolling shows nametable 0 and 1 (blue) on the screen (red)
When the scroll register wraps, nametable 0 is displayed again
Swap which nametable is on the left when the wrap happens to display nametable 1
To set the starting nametable, change bit 0 of the PPU control register at $2000. Clearing it to 0 will
put nametables 0 and 2 on the left side of the screen with 1 and 3 to the right. Setting it to 1 will put
1 and 3 on the left, and 0 and 2 on the right.
This sample code has the same scroll incrementing, but swaps the nametables at the same time the
scroll wraps from 255 to 0. Instead of the background jumping it continuously scrolls from one
nametable to the next. When the scroll wraps again the nametables are swapped again and the
scrolling keeps going.
NMI:
INC scroll ; add one to our scroll variable each frame
NTSwapCheck:
LDA scroll ; check if the scroll just wrapped from 255 to 0
BNE NTSwapCheckDone
NTSwap:
LDA nametable ; load current nametable number (0 or 1)
EOR #$01 ; exclusive OR of bit 0 will flip that bit
STA nametable ; so if nametable was 0, now 1
; if nametable was 1, now 0
NTSwapCheckDone:
LDA #$00
STA $2003
LDA #$02
STA $4014 ; sprite DMA from $0200
; run other game graphics updating code here

LDA #$00
STA $2006
LDA scroll
STA $2005
;;This is the PPU clean up section, so rendering the next frame starts
properly.
Pattern Table 1
ORA nametable ; select correct nametable for bit 0
STA $2000

STA $2001
; run normal game engine code here

; reading from controllers, etc
tutorial. scrolling2.asm includes everything up to this point.
Drawing New Columns
For just two screens of graphics the code above is fine. Games like Super Dodgeball use this
method. Both nametables are filled and scrolled between. For games like SMB where the levels are
wider than two screens some new background data will have to be inserted. The solution is to draw
a new vertical column of tiles somewhere off the visible screen, before it is scrolled into the visible
area. As long as the new column is drawn ahead of the visible area, calculated by the current scroll
and nametable, it will appear continuous. The tricky part is figuring out which column to draw, and
where it is to be placed. If we always use the opposite nametable and the same scroll point we will
be drawing the column that is about to come on screen.
>When to Draw
We will draw a new column anytime the scroll register becomes a multiple of 8, meaning the scroll
is aligned to the tiles. Some bit masking and testing can calculate when this happens. First any part
of the scroll not 0 to 7 is thrown away. Then if the result equals 0 the scroll count is a multiple of 8.
LDA scroll
AND #%00000111 ; throw away higher bits
BEQ DrawNewColumn ; see if lower bits = 0
Where to Draw
Now that we know when to draw, we need to calculate the starting PPU address of the new column.
The scroll register counts in pixels, but we want to count in tiles for which column to draw. Each
tile is 8 pixels wide, so we divide the scroll by 8 to get the tile number. That number is the low bits
of the address.
LDA scroll
LSR A
LSR A
LSR A ; shift right 3 times = divide by 8
STA columnLow ; $00 to $1F, screen is 32 tiles wide
The high bits of the address will come from the current nametable. First the low bit is inverted, to
get the off screen nametable number. Then the number is shifted up and added to the base address.
LDA nametable
EOR #$01 ; invert low bit, A = $00 or $01
ASL A ; shift up, A = $00 or $02
ASL A ; $00 or $04
CLC
ADC #$20 ; add high byte of nametable base address ($2000)
STA columnHigh ; now address = $20 or $24 for nametable 0 or 1
Now the scroll count and nametable have been used to make the full column address to start
copying new background data. It will be at the top of the nametable that is off screen. As the scroll
and nametable are changed, that calculation will still give the correct starting address.
How to Draw
Previously when we have been copying data to the background the PPU is set to auto increment the
address by 1. That helps with the copying because a whole row of data can be copied while only
writing the PPU address once. Incrementing by 1 goes to the next horizontal tile. In this case we
want to go to the next vertical tile because we are copying a column instead of a row. We want it to
increment by 32 which will jump down instead of across. There are 32 tiles per row, so adding 32
will always go down to the next row in the same column. The PPU has an increment 32 mode, set
using bit 2 in the PPU control register at $2000. When bit 2 is set to 0 the increment mode is +1.
When bit 2 is set to 1 the increment mode is +32. By setting the increment mode to +32 and
copying 30 bytes of background tiles we can draw one column at a time.
DrawColumn:
LDA #%00000100 ; set to increment +32 mode, don't care about other bits
STA $2000

LDA columnHigh
STA $2006 ; write the high byte of column address
LDA columnLow
STA $2006 ; write the low byte of column address
LDX #$1E ; copy 30 bytes
LDY #$00
DrawColumnLoop:
LDA columnData, y
STA $2007
INY
DEX
BNE DrawColumnLoop
By using the when/where/how we can draw a new column of data off screen before it becomes
visible. The full code and compiled .NES file is available from the download link at the bottom of
this tutorial. scrolling3.asm includes everything up to this point. It will be best to watch in an
emulator where you can see everything that is off screen. First open the scrolling3.nes file in the
FCEUXDSP emulator. Then choose "Name Table Viewer..." from the "Tools" menu. Reset the
emulator and watch the new columns being drawn off the visible screen area.
Drawing Real Background Data
The last example drew new columns, but it wasn't any real data. This example adds another counter
to keep track of how far along into the level a player is. By incrementing this counter every time a
new column is drawn the correct next column is easy to find. The DrawNewColumn function has
been updated to use the counter to load real background data. It can also be used at the beginning of
the game initialization to populate the starting nametable data instead of using the fill loops.
tutorial. scrolling4.asm includes 4 screens (128 columns) of real background ripped from SMB.
Updating the Attributes

The final piece of the scrolling puzzle is the attribute table. Updating it is the same process as the
background, where the attributes are updated while they are off screen. Again the scroll and
nametable registers will be used to calculate the correct attribute bytes to update. Each attribute byte
covers a 4x4 tile area. 4 tiles wide is 32 pixels, so the attributes must be updated anytime the scroll
register is a multiple of 32. The column numbers already calculated could be used instead of the
scroll variables to do the calculations.
LDA scroll
AND #%00011111 ; check for multiple of 32
BEQ NewAttrib ; if low 5 bits = 0, time to write new attribute bytes
Only 8 attribute bytes will need to be changed each time. However they are not sequential, so the
PPU increment +1 or +32 modes will not work. The PPU address needs to be changed for every
attribute byte updated. The starting address is the base attribute table at $20C0. Like the background
address the nametable bit is shifted up and added in. Then the scroll register is divided by 32 to get
the attribute byte offset. All that is calculated together to find the PPU address of the first attribute
byte. After that 8 is added to the address for each of the next bytes.
LDA nametable
EOR #$01 ; invert low bit, A = $00 or $01
ASL A ; shift up, A = $00 or $02
ASL A ; $00 or $04
CLC
ADC #$20 ; add high byte of attribute base address ($20C0)
STA columnHigh ; now address = $20 or $24 for nametable 0 or 1
LDA scroll
LSR A
LSR A
LSR A
LSR A
LSR A
CLC
ADC #$C0
STA columnLow ; attribute base + scroll / 32
tutorial. scrolling5.asm has the same incrementing scroll, but now draws the new column and
attribute bytes. Use the Name Table Viewer again to check out the attributes being updated. You can
see the attribute update change the color of the off screen clouds before that column of tiles is
changed. The same thing is why you see graphical glitches on the sides of SMB3 while it is
scrolling. To use this in your own game you will need to expand columnNumber to a bigger value.
Once you have understood everything here, there are some more advanced concepts to check out:
Meta Tiles - This idea is to store your backgrounds as bigger blocks instead of individual tiles.
Things like the question blocks would be stored as one byte in the ROM and then decoded into the
4 tiles when it is being drawn. Mostly this saves huge amounts of data space and could make
updating attributes easier.
Buffers - A section of RAM can be reserved to act as a buffer for the data to draw to the PPU later.
Outside of vblank where the is more processing time the next graphics updates would be calculated
and stored in a buffer. Then during vblank those buffers can be dumped right to the PPU, saving
time.
Compression - Packing the background data into simple compression formats like RLE can save
even more data space. Combine that with meta tiles and buffers to have a full scrolling engine.

Download and unzip the scrolling.zip sample files. Each of them adds a small step, so go through
them one at a time. Try expanding the background data to add more columns, making the scroll
speed variable, or making the scrolling controllable.
Advanced Nerdy Nights #4: Sprite 0 hit for a status bar
This Week: After scrolling this tutorial should be pretty simple. Sprite 0 has a special PPU flag
associated with it. Here it will be used to do split screen scrolling to enable a static status bar on the
top of the screen.
Sprite 0 Hit Flag

Sprite 0 has a special flag in the PPU status register at bit 6. When a non transparent pixel of sprite
0 overlaps a non transparent pixel of the background, the flag is set. In the SMB example, sprite 0 is
placed at the bottom of the coin icon. That is one part of the status bar that does not move.
In our example we first set the scroll registers to 0 for the static status bar. The nametable is also set
to 0. That makes sure that the background and sprite 0 will overlap in the correct place.
NMI:
; all graphics updating code goes here
LDA #$00
STA $2006
LDA #$00 ; start with no scroll for status bar

STA $2005
STA $2005
Pattern Table 1
STA $2000 ; start with nametable = 0 for status bar
STA $2001
Next we make sure the sprite 0 hit flag is clear, to avoid it being tripped from the previous frame.
The flag is cleared at the end of vblank, so once it equals 0 you know the next frame has started.
WaitNotSprite0:
lda $2002
and #%01000000
bne WaitNotSprite0 ; wait until sprite 0 not hit
Now we wait until the sprite 0 is hit. How long this takes depends on how far down the screen your
sprite 0 is placed.
WaitSprite0:
lda $2002
and #%01000000
beq WaitSprite0 ; wait until sprite 0 is hit
When that loop finishes, the PPU is drawing the first pixels of sprite 0 that overlap pixels on the
background. We add a small wait loop so the rest of the status bar is drawn, and then change the
scroll registers. The rest of the screen down is drawn using those settings.
ldx #$10
WaitScanline:
dex
bne WaitScanline
LDA scroll
STA $2005
Pattern Table 1
ORA nametable ; select correct nametable for bit 0
STA $2000
So the order is:

1 - set scroll to 0 for status bar
2 - wait for sprite 0 hit = 0
3 - wait for sprite 0 hit = 1
4 - delay so scanline finishes drawing
5 - set scroll for level background
The only other change is to make sure your graphics updating code does not draw over the status
bar. The previous DrawNewColumn function handles the graphics updates so it has a few small
differences. The starting address is increased by $80 to skip the first 4 rows of background. Then
the source address is increased by $04 for the same reason.

Download and unzip the sprite0.zip sample files. sprite0.asm is the same as the previous
scrolling5.asm file plus the changes covered here. This is another good one to watch in an emulator.
Nerdy Nights Sound intro:
About the Nerdy Nights Sound series
People are starting to complete the Nerdy Nights tutorial series and we're seeing a lot of new
homebrewers in the forum. This is great! One question that a lot of people ask after finishing up
their Pong game is "how do I do sound?" Well, I hope to teach you .
The plan is to have a series of tutorial that take you from having no sound to having a working
music/sfx engine. We'll start very simple (making a beep) and work our way up. These tutorials
will assume that you have completed the original Nerdy Nights tutorials (I won't reteach binary or
6502).
And I'm going to steal bunnyboy's disclaimer and re-use it as my own:
And finally these will not be without errors to begin with and may not be the absolute best way to
do anything. People develop their own programming styles and this is just mine...
Nerdy Nights Sound: Part 1:
make a music/sfx engine
The APU
Music and sound effects on the NES are generated by the APU (Audio Processing Unit), the sound
chip inside the CPU. The CPU "talks" to the APU through a series of I/O ports, much like it does
with the PPU and joypads.
PPU: $2000-$2007
Joypads: $4016-$4017
APU: $4000-$4015, $4017
Channels
The APU has 5 channels: Square 1, Square 2, Triangle, Noise and DMC. The first four play waves
and are used in just about every game. The DMC channel plays samples (pre-recorded sounds) and
is used less often.
Square
The square channels produce square waveforms. A square wave is named for its shape.
The wave transitions instantaneously from its high point to its low point (where the lines are
vertical). This gives it a hollow sound like a woodwind or an electric guitar.
Triangle
The triangle channel produces triangle waveforms. A triangle wave is also named for its shape.
The sound of a triangle wave is smoother and less harsh than a square wave. On the NES, the
triangle channel is often used for bass lines (in low octaves) or a flute (in high octaves). It can also
be used for drums.
Noise
The noise channel has a random generator, which makes the waves it produces sound like.. noise.
This channel is generally used for percussion and explosion sounds.
DMC
The DMC channel plays samples, which are pre-recorded sounds. It is often used to play voice
recordings ("Blades of Steel") and percussion samples. Samples take up a lot of ROM space, so not
many games make use of the DMC channel.
Enabling Channels
Before you can use the channels to produce sounds, you need to enable them. Channels are toggled
on and off via port $4015:
APUFLAGS ($4015)
76543210
|||||
||||+- Square 1 (0: disable; 1: enable)
|||+-- Square 2
||+--- Triangle
|+---- Noise
+----- DMC
Here are some code examples using $4015 to enable and disable channels:

lda #%00000001
sta $4015 ;enable Square 1 channel, disable others

lda #%00010110
sta $4015 ;enable Square 2, Triangle and DMC channels. Disable
Square 1 and Noise.

lda #$00
sta $4015 ;disable all channels

lda #$0F
sta $4015 ;enable Square 1, Square 2, Triangle and Noise
channels. Disable DMC.
;this is the most common usage.

Try opening up some of your favorite games in FCEUXD SP and set a breakpoint on writes to
$4015. Take a look at what values are getting written there. If you don't know how to do this,
follow these steps:
1. Open FCEUXD SP
2. Load a ROM
3. Open up the Debugger by pressing F1 or going to Tools->Debugger
4. In the top right corner of the debugger, under "BreakPoints", click the "Add..." button
5. Type "4015" in the first box after "Address:"
6. Check the checkbox next to "Write"
7. Set "Memory" to "CPU Mem"
8. Leave "Condition" and "Name" blank and click "OK"
Now FCEUX will pause emulation and snap the debugger anytime your game makes a write
(usually via STA) to $4015. The debugger will tell you the contents of the registers at that moment,
so you can check what value will be written to $4015. Some games will write to $4015 every
frame, and some only do so once at startup. Try resetting the game if your debugger isn't snapping.
What values are being written to $4015? Can you tell what channels your game is using?
Square 1 Channel
Let's make a beep. This week we'll learn how to produce a sound on the Square 1 channel. The
Square channels are everybody's favorites because you can control the volume and tone and
perform sweeps on them. You can produce a lot of interesting effects using the Squares.
Square 1 is controlled via ports $4000-$4003. The first port, $4000, controls the duty cycle (ie,
tone) and volume for the channel. It looks like this:
SQ1_ENV ($4000)
76543210
||||||||
||||++++- Volume
|||+----- Saw Envelope Disable (0: use internal counter for
volume; 1: use Volume for volume)
||+------ Length Counter Disable (0: use Length Counter; 1:
disable Length Counter)
++------- Duty Cycle
For our purposes, we will focus on Volume and Duty Cycle. We will set Saw Envelope Disable and
Length Counter Disable to 1 and then forget about them. If we leave Saw Envelopes on, the
volume of the channel will be controlled by an internal counter. If we turn them off, WE have
control of the volume. If WE have control, we can code our own envelopes (much more versatile).
Same thing with the Length Counter. If we disable it, we have more control over note lengths. If
that didn't make sense, don't worry. It will become clearer later. For now we're just going to disable
and forget about them.
Volume controls the channel's volume. It's 4 bits long so it can have a value from 0-F. A volume of
0 silences the channel. 1 is very quiet and F is loud.
Duty Cycle controls the tone of the Square channel. It's 2 bits long, so there are four possible
values:
00 = a weak, grainy tone. Think of the engine sounds in RC Pro-Am. (12.5% Duty)
01 = a solid mid-strength tone. (25% Duty)
10 = a strong, full tone, like a clarinet or a lead guitar (50% Duty)
11 = sounds a lot like 01 (25% Duty negated)
The best way to know the difference in sound is to listen yourself. I recommend downloading
FamiTracker and playing with the different Duty settings in the Instrument Editor.
For those interested, Duty Cycle actually refers to the percentage of time that the wave is in "up"
position vs. "down" position. Don't sweat it if graphs and waves aren't your thing. Use your ears
instead.
Here's a code snippet that sets the Duty and Volume for the Square 1 channel:
lda #%10111111; Duty 10 (50%), volume F (max!)
sta $4000
$4001 controls sweeps for Square 1. We'll skip them for now.
Setting the Note

$4002 and $4003 control the period of the wave, or in other words what note you hear (A, C#, G,
etc). Periods are 11-bits long. $4002 holds the low 8-bits and $4003 holds the high 3-bits of the
period. We'll get into more detail in a future tutorial, but for now just know that changing the values
written to these ports will change the note that is played.
SQ1_LO ($4002)
76543210
||||||||
++++++++- Low 8-bits of period
SQ1_HI ($4003)
76543210
||||||||
|||||+++- High 3-bits of period
+++++---- Length Counter
The Length Counter, if enabled, controls how long the note is played. We disabled it up in the
$4000 section, so we can forget about it for now.
Here is some code that will produce an eternal beep on the Square 1 channel:
lda #%00000001

sta $4015 ;enable square 1

lda #%10111111 ;Duty 10, Volume F
sta $4000

lda #$C9 ;0C9 is a C# in NTSC mode
sta $4002
lda #$00
sta $4003
Download and unzip the square1.zip sample files. All the code above is in the square1.asm file.
Make sure square1.asm and square1.bat are all in the same folder as NESASM3, then double click
square1.bat. That will run NESASM3 and should produce the square1.nes file. Run that NES file in
FCEUXD SP to listen to your beep! Edit square1.asm to change the Volume (0 to F), or to change
the Duty Cycle for the square wave. Try changing the period to produce different notes.
Nerdy Nights Sound: Part 2: Square 2 and Triangle Basics
This week: We will learn how to makes sounds with the Square 2 and Triangle channels.
Square 2
Last time we produced a beep on the Square 1 channel by making writes to $4000-$4003. Now
we'll learn how to do it with Square 2. This is very easy because Square 1 and Square 2 are almost
identical. We control the Square 2 channel with ports $4004-$4007, and they more or less mirror
Square 1's $4000-4003.
SQ2_ENV ($4004)
76543210
||||||||
||||++++- Volume
|||+----- Saw Envelope Disable (0: use internal counter for
volume; 1: use Volume for volume)
||+------ Length Counter Disable (0: use Length Counter; 1:
disable Length Counter)
++------- Duty Cycle
SQ2_SWEEP ($4005)
Skip this for now. This port, incidentally, is where Square 2 differs from Square 1.
SQ2_LO ($4006)
76543210
||||||||
SQ2_HI ($4007)
76543210
||||||||
To produce a sound, first we enable the channel via $4015:
lda #%00000010 ;enable Square 2

sta $4015

Then we write to the Square 2 ports:
lda #%00111000 ;Duty Cycle 00, Volume 8 (half volume)

sta $4004

lda #$A9 ;$0A9 is an E in NTSC mode
sta $4006

lda #$00
sta $4007

Except for sweeps, the Square 2 channel works just like the Square 1 channel.
Triangle
The Triangle channel produces triangle waveforms which have a smooth sound to them. Think of
the flute-like melody in the Dragon Warrior overland song. That's the Triangle.
Unlike the Square channels, we have no control over the Triangle channel's volume or tone. It
makes only one type of sound and it's either on (playing) or off (silent). We manipulate the Triangle
channel via ports $4008-$400B.
TRI_CTRL ($4008)
76543210
||||||||
|+++++++- Value
+-------- Control Flag (0: use internal counters; 1: disable
internal counters)
The triangle channel has two internal counters that can be used to automatically control note
duration. We are going to disable them so that we can control note length manually. We will set the
Control Flag to 1 and forget about it.
When the internal counters are disabled, Value controls whether the channel is on or off. To silence
the channel, set Value to 0. To turn the channel on (ie, unsilence), set Value to any non-zero value.
Here are some examples:
lda #%10000000 ;silence the Triangle channel

sta $4008

lda #%10000001 ;Triangle channel on
sta $4008

sta $4008

sta $4008

Note that the last three examples are functionally the same. Any non-zero value in Value makes the
Triangle channel play.
Unused Port
$4009 is unused
Setting the Note

$400A and $400B control the period of the wave, or in other words what note you hear (A, C#, G,
etc). Like the Squares, Triangle periods are 11-bits long. $400A holds the low 8-bits and $400B
holds the high 3-bits of the period. We'll learn more about periods next week, but for now just
know that changing the values written to these ports will change the note that is played.
TRI_LO ($400A)
76543210
||||||||
TRI_HI ($400B)
76543210
||||||||
The Length Counter, if enabled, controls how long the note is played. We disabled it up in the
$4008 section, so we can forget about it for now.
Here is some code to play an eternal beep on the Triangle channel:
lda #%00000100 ;enable Triangle channel

sta $4015
lda #%10000001 ;disable counters, non-zero Value turns channel on

sta $4008

lda #$42 ;a period of $042 plays a G# in NTSC mode.
sta $400A

lda #$00
sta $400B
Multiple Beeps
We now know how to use the Square 1, Square 2 and Triangle channels to make sound. It doesn't
take too much extra work to make them all play at the same time. We just have to enable all three
channels via $4015 and then write to the ports. Here's some code that will play a C#m chord (C# E
G#) using the knowledge we have gained up to now:
lda #%00000111 ;enable Sq1, Sq2 and Tri channels

sta $4015

;Square 1
lda #%00111000 ;Duty 00, Volume 8 (half volume)
sta $4000
lda #$C9 ;$0C9 is a C# in NTSC mode
sta $4002 ;low 8 bits of period
lda #$00
sta $4003 ;high 3 bits of period

;Square 2
lda #%01110110 ;Duty 01, Volume 6
sta $4004
lda #$A9 ;$0A9 is an E in NTSC mode
sta $4006
lda #$00
sta $4007

;Triangle
sta $4008
lda #$42 ;$042 is a G# in NTSC mode
sta $400A
lda #$00
sta $400B

Download and unzip the triad.zip sample files. All the code above is in the triad.asm file. Make sure
triad.asm and triad.bat are all in the same folder as NESASM3, then double click triad.bat. That will
run NESASM3 and should produce the triad.nes file. Run that NES file in FCEUXD SP to listen to
your C#m chord! Edit triad.asm to change the Volume and Duty Cycle for the square waves. Try
changing the Periods to produce different notes.
Try to silence the various channels by either disabling them via $4015 or silencing them via
$4000/$4004/$4008. Finally try writing some code that will silence/unsilence the individual
channels based on user input, like so:
A: toggle Square 1 channel on/off B: toggle Square 2 channel on/off
Select: toggle Triangle channel on/off
Nerdy Nights Sound: Part 3: Periods and lookup tables
Last Week: Square 2 and Triangle Basics
This week: We will learn about periods and build a period lookup table that spans 8 octaves.
Periods
In the last two lessons, I've been giving you the values to plug into the 11-bit periods for the Square
and Triangle channels. I haven't been giving you an explanation of what a period is, or where I got
those numbers. So this week we're going to learn about periods.
What is a period?
A period refers to the length of a wave, or rather the time length of the repeating part of a wave.
Notice how it is repeating. It starts high and remains high for 2 time units. Then it goes low and
remains low for 2 time units. Then it repeats. When we say period, we are talking about the
horizontal time length of this repeating wave. In this case, the period is 4 time units. The longer a
period is, the lower the note will sound. Conversely, the shorter a period is, the higher the note will
sound.
On the NES, we write an 11-bit period to the APU ports. The smaller the number, the shorter the
period, the higher the note. Larger numbers = longer periods = lower notes. Look at the following
code snippets that write an 11-bit period to the Square 1 ports:
lda #$C9

sta $4002
lda #$05
sta $4003 ;period $5C9: large number = long period = low note

;----

lda #$09
sta $4002
lda #$00
sta $4003 ;period $009: small number = short period = very high
note
Periods -> Notes

So how do we know which 11-bit period values correspond to which notes? The magic forumla is:
P = C/(F*16) - 1

P = Period
C = CPU speed (in Hz)
F = Frequency of the note (also in Hz).

The value of C differs between NTSC and PAL machines, which is why a game made for NTSC
will sound funny on a PAL NES, and vice-versa.
To find the period values for notes, we will have to look up note frequencies and plug them into the
formula. Or we can cross our fingers and hope somebody has already done the work for us and put
the answers in an easy-to-read table. Lucky for us a cool fellow named Celius has done just that,
for both NTSC and PAL. Here are the charts:
http://www.freewebs.com/the_bott/NotesTableNTSC.txt
http://www.freewebs.com/the_bott/NotesTablePAL.txt
Lookup Tables
It is fairly common practice to store period values in a lookup table. A lookup table is a table of
pre-calculated data stored in ROM. Like an answer sheet. Lookup tables are used to cut down on
complicated, time-consuming calculations. Let's look at a trivial example. Let's say you want a
subroutine that takes a value in A and returns 3^A. If you took the brute-force approach, you might
write something like this:
multiplier .rs 1
; takes a value (0-5) in A and returns 3^A

three_to_the_a:
bne .not_zero
lda #$01 ;3^0 is 1
rts
.not_zero:
tay
lda #$03
.loop:
sta multiplier
dey
beq .done
clc
adc multiplier
adc multiplier
jmp .loop
.done:
rts

It works, but it's not very pretty.
Here is how we would do it with a lookup table:
;lookup table with pre-calculated answers

powers_of_3:
.byte 1, 3, 9, 27, 81, 243

three_to_the_a:
tay
lda powers_of_3, y
rts

Easier to code. Easier to read. And it runs faster too.
NESASM3 Tip#1: Local Labels

You may have noticed in the above example that I put a period in front of some labels: .done, .loop,
.not_zero. NESASM3 treats these as local labels. There are two types of labels: global and local.
A global label exists across the whole program and must be unique. A local label only exists
between two global labels. This means that we can reuse the names of local labels - they only need
to be unique within their scope. Using local labels saves you the trouble of having to create unique
names for common case labels (like looping). I tend to use local labels for all labels that occur
within subroutines. To make a label local, stick a period in front of it.

Note Lookup Table
Let's take Celius's tables and turn them into a note lookup table. Period values are 11 bits so we will
need to define our lookup table using words. Note that .word is the same as .dw. Here is a
note_table for NTSC:
;Note: octaves in music traditionally start from C, not A.

; I've adjusted my octave numbers to reflect this.
note_table:
.wor
d
$07F1, $0780, $0713 ; A1-B1 ($00-$02)
.word $06AD, $064D, $05F3, $059D, $054D, $0500, $04B8, $0475,
$0435, $03F8, $03BF, $0389 ; C2-B2 ($03-$0E)
.word $0356, $0326, $02F9, $02CE, $02A6, $027F, $025C, $023A,
$021A, $01FB, $01DF, $01C4 ; C3-B3 ($0F-$1A)
.word $01AB, $0193, $017C, $0167, $0151, $013F, $012D, $011C,
$010C, $00FD, $00EF, $00E2 ; C4-B4 ($1B-$26)
.word $00D2, $00C9, $00BD, $00B3, $00A9, $009F, $0096, $008E,
$0086, $007E, $0077, $0070 ; C5-B5 ($27-$32)
.word $006A, $0064, $005E, $0059, $0054, $004F, $004B, $0046,
$0042, $003F, $003B, $0038 ; C6-B6 ($33-$3E)
.word $0034, $0031, $002F, $002C, $0029, $0027, $0025, $0023,
$0021, $001F, $001D, $001B ; C7-B7 ($3F-$4A)
.word $001A, $0018, $0017, $0015, $0014, $0013, $0012, $0011,
$0010, $000F, $000E, $000D ; C8-B8 ($4B-$56)
.word $000C, $000C, $000B, $000A, $000A, $0009,
$0008 ; C9-F#9 ($57-$5D)
Notice that at the highest octaves, some notes have the same value (C9 and C#9 for example). This
is due to rounding. We lose precision the higher we go, and a lot of the highest notes will sound out
of tune as a result. So in songs we probably wouldn't use octaves 8 and 9. These high notes could
be utilized for sound effects though, so we'll leave them in.

Once we have a note lookup table, we use the note we want as an index into the table and pull the
period values from it, like this:

lda #$0C ;the 13th entry in the table (A2)
asl a ;multiply by 2 because we are indexing into a
table of words
tay
lda note_table, y ;read the low byte of the period
sta $4002 ;write to SQ1_LO
lda note_table+1, y ;read the high byte of the period
sta $4003 ;write to SQ1_HI
To make it easier to know which index to use for each note, we can create a list of symbols:
;Note: octaves in music traditionally start at C, not A
;Octave 1
A1 = $00 ;"1" means octave 1.
As1 = $01 ;"s" means "sharp"
Bb1 = $01 ;"b" means "flat". A# == Bb
B1 = $02
;Octave 2
C2 = $03
Cs2 = $04
Db2 = $04
D2 = $05
;...
A2 = $0C
As2 = $0D
Bb2 = $0D
B2 = $0E
;Octave 3
C3 = $0F
;... etc
Now we can use our new symbols instead of the actual index values:
lda #A2 ;A2. #A2 will evaluate to #$0C

asl a ;multiply by 2 because we are indexing into a
table of words
tay
lda note_table, y ;read the low byte of the period
sta $4002 ;write to SQ1_LO
lda note_table+1, y ;read the high byte of the period
sta $4003 ;write to SQ1_HI

And if later we want to have a series of notes, symbols are much easier to read and alter:
sound_data:
.byte C3, E3, G3, B3, C4, E4, G4, B4, C5 ; Cmaj7 (CEGB)

sound_data_no_symbols:
.byte $0F, $13, $16, $1A, $1B, $1F, $22, $26, $27 ;same as above,
but hard to read. Cmaj7 (CEGB)

Low Notes On Squares (Sweep Unit)

One last thing needs to be mentioned. It's very important. It has to do with the Square channels'
sweep units. The sweep units can silence the square channels in certain situations (Periods >=
$400, our lowest notes), even when disabled. We'll have to take a quick look at the sweep unit ports
to solve this problem.
SQ1_SWEEP ($4001), SQ2_SWEEP ($4005)
76543210
||||||||
|||||+++- Shift
||||+---- Negate
|+++----- Sweep Unit Period
+-------- Enable (1: enabled; 0: disabled)
I'm not going to go into how it works now, but the unwanted silencing of low notes can be
circumvented by setting the negate flag:
lda #$08 ;set Negate flag on the sweep unit

sta $4001 ;or $4005 for Square 2.

If you really want to know why, check the Sweep Unit section of blargg's NES APU Sound
Hardware Technical Reference.
What about PAL?
For simplicity, these tutorials are going to use NTSC numbers. Once we finish our sound engine I'll
try to whip up a tutorial about adding PAL support.

Download and unzip the periods.zip sample files. Make sure periods.asm, periods.chr, note_table.i
and periods.bat are all in the same folder as NESASM3, then double click periods.bat. That will run
NESASM3 and should produce the periods.nes file. Run that NES file in FCEUXD SP. Use the d-
pad to select and play any note from our note table on the Square 1 channel. Controls are as
follows:
Up - Play selected note

Down - Stop note
Left - Move selection down a note
Right - Move selection up a note
Homework: Edit periods.asm and add support for the Square 2 and Triangle channels. Allow the
user to select between channels and play different notes on all three of them.
Homework #2: Read Disch's document The Frame and NMIs. Pay special attention to the "Take
Full Advantage of NMI" section. We are going to use this style of NMI handler with our sound
engine. In fact, periods.asm already uses it.
Nerdy Nights Sound: Part 4: sound engine skeleton
This Week: Sound Engine Basics. We will setup the framework to get our sound engine running.
Sound Engine
Now that we know how to get notes to play we can start thinking about our sound engine. What do
we want it to be able to do? How will the main program interact with it?
It's good practice to separate the different pieces of your program. The sound engine shouldn't be
messing with main program code and vice-versa. If you mix them, your code becomes harder to
read, the danger of variable conflicts increases and you open yourself up to hard-to-find bugs. If
you keep the different pieces of your program separate, you get the opposite: your code reads well,
you avoid variable conflicts, and bugs are easier to trace. Separation also improves your ability to
reuse code. If your sound engine only accesses its own internal routines and variables, it makes it
that much easier to pull it out from one game and plug it into another.
There has to be some communication between the main program and the sound engine of course.
The main program needs to be able to tell the sound engine to do things like: "Play song 2" or "shut
up". But we don't want the main program sticking its nose in the sound engine's business. We only
want it to issue commands. The sound engine will handle the rest on its own.
To set this up, we will create a small set of subroutines that the main program can use to invoke the
sound engine and give it commands. I'll call these subroutines "entrances". We want as few
entrances into the sound engine as possible. The sound engine itself will have several internal
subroutines it can work with, but the main program will only use the entrances.
Entrances
So what will our entrance subroutines be? We need to think about what the main program would
need to tell the sound engine to do. Here is a list of entrances we might want for our sound engine:
-Initialize sound engine (sound_init)

-Load new song/sfx (sound_load)
-Play a frame of music/sfx (sound_play_frame)
-Disable sound engine (sound_disable)
The names in paranthesis are what I'm going to call the subroutines in code. I prefixed them with
sound_ for readability. You can tell at a glance that they are sound routines. Here is a rundown of
what our commands will do:
sound_init will enable channels and silence them. It will also initialize sound engine variables.
sound_load will take a song/sfx number as input. It will use that song number to index into a table
of pointers to song headers. It will read the appropriate header and set up sound engine variables.
If that didn't make sense, don't worry. We'll be covering this stuff next week.
sound_play_frame will advance the sound engine by one frame. It will run the note timers, read
from the data streams (if necessary), update sound variables and make writes to the APU ports.
This stuff will also be covered in future weeks.
sound_disable will disable channels via $4015 and set a disable flag variable.
We already know enough to knock out two of those, sound_init and sound_disable. Let's write
them now. We'll write skeleton code for the other entrance subroutines as well. A few things to
mention before we do that though:
RAM
A sound engine requires a lot of RAM. A large sound engine might even take up a full page of
RAM. For this tutorial, we'll stick all our sound engine variables on the $300 page of RAM. There
is nothing magic about this number. I chose $300 for convenience. $000 is your zero-page RAM.
$100 is your stack. If you completed the original Nerdy Nights series, $200 will be your Sprite
OAM. So $300 is next in line.
ROM
The sound engine itself won't require a lot of ROM space for code, but if you have a lot of music
your song data might take up a lot of space. For this reason, I'm going to change our header to give
us two 16k PRG-ROM banks, like this:
.inesprg 2 ;2x 16kb PRG code

Now we have twice as much ROM space, just in case we need it. BTW, this is the maximum
amount of ROM we can have without using a mapper.

Noise Channel
I purposely haven't covered the Noise channel yet. We will want to silence it in our init code
though, so I will go ahead and teach that much. Noise channel volume is controlled via port
$400C. It works the same as $4000/$4004 does for the Square channels, except there is no Duty
Cycle control:
NOISE_ENV ($400C)
76543210
||||||
||++++- Volume
|+----- Saw Envelope Disable (0: use internal counter for volume;
1: use Volume for volume)
+------ Length Counter Disable (0: use Length Counter; 1: disable
Length Counter)

Like the Squares, we will silence the Noise channel by setting both disable flags, and setting the
Volume to 0.
Skeleton Sound Engine
Let's write the entrance subroutines to our sound engine. Most of this code should be very familiar
to you if you completed the first three tutorials in this series.
.rsset $0300 ;sound engine variables will be on the $0300 page of
RAM

sound_disable_flag .rs 1 ;a flag variable that keeps track of
whether the sound engine is disabled or not.
;if set, sound_play_frame will return without doing
anything.
.bank 0
.org $8000 ;we have two 16k PRG banks now. We will stick our
sound engine in the first one, which starts at $8000.
sound_init:
lda #$0F
channels

lda #$30
sta $4000 ;set Square 1 volume to 0
sta $4004 ;set Square 2 volume to 0
sta $400C ;set Noise volume to 0
lda #$80
sta $4008 ;silence Triangle

lda #$00
sta sound_disable_flag ;clear disable flag

;later, if we have other variables we want to initialize, we will
do that here.

rts

sound_disable:
lda #$00
sta $4015 ;disable all channels
lda #$01
sta sound_disable_flag ;set disable flag
rts
sound_load:
;nothing here yet
rts

sound_play_frame:
lda sound_disable_flag
bne .done ;if disable flag is set, don't advance a frame
;nothing here yet
.done:
rts
Driving the Sound Engine

We have the framework setup for our sound engine to run. The main program now has subroutines
it can call to issue commands to the sound engine. Most of them don't do anything yet, but we can
still integrate them into the main program. First we will want to make a call to sound_init
somewhere in our reset code:
RESET:
sei
cld
ldx #$FF
txs
inx

;... clear memory, etc

jsr sound_init
;... more reset stuff

Next we need something to drive our sound engine. Music is time-based. In any piece of music,
assuming a constant tempo, each quarter note needs to last exactly as long as every other quarter
note. A whole note has to be exactly as long as four quarter notes. If our sound engine is going to
play music, it needs to be time-based as well. We have a subroutine, sound_play_frame, that will
advance our sound engine a frame at a time. Now we need to ensure it gets called repeatedly at a
regular time interval.
One way to do this is to stick it in the NMI. Recall that when enabled, the NMI will trigger at the
start of every vblank. Vblank is the only safe time to write to the PPU, so the NMI is typically full
of drawing code. We don't want to waste our precious vblank time running sound code, but what
about after we are finished drawing? If we stick our call to sound_play_frame at the end of NMI,
after the drawing code, we are set. sound_play_frame gets called once per frame, and we avoid
stepping on the PPU's toes. And since sound_play_frame doesn't write to the PPU registers, it
doesn't matter if our sound code spills out of vblank.
Let's setup the NMI to drive our sound engine:
NMI:
pha ;save registers
txa
pha
tya
pha

;do sprite DMA
;update palettes if needed
;draw stuff on the screen
;set scroll

jsr sound_play_frame ;run our sound engine after all drawing
code is done.
;this ensures our sound engine gets run once per frame.

lda #$00
sta sleeping ;did you do your homework and read
Disch's document last week?
;http://nesdevhandbook.googlepages...

pla ;restore registers
tay
pla
tax
pla
rti
.include
To further separate our sound engine from the main program, we can keep all our sound engine
code in a separate file. NESASM3 gives us a directive .include that we can use to copy a source file
into our main program. We actually used this directive last week to include the note_table.i file,
which contained our period lookup table.
Using .include to copy a source file into our code is very similar to how we use .incbin to import
a .chr file. Assuming our sound engine code is saved in a file called sound_engine.asm, we will add
the following code to our main program:

.include "sound_engine.asm"
We will continue to include note_table.i, but since it is part of our sound engine we will stick
the .include directive in the sound_engine.asm file.
It's not bad practice to use includes a lot. You can pull your joypad routines out and stick them in
their own file. You can have separate files for your gamestate code, for your PPU routines and for
just about anything else you can think of. Breaking up your code like this will make it easier to find
things as your program gets larger and more complicated. It also makes it easier to plug your old
routines into new programs.

Download and unzip the skeleton.zip sample files. Make sure skeleton.asm, sound_engine.asm,
skeleton.chr, note_table.i, sound_data.i and skeleton.bat are all in the same folder as NESASM3,
then double click skeleton.bat. That will run NESASM3 and should produce the skeleton.nes file.
Run that NES file in FCEUXD SP.
I've hardcoded sound_load and sound_play_frame to play a little melody on the Square 1 channel.
It uses a simple frame counter to control note speed. The data for the music is found in the
sound_data.i file. Use the controller to interact with the sound engine. Controls are as follows:

A: Play sound from the beginning (sound_load)
B: Initialize the sound engine (sound_init)
Start: Disable sound engine (sound_disable)
Try editing sound_engine.asm to change the data stream that sound_play_frame reads from. The
different data streams available are located in sound_data.i. Try adding your own data stream to
sound_data.i too. Use the note symbols we made last week and terminate your data stream with
$FF.
Homework: Write two new sound engine entrance subroutines for the main program to use:
1. sound_pause: pauses playback of the sound, but retains the current position in the data
stream.
2. sound_unpause: if the sound is currently paused, resumes play from the saved position.
Then modify handle_joypad to allow the user to pause/unpause the music.
Homework #2: If the ideas presented in Disch's The Frames and NMIs document are still fuzzy in
your head, read it again.
Extra Credit: See if you can understand how my drawing buffer works. Use Disch's document to
help you. I won't cover drawing buffers in these sound tutorials, for obvious reasons, but it is
definitely worth your time to learn how to use them.
Sound Data, Pointer Tables, Headers
This Week: Sound Data, Pointer Tables, Headers
Designing Sound Data

We have a skeleton sound engine in place. Time to pack it with flesh and organs. Before we can
play a song, we will have to load a song. Before we can load a song, we will need song data. So our
next step is to decide how our sound data will look. We'll need to design our data format, create
some test data and then build our engine to read and play that data.
Data Formats
So how do we go about designing a sound data format? A good place to start would be to look at
what we are aiming to play. We know that our sound engine will have two basic types of sound
data:
1. Music
2. Sound Effects (SFX)
Music plays in the background. It uses the first 4 channels, has a tempo, and usually loops over and
over again.
Sound Effects are triggered by game events (eg, ball hitting a paddle) and don't loop indefinitely.
Sound effects have the job of communicating to the player what is going on right now, so they have
priority over music. If there is music playing on the Square 2 channel, and a sound effect is also
using the Square 2 channel, the sound effect should play instead of the music.
Depending on the game, some sound effects may have higher priority than others. For example, in
a Zelda-like game the sound of the player taking damage would have priority over the sound of the
player swinging their sword. The former communicates critical information to the player while the
latter is just for effect.
Streams
As mentioned above, a sound effect will have to share a channel (or channels) with the music. This
is unavoidable because music typically uses all the channels at once, all the time. So when a sound
effect starts playing, it has to steal a channel (or more) away from the music. The music will
continue to play on the other channels, but the shared channel will go to the sound effect. This
creates an interesting problem: if we stop music on a channel to play a sound effect, how do we
know where to resume the music on that channel when the sound effect is finished?
The answer is that we don't actually stop the music on the shared channel. We still advance it frame
by frame in time with the other music channels. We just don't write its data to the APU ports when
a sound effect is playing.
To do this, we will need to keep track of multiple streams of sound data. A data stream is a
sequence of bytes stored in ROM that the sound engine will read and translate into APU writes.
Each stream corresponds to one channel. Music will have 4 data streams - one for each channel.
Sound effects will have 2 streams and the sfx themselves will choose which channel(s) they use. So
6 streams total that could potentially be running at the same time. We will number them like this:
MUSIC_SQ1 = $00 ;these are stream number constants

MUSIC_SQ2 = $01 ;stream number is used to index into stream
variables (see below)
MUSIC_TRI = $02
MUSIC_NOI = $03
SFX_1 = $04
SFX_2 = $05
Each stream will need it's own variables in RAM. An easy way to organize this is to reserve RAM
space in blocks and use the stream number as an index:
;reserve 6 bytes each, one for each stream
stream_curr_sound .rs 6 ;what song/sfx # is this stream
currently playing?
stream_channel .rs 6 ;what channel is it playing on?
stream_vol_duty .rs 6 ;volume/duty settings for this stream
stream_note_LO .rs 6 ;low 8 bits of period for the current
note playing on the stream
stream_note_HI .rs 6 ;high 3 bits of the note period
;..etc
Here we have 6 bytes reserved for each variable. Each stream gets its own byte, for example:
stream_vol_duty+0: MUSIC_SQ1's volume/duty settings
stream_vol_duty+1: MUSIC_SQ2's volume/duty
stream_vol_duty+2: MUSIC_TRI's on/off
stream_vol_duty+3: MUSIC_NOI's volume
stream_vol_duty+4: SFX_1's volume/duty
stream_vol_duty+5: SFX_2's volume/duty

In our sound_play_frame code we will loop through all of the streams using the stream number as
an index:

ldx #$00 ;start at stream 0 (MUSIC_SQ1)
.loop:
;read from data stream in ROM if necessary
;update stream variables based on what we read

lda stream_vol_duty, x ;the value in x determines which stream
we are working with
;do stuff with volume

lda stream_note_LO, x
;do stuff with note periods

;do more stuff with other variables

inx ;next stream
cpx #$06 ;loop through all six streams
bne .loop
The music streams will always be running, updating the APU ports with their data frame by frame.
When a sound effect starts playing, one or both of the sfx streams will start running. Because our
loop processes the SFX streams last, they will write to the APU last and thus overwrite the shared-
channel music streams. Our channel conflict is taken care of automatically by the order of our loop!
Music
We now have an idea of how our stream data will be stored in RAM, but there are still many
unanswered questions. How do we load a song? How do we know where to find the data streams
in ROM? How do we read from those data streams? How do we interpret what we read from those
streams?
To answer these questions, we need to make a data format. Let's start with music. What should our
music data look like? Most NES music data is divided into three types:
1. Note - what note to play: A3, G#5, C2, etc

2. Note Length - how long to play the notes: eighth note, quarter note, whole note, etc
3. Opcodes - opcodes tell the engine to perform specific tasks: loop, adjust volume, change Duty
Cycle for squares, etc
*3.5. Arguments - some opcodes will take arguments as input (e.g. how many times to loop, where
to loop to).
Ranges
We will need to design our data format to make it easy for the sound engine to differentiate between
these three types of data. We do this by specifying ranges. For example, we might say that byte
values of $00-$7F represent Notes. $80-$9F are Note Lengths, and $A0-$FF are opcodes. I just
made those numbers up. It really doesn't matter what values we use. The important thing is that we
have ranges to test against to determine whether a byte is a note, note length or opcode. In our
engine code we will have something like this:
fetch_byte:
lda [sound_pointer], y ;read a byte from the data stream
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else it's an opcode
;do Opcode stuff
.note_length:
;do Note Length stuff
.note:
;do Note stuff

This code reads a byte from the sound data and then tests that byte to see which range it falls into.
It jumps to a different section of code for each possible range. Almost any data format you create
will be divided into ranges like this, whether it be sound data, map data, text data, whatever.
BPL and BMI

These two branch instructions are worth learning if you don't know them already. After BEQ, BNE,
BCS and BCC they are the most common branch instructions. They are often used in range-testing.
BPL tests the Negative (N) flag and will branch if it is clear. Think of BPL as Branch if PLus. The
N flag will be clear if the last instruction executed resulted in a value less than #$80 (ie, bit7 clear).
lda #%01101011

; |
; +-------- bit7 is clear. This will clear the N flag.
bpl .somewhere ;N flag is clear, so this will branch

lda #%10010101
; |
; +-------- bit7 is set. This will set the N flag
bpl .somewhere ;N flag is set, so this will not branch.
BMI is the opposite. It tests the N flag and will branch if it is set. Think of BMI as Branch if
MInus. The N flag will be set if the last instruction executed resulted in a value greater than or
equal to #$80 (ie, bit7 set).
lda #%01101011

; |
; +-------- bit7 is clear. This will clear the N flag.
bmi .somewhere ;N flag is clear, so this will not branch

lda #%10010101
; |
; +-------- bit7 is set. This will set the N flag
bmi .somewhere ;N flag is set, so this will branch to the
label .somewhere.
In the range-testing code above, I used BPL to check if a byte fell into the Note range (00-7F). Go
back and check it.
Song Headers
Music on the NES is typically composed of four parts: a Square 1 part, a Square 2 part, a Triangle
part and a Noise part. When you want to play a song, you will have the main program issue a
command to the sound engine telling it what song you want to play. It will look something like this:
lda #$02

jsr sound_load ;load song 2

Somehow our sound_load subroutine will have to take that "2" and translate it into a whole song,
complete with Square 1, Square 2, Triangle and Noise parts. How does that little number become 4
streams of data? Well, that number is an index into a pointer table, a table of pointers to song
headers. The song headers themselves will contain pointers to the individual channels' data streams.
Pointer Tables
A pointer table is a special kind of lookup table. Only instead of holding regular old numerical data
a pointer table holds addresses. These addresses "point" to the start of data. Addresses on the NES
are 16-bit ($0000-$FFFF), so pointer tables are always tables of words. Let's look at an example:
pointer_table:
.word $8000, $ABCD, $CC10, $DF1B

Here we have a pointer table. It's four entries long. Each entry is a 16-bit address. Presumably
there is data at these four addresses that we will want to read sometime in our program. To read this
data we will need to index into the pointer table, grab the address and store it in a zero-page pointer
variable and then read using indirect mode:
.rsset $0000

ptr1 .rs 2 ;a 2-byte pointer variable.
;The first byte will hold the LO byte of an address
;The second byte will hold the HI byte of an address
.org $E000 ;somewhere in ROM
lda #$02 ;the third entry in the pointer table ($CC10)
asl a ;multiply by 2 because we are indexing into a table
of words
tay
lda pointer_table, y ;#$10 - little endian, so words are
stored LO-byte first
sta ptr1
lda pointer_table+1, y ;#$CC
sta ptr1+1

;now our pointer is setup in ptr1. It "points" to address
$CC10. Let's read data from there.

ldy #$00
lda [ptr1], y ;indirect mode. reads the byte at $CC10
sta some_variable
iny
lda [ptr1], y ;reads the byte at $CC11
sta some_other_variable
iny
;... etc

This code takes an index and uses it to read an address from our pointer_table. It stores this address
in a variable called ptr1 (LO byte first). Then it reads from this address by using indirect mode. We
specify indirect mode by putting []'s around our pointer variable. Look at this instruction:

lda [ptr1], y
It means "Find the address ptr1 is pointing to. Add Y to that address. Load the value at that address
into A".
This is very versatile because we can stick any address we want into our ptr1 variable and read from
anywhere! A pointer table is just a lookup table of places we want to read from.
Of course you usually won't know where exactly in the ROM your data will be. So instead of
declaring addresses explicitely ($8000, $ABCD, $CC10, etc), you will use labels instead:
pointer_table:
.word data1, data2, data3 ;these entries will evaluate to
the 16-bit addresses of the labels below
data1:
.byte $FF, $16, $82, $44 ;some random data

data2:
.byte $0E, $EE, $EF, $16, $23

data3:
.byte $00, $01

Song Header Pointer Table
Our songs work the same way. When the main program tells the sound engine to play a song, it will
send the song number with it. This song number is actually an index into a pointer table of song
headers:

song_headers:
.word song0_header
.word song1_header
.word song2_header
;..etc

sound_load:
asl a ;multiply by 2. we are indexing into a
table of pointers (words)
tay
lda song_headers, y ;read LO byte of a pointer from the
pointer table.
sta sound_ptr ;sound_ptr is a zero page pointer
variable
lda song_headers+1, y ;read HI byte
sta sound_ptr+1

ldy #$00
lda [sound_ptr], y
sta some_variable
iny
;...read the rest of the header data

Header Data
So what will our song header data look like? At the very least it should tell us:
How many data streams we have (songs will usually have 4, but sfx will have fewer)
Which streams those are (which stream index to use)
Which channels those streams use
Where to find those streams (ie, pointers to the beginning of each stream).
Initial values for those streams (for example, initial volume)
As we add more features to our sound engine, we may expand our headers to initialize those
features. Let's start simple. Our headers will look like this:
main header:
--------+----------------
byte # | what it tells us
--------+----------------
00 | number of streams
01+ | stream headers (one for each stream)
stream headers:
--------+----------------
--------+----------------
00 | which stream (stream number)
01 | status byte (see below)
02 | which channel
03 | initial volume (and duty for squares)
04-05 | pointer to data stream
The status byte will be a bit-flag that tells us special information about the stream. For now we
will just use bit0 to mark a stream as enabled or disabled. In the future we may use other bits to
store other information, such as stream priority.
Stream Status Byte

76543210
|
+- Enabled (0: stream disabled; 1: enabled)
Sample Header
Here is some code showing a sample header:
SQUARE_1 = $00 ;these are channel constants

SQUARE_2 = $01
TRIANGLE = $02
NOISE = $03
MUSIC_SQ1 = $00 ;these are stream # constants

MUSIC_SQ2 = $01 ;stream # is used to index into stream variables
MUSIC_TRI = $02
MUSIC_NOI = $03
SFX_1 = $04
SFX_2 = $05
song0_header:
.byte $04 ;4 streams

.byte MUSIC_SQ1 ;which stream
.byte $01 ;status byte (stream enabled)
.byte SQUARE_1 ;which channel
.byte $BC ;initial volume (C) and duty (10)
.word song0_square1 ;pointer to stream

.byte MUSIC_SQ2 ;which stream
.byte $38 ;initial volume (8) and duty (00)

.byte MUSIC_TRI ;which stream
.byte TRIANGLE ;which channel
.byte $81 ;initial volume (on)
.word song0_tri ;pointer to stream

.byte MUSIC_NOI ;which stream
.byte $00 ;disabled. We will have our load routine
skip the
; rest of the reads if the status byte disables the
stream.
; We are disabling Noise because we haven't covered it
yet.
;these are the actual data streams that are pointed to in our
stream headers.
song0_square1:
.byte A3, C4, E4, A4, C5, E5, A5 ;some notes. A minor

song0_square2:
.byte A3, A3, A3, E4, A3, A3, E4 ;some notes to play on square 2

song0_tri:
.byte A3, A3, A3, A3, A3, A3, A3 ;triangle data

Sound Engine Variables
The last thing we need before we can write our sound_load routine is some variables. As
mentioned, our sound engine will have several streams running simultaneously. Four will be used
for music (one for each tonal channel). Two will be used for sound effects. So we will declare all
variables in blocks of 6. Based on our header data, we will need the following variables:
stream_curr_sound .rs 6 ;reserve 6 bytes, one for each stream
stream_status .rs 6
stream_channel .rs 6
stream_vol_duty .rs 6
stream_ptr_LO .rs 6
stream_ptr_HI .rs 6
sound_load
Now let's write some code to read our header. Pay special attention to the X register. I recommend
tracing through the code using the sample header above. Here is our sound_load routine:
;-------------------------------------
; load_sound will prepare the sound engine to play a song or sfx.
; input:
; A: song/sfx number to play
sound_load:
sta sound_temp1 ;save song number
asl a ;multiply by 2. We are indexing into a
table of pointers (words)
tay
lda song_headers, y ;setup the pointer to our song header
sta sound_ptr
lda song_headers+1, y
sta sound_ptr+1

ldy #$00
lda [sound_ptr], y ;read the first byte: # streams
sta sound_temp2 ;store in a temp variable. We will use
this as a loop counter
iny
.loop:
lda [sound_ptr], y ;stream number
tax ;stream number acts as our variable index
iny

lda [sound_ptr], y ;status byte. 1= enable, 0=disable
sta stream_status, x
beq .next_stream ;if status byte is 0, stream disabled, so
we are done
iny

lda [sound_ptr], y ;channel number
sta stream_channel, x
iny

lda [sound_ptr], y ;initial duty and volume settings
sta stream_vol_duty, x
iny

lda [sound_ptr], y ;pointer to stream data. Little endian,
so low byte first
sta stream_ptr_LO, x
iny

lda [sound_ptr], y
sta stream_ptr_HI, x
.next_stream:
iny

lda sound_temp1 ;song number
sta stream_curr_sound, x

dec sound_temp2 ;our loop counter
bne .loop
rts

Now our sound_load routine is ready. If the main program calls it, like this:
lda #$00 ;song 0

jsr sound_load

Our sound_load routine will take the value in the A register and use it to fill our music RAM with
everything we need to get our song running!
Reading Streams
Once we have our header loaded, we are ready to rock. All of our active streams have pointers to
their data stored in their stream_ptr_LO and stream_ptr_HI variables. That's all we need to start
reading data from them.
To read data from our data stream, we will first copy the stream pointer into a zero-page pointer
variable. Then we will read a byte using indirect mode and range-test it to determine whether it is a
note, note length or opcode. If it's a note, we will read from our note_table and store the 11-bit
period in RAM. Finally, we will update our stream pointer to point to the next byte in the stream.
First we will need to declare some new variable blocks for the note periods:
stream_note_LO .rs 6 ;low 8 bits of period

stream_note_HI .rs 6 ;high 3 bits of period
Here is our se_fetch_byte routine (se_ stands for "sound engine"):
;--------------------------
; se_fetch_byte reads one byte from a sound data stream and
handles it
; input:
; X: stream number
se_fetch_byte:
lda stream_ptr_LO, x ;copy stream pointer into a zero page
pointer variable
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1

ldy #$00
lda [sound_ptr], y ;read a byte using indirect mode
bpl .note ;if <#$80, we have a note
cmp #$A0 ;else if <#$A0 we have a note length
bcc .note_length
.opcode: ;else we have an opcode
;nothing here yet
jmp .update_pointer
.note_length:
;nothing here yet
jmp .update_pointer
.note:
asl ;multiply by 2 because we are index into
a table of words
sty sound_temp1 ;save our Y register because we are about
to destroy it
tay
lda note_table, y ;pull low 8-bits of period and store it
in RAM
sta stream_note_LO, x
lda note_table+1, y ;pull high 3-bits of period from our note
table
sta stream_note_HI, x
ldy sound_temp1 ;restore the Y register

;update our stream pointers to point to the next byte in the data
stream
.update_pointer:
iny ;set index to the next byte in the data
stream
tya
clc
adc stream_ptr_LO, x ;add Y to the LO pointer
bcc .end
inc stream_ptr_HI, x ;if there was a carry, add 1 to the HI
pointer.
.end:
rts
Look at the part that updates the stream pointer. After we finish all our reads, Y will hold the index
of the last byte read. To be ready for the next frame, we will want to update our pointer to point to
the next byte in the data stream. To do this, we increment Y and add it to the pointer. But we have
to be careful here. What if our current position is something like this:
stream_ptr: $C3FF
Y: 1
The next position here should be $C400. But ADC only works on the 8-bit level, so if we add 1 to
the low byte of the pointer we will get this instead:
stream_ptr: $C300
The FF in the low byte becomes 00, but the high byte remains the same. We need to increment the
high byte manually. But how do we know when to increment it and when to leave it alone? Lucky
for us, ADC sets the carry flag whenever it makes a FF->00 transition. So we can just check the
carry flag after our addition. If it is set, increment the high byte of the pointer. If it is clear, don't
increment it. That's what our code above does.
Playing Music
We've loaded our header. We've set up our stream pointers in RAM. We've written a routine that
will read bytes from the streams and turn them into notes. Now we need to update
sound_play_frame. sound_play_frame will loop through all 6 streams. It will check the status byte
to see if they are enabled. If enabled, it will advance the stream by one frame. Here's the code:
sound_play_frame:
bne .done ;if sound engine is disabled, don't advance a frame

inc sound_frame_counter
lda sound_frame_counter
cmp #$08 ;***change this compare value to make the notes play
faster or slower***
bne .done ;only take action once every 8 frames.

ldx #$00 ;our stream index. start at MUSIC_SQ1
stream
.loop:
lda stream_status, x ;check bit 0 to see if stream is enabled
and #$01
beq .next_stream ;if disabled, skip to next stream

jsr se_fetch_byte ;read from the stream and update RAM
jsr se_set_apu ;write volume/duty, sweep, and note
periods of current stream to the APU ports

.next_stream:
inx
cpx #$06 ;loop through all 6 streams.
bne .loop

lda #$00
sta sound_frame_counter ;reset frame counter so we can start
counting to 8 again.
.done:
rts
And here is se_set_apu which will write a stream's data to the APU ports:

se_set_apu:
lda stream_channel, x ;which channel does this stream write to?
asl a
asl a ;multiply by 4 so Y will index into the
right set of APU ports (see below)
tay
lda stream_vol_duty, x
sta $4000, y
sta $4002, y
lda stream_note_HI, x
sta $4003, y

lda stream_channel, x
cmp #TRIANGLE
bcs .end ;if Triangle or Noise, skip this part
lda #$08 ;else, set negate flag in sweep unit to allow low
notes on Squares
sta $4001, y
.end:
rts

Writing to the APU ports directly like this is actually bad form. We'll learn why in a later lesson.
One thing to pay attention to is how we get our APU port index. We take the channel and multiply
it by 4. Recall that we declared constants for our channels:
SQUARE_1 = $00 ;these are channel constants

SQUARE_2 = $01
TRIANGLE = $02
NOISE = $03
If our stream_channel is $00 (SQUARE_1), we multiply by 4 to get $00. y = 0

$4000, y = $4000
$4001, y = $4001
$4002, y = $4002
$4003, y = $4003
If our stream_channel is $01 (SQUARE_2), we multiply by 4 to get $04. y = 4
$4000, y = $4004
$4001, y = $4005
$4002, y = $4006
$4003, y = $4007
If our stream_channel is $02 (TRIANGLE), we multiply by 4 to get $08. y = 8
$4000, y = $4008
$4001, y = $4009 (unused)
$4002, y = $400A
$4003, y = $400B
If our stream_channel is $03 (NOISE), we multiply by 4 to get $0C. y = C
$4000, y = $400C
$4001, y = $400D
$4002, y = $400E
$4003, y = $400F
See how everything lines up nicely?

Download and unzip the headers.zip sample files. Make sure the following files are in the same
folder as NESASM3:
headers.asm
sound_engine.asm
headers.chr
note_table.i
song0.i
song1.i
song2.i
song3.i
headers.bat
Double click headers.bat. That will run NESASM3 and should produce the headers.nes file. Run
that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:

Up: Play
Down: Stop
Right: Next Song
Left: Previous Song
Song0 is a silence song. It is not selectable. headers.asm "plays" song0 to stop the music when you
press down. See song0.i to find out how it works.
Song1 is an evil sounding series of minor thirds.
Song2 is a short sound effect on the Sq2 channel. It uses the SFX_1 stream. Try playing it over the
other songs to see how it steals the channel from the music.
Song3 is a simple descending chord progression.
Try creating your own songs and sound effects and add them into the mix. To add a new song you
will need to take the following steps:
1) create a song header and song data (use the included songs as reference). Note that data streams
are terminated with $FF
2) add your header to the song_headers pointer table at the bottom of sound_engine.asm
3) update the constant NUM_SONGS to reflect the new song number total (also at the bottom of
sound_engine.asm)
Although not necessary, I recommend keeping your song data in a separate file like I've done with
song0.i, song1.i, song2.i and song3.i. This makes it easier to find the data if you need to edit your
song later. If you do this, don't forget to .include your file.
Tempo, Note Lengths, Buffering and Rests
This Week: Tempo, Note Lengths, Buffering and Rests
Timing
Last week we put together a huge chunk of the sound engine. We finally got it to play something
that resembled a song. But we had big limitations when it came to timing and note lengths. We
were using a single frame counter to keep time across all 6 streams of the sound engine. This is a
problem because it imposes the music's speed on our sound effects. If you were to use such a
system in a real game, your sound effects would speed up or slow down whenever you change to a
faster or slower song.
We also made the mistake of advancing the sound engine when our frame counter hit its mark, but
skipping it when it didn't. What happens if a game event triggers a sound effect one or two frames
after the counter hits its mark? The sound effect won't start until the next time our counter reaches
its mark - we have to wait for it! There will be a delay. Not good.
Worst of all, our frame counter also doesn't allow for variable note lengths. Unless every song you
write is going to consist of only 32nd notes, this is a problem. It becomes apparent then that we
need a more complex timing system.
Tempo
We'll correct the first two problems by ripping out the universal counter and giving each stream it's
own private counter. We'll also change our method of counting. Our old method of counting
frames and taking action when we reach a certain number is very limited. For example, let's say
that we have a song and our frame counter is taking action every 4 frames. Maybe the song sounds
a tad faster than we want, so we slow it down by changing the speed to update once every 5 frames.
But now the song sounds too slow. The speed we really want is somewhere in between 4 and 5, but
we can't get there with our frame counting method. Instead we'll use a ticker.
Ticker
The ticker method involves taking a number (a tempo) and adding it to a total, frame by frame.
Eventually, that total will wraparound from FF->00 and when it does the carry flag will be set (a
tick). This carry flag tick will be the signal we look for to advance our stream.
For example, let's say our tempo value is $40 and our total starts at $00. After one frame we will
add our tempo to the total. $00 + $40 = $40. Now our total is $40. Another frame goes by (2). We
add our tempo to the total again. $40 + $40 = $80. Our total is $80. Another frame goes by (3).
$80 + $40 = $C0. Another frame goes by (4). $C0 + $40 = $00. Carry flag is set. TICK! A tick
tells us that it is time to advance this stream. When we finish updating, we start adding again until
we get another tick.
As you can see, a tempo value of $40 will advance our stream once every 4 frames. If you do some
math (256 / 5), you will discover that a tempo of $33 will advance the stream roughly every 5
frames. If $40 is too fast for your song and $33 is too slow, you still have the values $34-$39 to
experiment with. Much more versatile! To see why this works, let's see what happens with a tempo
value of say $36:
$00 + $36 + $36 + $36 + $36 + $36 = $0E (Tick in 5 frames)

$0E + $36 + $36 + $36 + $36 + $36 = $1C (Tick in 5 frames)
$1C + $36 + $36 + $36 + $36 = $02 (Tick in 4 frames)
$02 + $36 + $36 + $36 + $36 + $26 = $10 (Tick in 5 frames)
A tempo of $36 produces a tick every 5 frames most of the time, but sometimes it only takes 4
frames. You might think that this disparity would make our song sound uneven, but really a single
frame only lasts about 1/60 of a second. Our ears won't notice. It will sound just right to us.
Here is some code that demonstrates how to implement a ticker:
stream_tempo .rs 6 ;the value to add to our ticker total

each frame
stream_ticker_total .rs 6 ;our running ticker total.
sound_play_frame:

ldx #$00
.loop:
lda stream_status, x
and #$01
beq .endloop ;if stream disabled, skip this stream

;add the tempo to the ticker total. If there is a FF-> 0
transition, there is a tick
lda stream_ticker_total, x
clc
adc stream_tempo, x
sta stream_ticker_total, x
bcc .endloop ;carry clear = no tick. if no tick, we are done
with this stream

jsr se_fetch_byte ;else there is a tick, so do stuff
;do more stuff
.endloop:
inx
cpx #$06
bne .loop
.done:
rts

Initializing
Anytime we add a new feature to our sound engine we will want to ask ourselves the following
questions:
1) Is this a feature that needs to be initialized for each song/sfx?

2) If so, are the values we use to initialize the feature variable (ie, not necessarily the same for every
song/sfx)?
If the answer to question #1 is yes, we will have to update sound_load to initialize the feature.
If the answer to question #2 is also yes, we will have to add a field to the song header format. The
values to plug into the initialization are different for each song, so the songs' headers will need to
provide those values for us.
In the case of our new timing scheme, we have two variables that need to be initialized:
sound_ticker_total and sound_tempo. Of the two, only sound_tempo will be variable. Different
songs will have different tempos, but they won't need to have different starting sound_ticker_totals.
So we will have to add one new field to our song header format for tempo:
main header:
--------+----------------
--------+----------------
stream headers:
--------+----------------
--------+----------------
01 | status byte
02 | which channel
03 | initial volume (and duty for squares)
06 | initial tempo
Then we will need to edit sound_load to read this new byte for each stream and store it in RAM.
We'll also want to initialize stream_ticker_total to some fixed starting value, preferably a high one
so that the first tick will happen without a delay. Finally, we will have to update all of our songs to
include tempos in their headers.
Note Lengths
We still have the problem of note lengths. Songs are made up of notes of variable length: quarter
notes, eighth notes, sixteenth notes, etc. Our sound engine needs to be able to differentiate between
different note lengths. But how? We will use note length counters.
Note Length Counters

Think of the fastest note you'd ever need to play, say a 32nd note. Since that will be our fastest
note, we'll give it the smallest count possible: $01. The next fastest note is a 16th note. In music, a
16th note equals two 32nd notes. In other words, a 16th note lasts twice as long as a 32nd note. So
we will give it a count value that is twice the count value of our 32nd note: $02. The next fastest
note is an 8th note. An 8th note equals two 16th notes. It is twice as long as a 16th note. So its
count value will be twice that of the 16th note: $04. Going all the way up to a whole note, we can
produce a lookup table like this:
note_length_table:
.byte $01 ;32nd note
.byte $02 ;16th note
.byte $04 ;8th note
.byte $08 ;quarter note
.byte $10 ;half note
.byte $20 ;whole note

We'll add more entries later for things like dotted quarter notes, but for now this is sufficient to get
us started.
To play different note lengths, we will give each stream a note length counter:
stream_note_length_counter .rs 6
When a note is played, for example an 8th note, its count value will be pulled from the
note_length_table and stored in the stream's note length counter. Then every time a tick occurs we
will decrement the counter. When the note length counter reaches 0, it will signal to us that our
note has finished playing and it is time for the next note. To say it another way, a note's count value
is simply how many ticks it lasts. An eighth note is 4 ticks long. A quarter note is 8 ticks long. A
half note is 16 ticks long ($10).
Note Lengths in Data

Now we need to add note lengths to our sound data. Recall that we specified that byte values in the
range of $80-$9F were note lengths:
se_fetch_byte:
lda stream_ptr_LO, x
sta sound_ptr
sta sound_ptr+1

ldy #$00
lda [sound_ptr], y
cmp #$A0
;do Opcode stuff
.note_length:
;do note length stuff
.note:
;do note stuff

So the first byte value that we can use for note lengths is $80. We are going to be reading from a
lookup table (note_length_table above), so we should assign the bytes in the same order as the
lookup table.
-----+--------------
byte | note length
-----+--------------
$80 | 32nd note
$81 | 16th note
$82 | 8th note
$83 | quarter note
$84 | half note
$85 | whole note
Now we can use these values in our sound data to represent note lengths:
;music data for song 0, square 1 channel

song0_sq1:
.byte $82, C3 ;play a C eighth note
.byte $84, D5 ;play a D half note

Of course, memorizing which byte value corresponds to which note length is a pain. Let's create
some aliases to make it easier on us when we are creating our sound data:
;note length constants

thirtysecond = $80
sixteenth = $81
eighth = $82
quarter = $83
half = $84
whole = $85
song0_sq1:
.byte eighth, C3 ;play a C eighth note
.byte half, D5 ;play a D half note

Pulling from the table
There is a small problem here. Lookup tables index from 0. This wasn't a problem for note values
(C5, D3, G6) because our note range started from 0 ($00-$7f). But our note length data has a range
of $80-$9F. Somehow we will need to translate the note length byte that comes from the data
stream into a number we can use to index into our table. In other words, we need to figure out a
way to turn $80 into $00, $81 into $01, $82 into $02, etc. Anything come to mind?
If you thought "just subtract $80 from the note length value", give yourself a cookie. If you thought
"just chop off the 7-bit", give yourself two cookies. Both solutions work, but the second solution is
a little bit faster and only takes one instruction to perform:
se_fetch_byte:
sta sound_ptr
sta sound_ptr+1

ldy #$00
.fetch:
lda [sound_ptr], y
cmp #$A0
;do Opcode stuff
.note_length:
and #%01111111 ;chop off bit7
sty sound_temp1 ;save Y because we are about to destroy
it
tay
lda note_length_table, y ;get the note length count value
sta stream_note_length_counter, x ;stick it in our note length
counter
ldy sound_temp1 ;restore Y
iny ;set index to next byte in the stream
jmp .fetch ;fetch another byte
.note:
;do note stuff

Notice that we jump back up to .fetch after we set the note length counter. This is so that we can
read the note that will surely follow the note length in the data stream. If we simply stop after
setting the note length, we'll know how long to play, but we won't know which note to play!
Here's an updated sound_play_frame routine that implements both the ticker and the note length
counters. Notice how the note length counter is only decremented when we have a tick, and we
only advance the stream when the note length counter reaches zero:
sound_play_frame:

ldx #$00
.loop:
and #$01

clc
adc stream_tempo, x
with this stream

dec stream_note_length_counter, x ;else there is a tick.
decrement the note length counter
bne .endloop ;if counter is non-zero, our note isn't finished
playing yet

jsr se_fetch_byte ;else our note is finished. Time to read
from the data stream
;do more stuff. set volume, note, sweep, etc
.endloop:
inx
cpx #$06
bne .loop
.done:
rts
We have one last change to make. When we load a new song we will want it to start playing
immediately, so we should initialize the stream_note_length_counter in the sound_load routine to
do just that. Our sound_play_frame routine decrements the counter and takes action if the result is
zero. Therefore, to ensure that our song starts immediately, we should initialize our
stream_note_length_counter to $01:

;somewhere inside the loop of sound_load
lda #$01
sta stream_note_length_counter, x

And now our engine supports note lengths. But there is still room for improvement. What if we
want to play a series of 8th notes? Not an uncommon thing to have in music. Here is how our data
would have to look now:
sound_data:
.byte eighth, C5, eighth, E5, eighth, G5, eighth, C6, eighth, E6,
eighth, G6, eighth, C7 ;Cmajor

That's a lot of "eighth" bytes. Wouldn't it be better to just state "eighth" once, and assume that all
notes following it are eighth notes? Like this:
sound_data:
.byte eighth, C5, E5, G5, C6, E6, G6, C7 ;Cmajor

That saved us 6 bytes of ROM space. And if you consider that a game may have 20+ songs, each
with 4 streams of data, each with potentially several strings of equal-length notes, this kind of
change might save us hundreds, maybe even thousands of bytes! Let's do it.
To pull this off, we will have to store the current note length count value in RAM. Then when our
note length counter runs to 0, we will refill it with our RAM count value.
stream_note_length .rs 6 ;note length count value
;-------
se_fetch_byte:
sta sound_ptr
sta sound_ptr+1

ldy #$00
.fetch:
lda [sound_ptr], y
cmp #$A0
;do Opcode stuff
.note_length:
and #%01111111 ;chop off bit7
sty sound_temp1 ;save Y because we are about to destroy
it
tay
lda note_length_table, y ;get the note length count value
sta stream_note_length, x ;save the note length in RAM so we
can use it to refill the counter
sta stream_note_length_counter, x ;stick it in our note length
counter
ldy sound_temp1 ;restore Y
iny ;set index to next byte in the stream
jmp .fetch ;fetch another byte
.note:
;do note stuff

;--------
sound_play_frame:

ldx #$00
.loop:
and #$01

clc
adc stream_tempo, x
with this stream

playing yet
lda stream_note_length, x ;else our note is finished. reload
the note length counter

jsr se_fetch_byte ;Time to read from the data stream
;do more stuff
.endloop:
inx
cpx #$06
bne .loop
.done:
rts

Adding those 4 lines of code just saved us hundreds of bytes of ROM space. A nice tradeoff for 6
bytes of RAM. Now our data will be made up of "strings", where we have a note length followed
by a series of notes:
sound_data:
.byte eighth, C5, E5, G5, C6, E6, G6, quarter, C6 ;six 8th notes
and a quarter note

Easy to read and easy to write.
Other Note Lengths

Now that everything is setup, we can add more note lengths to our note_length_table. Dotted notes
are very common in music. Dotted notes are equal in length to the note plus the next fastest note.
For example, a dotted quarter note = a quarter note + an 8th note. A dotted 8th note = an 8th note +
a 16th note. Let's add some dotted notes to our table:
note_length_table:
.byte $01 ;32nd note
.byte $02 ;16th note
.byte $04 ;8th note
.byte $08 ;quarter note
.byte $10 ;half note
.byte $20 ;whole note
;---dotted notes
.byte $03 ;dotted 16th note
.byte $06 ;dotted 8th note
.byte $0C ;dotted quarter note
.byte $18 ;dotted half note
.byte $30 ;dotted whole note?

The actual order of our note_length_table doesn't matter. We just have to make sure our aliases are
in the same order as the table:
;note length constants (aliases)
thirtysecond = $80
sixteenth = $81
eighth = $82
quarter = $83
half = $84
whole = $85
d_sixteenth = $86
d_eighth = $87
d_quarter = $88
d_half = $89
d_whole = $8A ;don't forget we are counting in hex
Your music will determine what other entries you'll need to add to your note length table. If one of
your songs has a really really long note, like 3 whole notes tied together, add it to the table ($60)
and make an alias for it (whole_x3). If your song contains a note that is seven 8th notes long (a half
note plus a dotted quarter note tied together), add it to the table ($1C) and make an alias for it
(seven_eighths).
Buffering APU Writes

Before, we've been writing to the APU one stream at a time. If two different streams shared a
channel, they would both write to the same APU ports. If three streams were to share a channel,
which is possible if there are two different sound effects loaded into SFX_1 and SFX_2, all three
would write to the same APU ports in the same frame. This is bad practice. It can also cause some
unwanted noise on the square channels.
A better method is to buffer our writes. Instead of writing to the APU ports directly, each stream
will instead write its data to temporary ports in RAM. We'll keep our loop order, so sfx streams will
still overwrite the music streams. Then when all the streams are done, we will copy the contents of
our temporary RAM ports directly to the APU ports all at once. This ensures that the APU ports
only get written to once per frame max. To do this, we first need to reserve some RAM space for
our temporary port variables:
soft_apu_ports .rs 16
We reserved 16 bytes for our temporary ports. Each one corresponds to an APU port:
soft_apu_ports+0 -> $4000 ;Square 1 ports

soft_apu_ports+1 -> $4001
soft_apu_ports+4 -> $4004 ;Square 2 ports

soft_apu_ports+8 -> $4008 ;Triangle ports

soft_apu_ports+9 -> $4009 (unused)
soft_apu_ports+10 -> $400A
soft_apu_ports+11 -> $400B
soft_apu_ports+12 -> $400C ;Noise ports

soft_apu_ports+13 -> $400D (unused)
soft_apu_ports+14 -> $400E
soft_apu_ports+15 -> $400F
Let's implement this by working backwards. First we will edit sound_play_frame and pull our call
to se_set_apu out of the loop. We do this because we only want to write to the APU once, after all
the streams are done looping:
;--------------------------
; sound_play_frame advances the sound engine by one frame
sound_play_frame:
ldx #$00

.loop:
and #$01 ;check whether the stream is active
beq .endloop ;if the channel isn't active, skip it

clc
adc stream_tempo, x
with this stream

playing yet
lda stream_note_length, x ;else our note is finished. reload
the note length counter

jsr se_fetch_byte
;snip

.endloop:
inx
cpx #$06
bne .loop

jsr se_set_apu
.done:
rts

Next we will modify se_set_apu to copy the temporary APU ports to the real APU ports:
se_set_apu:
ldy #$0F
.loop:
cpy #$09
beq .skip ;$4009 is unused
cpy #$0D
beq .skip ;$400D is unused

lda soft_apu_ports, y
sta $4000, y
.skip:
dey
bpl .loop ;stop the loop when Y is goes from $00 -> $FF

rts

Now we have to write the subroutine that will populate the temporary APU ports with a stream's
data. This part will get more complicated as we add more features to our sound engine, but for now
it's quite simple:
se_set_temp_ports:
asl a
asl a
tay

sta soft_apu_ports, y ;vol

lda #$08
sta soft_apu_ports+1, y ;sweep

sta soft_apu_ports+2, y ;period LO

sta soft_apu_ports+3, y ;period HI
rts

We will make the call to se_set_temp_ports after our call to se_fetch_byte, where the old
se_set_apu call was before we snipped it out of the loop. Notice that we don't bother to check the
channel before writing the sweep. se_set_apu takes care of this part for us. There's no harm in
writing these values to RAM, so we'll avoid branching here to simplify the code.
Crackling Sounds
Writing to the 4th port of the Square channels ($4003/$4007) has the side effect of resetting the
sequencer. If we write here too often, we will get a nasty crackling sound out of our Squares. This
is not good.
The way our engine is setup now, we call se_set_apu once per frame. se_set_apu writes to
$4003/$4007, so these ports will get written to once per frame. This is too often. We need to find a
way to write here less often. We will do this by cutting out redundant writes. If the value we want
to write this frame is the same as the value written last frame, skip the write.
First we will need to keep track of what was last written to the ports. This will require some new
variables:
sound_sq1_old .rs 1 ;the last value written to $4003

sound_sq2_old .rs 1 ;the last value written to $4007
Whenever we write to one of these ports, we will also write the value to the corresponding
sound_port4_old variable. Saving this value will allow us to compare against it next frame. To
implement this, we will have to unroll our loop in se_set_apu:
se_set_apu:
.square1:
lda soft_apu_ports+0
sta $4000
sta $4001
sta $4002
sta $4003
sta sound_sq1_old ;save the value we just wrote to $4003
.square2:
sta $4004
sta $4005
sta $4006
sta $4007
.triangle:
sta $4008
sta $400A
sta $400B
.noise:
sta $400C
sta $400E
sta $400F
rts

Now we have a variable that will keep track of the last value written to a channel's 4th port. The
next step is to add a check before we write:
se_set_apu:
.square1:
sta $4000
sta $4001
sta $4002
cmp sound_sq1_old ;compare to last write
beq .square2 ;don't write this frame if they were
equal
sta $4003
.square2:
sta $4004
sta $4005
sta $4006
cmp sound_sq2_old
beq .triangle
sta $4007
.triangle:
sta $4008
lda soft_apu_ports+10 ;there is no $4009, so we skip it
sta $400A
sta $400B
.noise:
sta $400C
lda soft_apu_ports+14 ;there is no $400D, so we skip it
sta $400E
sta $400F
rts

Finally we have to consider initialization. The only case we really have to worry about is the first
time a song is played in the game. Consider what happens if we initialize the sound_sq1_old and
sound_sq2_old variables to $00. We are essentially saying that on startup (RESET) the last byte
written to $4003/$4007 was a $00, which isn't true of course. On startup, no write has ever been
made to these ports. If we initialize to $00, and if the first note of the first song played has a $00 for
the high 3 bits of its period, it will get skipped. That is not what we want. Instead, we should
initialize these variables to some value that will never be written to $4003/$4007, like $FF. This
ensures that the first note(s) played in the game won't be skipped.
sound_init:
lda #$0F
channels

lda #$00
sta sound_disable_flag ;clear disable flag
;later, if we have other variables we want to initialize, we will
do that here.
lda #$FF
sta sound_sq1_old
sta sound_sq2_old
se_silence:
lda #$30
sta soft_apu_ports ;set Square 1 volume to 0
sta soft_apu_ports+4 ;set Square 2 volume to 0
sta soft_apu_ports+12 ;set Noise volume to 0
lda #$80
sta soft_apu_ports+8 ;silence Triangle

rts

Rests
The final topic we will cover this lesson is rests. A rest is a period of silence in between notes.
Like notes, rests can be of variable length: quarter rest, half rest, whole rest, etc. In other words a
rest is a silent note.
So how will we implement it? We will handle rests by considering a rest to be special case note.
We will give the rest a dummy period in our note table. Then, when we fetch a byte from the data
stream and determine the byte to be a note, we will add an extra check to see if that note is a rest. If
it is, we will make sure that it shuts up the stream.
First let's add the rest to our note table. We will give it a dummy period. It doesn't really matter
what value we use. I'm going to give it a period of $0000. We will also want to add the rest to our
list of note aliases:
note_table:
.word $07F1, $0780, etc...
;....more note table values here
.word $0000 ;rest. Last entry

;Note: octaves in music traditionally start at C, not A
A1 = $00 ;the "1" means Octave 1
As1 = $01 ;the "s" means "sharp"
Bb1 = $01 ;the "b" means "flat" A# == Bb, so same value
B1 = $02
;..... other aliases here
F9 = $5c
Fs9 = $5d
Gb9 = $5d
rest = $5e
Now we can use the symbol "rest" in our music data. "rest" will evaluate to the value $5E, which
falls within our note range ($00-$7F). When our sound engine encounters a $5E in the data stream,
it will pull the period ($0000) from the note table and store it in RAM. A period of $0000 is
actually low enough to silence the square channels, but the triangle channel is still audible at this
period so we have more work to do.
Checking for a rest

When we encounter a rest, we will want to tell the sound engine to shut this stream up until the next
note. The rest functions differently from all the other notes, so we will need to make a special
check for it in our code. We will make a subroutine se_check_rest to do this for us:
se_fetch_byte:
sta sound_ptr
sta sound_ptr+1

ldy #$00
.fetch:
lda [sound_ptr], y
cmp #$A0
;do Opcode stuff
.note_length:
;do Note Length stuff
.note:
;do Note stuff
sty sound_temp1 ;save our index into the data stream
asl a
tay
lda note_table, y
lda note_table+1, y
ldy sound_temp1 ;restore data stream index
;check if it's a rest

jsr se_check_rest
.update_pointer:
iny
tya
clc
adc stream_ptr_LO, x
bcc .end
inc stream_ptr_HI, x
.end:
rts

se_check_rest will check to see if the note value is equal to $5E or not. If it is, we will need to tell
the sound engine to silence the stream. If the note isn't equal to $5E, we can go on our merry way.
How will we silence our stream then? This is actually a little complicated. Recall that the stream's
volume (stream_vol_duty) is set in the song's header. se_set_temp_ports copies the value of
stream_vol_duty to soft_apu_ports. If we have se_check_rest modify the stream_vol_duty variable
directly (set it to 0 volume), the old volume value disappears. We won't know what to restore it to
when we are done with our rest. Oh no!
What we will want to do instead is leave stream_vol_duty alone. We will copy it into
soft_apu_ports every frame as usual. Then, after the copy we will check to see if we are currently
resting. If we are, we will make another write soft_apu_ports with a value that will set the volume
to 0. Make sense?
stream_status
To do this we will need to keep track of our resting status in a variable. If our sound engine
encounters a $5E in the data stream, we'll turn our resting status on. If it's not, we'll turn our resting
status off. There are only two possibilities: on or off. Rather than declare a whole new block of
variables and waste six bytes of RAM, let's assign one of the bits in our stream_status variable to be
our rest indicator:
Stream Status Byte

76543210
||
|+- Enabled (0: stream disabled; 1: enabled)
+-- Rest (0: not resting; 1: resting)

Our new subroutine se_check_rest will be in charge of setting or clearing this bit of the status byte:
se_check_rest:
lda [sound_ptr], y ;read the note byte again
cmp #rest ;is it a rest? (==$5E)
bne .not_rest
ora #%00000010 ;if so, set the rest bit in the status byte
bne .store ;this will always branch. bne is cheaper
than a jmp.
.not_rest:
and #%11111101 ;clear the rest bit in the status byte
.store:
sta stream_status, x
rts

Then we modify se_set_temp_ports to check the rest bit and silence the stream if it is set:
se_set_temp_ports:
asl a
asl a
tay

sta soft_apu_ports, y ;vol

lda #$08



;check the rest flag. if set, overwrite volume with silence value

and #%00000010
beq .done ;if clear, no rest, so quit
cmp #TRIANGLE ;if triangle, silence with #$80
beq .tri
lda #$30 ;else, silence with #$30
bne .store ;this will always branch. bne is cheaper than a
jmp.
.tri:
lda #$80
.store:
sta soft_apu_ports, y
.done:
rts

That's it. Now our engine supports rests! They work just like notes, so their lengths are controlled
with note lengths:
song_data: ;this data has two quarter rests in it.

.byte half, C2, quarter, rest, eighth, D4, C4, quarter, B3, rest

Download and unzip the tempo.zip sample files. Make sure the following files are in the same
folder as NESASM3:
tempo.asm
sound_engine.asm
tempo.chr
note_table.i
note_length_table.i
song0.i
song1.i
song2.i
song3.i
song4.i
song5.i
tempo.bat
Double click tempo.bat. That will run NESASM3 and should produce the tempo.nes file. Run that
NES file in FCEUXD SP.

Up: Play
Down: Stop
Right: Next Song/SFX
Left: Previous Song/SFX
Song0 is a silence song. Not selectable. tempo.asm "plays" song0 to stop the music when you
press down. See song0.i to find out how it works.
Song1 is last week's evil sounding series of minor thirds, but much faster now thanks to tempo
settings.
Song2 is the same short sound effect from last week.
Song3 is a simple descending chord progression. We saved some bytes in the triangle data using
note lengths (compare to last week's file)
Song4 is a new song that showcases variable note lengths and rests.
Song5 is a short sound effect. It plays 10 notes extremely fast. Play it over songs and see how it
steals the SQ2 channel from the music.
1) create a song header and song data (use the included songs as reference). Don't forget to add
tempos for each stream in your header. Data streams are terminated with $FF.
sound_engine.asm)
Although not necessary, I recommend keeping your song data in a separate file like I've done with
song0.i, song1.i, song2.i and song3.i. This makes it easier to find the data if you need to edit your
song later. If you do this, don't forget to .include your file.
Nerdy Nights Sound: Part 7: Volume Envelopes
This Week: Volume Envelopes
Volume Envelopes
This week we will add volume envelopes to our engine. A volume envelope is a series of volume
values that are applied to a note one frame at a time. For example, if we had a volume envelope
that looked like this:
F E D C 9 5 0
Then whenever we played a note, it would have a volume of F on the first frame, a volume of E on
the second frame, then D, then C, then 9, then 5 until it is finally silenced with a volume of 0 on the
7th frame. Applying this volume envelope on our notes would give them a sharp, short staccato
feel. Conversely, if we had a volume envelope that looked like this:
1 1 2 2 3 3 4 4 7 7 8 8 A A C C D D E E F F F
Each note would start very quietly and fade in to full volume. Look at this volume envelope:
D D D C B 0 0 0 0 0 0 0 0 6 6 6 5 4 0
Here we start at a high volume (D) and let it ring for 5 frames. Then we silence the note for 8
frames. Then the note comes back at a very low volume for 5 frames. Notes using this volume
envelope would sound like they had an faint echo.
As you can see, volume envelopes are pretty cool. We can get a lot of different sounds out of them.
Let's add them in.
Channels
Volume envelopes are best suited for the square and noise channels where we have full control of
the volume. The triangle channel on the other hand doesn't allow much volume control. It only has
two settings: full blast and off. We can still apply volume envelopes in a limited way though.
Consider these two volume envelopes:
0F 0E 0D 0C 09 05 00
04 04 05 05 06 06 07 07 08 08 09 09 0A 0A 00
These two envelopes would have a vastly different sound on the square channels, but to the triangle
they look like this:
On On On On On On Off
On On On On On On On On On On On On On On Off
You don't get the subtle shifts in volume, but you do get a different length. We can use volume
envelopes that end in 00 to control when the triangle key-off occurs. Not as cool as full volume
control, but still useful.
Defining volume envelopes

First let's define some volume envelopes so we have some data to work with. We'll use some of the
examples from above:
se_ve_1:
.byte $0F, $0E, $0D, $0C, $09, $05, $00
.byte $FF
se_ve_2:
.byte $01, $01, $02, $02, $03, $03, $04, $04, $07, $07
.byte $08, $08, $0A, $0A, $0C, $0C, $0D, $0D, $0E, $0E
.byte $0F, $0F
.byte $FF
se_ve_3:
.byte $0D, $0D, $0D, $0C, $0B, $00, $00, $00, $00, $00
.byte $00, $00, $00, $00, $06, $06, $06, $05, $04, $00
.byte $FF
Notice that I terminated each envelope with $FF. We need some terminator value so the engine will
know when we've reached the end of the envelope. We could have used any value, but $FF is pretty
common.
Next we will make a pointer table that holds the addresses of our volume envelopes:
volume_envelopes:
.word se_ve_1, se_ve_2, se_ve_3

Declaring variables
In order to apply a volume envelope to a particular stream, we will need a variable that tells us
which one to use. We will also need an index variable that tells us our current position within the
volume envelope:
stream_ve .rs 6 ;current volume envelope

stream_ve_index .rs 6 ;current position within the volume
envelope
stream_ve will tell us which volume envelope to use. Code-wise, it will act as an index into our
pointer table so we know where to read from. Sound familiar? It works the same way as "song
number" did for loading a song. We aren't there yet, but here's a peek at how we will use these
variables to read from the volume envelopes. (x holds the stream number):
sty sound_temp1 ;save y because we are about to
destroy it.
lda stream_ve, x ;which volume envelope?

asl a ;multiply by 2 because we are
indexing into a table of addresses (words)
tay
lda volume_envelopes, y ;get the low byte of the address from
the pointer table
sta sound_ptr
lda volume_envelopes+1, y ;get the high byte of the address
sta sound_ptr+1

ldy stream_ve_index, x ;our current position within the
volume envelope.
lda [sound_ptr], y ;grab the value.
;check against $FF (our termination value)
;set the volume
;increment stream_ve_index
;etc

Compare this code to the beginning of the sound_load routine. Are you starting to see a pattern?
Initializing
Whenever we add a new feature, we need to consider how we should initialize it. Every stream in
our music data will potentially have a different volume envelope, so we should add a volume
envelope field to our header. Volume envelopes will deprecate our old "initial volume" field, but we
will still need to have duty cycle info, so we'll just rename that field:
main header:
--------+----------------
--------+----------------
stream headers:
--------+----------------
--------+----------------
01 | status byte
02 | which channel
03 | initial duty (for triangle, set the 7bit)
04 | volume envelope
07 | initial tempo
To read this data from the header, we will have to insert the following code into our sound_load
routine (after reading the duty):
lda [sound_ptr], y ;the stream's volume envelope

sta stream_ve, x
iny

Notes will always start from the beginning of the volume envelope, so we can just initialize
stream_ve_index to 0:
lda #$00

sta stream_ve_index, x

Now we just need to make sure to assign volume envelopes to all the streams in our song data and
we're ready to go:
song5_header:
.byte $01 ;1 stream

.byte SFX_1 ;which stream
.byte $70 ;initial duty (01). Initial volume
deprecated.
.byte $00 ;the first volume envelope (se_ve_1)
.byte $FF ;tempo..very fast tempo
Remember that you can always create descriptive aliases for your volume envelopes if you don't
want to remember which number is which:
;volume envelope aliases

ve_short_staccato = $00
ve_fade_in = $01
ve_blip_echo = $02
song5_header:
.byte $01 ;1 stream

.byte SFX_1 ;which stream
.byte $7F ;initial duty (01). Initial volume
deprecated.
.byte ve_short_staccato ;the first volume envelope (se_ve_1)
.byte $FF ;tempo..very fast tempo

Using aliases is a good idea because the assembler will give you an error if you mistype your alias.
If you mistype your number, and it is still a valid number, the assembler won't know there's a
problem and will assemble it. This kind of bug in your data can be hard to trace.
Implementing Volume Envelopes

To implement volume envelopes, we need to modify the code where we set the volume. Instead of
using a fixed value like we were doing before, we need to read from our current position in the
volume envelope and use that value instead. Our volume code is starting to get a little complicated,
so let's pull it out into its own subroutine. This will make our code easier to follow:
;----------------------------------------------------
; se_set_temp_ports will copy a stream's sound data to the
temporary apu variables
; input:
; X: stream number
se_set_temp_ports:
asl a
asl a
tay

jsr se_set_stream_volume ;let's stick all of our volume code
into a new subroutine
;less cluttered that way

lda #$08


rts
What should our new subroutine se_set_stream_volume do? First it needs to read a value from our
stream's volume envelope. Then it needs to modify the stream's volume using that value. Then we
need to update our position within the volume envelope. Finally it needs to check to see if we are
resting, and silence the stream if we are (we wrote this code last week). It looks something like this
(new code in red):
se_set_stream_volume:
sty sound_temp1 ;save our index into soft_apu_ports
(we are about to destroy y)

tay
the pointer table
sta sound_ptr ;put it into our pointer variable
sta sound_ptr+1

.read_ve:
volume envelope.
cmp #$FF
bne .set_vol ;if not FF, set the volume
dec stream_ve_index, x ;else if FF, go back one and read
again
jmp .read_ve ; FF essentially tells us to repeat
the last
; volume value for the remainder of the note
.set_vol:
sta sound_temp2 ;save our new volume value (about to
destroy A)
lda stream_vol_duty, x ;get current vol/duty settings
and #$F0 ;zero out the old volume
ora sound_temp2 ;OR our new volume in.
ldy sound_temp1 ;get our index into soft_apu_ports

sta soft_apu_ports, y ;store the volume in our temp port
inc stream_ve_index, x ;set our volume envelop index to the
next position
.rest_check:

and #%00000010
beq .tri ;else, silence with #$30
lda #$30
bne .store ;this always branches. bne is
cheaper than a jmp
.tri:
lda #$80
.store:
.done:
rts

After we read a value from our volume envelope, we AND stream_vol_duty with #$F0. This has
the nice effect of clearing the old volume while preserving our squares' duty cycle settings. But we
need to be careful here. Recall that the triangle channel's on/off status is controlled by the low 7
bits of the port:
TRI_CTRL ($4008)
76543210
||||||||
|+++++++- Value
+-------- Control Flag (0: use internal counters; 1: disable
internal counters)
If any of those Value bits are set, the triangle channel will be considered on. Consider what happens
if bit 4, 5 or 6 happen to be set. In this case, ANDing with #$F0 won't turn the triangle channel off.
If the volume we pull from the volume envelope is 0, it won't silence our triangle channel because
bit 4, 5 or 6 will still be set. If we are careful not to set these bits in our song headers, the problem
should never come up. But for completeness we should fix it:
se_set_stream_volume:
sty sound_temp1 ;save our index into soft_apu_ports
(we are about to destroy y)

tay
the pointer table
sta sound_ptr ;put it into our pointer variable
sta sound_ptr+1

.read_ve:
volume envelope.
cmp #$FF
bne .set_vol ;if not FF, set the volume
dec stream_ve_index, x ;else if FF, go back one and read
again
jmp .read_ve ; FF essentially tells us to repeat
the last
; volume value for the remainder of the note
.set_vol:
sta sound_temp2 ;save our new volume value (about to
destroy A)

cpx #TRIANGLE
bne .squares ;if not triangle channel, go ahead
lda sound_temp2
bne .squares ;else if volume not zero, go ahead
(treat same as squares)
lda #$80
bmi .store_vol ;else silence the channel with #$80
.squares:
lda stream_vol_duty, x ;get current vol/duty settings
and #$F0 ;zero out the old volume
ora sound_temp2 ;OR our new volume in.
.store_vol:
ldy sound_temp1 ;get our index into soft_apu_ports
sta soft_apu_ports, y ;store the volume in our temp port
inc stream_ve_index, x ;set our volume envelop index to the
next position
.rest_check:

and #%00000010
beq .tri ;else, silence with #$30
lda #$30
bne .store ;this always branches. bne is
cheaper than a jmp
.tri:
lda #$80
.store:
.done:
rts

New notes
The last thing we need to consider is new notes. When an old note finishes and we start playing a
new note, we will want to reset the volume envelope back to the beginning. This is as easy as
setting stream_ve_index to 0 when we read a new note:
se_fetch_byte:
;...snip... (setup pointers, read byte, test range, etc)
.note:
;do Note stuff
asl a
tay
lda note_table, y
lda note_table+1, y

lda #$00
sta stream_ve_index, x ;reset the volume envelope.

;check if it's a rest and modify the status flag appropriately
jsr se_check_rest

;...snip... (update pointer)

And now we have volume envelopes.
Download and unzip the envelopes.zip sample files. Make sure the following files are in the same
folder as NESASM3:
envelopes.asm
sound_engine.asm
envelopes.chr
note_table.i
note_length_table.i
vol_envelopes.i
song0.i
song1.i
song2.i
song3.i
song4.i
song5.i
envelopes.bat
Double click envelopes.bat. That will run NESASM3 and should produce the envelopes.nes file.
Run that NES file in FCEUXD SP.

Up: Play
Down: Stop
Right : Next Song/SFX
Left : Previous Song/SFX
Song0 is a silence song. Not selectable.

Song1 is a boss song from The Guardian Legend, almost the same as the original.
Song2 is the same short sound effect from last week.
Song3 is a song from Dragon Warrior, very close to the original.
Song4 is the same song4 as last week, but volume envelopes allow us to save some bytes by
reducing rests.
Song5 is a short sound effect, same as last week.
1) create a song header and song data (use the included songs as reference). Don't forget to select a
volume envelope for each stream in your header. Data streams are terminated with $FF.
sound_engine.asm)
Try making your own volume envelopes too. To do so you will need to modify vol_envelopes.i.
Remember that volume envelopes are terminated with $FF.
Nerdy Nights Sound: Part 8: Opcodes and Looping
This Week: Opcodes and Looping
Opcodes
So far our sound engine handles two type of data that it reads from music data streams: notes and
note lengths. This is enough to write complex music but of course we are going to want more
features. We will want control over the sound of our notes. What if we want to change duty cycles
midstream? Or volume envelopes? Or keys? What if we want to loop one part of the song four
times? Or loop the entire song continuously? What if we want to play a sound effect as part of a
song?
All of these types of features, features where you are issuing commands to the engine, are going to
be done through opcodes (also called control codes or command codes). An opcode is a value in
the data stream that tells the engine to run a specific, specialized subroutine or piece of code. Most
opcodes will have arguments sent along with them. For example, an opcode that changes a
stream's volume envelope will come with an argument that specifies which volume envelope to
change to.
We've actually been using an opcode for weeks, I just haven't mentioned it. It's the opcode that
ends a sound, and we've been encoding it in our data streams as $FF. Here is the code we've been
using:
se_fetch_byte:
;---snip--- (fetch a byte and range test)

;do Opcode stuff
cmp #$FF
bne .end
lda stream_status, x ;if $FF, end of stream, so disable it and
silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte

cmp #TRIANGLE
beq .silence_tri ;triangle is silenced differently from
squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence
.silence_tri:
lda #$80 ;triangle silenced with #$80
.silence:
sta stream_vol_duty, x ;store silence value in the stream's
volume variable.
jmp .update_pointer ;done

;---snip--- (do note lengths and notes, update the stream's
pointer)
rts

Here we check if the byte read has a value of $FF. If so we turn the stream off and silence it. That's
an opcode.
It would be pretty messy if every opcode we had was just written straight out like this. Normally
we would pull this code into its own subroutine, like this:
se_fetch_byte:
;---snip--- (fetch a byte and range test)

;do Opcode stuff
cmp #$FF ;end sound opcode
bne .end
jsr se_op_endsound ;call the endsound subroutine
iny
jmp .fetch ;grab the next byte in the stream.

;---snip--- (do note lengths and notes, update the stream's
pointer)
rts

se_op_endsound:
lda stream_status, x ;end of stream, so disable it and silence
and #%11111110

cmp #TRIANGLE
squares and noise
bne .silence
.silence_tri:
.silence:
volume variable.
rts

The .opcode branch is much shorter now. If we wanted to add more opcodes, we could just add
some more compares:
.opcode:
;do Opcode stuff
cmp #$FF ;is it the end sound opcode?
bne .not_FF
jsr se_op_endsound ;if so, call the end sound subroutine
jmp .end ;and finish
.not_FF:
cmp #$FE ;else is it the loop opcode?
bne .not_FE
jsr se_op_loop ;if so, call the loop subroutine
jmp .opcode_done
.not_FE:
cmp #$FD ;else is it the change volume envelope
opcode?
bne .not_FD
jsr se_op_change_ve ;if so, call the change volume envelope
subroutine
jmp .opcode_done
.not_FD:
.opcode_done:
iny ;update index to next byte in the data stream
jmp .fetch ;go fetch another byte

This will work, but it's ugly. The more opcodes we add to our engine, the more checks we need to
make. What if we have 20 opcodes? Do we really want to do that many compares? It's a waste of
ROM space and cycles.
Tables
Anytime you find yourself in a situation where you are doing a lot of CMPs on one value, the
answer is to use a lookup table. It will simplify everything! We've done it already with notes, note
lengths, song numbers and volume envelopes. Could you imagine trying to get a note's period
without using the lookup table? It would look like this:
Is the note an A1? If so, use this period, else

Is the note an A#1? If so, use this period, else
Is the note a B1? If so, use this period, else
Is the note a C2? If so, use this period, else
... (about 100 more checks)
Is the note an F#9? If so, use this period, else
Is the note a rest? If so, use this period
That's just crazy. It would be hundreds of lines of unreadable code and you'd run into branch-range
errors too. When we use a lookup table, the code is simplified to this:
.note:
;do Note stuff
sty sound_temp1 ;save our index
asl a
tay
lda note_table, y
lda note_table+1, y

Much cleaner. Again, I can't stress it enough: if you find yourself doing lots of CMPs on a single
value, use a table instead!

With notes and note lengths we used a straight lookup table of values. With song numbers and
volume envelopes we used a special type of lookup table called a pointer table, which stored data
addresses. For opcodes we have two choices. We can use something called a jump table or we can
use an RTS table. They are almost the same and the difference in performance between the two
methods is negligible so for most programmers it's a matter of personal preference.
I prefer RTS tables myself, but we're going to use jump tables because they are easier to explain and
understand.
Jump Tables
Ok, here's our problem: Our sound engine has opcodes. A lot of them, let's say 10 or more. Each
opcode has its own subroutine. When our sound engine reads an opcode byte from the data stream,
we want to avoid a long list of CMP and BNE instructions to select the right subroutine. How do
we do that? We use a jump table.
A jump table is similar to a pointer table: it is a table of addresses. But whereas a pointer table
holds addresses that point to the start of data, a jump table holds addresses that point to the start of
code (ie, the start of subroutines). For example, suppose we have some subroutines:
sub_a:
lda #$00
ldx #$FF
rts

sub_b:
clc
adc #$03
rts

sub_c:
sec
sbc #$03
rts

Here is how a jump table would look using these subroutines:
sub_jump_table:
.word sub_a, sub_b, sub_c

Hey, that's pretty easy. We just use the subroutine label and the assembler will translate that into the
address where the subroutine starts. Let's make a jump table for our sound opcode subroutines:
se_op_endsound:
;do stuff
rts

se_op_infinite_loop:
;do stuff
rts

se_op_change_ve:
;do stuff
rts

;etc.. more subroutines
;this is our jump table

sound_opcodes:
.word se_op_endsound
.word se_op_infinite_loop
.word se_op_change_ve
;etc, one entry per subroutine

Cool. We have a jump table now. So how do we use it?
Indirect Jumping
The 6502 let's us do some cool things. One of those things is called an indirect jump. An indirect
jump let's you stick a destination address into a zero-page pointer variable and jump there. It works
like this:
.rsset $0000

;first declare a pointer variable somewhere in the zero-page
jmp_ptr .rs 2 ;2 bytes because an address is always a word
lda #$00

sta jmp_ptr
lda #$80
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to $8000

Here we stick an address ($8000, lo byte first) into our jmp_ptr variable. Then we do an indirect
jump by using the JMP instruction followed by a pointer variable in brackets:
jmp [jmp_ptr] ;indirect jump

This instruction translates into English as "Jump to the address that is stored in jmp_ptr and
jmp_ptr+1". It's extrememly useful. We can stick any address we want in there:

lda #$00
sta jmp_ptr
lda #$C0
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to $C000

We could read an address from ROM and use that if we wanted to, for example our reset vector:

lda $FFFC
sta jmp_ptr
lda $FFFD
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to our reset routine

And we can use it in combination with our jump table:
lda sound_opcodes, y ;read low byte of address from jump table
sta jmp_ptr
lda sound_opcodes+1, y ;read high byte
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to whatever address we pulled from the
table.

Pretty powerful. We can dynamically jump to any section of code we want!

Implementation
So we know how to build a jump table and we know how to do an indirect jump. Let's tie it all
together and stick it into our sound engine. Let's start with se_fetch_byte. se_fetch_byte reads a
byte from the data stream and range-checks it to see if it is a note, note length or opcode. Recall
that notes have a byte range of $00-$7F. Note lengths have a range of $80-$9F. The opcode byte
range is $A0-$FF:
se_fetch_byte:
sta sound_ptr
sta sound_ptr+1

ldy #$00
lda [sound_ptr], y
cmp #$A0
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
.note_length:
.note:
;do note stuff

So we need to assign our opcodes to values between $A0 and $FF. Just as with notes and note
lengths, the opcode byte we read from the data stream will be used as a table index (after
subtracting $A0), so we will assign our opcodes in the same order as our table:
sound_opcodes:
.word se_op_endsound ;this should be $A0
.word se_op_infinite_loop ;this should be $A1
.word se_op_change_ve ;this should be $A2
;etc, 1 entry per subroutine
;these are aliases to use in the sound data.

endsound = $A0
loop = $A1 ;be careful of conflicts here. this might
be too generic. maybe song_loop is better
volume_envelope = $A2
Now let's alter se_fetch_byte to take care of our opcodes:
se_fetch_byte:
sta sound_ptr
sta sound_ptr+1

ldy #$00
.fetch:
lda [sound_ptr], y
cmp #$A0
;do Opcode stuff
jsr se_opcode_launcher ;launch our opcode!!!
iny ;next position in the data stream
and #%00000001
bne .fetch ;after our opcode is done, grab another
byte unless the stream is disabled
rts ; in which case we quit (explained
below)
.note_length:
.note:
;do note stuff

I added a call to a subroutine called se_opcode_launcher and a little branch. Not a big change is it?
But there's an important detail here. se_opcode_launcher will be a short, simple subroutine that will
read from the jump table and perform an indirect jump. It looks like this:
se_opcode_launcher:
sty sound_temp1 ;save y register, because we are about to
destroy it
sec
sbc #$A0 ;turn our opcode byte into a table index
by subtracting $A0
; $A0->$00, $A1->$01, $A2->$02, etc. Tables index
from $00.
asl a ;multiply by 2 because we index into a
table of addresses (words)
tay
lda sound_opcodes, y ;get low byte of subroutine address
sta jmp_ptr
lda sound_opcodes+1, y ;get high byte
sta jmp_ptr+1
ldy sound_temp1 ;restore our y register
iny ;set to next position in data stream
(assume an argument)
jmp [jmp_ptr] ;indirect jump to our opcode subroutine

Short and simple. So why did I wrap this code in its own subroutine? Why not just stick this code
as-is in the .opcode branch of se_fetch_byte? Because we need a place to return to.
The JSR and RTS instructions work as a pair. They go hand in hand. They need each other.
Without going into too much detail, this is what goes on behind the scenes:
JSR sticks a return address on the stack and jumps to a subroutine. One way to look at it is to think
of JSR as a JMP that remembers where it started from.
RTS pops the return address off the stack and jumps there.
So JSR leaves a treasure map for RTS to pick up and follow later. The key point here is that RTS
expects a return address to be waiting for it on the stack.
Now our opcode subroutines all end in an RTS instruction. Do you see the potential problem here?
We call our opcode subroutines using an indirect jump. This requires us to use a JMP instruction,
not a JSR instruction. A JMP instruction doesn't remember where it started from. No return address
is pushed onto the stack with a JMP instruction. So when we jump to our opcode subroutine and hit
the RTS instruction at the end, there is no return address waiting for us! The RTS will pull
whatever random values happen to be on the stack at the time and jump there. We'll end up
somewhere random and our program will surely crash!
To fix this, we wrap our indirect jump in a subroutine, se_opcode_launcher. We call it with a JSR
instruction, completing the JSR/RTS pair:
jsr se_opcode_launcher ;this jsr will let us remember where we

came from

This JSR instruction will stick a return address on the stack for us. Then inside se_opcode_launcher
we perform our indirect jump to our desired opcode subroutine. Now when we hit that RTS
instruction at the end of the opcode subroutine we have a return address waiting for us on the stack.
Our program returns back to where we started. We are safe.
Opcode Subroutines
With our opcode launcher written, we are all set up to make opcodes. We already have one written:
the endsound opcode. This is the opcode we will use to terminate sound effects. Sound effects
don't loop continuously like songs do, so they need to be stopped. Let's take a look again:
se_op_endsound:
lda stream_status, x ;end of stream, so disable it and silence
and #%11111110

cmp #TRIANGLE
squares and noise
bne .silence ; (this will always branch. bne is
cheaper than a jmp)
.silence_tri:
.silence:
volume variable.
rts

This opcode is special. It's the reason for the check after the call to se_opcode_launcher:
se_fetch_byte:
;---snip---
;do Opcode stuff
jsr se_opcode_launcher
iny ;next position in the data stream
and #%00000001
bne .fetch ;after our opcode is done, grab another
byte unless the stream is disabled
rts ; in which case we quit (explained
below)

;---snip---

Normally, we want se_fetch_byte to keep fetching bytes until it hits a note. Recall that with note
lengths we jumped back to .fetch after setting the new note length. This is because after setting the
length of the note, we needed to know WHAT note to play. So we fetch another byte. The same
thing is true of opcodes. If we change the volume envelope with an opcode, great! But we still
need to know what note to play next. If we use an opcode to switch our square's duty cycle, great!
But we still need to know what note to play next. If we use an opcode to loop back to the beginning
of the song, that's great! But we still need to read that first note of the song. This is why we jump
back to fetch a byte after we run an opcode.
The ONE exception to this rule is when we end a sound effect. We are terminating the sound effect
completely, so there is no next note. We don't want to fetch something that isn't there, so we need to
skip the jump. That's why we check the status byte after we run the opcode. If the stream is
disabled by the endsound opcode, we are finished. Otherwise, fetch another byte.
Looping
The next opcode in our list is the loop opcode. This is the opcode that we will stick at the end of
every song to tell the sound engine to play the song again, and again and again. It is actually quite
easy to implement. It takes a 2-byte argument, which is the address to loop back to. The
subroutine looks like this:
lda [sound_ptr], y ;read LO byte of the address argument
sta stream_ptr_LO, x ;save as our new data stream position
iny
lda [sound_ptr], y ;read HI byte of the address argument
sta stream_ptr_HI, x ;save as our new data stream position
data stream position

sta sound_ptr+1 ;update the pointer to reflect the new
position.
sta sound_ptr
ldy #$FF ;after opcodes return, we do an iny.
Since we reset
;the stream buffer position, we will want y to start
out at 0 again.
rts

The first thing to notice about this subroutine is that it reads two bytes from the data stream. This is
the address argument that gets passed along with the opcode.
To make it clear, let's look at some example sound data:
song1_square1:
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6, C5, Eb5, G5, C6, Eb6, half,
G6 ;play some notes
.byte loop ;this alias evaluates to $A1, the loop opcode
.word song1_square1 ;this evaluates to the address of the
song1_square1 label
;ie, the address we want to loop to.

After the "loop" opcode comes a word which is the address to loop back to. In this example I chose
to loop back to the beginning of the stream data.
So what does our loop opcode do? It reads the first byte of this address argument (the low byte) and
stores it in stream_ptr_LO. Then it reads the second byte of the address argument (the high byte)
and stores it in stream_ptr_HI. These are the variables that keep track of our data stream position!
The loop opcode just changes these values to some address that we specify. Not too complicated at
all. The last step is to update the actual pointer (sound_ptr) so that the next byte we read from the
data stream will be the first note we looped back to.
In the example sound data above I looped back to the beginning of the stream data, but there's
nothing stopping me from looping somewhere else:
song1_square1:
;intro, don't loop this part
.byte quarter
.byte C4, C4, C4, C4
.loop_point: ;this is where we will loop back to.
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6, C5, Eb5, G5, C6, Eb6, half, G6
.byte loop ;this alias evaluates to $A1, the loop opcode
.word .loop_point ;this evaluates to the address of
the .loop_point label
;ie, the address we want to loop to.

Technically we can also "loop" to a forward position, in which case it's actually more like a jump
than a loop. That's all a loop is really: a jump... backwards.
Changing Volume Envelopes

Let's write the opcode subroutine to change volume envelopes. This one is even easier.
It takes one argument, which will be which volume envelope to switch to:
se_op_change_ve:
lda [sound_ptr], y ;read the argument
sta stream_ve, x ;store it in our volume envelope variable
lda #$00
sta stream_ve_index, x ;reset volume envelope index to the
beginning
rts

That's it!
Changing Duty Cycles

Now let's add an opcode that will change the duty cycle for a square stream. This one also takes
one argument: which duty cycle to switch to.
se_op_duty:
lda [sound_ptr], y ;read the argument (which duty cycle to
change to)
sta stream_vol_duty, x ;store it.
rts

Done! Now we have the subroutine, but we still need to add it to our jump table:
sound_opcodes:
.word se_op_endsound ;this should be $A0
.word se_op_loop ;this should be $A1
.word se_op_change_ve ;this should be $A2
.word se_op_duty ;this should be $A3
;etc, 1 entry per subroutine

endsound = $A0
loop = $A1
duty = $A3
And it's ready to use:
song0_square1:
;intro, don't loop this part
.byte quarter
.byte C4, C4, C4, C4
.loop_point: ;this is where we will
loop back to.
.byte duty, $B0 ;change the duty cycle
.byte volume_envelope, ve_blip_echo ;change the volume envelope
.byte eighth ;set note length to eighth
notes
.byte C5, E5, G5, C6, E6, G6 ;play some notes

.byte duty, $30 ;change the duty cycle
.byte volume_envelope, ve_short_staccato ;change volume
envelope
.byte C5, Eb5, G5, C6, Eb6, half, G6 ;play some eighth notes
and a half note

.byte loop ;loop to .loop_point
.word .loop_point

Readability
sound_engine.asm is getting pretty bulky with all these subroutines. It will only get bigger as we
add more opcodes. It's nice to have all of our opcodes together in one place, but it's annoying to
have to scroll around to find them. So let's pull all of our opcodes into their own file:
sound_opcodes.asm. Then, at the bottom of sound_engine.asm, we can .include it:
.include "sound_opcodes.asm" ;our opcode subroutines, jump table and aliases
.include "note_table.i" ;period lookup table for notes
.include "note_length_table.i"
.include "vol_envelopes.i"
.include "song0.i" ;holds the data for song 0 (header and data streams)
.include "song1.i" ;holds the data for song 1
.include "song2.i"
.include "song3.i"
.include "song4.i"
.include "song5.i"
.include "song6.i" ;oooh.. new song!

I gave it the extension .asm because it contains code as well as data, and I like to be able to tell at a
glance what files have what in them. Now whenever we want to add new opcodes, or tweak old
ones, we have them nice and compact in their own file.
Updating Sound Data

Whenever we add new things to our sound engine, we have to think about how it will affect our old
sound data. This week we added opcodes, which will change our songs and sound effects
terminate. Before we were terminating them with $FF. This won't work anymore because $FF
doesn't do anything. For songs, we should terminate with "loop" followed by an address to loop to.
With sound effects we should terminate with the opcode "endsound". See the included songs and
sound effects for examples.
RTS Tables
We talked about jump tables and indirect jumping this week. Another method for doing the same
thing involves something called an RTS table and the RTS Trick. I won't cover it in these
tutorials, but if you are curious to know how this works you can read this nesdev wiki article I wrote
about the RTS Trick.

Download and unzip the opcodes.zip sample files. Make sure the following files are in the same
folder as NESASM3:
opcodes.asm
sound_engine.asm
sound_opcodes.asm
opcodes.chr
note_table.i
note_length_table.i
vol_envelopes.i
song0.i
song1.i
song2.i
song3.i
song4.i
song5.i
song6.i
opcodes.bat
Double click opcodes.bat. That will run NESASM3 and should produce the opcodes.nes file. Run

Up: Play
Down: Stop
Right: Next Song/SFX
Left: Previous Song/SFX

Song1 is a boss song from The Guardian Legend. Now it loops!
Song2 is the same short sound effect from last week. Terminated with endsound.
Song3 is a song from Dragon Warrior. Now it loops!
Song4 is the same song4 as last week, but now it loops!
Song5 is a short sound effect, terminated with the endsound opcode.
Song6 should be familiar to readers of this forum. Do you recognize it? It utilizes opcodes for
changing duty cycles and volume envelopes. Plus it loops!
Try adding your own songs and sound effects in. Try to add your own opcodes too. Here's some
ideas for opcodes:
1. Trigger a sound effect mid-song

2. Implement duty cycle envelopes (similar to volume envelopes). Then make an opcode that
allows you to change it.
3. Finite loops
Finite Loops, Key Changes, Chord Progressions
This Week: More opcodes: Finite Loops, Key Changes, Chord Progressions
Opcodes
Last week we learned how to use opcodes. Opcodes allow a song's streams to call a subroutine
mid-play. This is a very powerful tool. We learned some of the most common opcodes: infinite
loop (really a jump), change volume envelopes and change duty cycles. Today we are going to
expand on opcodes and learn some cool opcode tricks that can save us a lot (!) of bytes and time.
Finite Looping
Last week we added the infinite loop opcode, which was really just an unconditional jump back to
an earlier part of the song. Today we're going to add a finite loop opcode. A finite loop opcode tells
the sound engine to repeat a particular section of a song X times, where X is some number defined
by you. In the Battle Kid theme song I added last week there is a passage that looks like this:
.byte sixteenth

.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4,
A4
A4
A4
.byte A3, C4, E4, A4, A3, E4, E3, E2

This is really just the same 4 notes repeated over and over again. Wouldn't it be cooler if we could
do something like this instead:
.byte sixteenth

.byte A3, C4, E4, A4
.byte loop_13_times_please
.byte A3, E4, E3, E2

That saves a lot of bytes. We go from 56 bytes all the way down to around 10. The Battle Kid song
actually plays this same phrase on both square channels, so really we go from 100+ bytes down to
20 or so. That's a big deal! If we consider how common repetitions of 4 or 8 occur in music, we
can easily see that having a finite loop opcode could potential save us hundreds if not thousands of
bytes in our sound data.
Finite Looping?
So what is a finite loop really? We saw that with an infinite loop it was really more like an
unconditional jump. When the sound engine hits the infinite loop opcode, it jumps back, always, no
matter what, no questions asked. A finite loop on the other hand is a conditional jump. It checks a
counter. If the counter isn't 0 it jumps. If it is 0, it doesn't jump.
Loop Counter
First things first we need a loop counter. Each stream will have the ability to loop, so each stream
will need its own loop counter:
stream_loop1 .rs 6 ;loop counter variable (one for each stream)
We will want to initialize this to 0 in our sound_load code:
lda #$00

sta stream_loop1, x
Next we will need a way to set this counter to some value. Some games bundle this up together in
the finite loop opcode, but I prefer to make it its own opcode:
;-----------------------------------------------------------------
------
;this is our JUMP TABLE!
sound_opcodes:
.word se_op_endsound ;$A0
.word se_op_infinite_loop ;$A1
.word se_op_change_ve ;$A2
.word se_op_duty ;$A3
.word se_op_set_loop1_counter ;$A4

endsound = $A0
loop = $A1
duty = $A3
set_loop1_counter = $A4

se_op_set_loop1_counter:
lda [sound_ptr], y ;read the argument (# times to loop)
sta stream_loop1, x ;store it in the loop counter variable
rts
Now we have an easy way to set the loop counter any time we want, like this:
;somewhere in sound data:

.byte set_loop1_counter, $04 ;repeat 4 times
Looping With The Counter
Our finite loop opcode will work like the infinite loop opcode, with two changes:
1) it will decrement the loop counter

2) it will check the result and only jump on a non-zero result
Let's write it:
;-----------------------------------------------------------------
------
sound_opcodes:
.word se_op_loop1 ;$A5

endsound = $A0
loop = $A1
duty = $A3
loop1 = $A5

se_op_loop1:
dec stream_loop1, x ;decrement the counter
lda stream_loop1, x ;and check it
beq .last_iteration ;if zero, we are done looping
.loop_back:
lda [sound_ptr], y ;read ptr LO from the data stream
sta stream_ptr_LO, x ;update our data stream position
iny
lda [sound_ptr], y ;read ptr HI from the data stream
sta stream_ptr_HI, x ;update our data stream position

position.
sta sound_ptr
Since we reset
out at 0 again.
rts
.last_iteration:
iny ;skip the first byte of the address
argument
; the second byte will be skipped automatically upon
return
; (see se_fetch_byte. There is an "iny" after "jsr
se_opcode_launcher")
rts

Now we can loop. To use the Battle Kid example above, we go from this (56 bytes):
A4
A4
A4
.byte A3, C4, E4, A4, A3, E4, E3, E2

to this (13 bytes):

.byte set_loop1_counter, 13 ;repeat 13 times.
.intro_loop: ;make sure our loop point is AFTER
we set the counter!
.byte A3, C4, E4, A4 ;the phrase to repeat.
.byte loop1 ;finite loop opcode
.word .intro_loop ;address to jump back to

.byte A3, E4, E3, E2 ;the last 4 notes

Pretty nice savings. Chances are we will be using this opcode set a lot.
Bonus
We can save a few more bytes here. You may have noticed that the code in the .loop_back section
of our finite loop opcode is identical to the infinite loop code:
se_op_loop1:

;---snip---

.loop_back:
iny

position.
sta sound_ptr
Since we reset
out at 0 again.
rts
;---snip---
Compare with:
iny

position.
sta sound_ptr
Since we reset
out at 0 again.
rts
Why have identical code in two places? Let's cut out the whole .loop_back section and replace it
with a "jmp se_op_infinite_loop":
se_op_loop1:
dec stream_loop1, x ;decrement the counter
lda stream_loop1, x ;check the counter
beq .last_iteration ;if zero, we are done looping
jmp se_op_infinite_loop ;if not zero, loop back
.last_iteration:
iny ;skip the first byte of the address
argument
; the second byte will be skipped automatically upon
return
; (see se_fetch_byte after "jsr se_opcode_launcher")
rts

Multiple Finite Loops
You may have been wondering why I named the finite loop opcode "loop1". Why stick a 1 on the
end there? This is because sometimes one finite loop opcode isn't enough. Consider the following
song structure. Assume each letter represents a long series of notes:
A A A B C
A A A B C
A A A B C
A A A B C

With one finite loop opcode you could reduce it to this:
(A A A B C)x4

But if you had two finite loop opcodes available, you could nest them to reduce it even further:
(Ax3 B C)x4

If the music you write has a lot of patterns like this, it may be worth your while to have two or more
finite loop opcodes available to you so that you can nest them. To add another finite loop opcode
you need to:
1) declare another loop counter variable block in RAM (stream_loop2 .rs 6)

2) initialize the new loop counter to 0 in the sound_load routine.
3) add a new opcode for setting the new loop counter (se_op_set_loop2_counter)
4) add a new opcode to check the new counter and loop (se_op_loop2)
5) make sure to add the new opcodes to the jump table and give them an alias (set_loop2_counter,
loop2).
Each finite loop opcode you add requires 6 bytes of RAM (a limited resource!), so please consider
carefully if it is worth the tradeoff. It all depends on your music data.
Changing Keys
Another useful feature to have is the ability to change keys. Imagine you write a song and you have
it all done. Then at the last minute you decide you want it to be in another key, say a step (2 notes)
lower. Rather than rewrite the whole song by hand (it takes forever), wouldn't it be nice if there was
an opcode that you could set to automatically subtract two from every note? What if you have a
song pattern that gets played in more than one key (a rhythm track for a Blues song, for example)?
We could save lots of bytes if we can figure out a way to write the pattern once, and then loop it
while changing keys each iteration. Let's do it.
Note Offset
We will implement keys by having a note offset variable:
stream_note_offset .rs 6 ;note offset
The note offset is a value that gets added to the note value before pulling the period out of the
note_table. We will initialize stream_note_offset to 0 so that the default behavior is to add 0 to the
note (resulting in no change). However, if we set stream_note_offset to some value via an opcode,
it will change the notes. Here is an updated se_fetch_byte that demonstrates how this works:
se_fetch_byte:
;...snip...
.note:
;do Note stuff
clc
adc stream_note_offset, x ;add note offset
asl a
tay
lda note_table, y
lda note_table+1, y
;...snip...

Imagine what would happen if we have stream_note_offset set to 2. Say we read a C4 note from
the data stream:

1. A C4 note is equivalent to hex value #$1b (see aliases in note_table.i)
2. we add stream_note_offset to this value. #$1b + #$02 = #$1d.
3. hex value #$1d is equivalent to a D4 note (see note_table.i)
4. wow, we raised the note up a step!
Using the same value for stream_note_offset, if we had a string of notes like this:
C4, E4, G4, B4, C5, E5, G5, E5, B5, C6 ;Cmaj7
it would get translated to:
D4, Fs4, A4, C#5, D5, Fs5, A5, C#6, D6 ;Dmaj7
Using stream_note_offset we can easily transpose entire sections of music into other keys. As
mentioned above, we will initialize a stream's stream_note_offset to zero:
sound_load:
;---snip---
lda #$00
sta stream_note_offset, x
;---snip---

Set Note Offset
Now let's make an opcode that will set stream_note_offset to a specific value:
;-----------------------------------------------------------------
------
sound_opcodes:
.word se_op_set_note_offset ;$A6

endsound = $A0
loop = $A1
duty = $A3
loop1 = $A5
set_note_offset = $A6
se_op_set_note_offset:
lda [sound_ptr], y ;read the argument
sta stream_note_offset, x ;set the note offset.
rts
Now we can set the note offset anytime we want in the data stream:
;oops, after writing the song, I realized I wanted it to be in D

instead. No problem.
sound_data:
.byte set_note_offset, 2
.byte C2, C3, C4, C5, ;etc.. more notes in the key of C.

Adjust Note Offset
Setting the note offset to a specific value has very limited application. It's like a one-time
keychange. More often we will want to set the note offset to some relative value. For example,
instead of setting stream_note_offset to 2, we might want to set stream_note_offset to "the current
offset + 2". If we had an opcode that let us adjust stream_note_offset by a relative value, we could
use it together with loops. First let's write the opcode:
;-----------------------------------------------------------------
------
sound_opcodes:
.word se_op_adjust_note_offset ;$A7

endsound = $A0
loop = $A1
duty = $A3
loop1 = $A5
adjust_note_offset = $A7
se_op_adjust_note_offset:
lda [sound_ptr], y ;read the argument (what value to
add)
clc
adc stream_note_offset, x ;add it to the current offset
sta stream_note_offset, x ;and save.
rts

Let's look at this opcode in use. Say we have a long arpeggiated line like this:
C2, E2, G2, B2, C3, E3, G3, B3, C4, E4, G4, B4, C5, E5, G5, B5,
C6, E6, G6, B6, C7 ;Cmaj7 (21 bytes)
This passage just repeats the same 4 notes (C E G B) over 5 octaves.
.byte set_loop1_counter, 5 ;loop 5 times

.loop
.byte C2, E2, G2, B2 ;these are the 4 notes to loop
.byte adjust_note_offset, 12 ;each iteration add 12 to the
offset (ie, go up an octave)
.byte loop1
.word .loop

.byte C2 ;will be a C7. Cmaj7 (12 bytes)

The first time through the loop it will play C2, E2, G2, B2. The second time through the loop it will
play C3, E3, G3, B3. The third time through will be C4, E4, G4, B4, etc. Using our opcodes, we
reduce the size of our data from 21 bytes to 12 bytes. That's almost 50% savings.
Battle Kid
To take a better example, let's look at the bassline to the Battle Kid theme song. Last week, it
looked like this:
song6_tri:
.byte eighth
.byte A3, A3, A4, A4, A3, A3, A4, A4
.byte G3, G3, G4, G4, G3, G3, G4, G4 ;down a step (-2)
.byte F3, F3, F4, F4, F3, F3, F4, F4 ;down a step (-2)
.byte Eb3, Eb3, Eb4, Eb4, Eb3, Eb3, Eb4, Eb4 ;down a step (-2)
.byte loop
.word song6_tri
;36 bytes

We have a pattern here: X3, X3, X4, X4, X3, X3, X4, X4, where X = some note. It just so
happens that each new X is just the previous X minus 2. Using our new opcode, we can rewrite the
bassline like this:
song6_tri:
.byte eighth
.byte set_loop1_counter, 4 ;repeat 4 times
.loop:
.byte A3, A3, A4, A4, A3, A3, A4, A4 ;series of notes to
repeat
.byte adjust_note_offset, -2 ;go down a step
.byte loop1
.word .loop

.byte set_note_offset, 0 ;after 4 repeats, reset
note offset to 0.
.byte loop ;infinite loop
.word song6_tri
;21 bytes

We drop from 36 bytes to 21 bytes of ROM space. About 40% savings!

Loopy Sound Effects
We can produce some cool sound effects if we combine loops and key changes at high tempos.
Look at this one (tempo is $FF):
song7_square2:
.byte set_loop1_counter, $08 ;repeat 8 times
.loop:
.byte thirtysecond, D7, D6, G6 ;play two D notes at different
octaves and a G. Pretty random
.byte adjust_note_offset, -4 ;go down 2 steps
.byte loop1
.word .loop
.byte endsound

This sound effect plays a simple 3-note pattern in descending keys super fast. The sound data is
only 12 bytes, but it produces a pretty complex sound effect. Listen to song7 in this week's sample
files to hear it. By experimenting with loops like this we can come up with some sounds that would
be difficult to compose by hand.

Complex Chord Progressions
We made some good savings percentage-wise on the bassline to Battle Kid. But we were lucky.
The chord progression went down in consistent steps: -2, -2, -2. It was possible to loop this because
we adjust the note_offset by the same value (-2) each time. But what if we had a pattern that was
repeated in a more complicated way? We do. Let's look at the rhythm pattern for our Guardian
Legend boss song:
song1_square1:
.byte eighth
.byte A2, A2, A2, A3, A2, A3, A2, A3
.byte F3, F3, F3, F4, F3, F4, F3, F4 ;+8 (A2 + 8 = F3)
.byte A2, A2, A2, A3, A2, A3, A2, A3 ;-8
.byte F3, F3, F3, F4, F3, F4, F3, F4 ;+8
.byte E3, E3, E3, E4, E3, E4, E3, E4 ;-1
.byte E3, E3, E3, E4, E3, E4, E3, E4 ;+0
.byte Ds3, Ds3, Ds3, Ds4, Ds3, Ds4, Ds3, Ds4 ;-1
.byte D3, D3, D3, D4, D3, D4, D3, D4 ;-1
.byte C3, C3, C3, C4, C3, C4, C3, C4 ;-2
.byte B2, B2, B2, B3, B2, B3, B2, B3 ;-1
.byte As2, As2, As2, As3, As2, As3, As2, As3 ;-1
.byte A2, A2, A2, A3, A2, A3, A2, A3 ;-1
.byte Gs2, Gs2, Gs2, Gs3, Gs2, Gs3, Gs2, Gs3 ;-1
.byte G2, G2, G2, G3, G2, G3, G2, G3 ;-1
.byte loop ;+2 (loop back to
A2)
.word song1_square1

Here we have another pattern: Xi, Xi, Xi, Xi+1, Xi, Xi+1, Xi, Xi+1, where X = some note and i =
some octave. Cool. A pattern means we have an opportunity to save bytes by looping. But wait.
Unlike Battle Kid, this pattern jumps around in an inconsistent way. What should we do?
Super TGL Transposition Trick

I learned this trick from The Guardian Legend, so I call it the TGL Transposition Trick. What we
do is we loop the pattern, and then use the loop counter as an index into a lookup table. The lookup
table contains note offset values. Because the loop counter decrements, our lookup table will be
sequentially backwards.
Wait, what?
Let's looks at our example:
song1_square1:
.byte eighth
.loop:
.byte A2, A2, A2, A3, A2, A3, A2, A3
;pull a value from lookup_table and
; add it to stream_note_offset
.byte loop1 ;finite loop (14 times)
.word .loop

.word song1_square1

.lookup_table:
.byte 2, -1, -1, -1, -1, -1, -2
.byte -1, -1, 0, -1, 8, -8, 8 ;14 entries long, reverse
order

I'm going to break it down in a second here, but first let me tell you that the part highlighted in red
above will be covered by a single opcode, transpose. The transpose opcode takes a 2-byte
argument, so altogether that commented section will be replaced with 3 bytes of data. So if we
count up all of the bytes in our rhythm sound data we get 34 bytes. The original was 116 bytes. By
using the TGL Transposition Trick, we save 82 bytes. That's 70%!
song1_square1:
.byte eighth
.loop:
.byte A2, A2, A2, A3, A2, A3, A2, A3
.byte transpose ;the transpose opcode
take a 2-byte argument
.word .lookup_table ;which is the address of
the lookup table

.byte loop1 ;finite loop (14 times)
.word .loop

.word song1_square1

.lookup_table:
.byte 2, -1, -1, -1, -1, -1, -2
.byte -1, -1, 0, -1, 8, -8, 8 ;14 entries long, reverse
order
;*** altogether 34 bytes ***
The transpose opcode will set up a pointer variable to point to the lookup table. Then it will take
the loop counter, subtract 1, and use the result as an index into the table. We subtract 1 because the
tables index from zero. If we loop 14 times, our table will have 14 entries numbered 0-13. Once
the transpose opcode has its index, it will pull a value from the table. This value will be added to
stream_note_offset.
Before we write the opcode, let's trace through the data to see how it works. We'll start at the very
first byte of song1_square1:
1) set note length to eighth notes

2) set the loop counter to 14
(.loop iteration 1)
3) play a series of notes: A2, A2, A2, A3, A2, A3, A2, A3
4) transpose opcode. Setup a pointer to lookup_table. Use our loop counter, minus one, as an
index. The loop counter is 14 now, so we will pull out .lookup_table+13, which is an 8. Add 8 to
the current stream_note_offset: 0 + 8 = 8.
5) decrement the loop counter (14->13) and loop back to the .loop label
(iteration 2)
6) our new string of notes with the +8: F3, F3, F3, F4, F3, F4, F3, F4.
7) transpose opcode. Loop counter is 13. Grab .lookup_table+12, which is -8. Add -8 to
stream_note_offset: 8 + -8 = 0.
8) decrement loop counter (13->12) and loop back to .loop label
(iteration 3)
9) our new string of notes with the +0: A2, A2, A2, A3, A2, A3, A2, A3
10) transpose opcode. Loop counter is 12. Grab .lookup_table+11, which is 8. Add 8 to
stream_note_offset: 0 + 8 = 8.
(iteration 4)
12) our new string of notes with the +8: F3, F3, F3, F4, F3, F4, F3, F4.
13) transpose opcode. Loop counter is 11. Grab .lookup_table+10, which is -1. Add -1 to
stream_note_offset: 8 + -1 = 7.
(iteration 4)
15) our new string of notes with the +7: E3, E3, E3, E4, E3, E4, E3, E4.
16) transpose opcode. Loop counter is 10. Grab .lookup_table+9, which is 0. Add 0 to
stream_note_offset: 7 + 0 = 7.
etc. On the last iteration our loop counter is 1. We grab .lookup_table+0 and add it to
stream_note_offset. Then we decrement the loop counter (1->0). Our loop counter is now 0, so our
loop breaks. Pretty cool, no? Let's write it.
;-----------------------------------------------------------------
------
sound_opcodes:
.word se_op_adjust_note_offset ;$A7
.word se_op_transpose ;$A8

endsound = $A0
loop = $A1
duty = $A3
loop1 = $A5
adjust_note_offset = $A7
transpose = $A8
se_op_transpose:
lda [sound_ptr], y ;read low byte of the pointer to our
lookup table
sta sound_ptr2 ;store it in a new pointer variable
iny
lda [sound_ptr], y ;read high byte of pointer to table
sta sound_ptr2+1

sty sound_temp ;save y because we are about to
destroy it
lda stream_loop1, x ;get loop counter, put it in Y
tay ; this will be our index into the
lookup table
dey ;subtract 1 because indexes start
from 0.

lda [sound_ptr2], y ;read a value from the table.
clc
adc stream_note_offset, x ;add it to the note offset
sta stream_note_offset, x

ldy sound_temp ;restore Y
rts

There is a new pointer variable here, sound_ptr2. Actually, what I really did was rename jmp_ptr to
sound_ptr2. The new name let's me know it's for sound engine use only. Since we finish with
jmp_ptr as soon as we jump, there are no pointer conflicts here.
Conclusion
This is just an example of how clever use of opcodes and looping can save you lots of bytes. Keep
in mind that this transpose opcode is only useful if you write music that has repeating patterns in the
rhythm section. If you don't, then save yourself some bytes and cut the opcode from your sound
engine.

Download and unzip the opcodes2.zip sample files. Make sure the following files are in the same
folder as NESASM3:
opcodes2.asm
sound_engine.asm
sound_opcodes.asm
opcodes2.chr
note_table.i
note_length_table.i
vol_envelopes.i
song0.i
song1.i
song2.i
song3.i
song4.i
song5.i
song6.i
song7.i
opcodes2.bat
Double click opcodes2.bat. That will run NESASM3 and should produce the opcodes2.nes file. Run

Up: Play
Down: Stop

Song1-Song6 are the same as last week, but they take up less ROM-space now
Song7 is a new sound effect created by looping a key change at high tempo.
As usual, try adding your own songs and sound effects in using the new opcodes. Experiment.
Nerdy Nights Sound: Part 10: Simple Drums
This Week: Simple Drums
Drums
This week we are going to take a look at drums. I saved them until now because they use the Noise
channel, which operates much differently from the other 3 channels. With the Square and Triangle
channels we manually set the waveform period to choose what note to play. Our note lookup table
was just a table of periods to plug into the channel ports. The Noise channel on the other hand
produces random noise. We don't use a note table at all.
Noise Channel
The noise channel doesn't produce notes, it produces noise. We communicate with the noise
channel through 3 ports: $400C, $400E, $400F. Note that port $400D is unused. Let's take a
closer look at the Noise ports.
Volume - $400C
$400C let's you control the volume of the Noise channel. It works just like the Square channels,
except there is no Duty Cycle:
NOI_ENV ($400C)
76543210
||||||
||++++- Volume
|+----- Saw Envelope Disable (0: use internal counter for volume;
1: use Volume for volume)
+------ Length Counter Disable (0: use Length Counter; 1: disable
Length Counter)

For our purposes, we will set Saw Envelope Disable and Length Counter Disable and forget about
them. This will allow us to have full control of "note" length and volume via Volume Envelopes.
We did the same thing for the Square channels.
Random Generator - $400E

$400E let's us control the settings for the random generator. It looks like this:
NOI_RAND ($400E)
76543210
| ||||
| ++++- Sound Type
+-------- Mode
Mode sets the mode. There are two modes, Mode-0 and Mode-1. Mode-0 sounds dull and
breathy, like static sssh. Mode-1 sounds more sharp, robotic and buzzy. There's really no good way
to describe in words, so the best way to know the difference in sound is to listen to both modes.
(see below)
Each mode has 16 possible sounds, selected by Sound Type. This means that the Noise channel
really only gives us 32 possible sounds total! More complex sound effects on the Noise channel are
created by playing combinations of these 32 noises one after another. To hear what each of the 32
sounds sound like, listen to Song 8 in this week's sample program. It plays the 16 Mode-0 sounds
followed by the 16 Mode-1 sounds.
Note: In this tutorial I will assign each of the 32 Noise channel sounds a number 00-1F. The left
number (0 or 1) refers to the Mode. The right number (0-F) refers to the Sound Type. For example,
sound "04" means "Mode 0, Sound Type 4". Sound "1E" means "Mode 1, Sound Type E".
Length Counter - $400F

$400F is the Noise channel's length counter. We disabled the length counter in $400C, so we can
ignore this port completely!
Simple Noise Drums

The simplest way to make a drum sound on the Noise channel is to play a single Sound Type under
a volume envelope that decays to 0 (silence). Many games use this kind of drum exclusively. The
Guardian Legend for example only uses two drum sounds throughout the whole game: 04 and 06.
Battle Kid makes heavy use this simple drum style too. Lots of games do.
So how will we represent simple drums in the sound data? Recall that our sound engine
distinguishes between Notes, Note Lengths and Opcodes using ranges:
.fetch:
lda [sound_ptr], y
cmp #$A0
;do opcode stuff
;range A0-FF
.note_length:
;range 80-9F
.note:
;do note stuff
;range 00-7F

Although we won't use our note table for the Noise channel, we will treat drums like notes as far as
ranges are concerned. So we need our drum data values to be in the range of $00-$7F. That's easy.
We'll assign the Mode-0 sounds to $00-$0F and Mode-1 sounds to $10-$1F.
Some drum data might look like this then:
example_drum_data:
.byte eighth, $04
.byte sixteenth, $1E, $1E, $1F
.byte d_eighth, $04
.byte sixteenth, $06, $06, $08, $08
.byte eighth, $17, $07
.byte loop
.word example_drum_data
Since we are not using the note table for Noise, we will need to alter our Note code to check the
channel and branch to different code if we are processing the Noise channel (new stuff in red):
.note:
;do Note stuff
sta sound_temp2 ;save the note value
lda stream_channel, x ;what channel are we using?
cmp #NOISE ;is it the Noise channel?
bne .not_noise
jsr se_do_noise ;if so, JSR to a subroutine to handle
noise data
jmp .reset_ve ;and skip the note table when we
return
.not_noise: ;else grab a period from the
note_table
lda sound_temp2 ;restore note value
clc
adc stream_note_offset, x ;add note offset
asl a
tay
lda note_table, y
lda note_table+1, y

;check if it's a rest and modify the status flag appropriately
jsr se_check_rest
.reset_ve:
lda #$00
sta stream_ve_index, x
.update_pointer:
iny
tya
clc
adc stream_ptr_LO, x
bcc .end
inc stream_ptr_HI, x
.end:
rts

This code checks to see if the channel is the Noise channel. If so it JSRs to a special Noise
subroutine. Upon return, it jumps over the note_table code and updates the volume envelope and
stream pointer like normal.
So what does our special Noise subroutine, se_do_noise, look like? The job of se_do_noise
will be to take the note value and convert it into something we can write to $400E (NOI_RAND).
Recall that $400E expects the Mode number in bit7 and the Sound Type in bits 0-3:
NOI_RAND ($400E)
76543210
| ||||
| ++++- Sound Type
+-------- Mode
Mode-0 sounds don't need to be converted at all. We represent them with the values $00-$0F,
which correspond exactly to the values we need to write to $400E.
Mode-1 sounds need to be tweaked a bit. We represent Mode-1 sounds with the values $10-$1F, or
in binary %00010000-%00011111. Notice we identify the Mode number using bit4. Port $400E
expects the Mode number in bit7 though, not bit4, so we will need to set bit7 ourselves:
se_do_noise:
lda sound_temp2 ;restore the note value
and #%00010000 ;isolate bit4
beq .mode0 ;if it's clear, Mode-0, so no conversion
.mode1:
lda sound_temp2 ;else Mode-1, restore the note value
ora #%10000000 ;set bit 7 to set Mode-1
sta sound_temp2
.mode0:
lda sound_temp2
sta stream_note_LO, x ;temporary port that gets copied to $400E
rts
Now everything is set. Note values of $00-$0F will get written to stream_note_LO directly.
Note values of $10-$1F will get converted to $90-$9F first and then get written to
stream_note_LO. Note that we do not bother clearing bit4 for Mode-1 sounds. Bit4 has no
effect on $400E so we can just leave it as it is.
Drum Decay
The only thing left to add is a volume envelope for the simple drums to use. It should be short and
decay to zero (silence):
volume_envelopes:
.word se_ve_1
.word se_ve_2
.word se_ve_3
.word se_ve_tgl_1
.word se_ve_tgl_2
.word se_battlekid_loud
.word se_battlekid_loud_long
.word se_battlekid_soft
.word se_battlekid_soft_long
.word se_drum_decay

se_drum_decay:
.byte $0E, $09, $08, $06, $04, $03, $02, $01, $00 ;7 frames per drum. Experiment to get the
length and attack you want.
.byte $FF

ve_drum_decay = $09
You can of course make multiple volume envelopes for the drums to choose from.
Conclusion
Now we can add drum data to our songs. Here is the drum data for The Guardian Legend boss song
we've been using:
;in the song header, after the header info for Squares and
Triangle:
.byte MUSIC_NOI ;which stream
.byte $01 ;status byte: enabled
.byte NOISE ;which channel
.byte $30 ;initial volume_duty value (disable length
counter and saw envelope)
.byte ve_drum_decay ;volume envelope
.word song1_noise ;pointer to the sound data stream
.byte $53 ;tempo

song1_noise:
.byte eighth, $04 ;this song only uses drum 04
(Mode-0, Sound Type 4) for a snare
.byte sixteenth, $04, $04, $04
.byte d_eighth, $04
.byte sixteenth, $04, $04, $04, $04
.byte eighth, $04, $04
.byte loop
.word song1_noise

Try adding some drums to your own songs!

Download and unzip the drums.zip sample files. Make sure the following files are in the same
folder as NESASM3:
drums.asm
sound_engine.asm
sound_opcodes.asm
drums.chr
note_table.i
note_length_table.i
vol_envelopes.i
song0.i
song1.i
song2.i
song3.i
song4.i
song5.i
song6.i
song7.i
song8.i
drums.bat
Double click drums.bat. That will run NESASM3 and should produce the drums.nes file. Run that
NES file in FCEUXD SP.

Up: Play
Down: Stop

Song1-Song7 are the same as last week, but a few of them (1, 4 and 6) have drums now.
Song8 is a new "song" that plays each of the Noise channel's 32 sounds, in order from 00-1F. First
it plays them with a sustained volume envelope so that you can hear how they sound drawn out.
Next they are played using the 7-frame ve_drum_decay volume envelope we made so you can hear
how they sound as simple drum sounds.
Nerdy Nights: hex editing: Alter your own title screens!
Written by bunnyboy
This lesson is largely unrelated to programming, but instead focuses on ROM editing. Using the
recently released Exerion 2 ROM we will edit the title screen text to show your own message. The
instructions here will work for many games but not all. Some use text/graphics compression or do
not have all the letters available for use.
You will need:

• FCEUXDSP emulator with graphics viewer. Sorry Mac/Linux users, the best development
emulators are Windows only.
• Hex editor application
• Text editor application
• 10 minutes
Step 1: Backup!
Make a copy of the ROM. Work with this copy. In case you mess up you can go back to the
original.
Step 2: Finding the graphics
Load up the game in FCEUXDSP and go to the screen you want to edit. When you are there choose
the Debug... menu option and hit the Step Into button in the window that opens. That will tell the
emulator to pause so you can keep working without it changing.
Next choose the PPU Viewer... menu option. On one side will be the sprite graphics, and on the
other the background graphics. The text is made of background graphics tiles.
If you put your mouse over one of the letter graphics the tile number will update. Use those tile
numbers to make a chart of which number corresponds to which tile number. For Exerion 2, that
chart will start like:
A = 0A
B = 0B
C = 0C
D = 0D
E = 0E
F = 0F
G = 10
H = 11
I = 12
etc
Step 3: Finding the text

Use that chart to write out what text you are looking for. For Exerion 2 we are going to edit the
"PLAYERS" text for the 2 players option. The hex characters we will be looking for are:
TEXT P L A Y E R S
HEX 19 15 0A 22 0E 1B 1C
Now that you know what to look for, open up the ROM in your hex editor application. Do a hex
search for the string you just figured out. Hopefully it will appear just once in the ROM. In Exerion
2 the 2 PLAYERS text is around hex address 2F40.
Step 4: Replacing the text

Use your chart again to make a new string, the same length as the previous text. This is very
important. The size of the ROM must stay exactly the same size so you cannot add extra characters.
If your new string is shorter than the old one then you must add space characters to make it the
same length.
For Exerion 2 I want to add a message longer than just the PLAYERS text. Looking before the P in
players there is a space, a 2, and lots more spaces. I can safely replace some of those too, as long as
I don't add or subtract from the ROM size.
OLD TEXT 2 P L A Y E R S
OLD HEX 30 02 30 19 15 0A 22 0E 1B 1C
NEW TEXT M R M A R K S U X
NEW HEX 16 1B 16 0A 1B 14 30 1C 1E 21
Just select the old hex and delete it, then paste in the new hex. Save your ROM and load it up in the
emulator to make sure it worked. If the game doesn't load in the emulator then you likely changed
the size of the ROM.
Now that your game is done, test it on real hardware with your PowerPak or make a physical copy
with the ReproPak at http://www.retrousb.com...
AtariAge: 6502 Killer hacks
Written by djmips
As I was growing up, I kept a notebook full of cool code snippets and ideas. My notebook had been
misplaced but I ran across it recently and here is one of the pages which is from a 1987 Dr. Dobbs
article by Mark S. Ackerman. "6502 Killer Hacks".
Post your own 6502 Killer Hacks and share them with the rest of us!
I also checked into Mark S. Ackerman with our trusty tool Google and found his 'vita' -
Pretty sure it's the same guy as he worked at GCC from 1982 - 1984 and was the lead on Ms.
PacMan, Galaxian and Moon Patrol - time to update AtariAge database as these games are empty
when it comes to staff
He has a patent on the Galaxian kernel.
Well here is the killer hack. This one is to scrimp on RAM.
Incrementing only the lower 4 bits of a byte (with wrap)

...
lda word ; original byte
and #$0f ; retrieve lower nybble
tay ; index
lda word
clc ; might not be needed
adc nextinc,y ; could be ora or sbc
sta word
...
nextinc .byte 1,2,3,4,5,6,7,8

.byte 9,10,11,12,13,14,15,0
Well, funny thing is - maybe I didn't transcribe it properly back in '87 - because it doesn't seem like
it would work.
Seems like it needs an AND #$F0 after the second LDA word
So I thought I'd take a shot at a working version...

...
lda word ; original byte
and #$0f ; retrieve lower nybble
tay ; index
lda word
clc
adc nextinc,y
sta word
...
nextinc .byte 1,1,1,1,1,1,1,1

.byte 1,1,1,1,1,1,1,-15
who knows if that one works either. :-)
If someone has the original article from Feb 1987 Dr. Dobbs Journal, I'd be curious to see the code.
Also, post your own 6502 Killer Hacks and share them with the rest of us!
- David
Updated 2017: Just came across the original PDF of the article by Mark S. Ackerman and
confirmed that I did transcribe it incorrectly but my fixed version is the same as the published
version.
http://archive.6502.org/publications/dr_dobbs_journal_selected_articles/6502_hacks.pdf Edited
April 1, 2017 by djmips
MMC1 Memory Mapping: NintendoAge Programming
Resources - MMC1 Memory Mapping
Written by MRN
INTRO
I had a request for MMC1 mapper information, so I thought everyone might like to read what I have
to say.
This will help you figure out MMC1 memory mapping. It assumes that you have a working
knowledge
of NES programming and can understand my messed up little mind. Most of the information on this
mapper can be found on:
http://wiki.nesdev.com/w/index.php/MMC1
MMC1 will allow you to use up to 256 kB of PRG ROM and 128 kB of CHR ROM. I think that
you can use SUROM to expand up to 4 MB of PRG ROM, but this is not covered here.
First, you have to decide on what kind of configuration you want to use. MMC1 will support PRG
bank
switching in either 16kB mode or 32kB mode. And with the 16, you can choose whether you want
the memory at $C000 or the memory at $8000 to be switched.
Second, you have to decide on the CHR ROM switching. MMC1 supports switching of 4kB
(background or sprite tiles separately) or 8kB (switching them together).
After you decide this, you are ready to start.
INES HEADER
Notes:
-You can only have even numbers of PRG banks. NESASM uses 8kB "banks", I'm talking about
16kB banks.
i.e. 02 (32kB),04 (64kB),06 (96kB),08 (128kB),0A (160kB),0C (192kB),0E (224kB),10 (256kB)
-CHR banks are in multiples of 8kB banks. (important if you are using 4kB swapping.)
i.e. 01 (8kB),02 (16kB),03 (24kB),04 (32kB),05 (40kB), 06 (48kB), etc., 10 (128kB)
-MMC1 mapper number is "1" ...creative!
-Mirroring should match that used below in the initiation routine.
In this exercise, we will use:
.inesprg $10 ; 16x 16KB PRG code
.ineschr $10 ; 16x 8KB CHR data
.inesmap $01 ; mapper 1 = MMC1, 4KB CHR bank swapping
.inesmir 0 ; background mirroring
MAPPER CONTROL HEADER
This is one bite that has all the information for the mapper. Observe:
76543210
Bits 7,6,5 - Not sure what these do.
Bit 4 - CHR ROM bank mode - (0 means switch 8kB at a time, 1 means switch the two separate
4kB banks independently)
Bit 3 - PRG ROM bank mode - (0 means switch all 32kB at once, ignores bit 2)
(1 means switch ONLY the 16kB specified in bit 2)
Bit 2 - PRG ROM location - (0 means switch 16kB at $C000, 1 means switch 16kB at $8000)
Bits 1,0 - Mirroring - (0 means one screen, lower bank; 1 means one screen, upper bank
2 means vertical; 3 means horizontal)
Here we will use LDX #%00011000
BITCH WORK! Look above and figure out what this means.
INITIATE MAPPER
Here we load the information required by the system to run the mapper as well as the initial banks.
You have to do the 5 writes to make it work...for whatever reason.
initMMC1Mapper:
LDA #$80 ;this locks the PRG ROM at $C000-$FFFF to the last bank.
STA $8000

TXA ;uses our header to initiate the mapper
JSR setMMC1ControlMode

LDA #$02 ;sets the CHR information for the sprites
JSR setCHRPage0000

LDA #$01 ;sets the CHR information for the background
JSR setCHRPage1000

LDA #$03 ;sets the PRG information
JSR setPRGBank

RTS

setMMC1ControlMode:
STA $8000
LSR A
STA $8000
LSR A
STA $8000
LSR A
STA $8000
LSR A
STA $8000
RTS

setCHRPage0000:
STA $A000
LSR A
STA $A000
LSR A
STA $A000
LSR A
STA $A000
LSR A
STA $A000
RTS
setCHRPage1000:
STA $C000
LSR A
STA $C000
LSR A
STA $C000
LSR A
STA $C000
LSR A
STA $C000
RTS

setPRGBank:
STA $E000
LSR A
STA $E000
LSR A
STA $E000
LSR A
STA $E000
LSR A
STA $E000
RTS
Congrats....your program should work.
USING THE MAPPER
You can swap out banks whenever you want, even several times per NMI. Just load the bank
number you want to use into A and call the appropriate subroutine. Just be sure that you don't
switch away information that your program needs to run or it will die.
Weird bank numbering notes:
-CHR data is stored in 8kB for NESASM. If you want to call the first 4kB of data from the 6th 8kB
chunk,
you would use bank #$0C. Observe, call number for 4kB chunk vs. 8kB bank number:
00-0
01-0
02-1
03-1
04-2
05-2
06-3
07-3
08-4
09-4
0A-5
0B-5
0C-6
0D-6
0E-7
0F-7
10-8
11-8
12-9
13-9
14-10
15-10
16-11
17-11
18-12
19-12
1A-13
1B-13
1C-14
1D-14
1E-15
1F-15
Clear?
-PRG info is stored in 8kB chunks in NESASM, but you call and switch 16kB banks. If you want
to call bank 26, use call number #$0D.
Observe, call number vs. bank number:
0-0,1
1-2,3
2-4,5
3-6,7
4-8,9
5-10,11
6-12,13
7-14,15
8-16,17
9-18,19
A-20,21
B-22,23
C-24,25
D-26,27
E-28,29
F-30,31
Clear?
-At the end of each 16kB bank, you have to have vectors in place or it will die.
.org $FFFA ;first of the three vectors starts here
.dw NMI ;when an NMI happens (once per frame if enabled) the
;processor will jump to the label NMI:
.dw RESET ;when the processor first turns on or is reset, it will jump
;to the label RESET:
.dw 0 ;external interrupt IRQ is not used in this tutorial
-Bank numbering is successive. i.e. if you have PRG banks numbered 0-23, you would start
numbering your CHR banks at 24.
-If you have, for example, a 16kB CHR file, you only have to include the starting place and
NESASM will
split the banks properly. i.e. in 4kB mode:
.bank 32
.org $0000
.incbin "MMC1.chr" ;includes 16KB graphics file
This will include 4 - 4kB (or 2-8kB) banks in the assembly process. Be sure to account for the 2
banks
in your numbering. (see the attached ASM file.)
PRACTICAL APPLICATION
This is a little inefficient. To use all this nonsense in something real, an example would be:
LoadBackground:
;find room info
;switch to room info table bank
;load bankground pointer
;switch to bank where background information is stored
;load background
;switch back to room info table bank
;load attribute and palette pointers
;switch to attribute/palette information bank
;load attributes/palettes
;switch back to room info table bank
;load collision detection pointer
;switch to collision detection bank
RTS
WORKING EXAMPLE
Here we use the controller to switch banks for both CHR banks and the PRG bank.
A and B - swap out CHR information for the sprites
Select and Start - nothing
Up and Down - load a background located in different banks (the flashing is cause you are holding
the button
down for more than one frame. Just tap it. I was too lazy to add stuff in to fix this.)
Left and Right - swap out the CHR information for the backgrounds
Download the attached file and assemble the program. Mess around with it and try to switch out
the
various buttons and banks. Fix the flashing. Add new backgrounds and characters...etc.
The little DOS assembler file might not work, you may have to edit it to your drive.
THE END!
I'm sure I totally screwed this up, but don't worry! Someone will help me out if there are any
mistakes.
Attachment: MMC1.zip
World Building with SGP: Introductions
Written by Sole Goose Productions
Introduction to the Series

Welcome to World Building with SGP! What follows is a series of reflections that have spawned
from my own experiences attempting to learn NES programming. I am not an expert when it comes
to writing code, drawing graphics, composing sound, or designing levels, but hopefully these
thoughts will be of use to someone. I plan to add to these reflections as I attempt new aspects of
game development, and earlier lessons will be refined as my knowledge and experience grow.
These are very much an extension of what I have learned here on NA and through reading various
articles on NESDEV and elsewhere, and a cumulative bibliography will be added at some point.
Please comment, critique (constructively one hopes), and discuss things as you see fit. This initial
post will serve as a table of contents of sorts, and each chapter will have its own thread. Keep a
lookout for an alternative form of presentation in the future too. Also, please note that they are
written mainly with an eye toward creating overhead adventure games. As I attempt other game
play perspectives, thoughts on these will be incorporated.
These lessons, articles, what have you are intended for different types of people. On the one hand,
they are for programmers setting out on the long journey that is NES development. Perhaps more
importantly, though, my hope is that they reach people within our community who do not really
have an interest in programming, but who would like to be involved with game development at
one level or another. We have already witnessed what can happen when highly talented people
collaborate (several 8-Bit Christmas games, the Battle Kids, MRN’s various games, T-Gun,
Nomolos and Owlia, and many others), though most contributions have been more musically
inclined. Eventually, I hope to discuss music, but the first set of lessons will primarily deal with
art. For most programmers, well, many at least, graphics are a huge time sink. We can usually get
decent results with enough patience, but that is valuable time lost that could be spent actually
completing games. My current main character, for example, has already taken somewhere between
twenty and thirty hours to draw, and there are still several poses to go. Hopefully these
initial discussions will help a few to take up the challenge of learning how to draw within the
structures of the NES.
One last thing. About every month or so, we see someone pop into the Brewery with a suggestion
for a game, or asking how to convert “X” game to the NES. We are familiar with the results of
these, and the various arguments that are exchanged on either side. One of the most important
points that is brought up, in my own opinion anyways, is that those taking the time to learn how to
actually program already have plenty of ideas, and are not likely to take up someone else’s project.
There is certainly truth in it, and even with detailed write ups at the design level, few are going to
actually do someone else’s work for them. I would think, however, that if someone really took the
time to create a world at the artistic level, the response would be much warmer. Mind you, it would
take hundreds of hours to do so, which would also help to prove one's seriousness and commitment.
Still, if seeking collaboration, this might be a way to go. Draw some things, show us what you can
do, and then hopefully mutual collaboration can ensue. Just a thought though, and I can only speak
for myself in this regard.
World Building with SGP 1: Foundations of NES Graphics
Introduction
This first lesson briefly covers the essentials of drawing graphics for the NES. Text adventures
aside, graphics are the mode through which we enter most game worlds, and are understandably our
starting point when discussing their construction. This chapter touches on the core fundamentals of
the system, but also lightly raises many other issues that will be discussed in future chapters. It
covers a lot of ground for the simple fact that a bit of awareness now can save a tremendous amount
of time and headache later.
56 Shades of NES
Perhaps nothing defines the NES, and people’s perceptions and memories of its games, as much as
the color palette of the system. The NES PPU (Picture Processing Unit) is capable of displaying
sixty four colors, though to be fair slightly less than ten of these are black, hence the number fifty
six in our colorfully named title above. The palette is as follows, but it is extremely important to
note that the colors will appear differently on different means of display (CRT versus LCD, etc.).
(Taken from somewhere, I forget where exactly)
The numbers that are on the colors are the actual hexadecimal values that are used when inputting
them into a program in assembly language. In order to see the visual differences between the colors,
it is best to try them out, either on an emulator or actual hardware. Colors can be adjusted in terms
of intensity as well, but any changes of this nature will affect all colors that appear on a screen.
Fifty six colors, it does not seem like a lot but it would let us create characters and backgrounds on
par with early SNES games; however, now the real limitations of the NES need to be described.
Palettes
While the NES is capable of displaying fifty six more or less unique colors, only a small number of
these can be simultaneously displayed on the screen at once. There are two types of palettes that
the NES uses on screen at the same time: one for backgrounds and one for sprites. They both
partake of the same fifty six colors, but the restrictions for the two types are different. A sample set
of palettes that would be used for a screen, using the values given above, is as follows:
Background Palettes:
.db $22,$29,$1A,$0F, $22,$36,$17,$0F, $22,$30,$21,$0F, $22,$27,$17,$0F
Sprite Palettes:
.db $22,$1C,$15,$14, $22,$02,$38,$3C, $22,$1C,$15,$14, $22,$02,$38,$3C
For clarity’s sake, this could also be written as:
Background Palettes:
.db $22,$29,$1A,$0F ; Background Palette 1
.db $22,$36,$17,$0F ; Background Palette 2
Sprite Palettes:
.db $22,$1C,$15,$14 ; Sprite Palette 1
.db $22,$02,$38,$3C ; Sprite Palette 2
.db $22,$1C,$15,$14 ; Sprite Palette 3
.db $22,$02,$38,$3C ; Sprite Palette 4
First and foremost, notice that each palette set consists not of sixteen colors, but four groups of four
colors. We will come back to this in a moment, but these are the individual palettes that we will
use when drawing. Also, if you look closely you will see that $22 (sky blue) appears in each group
of four throughout either type of palette, though it plays a different role in each. In terms of the
background, $22 functions as the universal background color, which in reality only allows for
thirteen different background colors on screen at once. My own humble advice is to set the
universal color to whatever color appears most frequently, which in most cases will be black
(depending on one’s style and the subject matter of course). In regards to the four sprite palettes,
$22 functions as the transparent color, giving twelve choices total for the various sprites. Earlier
NES games, such as Super Mario Bros. and The Legend of Zelda used all three colors for actual
colors, with the result that they can often look washed-out when compared with later sprites.
There are a couple of solutions to this problem, one of which is to use one of the three colors as a
black outline (as with Kirby), in the process reducing the amount of available colors to two. This
accounts for many of the seemingly wild uses of color in NES games, some of which were
continued on more technically capable machines. Does the fighter in Final Fantasy really have red
hair? Come on. Another solution is to set game play, or at least the majority of game play, against an
all black screen, thereby creating the illusion of a black outline.
One homebrew project uses this to great effect:
This technique was heavily used in early NES games, and while it works well for side-scrolling
games, it does not work as well for overhead adventures. One of the few to attempt to do so was the
unlicensed title Spiritual Warfare. For much of the game the player may not notice what is going
on, but when walking across non-black backgrounds the absence of black in the sprite palette
becomes readily apparent, particularly in regards to the hero’s eyes:
Perhaps it is not so much the absence of black that is noticeable, but rather trying to imitate the
presence of it. Characters without black outlines can hold their own, but factors such as
backgrounds and perspective need to factor into the decision whether to eschew it or not. Eyes are
always a dead give away.
Now, as you think back to every NES game that you have ever played and try to verify if all of this
is true or not, you may recall that certain games did in fact use more colors per character, with Mega
Man being the prime example. Well, it is probably time to talk about the size of sprites versus the
size of characters, as well as how to get around some of the limitations inherent to the NES.
Screen Resolution/Pixel Depth and the Real Limits of NES Graphics

Backgrounds and sprites are composed of tiles, each of which consists of 8x8 pixels. We will first
deal with backgrounds, and then move on to discuss sprites.
Backgrounds:
Scenery and objects are generally made up of several tiles, and are often grouped in twos (i.e. 16x16
pixels, or 2x2 tiles). These groups we will refer to as metatiles, although I have also seen the word
megatile be used in its place.
See, for instance, the following from Super Mario Bros.:
Taken together these four tiles form the famous question mark block. Generally, it is a good idea to
work at the metatile level, for the unfortunate reason that further graphical limitations are imposed
by what are known as attributes. Remember the four background palette groups from above? Well,
each 16x16 pixel area (or 2x2 tile group) can only use one of these palettes. This is extremely
important, and further complicated by the fact that the first row of tiles (eight, pixel high scan lines)
that the NES outputs cannot actually be see on an NTSC television (on a European PAL television it
is visible). Compare, for instance, an NTSC versus a PAL screen of The Legend of Zelda.
They are the exact same screen, only the NTSC format does not allow us to see it in its entirety.
Why would this complicate matters? Basically, because for those of us not programming for a PAL
system it causes the groups of tiles to be off, since all background tiles are aligned to a grid. Want
to make a symmetrical room? Too bad, unless you want to waste several lines of the playing field
trying to re-align everything, which is in fact what many games do. That, or they appear to cut tiles
off in the middle, despite the fact that they are in reality present. Zelda uses good examples of both
of these methods.
First, here is a sample background that shows where the attribute groups are located on an NTSC
screen:
See how the top and bottom rows are incomplete? Since each of the squares is a metatile in size, if
you wanted to make a wall that ran along the top of the screen, it would either need to be three tiles
high (one and a half metatiles), or if it was only one metatile high whatever was beneath it would
need to partake of the same three colors. Take a look at the following close up of a poorly drawn
sample screen. The drawing is entirely determined by where the attribute tables fall:
If you notice, I have taken into account some issues and neglected others. For instance, if done
correctly, the curb need not be green, which it must in the example since the brick wall shares the
same palette as the garage door (see the top left corner of the garage). Had the brick wall been an
even number of tiles tall, instead of the five that it currently is, the curb could draw from a different
palette. Additionally, if we look at the overhang of the roof it is apparent that the roof and the road
will have to partake of the same palette. This is why it can be useful, especially when learning, to
draw things in terms of metatiles.
Understandably, attributes also determine the placement of objects. Let us say that I wanted to stack
boxes up against the building. To do this effectively (i.e. colorfully), the objects would only be able
to be drawn in places where they would not cross attribute boundaries. Attributes not only affect the
look of a game, but also determine game play itself to a great extent, since they influence where
exactly objects can be placed. This is not restricted to objects, either, as can be seen in the
placement of the status bar in most games. In order to make the status bar independent of the game
world (although it could potentially share colors with one of the game world palettes; black in this
case), it is necessary to regain an evenness to the lines created by the offset of the first row. See, for
example, how The Legend of Zelda divides the game world:
Each square section is one metatile (or 2x2 tile section). Notice how the height of the status bar is
an uneven number of tiles, but how this serves to re-align the game world.
Not to belabor the point, but attributes will be the sticking point for most people when it comes to
background graphics.
Here is one last example of a room drawn by someone who did not take into account attributes. It is
a nice symmetrical room, but can you guess what is wrong with it?
If we were to change the program’s code so that every other attribute section was different, this is
what we would see (pardon the garish color choices):
Two issues can be noticed, possibly more. One is that the walls will have to use the same
palette/colors as the floor, since both the top and bottom walls share an attribute area with the floor.
Notice, this does not affect the side walls, other than the top and bottom entrances, in the same
fashion. The other issue is that the status bar appears in the middle of an attribute area. A further
point: even though the top and bottom walls do not line up, the middle passages do (at least on the
sides). Shifting everything up or down one row will cause these to be off, hence the importance of
minding one’s attributes.
Here is the same room, but showing it in full PAL resolution. This is the “real” room, though unlike
in Zelda the extra row has not been utilized. In many cases programmers simply threw in a row of
black tiles at the top and at the bottom of the screen (as has been done here).
Attribute tables are rightly known as about the hardest things to deal with, since they must
constantly be kept in mind.
It should be mentioned that there is a way out of the whole attribute business all together. Toward
the end of the NES’s lifespan, Nintendo developed the MMC5 memory chip which allowed for each
individual 8x8 background tile to use its own palette. However, the only way to obtain these at the
moment is to rip apart one of the few games that used it and recycle its board. Not surprisingly,
these games are not that common, and they also tend to be top games, making them both rarer and
more expensive.
Just to note, attributes can be incorporated at the programming level in a variety of ways within a
game. Two of the most common methods are through the use of tables or by assigning attributes to
metatiles on an individual basis. The former method can be found in The Nerdy Nights tutorials, and
it is easier to implement. MRN uses the latter method, which has many advantages despite the
ensuing complications that come with it. This second method is also why it is encouraged to think
in terms of metatiles and not individual tiles when drawing background graphics.
Sprites
In terms of sprites, things are a little easier to visually see than with the majority of background
objects (question mark blocks aside). Link is clearly made up of four, 8x8 pixel tiles (or 2x2 sprite
tiles). We will refer to these as metasprites, or megasprites.
Each of the individual tiles can only draw from one of the four sprite palettes, which are different
than the background palettes. Unlike background tiles, however, each sprite tile can be different. If
we wanted Link to have a pretty pink shield he could. On the other hand, if we wanted to give him a
pink hat, we would be in a bit of trouble since the tips of his shoulders would also have to be pink
due to each tile only being able to use three colors. This is extremely important when designing
sprites, and must also be taken into account when considering character animations. If Link’s shield
were to move up into the tile above it, then it would automatically change colors depending on what
palette that tile uses. An example of not taking this into consideration, or simply not caring, can be
found in Metal Mech. As can be seen, when the hero runs his hands/gloves enter into a tile that uses
a different palette and change color.
Due to the size of the character and the intensity of the game it is not all that noticeable, but it is
there nonetheless.
While sprites graphics are free from some of the constraints of background graphics, they do have
their own issues. Ever notice how sprites flicker when there are too many things on the screen?
Well, there is a reason for this. The NES can only display up to eight sprites per scanline (these are
horizontal). Therefore, whenever there are more than eight sprites either one of two things will
happen: A) only eight will display, or B) the sprites can be programmed to flicker. Most games will
use the flickering method, though in others, such as Grand Master, enemies will simply disappear.
The top half of one of the cubes has vanished (though it will still deal damage!):
This is one reason why tall characters (4x2 tiles) are often found in NES games, while wide
characters are generally not. Do not forget that hero and enemy weapons also count toward the
sprite limit. If the Fencer, in the above example from Grand Master, were to swing his sword, the
top halves of an additional two enemies would no longer be visible.
We will return to factors that influence sprite design in another lesson, but there is a key reason for
raising scan line issues now. Characters with three colors, one of which is black, are not terribly
exciting. As has been seen, some designers stuck with this limitation, but many of the more
memorable characters side-stepped the issue by layering sprites over each other in order to achieve
additional detail. Perhaps the most famous of these is Mega Man. The image below shows how the
designers were able to really give life to the Blue Bomber with only a single additional sprite. Note,
the face tile does not have to align with the edges of the other sprites, but it can sit anywhere in
relation to them:
As much as we can appreciate what is going on here, it can potentially cause scan line issues, since
it adds another sprite to the player character. If we were to do all enemies in this way, only the hero
and a single enemy would be able to be on the same horizontal line without problems arising (prior
to weapons even!). Also, do not forget that the colors used in Mega Man’s face will eat up another
palette, leaving only two palettes for all other enemies, objects, status bar details, etc.
While a great majority of games in the NES-era used sprites made up of 8x8 tiles, another option
does exist. At the programming level sprites can be changed to be 8x16 in size, i.e. eight pixels wide
and sixteen pixels tall. Functionally, this is like stacking two sprites on top of each other. The main
benefit of using 8x16 sprites is that it allows one to place more objects on screen than if only using
8x8 sprites. The NES only allows up to sixty four sprites to be displayed at once. If we envision a
screen in which all objects and characters are one metatile in size, using 8x16 sprites would
essentially allow us to have double the number of entities. Pretty neat stuff, until we recall that only
eight sprites can appear on a single scan line. Other disadvantages to using 8x16 sprites is that they
must be stored in memory in the same pattern that they are used in the game. If “A” is used on top
of “B,” neither tile can be re-used elsewhere. A practical example of this limitation occurs when
animating characters. At times it is only necessary to alter a character’s bottom or top half in a pose,
yet using 8x16 sprites would prohibit us from re-using one of the halves. Instead, the same tile
would have to be entered into memory twice, decreasing the total number of unique tiles that can
appear on a screen at once. Here is a small example of what we are talking about:
In the image on the left (8x8), the top half of the character could be recycled for both walking
poses. The image on the right (8x16), though, necessitates that the top half be doubled in memory.
To grasp the full implications of the above, tile usage would need to be multiplied by the number of
poses for each character, the number of characters on a given screen, and other factors, such as if
one has objects that are only a single tile tall. In regards to this last point, if a sword sixteen pixels
long and eight pixels wide were displayed vertically it would successfully use an 8x16 sprite.
However, when displayed horizontally, the other eight pixels of two sprites would be wasted. In this
case, it is not so much that a tile would be doubled, but that it would have to be left blank
(functionally equivalent to re-using the same tile twice, but the waste is more obvious).
As usual, the use of 8x8 versus 8x16 sprites will depend on subject matter, programming
preferences, and a host of other factors. If we are primarily drawing non-animated objects (space
ships come to mind), or objects with fewer changes and variations, using 8x16 sprites could
potentially allow more activity to exist on screen than if using 8x8 sprites.
Another factor that plays into sprite design is mirroring. The NES allows sprites to be mirrored
either horizontally, vertically, or both. Mirroring both directions at once essentially allows a sprite
to be rotated. The following image is an extremely simple example of mirroring in action:
Most graphics editors allow one to easily mirror and rotate sprites in order to get an idea of what a
drawing will look like when finished, and let’s be honest, it is easier to mirror something than to
draw something from scratch. However, at the programming level it is a great way to save space in
memory! The NES can only hold 256 unique tiles of sprite data in memory at any given time (plus
another 256 tiles of background data), so mirroring becomes essential at a certain point. Mirroring
is one of main strengths that sprites have going for them, as background tiles cannot be mirrored!
One more key limitation that shapes one’s drawing practices.
World Building with SGP 2:
Introduction to NES Graphics Editors
Introduction
When programming, NES graphics data is stored and compiled from CHR file types. In terms of
viewing CHR data without possessing the uncompiled code, it can be accessed through a program’s
ROM (.nes file type) by using a special type of program designed to display this data. These
programs generally allow one to both view and edit the information that one finds, or to create
wholly new information that can be compiled back into a .nes file. In other words, there are two
movements that we are talking about: accessing and creating. Both are handy when drawing, since
the former allows one to genuinely see what people did in the past, and of course the other allows
for unique creations of your own.
There are several graphics editors available for the NES, though most were written some time ago
and are no longer updated. Still, they are extremely capable and useful. MRN gives an introduction
to using Tile Molester in Week 2 of his tutorials, and it will not be gone over again here. Eventually,
this section will introduce what looks to be a very powerful editor currently in development, though
it is still in the testing phase at the moment. In the meantime, a brief discussion of YY-CHR will
given, as it will allow one to get started in the present.
YY-CHR
Find it, download it, install it; yay! Upon opening the program you can see some rather nice
pictures, though outside of large cut-scenes they are of little use in actual game development. If you
right click the top left corner, highlight the whole picture and then use ctrl X, you will be left with a
blank canvas like what we have here:
As can be seen, the gridded area on the left consists of 256 squares. The area on the right serves as a
zoomed-in display of whatever is found within the white square on the left. Drawing and editing are
done on the right, with several standard options in terms of tools. They are no more and no less
complicated than those found in Paint.
Another useful set of tools can be found along the top bar. These are the options to flip, rotate, or
shift tiles on the canvas. They will affect anything that is highlighted. If using the left click, the
default size of the box is a 4x4 tile area. However, if you right click, anything from a single tile all
the way up to the entire canvas can be selected and altered (outlined by a blue box).
If we examine the editor, we can see that below the area on the right there are different color and
palette options. Sixteen possible palette options can be configured, though one’s choices will not be
saved for future use. Each palette choice consists of four colors, and the active palette also displays
the hexadecimal values of each color. To change what colors are in the active palette, simply click
on one of the four active colors and then click on your desired choice in the full NES spectrum
below. As stated in the previous chapter, the colors will appear differently depending on both the
display and format on which they are projected (CRT, LCD, emulator, actual hardware, etc.). Still,
one blue can be seen to be darker or lighter than another blue. When drawing in YY-CHR, any of
the blacks can be used, though when programming some should be avoided ($0D and another if I
recall).
A quick way to gain some familiarity with the program is to open up a few NES or Game Boy files
and see what is in them. It allows us to see how programmers organized their work and also the
chance to start playing around with some of the tools. If you do not have anything handy, there is
a .chr file included with The Nerdy Nights. Please note that some files will not load correctly (Zelda
for instance), and most will display garbled information at first. After opening a file, scroll down
until you find something besides static. Here is a sample screen from a popular Game Boy game:
Due to the currently selected colors, it can be difficult to tell what exactly one is looking at.
Changing only the colors gives us the following:
Now that we can see what we are looking at, we can also see that things are a bit out of sorts. Well,
kind of anyways. Things as they are difficult to see visually, though they are arranged ideally from a
(i.e. my) programming perspective. Thankfully, YY-CHR gives us the option of rearranging the
tiles. Nine times out of ten, one of the options will be satisfactory. To do this, click on the Pattern
option a bit below the display on the left. It will adjust in real time, as can be seen here:
It is now possible to see what the artist(s) envisioned for three characters and more. The pattern that
is displayed is easy enough to decipher, and it gets easier to do with practice. Pay special attention
to mirrored poses. Also, sprites are generally easier to see than background data. Different pattern
layouts and dominant colors may also be used when switching tile types.
Just to note, if using CHR-RAM in one’s program, individual tiles can be swapped out as needed,
and one is not limited to a static set 256 tiles. One could arrange all of a main character’s views and
poses in one block of memory, as in the above example, and then load them into the program as
needed. One of the benefits of the 256 tile grid, whether using CHR-RAM or CHR-ROM, is that it
allows one to see everything that might be on a screen at once, decreasing the chance that one might
conceptually plan for too much to be going on at a given time. In regards to sprites one needs to
keep in mind player character and non-player character poses, weapons, items, status bar details,
and a slew of other things. When dealing with background data, reserving space for an upper and
lower case alphabet can eat up roughly a quarter of the available tiles. Add to this status bar
elements or other stable features and the number grows to a good amount prior to drawing anything
unique to a screen. Dealing with some of these concerns, however, is a topic for a different day.
As stated above, when jumping between background and sprite tiles, color choices may be off. This
is also a common occurrence when combining assets from different games. In my own drawing
practices I often search for different examples to analyze and deconstruct prior to attempting
anything creative on my own part. Here we can see two sprites from two different games side-by-
side:
The character on the right, while mostly accurate, needs a couple of colors swapped. YY-CHR
allows us to do this relatively easily: highlight the desired group of tiles and then use control R to
bring up the replace color option.
The screen will look like this:
As a friendly warning let me just say that it can be a bit confusing to get the results that you want.
YY-CHR is a powerful tool, but it is one that really needs to be used in order to be understood. Play
around with it, analyze what others have done, draw some things and post them here or elsewhere.
Practice makes perfect as they say. Also, if you discover other useful features please share them. I
only use what I need, hence why this discussion is a bit abbreviated.
The main limitation of using a program like YY-CHR, at least in my own experience, is that it only
allows us to see four colors at once. If we are layering sprites, adding faces, or simply want to see
what two background tiles might look like next to each other, we are forced to use other programs.
If programming, one option is simply to load the tiles into a mock or dummy game. In the past I
have kept a stripped down program that only displays a single screen just for this purpose. Another
option, particularly if actively drawing with more than three or four colors, is to use Paint or a
comparable program. At the moment, this is what I use for the majority of my drawing. Simply
zoom in all the way to the pixel level and get to work. I generally draw a 16x16 box in order to not
go outside of a metatile. One should also be sure to not let any layered sprites exceed an 8x8 area
(unless you want them to of course). Another benefit of using Paint is that it allows one to stack
revisions next to each other. For any given character I can have tens or hundreds of revisions on the
same canvas, and can trace the evolution of my work. When it comes time to actually load tiles
back into YY-CHR, each pixel must be transferred over. I have heard that there is a way to copy and
paste from Paint to the graphics editor, and I used to use it, but I’m not sure what happened. Plus,
like the monks of old, there is a certain benefit to re-drawing one’s final work. I usually tell myself
that before and after the process, but forget it during the transfer. On the plus side, some mistakes
are usually caught in terms of consistency. Anyways, if anyone knows how to get the copy and
paste to work, please speak up!
Now get drawing and let us see what you can do! Further chapters will deal with various smaller
issues, and a great many of them are directly related to graphics. Until next time...
Top Status Bar and Controller Question
Written by bigjt_2
Hey Everyone,
I've been able to figure out how to get a status bar working at the bottom of the screen using sprite 0
collision (thanks to the help of albaily and thefox, of course). Now I've been experimenting with
moving the status bar to the top of the screen. I cannot get it to work without it going all buggy.
Here's my NMI routine:
NMI:
PHA ;save registers in case of interrupt
TXA
PHA
TYA
PHA
LDA #$00
STA $2003 ; set the low byte (00) of the RAM address
LDA #$02
STA $4014 ; set the high byte (02) of the RAM address, start the transfer
LDA needdraw
AND #%00000001
BEQ .skipcolumns
JSR load_offscreen_tiles
JSR load_offscreen_attributes
.skipcolumns:
LDA needdraw
AND #%00000010
BEQ .skipscore
JSR DrawScore
.skipscore:
LDA soft_2000
STA $2000
LDA soft_2001
STA $2001
.Scrolling:
LDA $2002 ;reset the latch
LDA xScroll
STA $2005
LDA #$00
STA $2005
LDA #$00
STA sleeping
.Sprite0_testone:
BIT $2002
BVS .Sprite0_testone
.Sprite0_testtwo:
BIT $2002
BVC .Sprite0_testtwo
LDA #%10010000
STA $2000
LDA $2002
LDA #$00
STA $2005
STA $2005
PLA ; restore registers

TAY
PLA
TAX
PLA
I've attached the full ROM with all the asm and info files in a zip with this post. At first I tried
placing the $2000 and $2005 changes above the sprite 0 code and had it all at the beginning of
NMI. Also, I moved my Sprite 0 coordinates (found in init_PPU.asm) to intersect the lower edge of
the status bar at the top of the screen. For whatever reason, I thought that would work. Plop. It
didn't.
Also, I've noticed I have a problem coding controller input. Pushing the "A" and "B" buttons in this
demo is supposed to increment/decrement that two-digit counter at the bottom by one. Instead, it
increment/decrements by 6-12 (it changes every time). I'm guessing it's because each time I push a
button, it picks it up over the course of many frames instead of just one, and hence reads it as
though I pushed the button many times instead of just one. Am I right? If so, anyone know any
smooth tricks around this? I ran into the same problem awhile ago when I was trying to do a pause
routine.
PS - Some of my comments may be incorrect in the ASM files (wrong decimal value, for instance)
Sorry if anyone runs into that.
Attachment: spr0_scroll.zip
Background Horizontal Scrolling Buffer
Written by bigjt_2
Hey Everyone,
I just wanted to share a horizontal background scrolling engine I've been working on. I've noticed a
lot of other noobs besides myself asking about how to do this, so wanted to post this to hopefully
help out. It's probably not the most effective way to handle background scrolling and buffering, it's
just simply how I got it to work. I'm sure there are improvements that can be made, and if anyone
has them to suggest it would be awesome.
One area where I know improvements can be made is incorporating metatiles and data
compression. The backgrounds are loaded in byte by byte and it can be a bit time-consuming
making them and then re-arranging them to work with the buffer. Also, since I don't know how to
use compression and each background takes up about 1K of memory, that only gives you about 31
backgrounds you can load in using this engine. Eh, who needs worlds 1-2 through 8-4 of Super
Mario Bros, anyway? :-)
So think of it as "beginner's" scrolling engine, one that would hopefully be easier to understand by
noobs such as myself. I was simply focused on getting something up and running, and then figuring
out the more complicated stuff later. I'm going to start studying said complicated stuff very shortly.
In the mean time, if anyone needs to look at something more advanced, you might check out
Cadaver's more sophisticated scrolling engine that handles vertical and horizontal scrolling. It's
excellent and can be found at nesdev (http://parodius.com/scroll.zip).&...; I've been looking at it a
little bit but haven't delved too deeply into the code, yet.
--Edit: Forgot to mention: hit left and right to scroll. Sorry. It'll stop you from scrolling any farther
when you've hit the left edge of first background or right edge of last background.
--Further Edit 5-19-10: I just noticed this has been posted to the Programming Resources sticky.
Just so I don't confuse anyone who might be looking, I've removed all earlier versions of this demo.
It's gone through a lot of changes and the first few versions were not at all ideal. I will keep the
most current, updated version posted at the top of the thread here to make it easier for people to
find.
Attachment: top_status_bar.zip

Nerdy Nights NES Tutorials v1

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Nerdy Nights NES Tutorials v1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nerdy Nights NES Tutorials v1

Uploaded by

Copyright:

Available Formats

About This PDF (v1.

Advanced Tutorial Series

Audio Tutorial Series

16's place 1's place

And more examples:

Core Programming Concepts

NES System Architecture

PRG - Program memory, the code for the game

CHR - Character memory, the data for graphics

CPU - Central Processing Unit, the main processor chip

PPU - Picture Processing Unit, the graphics chip

It will find/replace to:

6502 Processor Overview

$0000-0800 - Internal RAM, 2KB chip in the NES

6502 Assembly Overview

STY $00BA ;store Y into memory location $00BA

6502 Instruction Set

Common Load/Store opcodes

LDA #$0A ; LoaD the value 0A into the accumulator A

ADC #$01 ; ADd with Carry

Common Comparison opcodes

Common Control Flow opcodes

.inesprg 1 ; 1x 16KB bank of PRG code

Completing The Program

And the new code to set up the sprite data:

Putting It All Together

.db $80, $33, $00, $88 ;sprite 1

.db $88, $34, $00, $80 ;sprite 2

.db $88, $35, $00, $88 ;sprite 3

; if compare was equal to 16, continue down

LDA $4016 ; player 1 - A

LDA $4017 ; player 2 - A

01011011 controller data

LDA $4016 ; player 1 - A

LDA $4016 ; player 1 - B

LDA $4016 ; player 1 - Select

AND #%00000001 ; erase everything but bit 0

BEQ ReadADone ; branch to ReadADone if button is NOT pressed (0)

; add instructions here to do something when button IS pressed (1)

ReadADone: ; handling this button is done

LDA $0203 ; load sprite X (horizontal) position

CLC ; make sure the carry flag is clear

STA $0203 ; save sprite X (horizontal) position

LDA $0203 ; load sprite position

SEC ; make sure carry flag is set

STA $0203 ; save sprite position

Putting It All Together

.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;all sky ($24 = sky)

.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;row 2

.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24 ;;all sky

.db $24,$24,$24,$24,$45,$45,$24,$24,$45,$45,$45,$45,$45,$45,$24,$24 ;;row 3

.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$53,$54,$24,$24 ;;some brick tops

.db $24,$24,$24,$24,$47,$47,$24,$24,$47,$47,$47,$47,$47,$47,$24,$24 ;;row 4

.db $24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$24,$55,$56,$24,$24 ;;brick bottoms

And finally the same color palette as SMB is used:

LDA $2002 ; read PPU status to reset the high/low latch

STA $2006 ; write the high byte of $2000 address

STA $2006 ; write the low byte of $2000 address

LDX#$00 ; start out at 0