[go: up one dir, main page]

0% found this document useful (0 votes)
102 views206 pages

ARC4. Programmers Reference

It is the specifications for arc processor

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views206 pages

ARC4. Programmers Reference

It is the specifications for arc processor

Uploaded by

Amit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 206

ARCtangent™-A4

Programmer’s Reference
ARCtangent™-A4 Programmer’s Reference

ARC™ International
European Headquarters North American Headquarters
ARC House 2025 Gateway Place, Suite 140
Waterfront Business Park San Jose, CA 95110 USA
Elstree Road Tel. 408.437.3400
Elstree, Herts WD6 3BS UK Fax 408.437.3401
Tel. +44 (0) 20.8236.2800
Fax +44 (0) 20.8236.2801
www.arc.com

Confidential and proprietary information


© 2000-2002 ARC International (unpublished). All rights reserved.
Notice
This document contains confidential and proprietary information of ARC International and is protected
by copyright, trade secret, and other state, federal, and international laws. Its receipt or possession does
not convey any rights to reproduce, disclose its contents, or manufacture, use, or sell anything it may
describe. Reproduction, disclosure, or use without specific written authorization of ARC International is
strictly forbidden.
The product described in this manual is licensed, not sold, and may be used only in accordance with the
terms of an end-user license agreement (EULA) applicable to it. Use without an EULA, in violation of
the EULA, or without paying the license fee is unlawful.
Every effort is made to make this manual as accurate as possible. However, ARC International shall have
no liability or responsibility to any person or entity with respect to any liability, loss, or damage caused
or alleged to be caused directly or indirectly by this manual, including but not limited to any interruption
of service, loss of business or anticipated profits, and all direct, indirect, and consequential damages
resulting from the use of this manual. The entire ARC International warranty and liability in respect of
use of the product are set out in the EULA.
ARC International reserves the right to change the specifications and characteristics of the product
described in this manual, from time to time, without notice to users. For current information on changes
to the product, users should read the readme file and/or release notes that are contained in the distribution
media. Use of the product is subject to the warranty provisions contained in the EULA.
Trademark acknowledgments—ARC, the ARC logo, ARCangel, ARCform, ARChitect,
ARCompact, ARCtangent, BlueForm, CASSEIA, High C/C++, High C++, iCon186, MetaDeveloper,
Precise Solution, Precise/BlazeNet, Precise/EDS, Precise/MFS, Precise/MQX, Precise/MQX Test Suites,
Precise/MQXsim, Precise/RTCS, Precise/RTCSsim, SeeCode, TotalCore, Turbo186, Turbo86,
V8 µ-RISC, V8 microRISC, and VAutomation are trademarks of ARC International. High C and
MetaWare are registered under ARC International. All other trademarks are the property of their
respective owners.

5050-001 August-2002

ii ARCtangent™-A4 Programmer’s Reference


Contents
Chapter 1 — Preface 1
Key Features 1

Chapter 2 — Architectural Description 5


Introduction 5
Programmer’s Model 5
Core register set 6
Auxiliary register set 7
The Host 7
Extensions 9
Extension core registers 9
Extension auxiliary registers 9
Extension instruction set 10
Extension condition codes 10
System Customization 11
Memory controller 11
Load store unit 11
Interrupt unit 11
Debugging Features 12
Power Management 12

Chapter 3 — Data Organization and


Addressing 15
Introduction 15
Operand Size 15
Data Organization 16
Registers 16
Immediate data 16
Memory 16
Addressing Modes 16
Memory Addressing 19
Instruction Format 19
Register 20

iii ARCtangent™-A4 Programmer’s Reference


Contents

Short immediate 20
Long immediate 20
Branch 20
Register Notation 20

Chapter 4 — Interrupts 23
Introduction 23
ILINK Registers 23
Interrupt Vectors 23
Interrupt Enables 24
Returning from Interrupts 25
Reset 26
Memory Error 26
Instruction Error 26
Interrupt Times 27
Alternate Interrupt Unit 27

Chapter 5 — Instruction Set Summary 29


Introduction 29
Arithmetic and Logical Operations 29
Null Instruction 30
Single Operand Instructions 30
Jump, Branch and Loop Operations 33
Zero Overhead Loop Mechanism 35
LP_COUNT must not be loaded directly from memory 37
Single instruction loops 37
Loop count register 38
Branch and jumps in loops 39
Instructions with long immediate data: correct coding 40
Instructions with long immediate data: incorrect coding 40
Valid instruction regions in loops 41
Breakpoint Instruction 43
BRK instruction in delay slot 44
Sleep Instruction 44
SLEEP instruction in delay slot 45
SLEEP instruction in delay slot of Jump 46
SLEEP instruction in single step mode 46
Software Interrupt Instruction 46
SWI instruction format 46
Load and Store Operations 46
Auxiliary Register Operations 48

iv ARCtangent™-A4 Programmer’s Reference


Contents

Extension Instructions 49
Optional Extensions Library 49
Multiply 32 X 32 50
Barrel shift/rotate block 51
Normalize instruction 51
SWAP instruction 52
MIN/MAX instructions 53

Chapter 6 — Condition Codes 55


Introduction 55
Condition Code Register 55
Condition Code Register Notation 55
Condition Code Test 56

Chapter 7 — Register Set Details 59


Core Register Set 60
Link registers 61
Loop count register 61
Immediate data indicators 61
Extension core registers 62
Multiply result registers 62
Auxiliary Register Set 62
Status register 63
Semaphore register 63
Loop control registers 65
Identity register 65
Debug register 65
Extension auxiliary registers 67
Optional extensions auxiliary registers 67
Multiply restore register 67

Chapter 8 — Instruction Set Details 69


Introduction 69
Instruction Map 69
Addressing Modes 71
Dual operand instructions 71
Single operand instructions 72
Branch type Instructions 72
Jump Instruction 72
Load Instruction 73
Store instruction 73
Load from auxiliary register instruction 74

ARCtangent™-A4 Programmer’s Reference v


Contents

Store to auxiliary register instruction 74


Instruction Encoding 75
Register 76
Short immediate 77
Single operand 77
Branch 77
Instruction Set Details 79
ADC 80
ADD 81
AND 82
ASL/LSL 83
ASL multiple 84
ASR 85
ASR multiple 86
BIC 87
Bcc 88
BLcc 89
BRK 91
EXT 92
FLAG 93
Jcc 95
JLcc 97
LD 99
LPcc 101
LR 102
LSL 103
LSR 104
LSR multiple 105
MAX 106
MIN 107
MOV 108
MUL64 109
MULU64 111
NOP 113
NORM 114
OR 116
RLC 117
ROL 118
ROR 119
ROR multiple 120
RRC 121
SBC 122
SEX 123

vi ARCtangent™-A4 Programmer’s Reference


Contents

SLEEP 124
SR 125
ST 126
SUB 128
SWAP 129
SWI 130
XOR 131

Chapter 9 — The Host 133


Halting 135
Starting 135
Pipecleaning 136
Single Stepping 137
Single cycle step 137
Single instruction step 138
SLEEP instruction in single step mode 138
BRK instruction in single step mode 138
Software Breakpoints 139
ARCtangent-A4 Core Registers 139
ARCtangent-A4 Auxiliary Registers 139
STATUS 139
SEMAPHORE 139
IDENTITY 140
DEBUG 140
ARCtangent-A4 Memory 140

Chapter 10 — Pipeline and Timings 141


Introduction 141
Stage 1. Instruction fetch 141
Stage 2. Operand fetch 141
Stage 3. ALU 142
Stage 4. Write back 142
Pipeline-Cycle Diagram 142
Arithmetic and Logic Function Timings 143
Immediate Data Timing 144
Short immediate 144
Long immediate 144
Destination immediate 145
Conditional Instruction Timing 146
Extension Instruction Timings 147
Single cycle extension instructions 147
Multi cycle extension instructions 147

ARCtangent™-A4 Programmer’s Reference vii


Contents

Multiply timings 148


Barrel shift timings 150
Jump and Branch Timings 151
Jump instruction 151
Jump and nullify delay slot instruction 152
Jump and execute delay slot instruction 152
Jump with immediate address 153
Jump setting flags 154
Conditional jump 154
Jump and link 156
Branch 158
Conditional branch 159
Software breakpoints 160
Software breakpoint return address calculation 160
Branch and link 161
Loop Timings 162
Loop set up 162
Conditional loop 164
Loop execution 165
Single instruction loops 166
Reading loop count register 168
Writing loop count register 169
Branch and jumps in loops 171
Software breakpoints in loops 172
Instructions with long immediate data 172
Flag Instruction Timings 173
Breakpoint 173
Sleep Mode 174
Load and Store Timings 176
Load 176
Store 178
Auxiliary Register Access 179
Load from register (LR) 179
Store to register (SR) 180
Interrupt Timings 181
Interrupt on arithmetic instruction 182
Software interrupt 183
Interrupt on jump, branch or loop set up 184
Interrupt on loop execution 185
Interrupt on load 186
Interrupt on store 186
Interrupt on auxiliary register access 186

viii ARCtangent™-A4 Programmer’s Reference


Contents

Single Instruction Step 186


Single instruction step on single word instructions 186
Single instruction step on instruction with long immediate data 187

Index 189

ARCtangent™-A4 Programmer’s Reference ix


List of Figures
Figure 1 Data flows in the ARCtangent-A4 architecture 6
Figure 2 Example Host Memory Maps 8
Figure 3 Power Management Block Diagram 13
Figure 4 Status Register 25
Figure 5 PC Update and Loop Detection Mechanism for Loops 36
Figure 6 Valid Instruction Regions in Loops 42
Figure 7 32x32 Multiply 50
Figure 8 Barrel Shift Operations 51
Figure 9 Norm Instruction 52
Figure 10 SWAP Instruction 52
Figure 11 Status Register 55
Figure 12 Core Register Map 59
Figure 13 Auxiliary Register Set 60
Figure 14 Status Register 63
Figure 15 Semaphore Register 63
Figure 16 Loop Start Register 65
Figure 17 Loop End Register 65
Figure 18 Identity Register 65
Figure 19 Debug Register 65
Figure 20 Multiply Restore Register 67
Figure 21 Example Host Memory Maps 134
Figure 22 ARCtangent-A4 Pipeline 141
Figure 23 Pipeline-Cycle Diagram 142

x ARCtangent™-A4 Programmer’s Reference


List of Figures

ARCtangent™-A4 Programmer’s Reference xi


List of Tables
Table 1 Core Register Map 6
Table 2 Auxiliary Register Map 7
Table 3 Data Addressing Modes 18
Table 4 Key for Addressing Modes and Conventions 18
Table 5 Interrupt Summary 24
Table 6 Arithmetic and Logical Instructions 30
Table 7 Single Operand Instructions: Move and Extend 31
Table 8 Single Operand Instructions: Rotates and Shifts 32
Table 9 Single Operand Instructions: Flags and Halts 32
Table 10 Jump, Branch and Loop Instructions 33
Table 11 Load and Store Instructions 47
Table 12 Auxiliary Register Operations 48
Table 13 Condition Codes 57
Table 14 Core Register Map 61
Table 15 Multiply Result Registers 62
Table 16 Auxiliary Register Set 63
Table 17 Basecase Instruction Map 71
Table 18 Host Accesses to the ARCtangent-A4 processor 134
Table 19 Single Step Flags in Debug Register 137

ARCtangent™-A4 Programmer’s Reference xii


Chapter 1 — Preface
This manual is aimed at programmers of the ARCtangent™-A4 processor and
serves as a reference to the instruction set of the basecase ARCtangent-A4
(version 8) core.
Programmer’s reference for additional extensions or customizations that have
been implemented in the target ARCtangent-A4 system are contained in other
manuals.

Key Features
Data Paths
• 32-Bit Data Bus
• 32-Bit Load/Store Address Bus
• 32-Bit Instruction Bus
• 24-Bit Instruction Address Bus
Registers
• 32 General Purpose Core Registers
• Auxiliary Register Set
Load/Store Unit
• Delayed Load mechanism with Register Scoreboard
• Buffered Store
• Address Register Write-Back
Program Flow
• 4 Stage Pipeline
• Single Cycle Instructions

ARCtangent™-A4 Programmer’s Reference 1


Key Features

• All ALU Instructions are Conditional


• Single Cycle Immediate Data

Preface
1
• Jumps and Branches with Single Instruction Delay Slot
• Delay Slot Execution Modes
• Zero Overhead Loops
Interrupts and Exceptions
• Levels of Exception
• Non-Maskable Exceptions
• Maskable External Interrupts in basecase ARCtangent-A4 processor
Extensions
• 16 Extension Dual Operand Instruction Codes
• 55 Extension Single Operand Instruction Codes
• 28 Extension Core Registers
• 32 Bit addressable Auxiliary Register Set
• 16 Extension Condition Codes
• Build Configuration Registers
System Customizations
• Host Interface
• Separate Memory Controller
• Separate Load/Store Unit
• Separate Interrupt Unit
Host Interface Debug Features
• Start, stop and single step the ARCtangent-A4 processor via special registers
• Check and change the values in the register set and ARCtangent-A4 memory
• Communicate via the semaphore register and shared memory
• Perform code profiling by reading the status register

2 ARCtangent™-A4 Programmer’s Reference


Key Features

• Breakpoint Instruction

Prefacen
Power Management

1
• Sleep Mode
• Clock Gating Option

ARCtangent™-A4 Programmer’s Reference 3


Chapter 2 — Architectural Description

Introduction
The ARCtangent-A4 is a 4-stage pipeline processor incorporating full 32-bit
instruction, data and addressing. In line with RISC (reduced instruction set
computer) based architectures, ARCtangent-A4 has an orthogonal instruction set
with all addressing modes implemented on all arithmetic and logical instructions.
The architecture is extendible in the instruction set and registers. These
extensions will be touched upon in this document but covered fully in other
documents.
This document describes the minimum basecase version of ARCtangent-A4 with
which all future designs incorporating ARCtangent-A4 must adhere to.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Programmer’s Model
The programmer’s model is common to all implementations of ARCtangent-A4
processor to allow upward compatibility of code.
Logically, ARCtangent-A4 processor is based around a 3 (or 4)-port core register
file with many of the instructions being dual operand and 1 destination register.
Other registers are contained in the auxiliary register set and are accessed with
the LOAD-REGISTER/STORE-REGISTER commands or other special
commands.

ARCtangent™-A4 Programmer’s Reference 5


Programmer’s Model

3-port core register file

memory controller
Architectural
Description
2

d bus s2 bus s1bus


result load/store

source 2 source 1
pc
pc controller

i bus
auxiliary instruction decoder
registers ALU

Figure 1 Data flows in the ARCtangent-A4 architecture

Core register set


Number Core register name Function
0-28 r0-r28 General Purpose Registers
29 ILINK1 or r29 Maskable interrupt link register
30 ILINK2 or r30 Maskable interrupt link register
31 BLINK or r31 Branch link register
32-59 r32-r59 Register space reserved for extensions
60 LP_COUNT Loop count register (24 Bits)
61 - Short immediate data indicator setting flags
62 - Long immediate data indicator
63 - Short immediate data indicator not setting flags

Table 1 Core Register Map


The core register set in ARCtangent-A4 processor is shown in Table 1. Other
predefined registers are in the auxiliary register set and they are shown in Table

6 ARCtangent™-A4 Programmer’s Reference


The Host

2. The general purpose registers (r0-r28) can be used for any purpose by the
programmer.

Auxiliary register set

Architectural
Description
Number Auxiliary register name Description
0x0 STATUS Status register

2
0x1 SEMAPHORE Inter-process/Host semaphore register
0x2 LP_START Loop start address (24 bits)
0x3 LP_END Loop end address (24 bits)
0x4 IDENTITY ARCtangent-A4 Identification register
0x5 DEBUG Debug register
0x60 - RESERVED Build Configuration Registers
0x7F

Table 2 Auxiliary Register Map


The auxiliary register set contains special status and control registers. Auxiliary
registers occupy a special address space that is accessed using special load and
store instructions, or other special commands. The basecase ARCtangent-A4
processor uses 6 status and control registers, and reserves the additional registers
0x60 to 0x7F, leaving the rest of the 232 registers for extension purposes.

The Host
The ARCtangent-A4 processor was developed with an integrated host interface
to support communications with a host system. The ARCtangent-A4 processor
can be started, stopped and communicated by the host system using special
registers. Further information is contained in later sections of this manual.
Most of the techniques outlined here will be handled by the software debugging
system, and the programmer, in general, need not be concerned with these
specific details.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

ARCtangent™-A4 Programmer’s Reference 7


The Host

It is expected that the registers and the program memory of ARCtangent-A4


processor will appear as a memory mapped section to the host. For example,
Figure 2 shows two examples: a) a contiguous part of host memory and b) a
Architectural
section of memory and a section of I/O space.
Description
Memory Map I/O Map
2

ARCtangent-A4 Core ARCtangent-A4 Core


Registers Registers

ARCtangent-A4 Auxiliary ARCtangent-A4 Auxiliary


Registers Registers

Memory Map

ARCtangent-A4 memory ARCtangent-A4 memory

a)Single Memory Map b)Memory Map with I/O


Map

Figure 2 Example Host Memory Maps

8 ARCtangent™-A4 Programmer’s Reference


Extensions

Extensions
The ARCtangent-A4 processor is designed to be extendible according to the

Architectural
Description
requirements of the system in which it is used. These extensions include more
core and auxiliary registers, new instructions, and additional condition code tests.

2
This section is intended to inform the programmer of the ARCtangent-A4
processor where these extensions occur and how they affect the programmer's
view of the ARCtangent-A4 processor.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Extension core registers


The core register set has a total of 64 different addressable positions. The first 29
are general purpose basecase registers, the next 3 are the link registers and the
last 4 are the loop count register and immediate data indicators. This leaves
positions 32 to 59 for extension purposes. The extension registers are referred to
as r32, r33,..etc....,r59. The core register map is shown in Table 1.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Extension auxiliary registers


The auxiliary registers are accessed with 32-bit addresses and are long word data
size only. Extensions to the auxiliary register set can be anywhere in this memory
address space excepting those positions defined in the basecase for auxiliary
registers. They are referred to using the load from register (LR) and store to
register (SR) instructions or special extension instructions. The reserved auxiliary
register addresses are shown in Table 2.
NOTE If an auxiliary register position that does not exist is read, then the ID register
value is returned.

The auxiliary register address region 0x7F up to 0x80, is reserved for the Build
Configuration Registers (BCR) that can be used by embedded software or host
debug software to detect the configuration of the ARCtangent-A4 hardware. The
Build Configuration Registers contain the version of each ARCtangent-A4
extension, as well as configuration information that is build specific. The

ARCtangent™-A4 Programmer’s Reference 9


Extensions

registers are available for ARCtangent-A4 basecase version 8 processor onwards


and will always remain backwardly compatible.
NOTE The Build Configuration Registers are fully described associated
Architectural documentation.
Description
The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
2

Extension instruction set


Instructions are encoded onto the instruction word using a 5 bit binary number.
This gives 32 separate instructions. The first 16 instructions are defined in the
basecase ARCtangent-A4 processor. The remaining 16 instructions are available
for extension. The basecase and extension instruction codes are given in Table
17.
Extension instructions can be used in the same way as the normal ALU
instructions, except an external ALU is used to obtain the result for write-back to
the core register set.

Extension condition codes


The condition code test on an instruction is encoded using a 5 bit binary number.
This gives 32 different possible conditions that can be tested. The first 16 codes
(00-0F) are those condition codes defined in the basecase version of
ARCtangent-A4 processor which use only the internal condition flags from the
status register (Z, N, C, V), see Table 13 Condition Codes.
The remaining 16 condition codes (10-1F) are available for extension and are
used to:
• provide additional tests on the internal condition flags or
• test extension status flags from external sources or
• test a combination external and internal flags
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

10 ARCtangent™-A4 Programmer’s Reference


System Customization

System Customization
As well as the extensions mentioned in the previous section, ARCtangent-A4

Architectural
Description
processor can be additionally customized to match memory, cache, and interrupt
requirements. This is achieved by using a separate memory controller, load/store

2
unit and interrupt unit.

Memory controller
This unit is defined according to the memory system with which the
ARCtangent-A4 processor is being used. Instruction-cache, data-cache, DRAM
control, instruction versus data arbitration and other memory specific logic will
be defined in the memory controller.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Load store unit


The load store unit contains the register scoreboard for marking which registers
are waiting to be written from the result of delayed loads. The size of the
scoreboard is changed according to the number of delayed loads that the memory
controller can accommodate at any given time. The load store unit can
additionally be modified to provide result write-back and register scoreboard for
multi-cycle extension instructions.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Interrupt unit
The interrupt unit contains the exception and interrupt vector positions, the logic
to tell the ARCtangent-A4 which of the 3 levels of interrupt has occurred, and the
arbitration between the interrupts and exceptions. The interrupt unit can be
modified to alter the priority of interrupts, the vector positions and the number of
interrupts.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

ARCtangent™-A4 Programmer’s Reference 11


Debugging Features

Debugging Features
It is possible for the ARCtangent-A4 to be controlled from a host processor using
Architectural special debugging features. The host can:
Description
• start and stop the ARCtangent-A4 processor via the status and debug register
2

• single step the ARCtangent-A4 processor via the debug register


• check and change the values in the register set and ARCtangent-A4 memory
• communicate with the ARCtangent-A4 processor via the semaphore register
and shared memory
• perform code profiling by reading the status register
• enable software breakpoints by using Bcc instruction
• enable software breakpoints by using BRK instruction
With these abilities it is possible for the host to provide software breakpoints,
single stepping and program tracing of the ARCtangent-A4 processor.
It is possible for the ARCtangent-A4 processor to halt itself with the FLAG
instruction. The self halt bit (SH) in the debug register is set if the ARCtangent-
A4 processor halts itself.

Power Management
ARCtangent-A4 basecase version 8 processor and above have special power
management features. The SLEEP instruction halts the ARCtangent-A4
processor and halts the pipeline until an interrupt or a restart occurs. Sleep mode
stalls the core pipeline and disables any on-chip RAM.
Optional clock gating is provided which will switch off all non-essential clocks
when the ARCtangent-A4 processor is halted or the ARCtangent-A4 processor is
in sleep mode. This means the internal ARCtangent-A4 control unit is not active
and major blocks are disabled. The host interface, interrupt unit and memory
interfaces are always left enabled to allow host accesses and "wake" feature. The
following diagram shows a summary of the clock gating and sleep circuitry.

12 ARCtangent™-A4 Programmer’s Reference


Power Management

Normal Domain
clk
HOST I/F Gated Domain

Memory

Memory I/F

Architectural
Description
ARCtangen
t-A4

2
Control
Logic core

Interrupts

Figure 3 Power Management Block Diagram

ARCtangent™-A4 Programmer’s Reference 13


Chapter 3 — Data Organization and
Addressing

Introduction
This chapter describes the data organization and addressing of the ARCtangent-
A4 processor.

Operand Size
The ARCtangent-A4 is a 32-bit word architecture and as such most operations
are with 32-bit data. However, there are a few exceptions.
The basic data types are:
• Long word (32-bit) for register-register operation, immediate
data and load/store
• Word (16-bit) for load/store operations only
• Short Immediate (9-bit) for short immediate data only
• Byte (8 bit) for load/store operations only
and addressing:
• absolute (32-bit) for load/store and jumps
• relative (20-bit) for branch and loop

ARCtangent™-A4 Programmer’s Reference 15


Data Organization

Data Organization
Registers
The core registers and auxiliary registers are 32-bit (long word) wide.

Immediate data
Data Organization and

The immediate data as an operand can be 32-bit (long immediate), or 9-bit sign
Addressing

extended to 32-bit (short immediate).


3

Memory
The memory operations (load and store) can have data of 32 bit (long word), 16
bit (word) or 8 bit(byte) wide. Byte operations use the low order 8 bits and may
extend the sign of the byte across the rest of the long word depending on the
load/store instruction. The same applies to the word operations with the word
occupying the low order 16 bits. Data memory is accessed using byte addresses,
which means long word or word accesses can be supplied with non-aligned
addresses. The following should be supported as a minimum:
• long words on long word boundaries
• words on word boundaries
• bytes on byte boundaries
There is no "unaligned access exception" available in the ARCtangent-A4
processor. The basecase ARCtangent-A4 processor is “Endian free”, in that the
endianness of the implemented ARCtangent-A4 system is dependant entirely on
the memory system.

Addressing Modes
The addressing modes that the instructions use are encoded within the register
fields of the instruction word. There are basically only 3 addressing modes:
register-register, register-immediate and immediate-immediate. However, as a
consequence of the action performed by the different instruction groups, these
can be expanded as shown in Table 3 Data Addressing Modes.

16 ARCtangent™-A4 Programmer’s Reference


Addressing Modes

Mode Syntax Operation


Register, register op a,b,c a ← b op c
Register, immediate op a,b,imm a ← b op imm
immediate, register op a,imm,c a ← imm op c
immediate, immediate op a,imm,imm a ← imm op imm
test op 0,b,c no result but b op c can set

Data Organization and


flags
test with immediate op 0,b,imm b op imm can set flags

Addressing
op 0,imm,c imm op c can set flags

3
single operand (register) single_op a,b a ← single_op b
single operand immediate single_op a,imm a ← single_op imm
single operand test single_op 0,b Single_op b can set flags
(register)
single operand test single_op 0,imm Single_op imm can set flags
immediate
flag with register flag b Flags ← b
flag with immediate flag imm Flags ← imm
load ld a,[b,c] a ← data at address [b+c]
load with immediate offset ld a,[b,imm] a ← data at address [b +
ld a,[imm,c] imm]
a ← data at address [imm +
c]
load from immediate ld a,[imm] a ← data at address [imm]
address
load from auxiliary register lr a,[b] a ← data in reg. at address
lr a,[imm] [b]
a ← data in reg. at address
[imm]
store st c,[b] Data at address [b] ← c
store with immediate offset st c,[b,shimm] Data at address [b + shimm]
←c
store to immediate address st c,[imm] Data at address [imm] ← c
store 0 st 0,[b] Data at address [b] ← 0

ARCtangent™-A4 Programmer’s Reference 17


Addressing Modes

Mode Syntax Operation


store shimm with st Data at address [b + shimm]
immediate offset shimm,[b,shimm] ← shimm (shimms MUST
match)
store limm with immediate st limm,[b,shimm] Data at address [b + shimm]
offset ← limm
store to auxiliary register sr c,[b] Data in reg. at address [b] ←
Data Organization and

sr c,[imm] c
Addressing

Data in reg. at address [imm]


←c
3

Table 3 Data Addressing Modes


Key for addressing modes and conventions
← replaced by
a result register b operand register 1
c operand register 2 C carry flag in status register
op instruction single_op single operand instr
ld load instruction st store instruction
lr load from auxiliary sr store to auxiliary register
register instruction instruction
flag flag instruction imm immediate data
(long or short)
limm long immediate shimm short immediate
(32 bit constant) (signed 9 bit constant)
addr absolute address rel_addr relative address
<cc> optional condition code <f> optional set flags field
<zz> optional size field <di> optional data cache bypass
field
<a> optional address write- <x> optional sign extend field
back field
<dd> optional delay slot . separator if other fields are
execution mode used
Table 4 Key for Addressing Modes and Conventions

18 ARCtangent™-A4 Programmer’s Reference


Memory Addressing

Memory Addressing
Branch and jump instructions that refer to memory (i.e. J, JL, B, BL, LP) contain
an address. This address is referred to in the form [n:2], where n is the most
significant bit of the word. It is used as a long-word offset or address, but the
numbering has retained the convention for byte addressing.
As an example to refer to the address 4 long words forward in a branch

Data Organization and


instruction would be 16 bytes ahead but only bits 21:2 are encoded. However, the

Addressing
syntax in assembly language would still be in bytes. Therefore, to branch 4 long
words forward, the syntax would be bra 16, although it is unlikely that a

3
programmer would specify a branch’s relative address in such a way.
With the load and store commands (LD and ST), the address calculated by the
instruction is passed as a 32-bit word to the memory controller, and used as a
byte address.
An interrupt may be caused by the memory controller if the size of the operation
and the address are incompatible, e.g. if the memory controller cannot fetch long-
words from byte boundaries. This will be dependent on the memory controller
being used with the ARCtangent-A4 processor and is not part of the basecase
ARCtangent-A4 processor.

Instruction Format
Instructions are one long word in length and may have a long word immediate
value following. There are three basic instruction layouts. The instruction is
encoded on the I field. The result of the instruction is sent to the register defined
by the A field. The two register source addresses are encoded on the B and C
fields. If the result of the instruction needs to set the flags then the F bit is set.
The condition that causes the instruction to be executed is encoded on the
condition code field Q. The reserved bits R are undefined and should be set to 0.
The L field in the branch type instruction specifies the signed relative jump
address and the N field is used in jumps and branches to nullify or execute the
next instruction. See also Chapter 5 — Instruction Set Summary and Chapter 8
— Instruction Set Details for further details.

ARCtangent™-A4 Programmer’s Reference 19


Register Notation

Register
31 27 26 21 20 15 14 9 8 7 6 5 4 3 2 1 0

I[4:0] A[5:0] B[5:0] C[5:0] F R N N Q Q Q Q Q

Short immediate
31 27 26 21 20 15 14 9 8 0
Data Organization and

I[4:0] A[5:0] B[5:0] C[5:0] D[8:0]


Addressing

Long immediate
3

First long-word, the instruction


31 27 26 21 20 15 14 9 8 7 6 5 4 3 2 1 0

I[4:0] A[5:0] B[5:0] C[5:0] F R N N Q Q Q Q Q

Second long-word, the data


31 0

Limm[32:0]

Branch
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

I[4:0] L[21:2] N N Q Q Q Q Q

Register Notation
The core registers are identified as follows:
rn general purpose register number n
ILINK1 maskable interrupt link register 1
ILINK2 maskable interrupt link register 2
BLINK branch link register
LP_COUNT loop count register (24 bits)
Example syntax:
AND r1,r2,r3 ;r1 ← r2 AND r3
AND ILINK2,r21,r21 ;ILINK2 ← r21

The auxiliary registers are identified as:

20 ARCtangent™-A4 Programmer’s Reference


Register Notation

STATUS status register


SEMAPHORE semaphore register
LP_START loop start address (24 bits)
LP_END loop end address (24 bits)
IDENTITY ARCtangent-A4 identification register

Data Organization and


DEBUG ARCtangent-A4 debug register

Addressing
Example syntax:

3
SR r5,[SEMAPHORE] ;[SEMAPHORE] ← r5
LR r4,[LP_START] ;r4 ← [LP_START]

ARCtangent™-A4 Programmer’s Reference 21


Chapter 4 — Interrupts

Introduction
The ARCtangent-A4 interrupt mechanism is such that 3 levels of interrupts are
provided.
• Exceptions like Reset, Memory Error and Invalid Instruction (high priority)
• level 1 (low priority) interrupts which are maskable
• level 2 (mid priority) interrupts which are maskable.
The exception set has the highest priority, level 2 set has middle priority and
level 1 the lowest priority.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

ILINK Registers
When an interrupt occurs, the link register, where appropriate, is loaded with the
status register containing the next PC and the current flags; the PC is then loaded
with the relevant address for servicing the interrupt.
Link register ILINK2 is associated with the level 2 set of interrupts and the two
exceptions: memory error and instruction error. ILINK1 is associated with the
level 1 set of interrupts.

Interrupt Vectors
In the basecase ARCtangent-A4 processor, there are three exceptions and each
exception has it's own vector position, an alternate interrupt unit may be
implemented, see section Alternate Interrupt Unit.

ARCtangent™-A4 Programmer’s Reference 23


Interrupt Enables

The ARCtangent-A4 processor does not implement interrupt vectors as such, but
rather a table of jumps. When an interrupt occurs the ARCtangent-A4 processor
jumps to fixed addresses in memory, which contain a jump instruction to the
interrupt handling code. The start of these interrupt vectors is dependent on the
particular ARCtangent-A4 system and is often a set of contiguous jump vectors.
Example vector offsets are shown in the following table. Two long-words are
reserved for each interrupt line to allow room for a jump instruction with a long
immediate address.
Vector Name Link register Byte Offset
0 reset - 0x00
1 memory exception ILINK2 0x08
2 instruction error ILINK2 0x10
Interrupts

3–n irq3-irqn lLINKm 0x18 – 0x08*n


4

Table 5 Interrupt Summary


It is possible to execute the code for servicing the last interrupt in the interrupt
vector table without using the jump mechanism. An example set of vectors
showing the last interrupt vector is shown in the following code.
;Start of exception vectors
reset: JAL res_service ;vector 0
mem_ex: JAL mem_service ;vector 1, ilink2
ins_err: JAL instr_service ;vector 2, ilink2
ivect3: JAL iservice3 ;vector 3, ilink1
ivect4: ;vector 4, interrupt, ilink1
;start of interrupt service code for
;ivect4

NOTE The implemented ARCtangent-A4 system may have extensions or


customizations in this area, please see associated documentation.

Interrupt Enables
The level 1 set and level 2 set of interrupts are maskable. The interrupt enable
bits E2 and E1 in the status register are used to enable level 2 set and level 1 set
of interrupts respectively. Interrupts are enabled or disabled with the flag
instruction.

24 ARCtangent™-A4 Programmer’s Reference


Returning from Interrupts

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Z N C V E2 E1 H R PC[25:2]

Figure 4 Status Register


Example:
.equ EI,6 ; constant to enable both interrupts
.equ EI1,2 ; constant to enable level 1 interrupt only
.equ EI2,4 ; constant to enable level 2 interrupt only
.equ DI,0 ; constant to disable both interrupts

FLAG EI ; enable interrupts and clear other flags

FLAG DI ; disable interrupts and clear other flags

Returning from Interrupts

Interrupts
4
When the interrupt routine is entered, the interrupt enable flags are cleared for the
current level and any lower priority level interrupts. Hence, when a level 2
interrupt occurs, both the interrupt enable bits in the status register are cleared at
the same time as the PC is loaded with the address of the appropriate interrupt
routine.
Returning from an interrupt is accomplished by jumping to the contents of the
appropriate link register, using the JAL [ILINKn] instruction. With the flag bit
enabled on the jump instruction, the flags are loaded into the status register along
with the PC, thus returning the flags to their state at point of interrupt. This
includes the interrupt enable bits E1 and E2, one or both of which will have been
cleared on entry to the interrupt routine.
There are 2 link registers ILINK1 (r29) and ILINK2 (r30) for use with the
maskable interrupts, memory exception and instruction error. These link registers
correspond to levels 1 and 2 and the interrupt enable bits E1 and E2 for the
maskable interrupts.
For example, if there was no interrupt service routine for interrupt number 5, the
arrangement of the vector table would be:
ivect4: JAL iservice4 ;vector 4
ivect5: JAL.F [r29] ;vector 5 (jump to ilink1)
NOP ;instruction padding
ivect6: JAL iservice6 ;vector 6

ARCtangent™-A4 Programmer’s Reference 25


Reset

Reset
A reset is an asynchronous, external reset signal that causes the ARCtangent-A4
processor to perform a “hard” reset. Upon reset, various internal states of the
ARCtangent-A4 processor are pre-set to their initial values. The pipeline is
flushed, interrupts are disabled; status register flags are cleared; the semaphore
register is cleared; loop count, loop start and loop end registers are cleared; the
scoreboard unit is cleared; pending load flag is cleared; and program execution
resumes at the interrupt vector base address (offset 0x00) which is the basecase
ARCtangent-A4 processor reset vector position. The core registers are not
initialized except loop count (which is cleared). A jump to the reset vector, a
“soft” reset, will not pre-set any of the internal states of the ARCtangent-A4
processor.
Interrupts
4

NOTE The implemented ARCtangent-A4 system may have extensions or


customizations in this area, please see associated documentation.

Memory Error
A memory error can be caused by an instruction fetch from, a load from or a
store to an invalid part of memory. In the basecase ARCtangent-A4 processor,
this exception is non-recoverable in that the instruction that caused the error
cannot be returned to.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Instruction Error
If an invalid instruction is fetched that the ARCtangent-A4 processor cannot
execute, then an instruction error is caused. In the basecase ARCtangent-A4
processor, this exception is non-recoverable in that the instruction that caused the
error cannot be returned to. The standard instruction field (I[4:0]) is used to
decode whether the instruction is valid. This means that a non-implemented
single-operand instruction will not generate an instruction error when executed.
The software interrupt instruction (SWI) will also generate an instruction error
exception when executed.

26 ARCtangent™-A4 Programmer’s Reference


Interrupt Times

Interrupt Times
Interrupts are held off for one cycle when an instruction has a dependency on the
following instruction or is waiting for immediate data from memory. This occurs
during a branch, jump or simply when an instruction uses long immediate data.
The time taken to service an interrupt is basically a jump to the appropriate
vector and then a jump to the routine pointed to by that vector. The timings of
interrupts according to the type of instruction in the pipeline is given later in this
documentation.
The time it takes to service an interrupt will also depend on the following:
• Whether a jump instruction is contained in the interrupt vector table

Interrupts
• Allowing stage 1 to stage 2 dependencies to complete

4
• Returning loads using write-back stage
• An I- Cache miss causing the I-Cache to reload in order to service the
interrupt
• The number of register push items onto a software stack at the start of the
interrupt service routine
• Whether an interrupt of the same or higher level is already being serviced
• An interruption by higher level interrupt

Alternate Interrupt Unit


It should be assumed that the ARCtangent-A4 processor adheres to the interrupt
mechanism according to this chapter. It is possible, however, that an alternate
interrupt unit may be provided on a particular system. The interrupt unit contains
the exception and interrupt vector positions, the logic to tell the ARCtangent-A4
processor which of the 3 levels of interrupt has occurred, and the arbitration
between the interrupts and exceptions.
The interrupt unit can be modified to alter the priority of interrupts, the vector
positions and the number of interrupts. The 3 levels of interrupt which are set
with the status register and the return mechanism through link registers can not
be altered. Further masking bits and extra link registers can be provided by the

ARCtangent™-A4 Programmer’s Reference 27


Alternate Interrupt Unit

use of extensions in the auxiliary and core register set. How this would be done is
entirely system dependent.
NOTE The implemented ARCtangent-A4 system may have extensions or
Interrupts customizations in this area, please see associated documentation.
4

28 ARCtangent™-A4 Programmer’s Reference


Chapter 5 — Instruction Set Summary

Introduction
This chapter contains an overview of the types of instructions in the ARCtangent-
A4 processor. The types of instruction in the ARCtangent-A4 processor are:
• Arithmetic and Logical ADD, AND, OR…etc.
• Single Operand FLAG, MOV, LSL...etc.
• Jump, Branch and Loop J,B, LP…etc.
• Load and Store LD, ST…etc.
• Control BRK, SLEEP, SWI…etc

NOTE The implemented ARCtangent-A4 system may have extensions or


customizations in this area, please see associated documentation.

For the operations of the instructions the notation shown in Table 4 is used.

Arithmetic and Logical Operations


These operations are of the form a ←b op c where the destination (a) is replaced
by the result of the operation (op) on the operand sources (b and c). The ordering
of the operands is important for some operations (e.g.: SUB, BIC) All arithmetic
and logical instructions can be conditional and/or set the flags. However,
instructions using the short immediate addressing mode can not be conditional.
Instruction Operation Description
ADD a←b+c add
ADC a←b+c+C add with carry

ARCtangent™-A4 Programmer’s Reference 29


Null Instruction

Instruction Operation Description


SUB a←b-c subtract
SBC a ← (b - c) - C subtract with carry
AND a ← b and c logical bitwise AND
OR a ← b or c logical bitwise OR
BIC a ← b and not c logical bitwise AND with invert
XOR a ← b exclusive-or c logical bitwise exclusive-OR

Table 6 Arithmetic and Logical Instructions


The syntax for arithmetic and logical instructions is:
op<.cc><.f> a,b,c
Examples:
AND r1,r2,r3 ; r1 replaced by r2 AND r3
AND.NZ r1,r2,r3 ; if zero flag not set then r1 is replaced by
Instruction Set

; r2 AND r3
Summary

AND.F r1,r2,r3 ; r1 is replaced by r2 AND r3 and appropriate


; flags set
5

AND.NZ.F r1,r2,r3 ; if zero flag not set then r1 is replaced by


; r2 AND r3 and the appropriate flags set

Null Instruction
Many instructions can be encoded in such a way that no operation is performed.
This is very useful if a NOP instruction is required. To encode a NOP, it is just a
matter of having short immediate data in all register fields and not setting flags.
For example, the encoding of NOP is actually equivalent to:
XOR 0x1FF,0x1FF,0x1FF

Single Operand Instructions


Some instructions require just a single operand. These include flag and rotate
instructions. These instructions are of the form a ← op b where the destination
(a) is replaced by the operation (op) on the operand source (b). Single operand
instructions can be conditional and/or set the flags. However, instructions using
the short immediate addressing mode can not be conditional.
The following table shows the move and extend functions.

30 ARCtangent™-A4 Programmer’s Reference


Single Operand Instructions

Instruction Operation Description


MOV b move source to register (included for
.... instruction set symmetry, basically it is an
a encoding of the AND instruction: AND
a,b,b)
SEX b sign extend, byte or word
.... ....

EXT b zero extend, byte or word


0.. ... ..0 ....

Table 7 Single Operand Instructions: Move and Extend

The following table shows rotates and shifts.


Instruction Operation Description

Instruction Set
ASR arithmetic shift right

Summary
b C

5
....

LSR b C logical shift right


....

0 a

ROR b C
rotate right
....

RRC rotate right through carry


b C
....

ASL C b arithmetic shift left (included for


.... instruction set symmetry, it's an
a 0 encoding of ADD instruction: ADD
a,b,b )
LSL C b logical shift left
....
(same as ASL)
a 0

ARCtangent™-A4 Programmer’s Reference 31


Single Operand Instructions

Instruction Operation Description


RLC rotate left through carry
C b
.... (included for instruction set
a
symmetry, it is basically an encoding
of the ADC instruction: ADC.F a,b,b)
ROL rotate left (included for instruction set
C b
....
symmetry, it takes 2 cycles to
execute: ADD.F a,b,b then ADC
a
a,a,0)
Table 8 Single Operand Instructions: Rotates and Shifts
The following table shows some special single operand instructions that affect
registers other than core registers.
Instruction Operation Description
FLAG b move low bits of source into flags in the
Instruction Set

.... STATUS register


Summary

flags
5

FLAG 1 Halt - set H bit in STATUS register. (decoded


1 at stage 3, halts the ARCtangent-A4,
processor leaving other bits unchanged)
H

BRK 1 Break - set H bit in STATUS register


(decoded at stage one, flushes the earlier
H instructions from the pipe and halts the
ARCtangent-A4 processor, other bits are
unchanged)
SLEEP 1 Sleep - set ZZ bit in DEBUG register
(decoded at stage two, flushes the earlier
ZZ instructions from the pipe, puts the processor
to sleep mode and stalls the ARCtangent-A4
processor, other bits are unchanged)
SWI assert Software Interrupt – generate an instruction
error exception (decoded at stage 2, execution
Instr. Error. jumps to the interrupt vector for instruction
error, ILINK2 is used as the link register)
Table 9 Single Operand Instructions: Flags and Halts

32 ARCtangent™-A4 Programmer’s Reference


Jump, Branch and Loop Operations

The syntax for single operand instructions is:


op<.cc><.f> a, b
Examples:
ROR r1,r2 ; rotate right r2 by one and put result in r1
FLAG r1 ; move low bits of r1 into flags register
ROR.NZ r1,r2 ; if zero flag not set then r1 is replaced by
; r2 rotated right by one
ROR.F r1,r2 ; r1 is replaced by r2 rotated right by one
; and appropriate flags set
ROR.NZ.F r1,r2 ; if zero flag not set then r1 is replaced by
; ROR r2 and the appropriate flags set
FLAG.NZ r1 ; if zero flag not set then update flags with
; r1

Jump, Branch and Loop Operations


Although most instructions can be conditional, additional program control is

Instruction Set
provided with jump (J, JL), branch (B, BL) and loop (LP) instructions.

Summary
Branch, loop and jump instructions use the same condition codes as instructions.

5
However, the condition code test for these jumps is carried out one stage earlier
in the pipeline than other instructions.
This means that if an instruction setting the flags is immediately followed by a
jump, then a single cycle stall will be incurred before executing the jump
instruction (Even if the jump is unconditional). In this case, performance can be
increased by inserting a useful non-flag setting instruction between the flag
setting instruction and the jump.
Instruction Operation (if cc true) Description
Jcc pc ← addr Jump
JLcc blink ← pc Jump and link
pc ← addr (ARCVER 0x06 and higher)
Bcc pc ← reladdr + pc Branch
BLcc blink ← pc branch and link
pc ← reladdr + pc
LPcc lp_end ← addr Set up Zero-overhead loop
lp_start ← pc
Table 10 Jump, Branch and Loop Instructions

ARCtangent™-A4 Programmer’s Reference 33


Jump, Branch and Loop Operations

Due to the pipeline in the ARCtangent-A4 processor, the jump instruction does
not take effect immediately, but after a one cycle delay. The execution of the
immediately following instruction after a jump, branch or loop can be controlled.
This instruction is said to be in the delay slot. The branch and link instruction
(BL) and the jump and link instruction (JL for the ARCtangent-A4 basecase
processor version 6 and higher) also save the whole of the status register to the
link register. This status register is taken either from the first instruction
following the branch (current PC) or the instruction after that (next PC)
according to the delay slot execution mode. The modes for specifying the
execution of the delay slot instruction are:
Mode Operation Link Register
ND No Delay slot instruction (default) Link to current
Only execute next instruction when not jumping PC
D Delay slot instruction Link to next PC
Always execute next instruction
Instruction Set

JD Jump Delay slot instruction Link to next PC


Summary

Only execute next instruction when jumping


5

Branch type instructions use 20-bit relative addressing. The syntax of the branch
type instruction is:
op<cc><.dd> reladdr
Examples:
LP end_of_loop ; set up loop registers
LPNZ end_of_loop ; if not zero set up loop regs
; otherwise jump to end_of_loop
BL subroutine1 ; Branch and link to subroutine1
; saving status reg to BLINK
BL.D subroutine1 ; and always execute next instruction
BNE nother_bit ; if zero flag not set then branch
; to nother_bit
BNE.JD label ; if zero flag not set then execute
; next instruction and branch to
; label else skip next instruction
The jump instruction uses 32-bit absolute addressing. To enable the correct flag
state when returning from interrupts the jump instruction also has a flag set field.
NOTE If the jump instruction is used with long immediate data, then the delay slot
execution mechanism does not apply, but should default to .JD for JLcc.

34 ARCtangent™-A4 Programmer’s Reference


Zero Overhead Loop Mechanism

For ease of programming, an alternative syntax is allowed when setting flags


with the jump instruction.
The syntax of the jump instruction is:
op<cc><.dd><.f> [addr]
or op<cc><.dd>.f [addr],flag_value
Examples:
JAL [r1] ; jump to address in register r1
JAL start ; jump to start
JNZ specbit ; If zero flag clear jump to specbit
JAL.ND.F [r29] ; return from interrupt and restore flags too
JAL.F end,64 ; jump always to end and set Z bit
JL subroutine; jump and link always to subroutine
; saving status reg to BLINK
JLAL sub1 ; jump and link always to sub1
; saving status reg to BLINK

Instruction Set
Zero Overhead Loop Mechanism

Summary

5
The ARCtangent-A4 processor has the ability to perform loops without any
delays being incurred by the count decrement or the end address comparison.
Zero delay loops are set up with the registers LP_START, LP_END and
LP_COUNT. LP_START and LP_END can be directly manipulated with the LR
and SR instructions and LP_COUNT can be manipulated in the same way as
registers in the core register set.
NOTE The LP_START, LP_END and LP_COUNT registers are only 24 bit registers,
with the top 8 bits reading as zeros. The maximum number of loop iterations is
16,777,216 (if LP_COUNT = 0 on entry). The special instruction LP is used to
set up the LP_START and LP_END in a single instruction.

The LP instruction is similar to the branch instruction. Loops can be


conditionally entered into. If the condition code test for the LP instruction returns
false, then a branch occurs to the address specified in the LP instruction. If the
condition code test is true, then the address of the next instruction is loaded into
LP_START register and the LP_END register is loaded by the address defined in
the LP instruction.

ARCtangent™-A4 Programmer’s Reference 35


Zero Overhead Loop Mechanism

NOTE The loop mechanism is always active and the registers used by the loop
mechanism are set up with the LP instruction. As LP_END is set to 0 upon
reset, it is not advisable to execute an instruction placed at the end of program
memory space (0xFFFFFC) as this will trigger the LP mechanism if no other
LP has been set up since reset . Also, caution is needed if code is copied or
overlaid into memory, that before executing the code that LP_END is initialized
to a safe value (i.e. 0) to prevent accidental LP triggering. Similar caution is
required if using any form of MMU or memory mapping.

When there is not a pipeline stall, an interrupt, a branch or a jump then the loop
mechanism comes into operation.
The operation of the loop mechanism is such that PC+1 is constantly compared
with the value LP_END. If the comparison is true, then LP_COUNT is tested. If
LP_COUNT is not equal to 1, then the PC is loaded with the contents of
LP_START, and LP_COUNT is decremented. If, however, LP_COUNT is 1,
then the PC is allowed increment normally and LP_COUNT is decremented. This
is illustrated in Figure 5.
Instruction Set
Summary
5

PC ! NEXT_PC

is LP_END No
= NEXT_PC?

Yes
decr LP_COUNT

No PC ! LP_START
is LP_COUNT = 1?

Yes

PC ! NEXT_PC

Figure 5 PC Update and Loop Detection Mechanism for Loops

36 ARCtangent™-A4 Programmer’s Reference


Zero Overhead Loop Mechanism

The use of zero delay loops is illustrated in the following code sample:
MOV LP_COUNT,2 ; do loop 2 times (flags not set)
LP loop_end ; set up loop mechanism to work
; between loop_in and loop_end
loop_in: LR r0,[r1] ; first instruction in loop
ADD r2,r2,r0 ; sum r0 with r2
BIC r1,r1,4 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

In order that the zero delay loop mechanism works as expected, there are certain
affects that the user should be aware of.

LP_COUNT must not be loaded directly from memory


In the current microarchitecture of the ARCtangent-A4 processor there is no
shortcut path to the LP_COUNT register. This path is used by returning loads, to
boost performance.
As a consequence of not having the shortcut available, the LP_COUNT register

Instruction Set
Summary
should not be used as the destination of a load instruction. Attempting to do so
may cause an incorrect value to be loaded into LP_COUNT.

5
The following is an example of code that may not function correctly:
LD LP_COUNT,[r0] ; caution!! LP_COUNT loaded from memory!

Instead, an intermediary register should be used, as follows:


LD r1,[r0] ; register loaded from memory
MOV LP_COUNT, r1 ; LP_COUNT loaded from register

This second example loads a value into a register (a process that does have a
shortcut path and which, therefore, will function correctly). The register value is
loaded into the LP_COUNT register, a process that does not require shortcutting
and which will function correctly.

Single instruction loops


Single instruction loops cannot be set up with the LP instruction. The LP
instruction can set up loops with 2 or more instructions in them. However, it is
possible to set up a single instruction loop with the use of the LR and SR
instructions. If a single instruction loop is attempted to be set up with the LP
instruction, as in the following example, then the instruction in the loop (OR)
will be executed once and then the code following the loop (ADD) will be
executed as normal. The LP_START and LP_END registers will be updated by

ARCtangent™-A4 Programmer’s Reference 37


Zero Overhead Loop Mechanism

the time the instruction after the attempted loop (ADD) is being fetched, which
is, however, too late for the loop mechanism.
LP loop_end ; this will execute only once
loop_in: OR r21,r22,r23 ; single instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop
If the user wishes to have single instruction loops, then code like that in the
following code example can be used. Notice, there has to be a delay to allow the
loop start and loop end registers to be updated with the SR instruction. The code
basically updates the registers in the loop mechanism that would normally be
updated by the LP instruction.
MOV LP_COUNT,5 ; no. of times to do loop
MOV r0,dooploop>>2 ; convert to long-word size
ADD r1,r0,1 ; add 1 to dooploop address
SR r0,[LP_START] ; set up loop start register
SR r1,[LP_END] ; set up loop end register
NOP ; allow time to update regs
NOP ; can move useful instrs. here
dooploop:OR r21,r22,r23 ; single instruction in loop
Instruction Set

ADD r19,r19,r20 ; first instruction after loop


Summary
5

Loop count register


The loop count register, unlike other core registers, has short cutting disabled
(See Chapter 10 — Pipeline and Timings). This means that there must be at least
2 instructions (actually 2 cycles) between an instruction writing LP_COUNT and
one reading LP_COUNT.
MOV LP_COUNT,r0 ; update loop count register
MOV r1,LP_COUNT ; old value of LP_COUNT
MOV r1,LP_COUNT ; old value of LP_COUNT
MOV r1,LP_COUNT ; new value of LP_COUNT

In order for the loop mechanism to work properly, the loop count register must
be set up with at least 3 instructions (actually 3 cycles) between it and the last
instruction in the loop. In the following example, the MOV instruction will
override the loop mechanism (which would decrement LP_COUNT) and the loop
will be executed one more time than expected. The MOV instruction must be
followed by a NOP for correct execution. The following code sample shows an
invalid count loop setup.
MOV LP_COUNT,r0 ; do loop r0 times (flags not set)
LP loop_end ; set up loop mechanism
loop_in: OR r21,r22,r23 ; first instruction in loop
AND 0,r21,23 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

38 ARCtangent™-A4 Programmer’s Reference


Zero Overhead Loop Mechanism

The following code sample shows a valid count loop setup


MOV LP_COUNT,r0 ; do loop r0 times (flags not set)
NOP ; allow time for loop count set up
LP loop_end ; set up loop mechanism
loop_in: OR r21,r22,r23 ; first instruction in loop
AND 0,r21,23 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

When reading from the loop count register (LP_COUNT) the user must be aware
that the value returned is that value of the counter that applies to the next
instruction to be executed. If the last instruction in a loop reads LP_COUNT,
then the value returned would be that value after the loop mechanism has updated
it. The following code example shows a Reading Loop Counter near Loop
Mechanism
MOV r0,LP_COUNT ; loop count for this iteration
MOV r0,LP_COUNT ; loop count for next iteration
loop_end:

Instruction Set
ADD r19,r19,r20 ; first instruction after loop

Summary
Branch and jumps in loops

5
Jumps or branches without linking will work correctly in any position in the loop.
There are, however, some side effects for delay slots and link registers when a
branch or jump is the last instruction in a loop:
Firstly, it is possible that the branch or jump instruction is contained in the very
last long-word position in the loop. This means that the instruction in the delay
slot (See Chapter 5 — Instruction Set Summary and Chapter 10 — Pipeline and
Timings) would be either the first instruction after the loop or the first instruction
in the loop (pointed to by loop start register) depending on the result of the loop
mechanism. The instruction in the delay slot will be that which would be
executed if the branch or jump was replaced by a NOP.
If a branch-and-link or jump-and-link instruction is used in the one before last
long-word position in a loop, then the return address stored in the link register
(BLINK) may contain the wrong value. The following instructions will store the
address of the first instruction after the loop, and therefore should not be used in
the second to last position:
BLcc.D address
BLcc.JD address
JLcc.D [Rn]

ARCtangent™-A4 Programmer’s Reference 39


Zero Overhead Loop Mechanism

JLcc.JD [Rn]
JLcc address
If the ND delay slot execution mode is used for branch-and-link or jump-and-link
instruction in the one before last long-word position in a loop, then the return
address is stored correctly in the link register.
The loop count does not decrement if the instruction fetched was subsequently
killed as the result of a branch/jump operation. For these reasons it is
recommended that subroutine calls should not be used within the loop
mechanism.
Instructions with long immediate data: correct coding
Instructions with long immediate date will work correctly with the zero overhead
loop mechanism as long as the LP instruction is used. Even if the instruction
containing the long immediate data is seen as the last instruction in the loop.
Here, we are setting up the loop with an instruction that uses long immediate
Instruction Set

data. The loop_end label points to the first instruction after the loop.
Summary

MOV LP_COUNT,r0 ; do loop r0 times (flags not set)


5

LP loop_end
loop_in:...
...
XOR r1,r2,r3
OR r21,r22,2048 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

Instructions with long immediate data: incorrect


coding
It is difficult, but nonetheless possible, that an instruction that uses long
immediate data could fall across the very last long-word position in the loop.
This means that the long immediate data would be either be taken from the first
location after the loop or the first location in the loop (pointed to by loop start
register) depending on the result of the loop mechanism. It is unlikely that this
would occur with sensible coding, but the following example shows how it could
be done. Here, we are setting up the loop mechanism by writing the loop registers
directly. The only register write shown here is the writing of LP_END.

40 ARCtangent™-A4 Programmer’s Reference


Zero Overhead Loop Mechanism

MOV r1,limmloop>>2 ; convert to long-word size


ADD r1,r1,1 ; add 1 to limmloop address
SR r1,[LP_END] ; set up loop end register
NOP ; allow time to update reg
...
...
NOP
limmloop: OR r21,r22,2048 ; instruction across loop end
ADD r19,r19,r20 ;
The OR instruction in the above example has a long immediate value of 2048
which crosses the loop end address. This is accomplished by getting the address
of the OR instruction and adding 1 to the address to force the LP_END register to
point to the position just after the OR instruction, but before the long immediate
value.

Valid instruction regions in loops


To summarise the effect that the loop mechanism has on these special cases see
Figure 6. As an example, if an instruction that reads LP_COUNT is in position

Instruction Set
insn (like MOV r1,LP_COUNT), then the value that the instruction reads will

Summary
be that value after the loop mechanism updated it.

5
For further details see Chapter 7 — Register Set Details.

ARCtangent™-A4 Programmer’s Reference 41


Zero Overhead Loop Mechanism

The Loop Loop Set Writing Reading Delay Slots Immediates


Up
LP LP_COUNT LP_COUNT Bcc or Jcc limm instr
loop_end
Loop_start:
Ins1 works update value before Works normally works
normally before loop loop normally
mechanism mechanism
Ins2 ... ... ... ... ...
Ins3 ... ... ... ... ...
... ... ... ... ... ...
... ... ... ... ... ...
Insn-4 ... update ... ... ...
before loop
mechanism
Insn-3 ... overwrite ... ... ...
loop
mechanism
Instruction Set

Insn-2 works update after ... Works normally ...


Summary

normally loop
mechanism
5

Insn-1 loop end update after value before wrong return works
not set up loop loop address may be normally
in time mechanism mechanism stored in BLINK
LP_COUNT
decrements
according to
delay slot mode
Insn loop end update after value after loop_count imm data =
not set up loop loop decrements ins1 or outins1
in time mechanism mechanism
delay slot = ins1
or outins1

Loop_end:
Outins1
Outins2

Figure 6 Valid Instruction Regions in Loops

42 ARCtangent™-A4 Programmer’s Reference


Breakpoint Instruction

Breakpoint Instruction
The breakpoint instruction is a single operand basecase instruction that halts the
program code when it is decoded at stage one of the pipeline. This is a very basic
debug instruction, which stops the ARCtangent-A4 processor from performing
any instructions beyond the breakpoint. The pipeline is also flushed upon decode
of this instruction. To restart the ARCtangent-A4 processor at the correct
instruction the old instruction is rewritten into main memory. It is immediately
followed by an invalidate instruction cache line command (if an instruction cache
has been implemented) to ensure that the correct instruction is loaded into the
cache before being executed by the ARCtangent-A4 processor. The program
counter must also be rewritten in order to generate a new instruction fetch, which
reloads the instruction. Most of the work is performed by the debugger with
regards to insertion, removal of instructions with the breakpoint instruction.
The program flow is not interrupted when employing the breakpoint instruction,
and there is no need for implementing a breakpoint service routine. There is also

Instruction Set
Summary
no limit to the number of breakpoints that can be inserted into a piece of code.

5
NOTE The breakpoint instruction sets the BH bit (refer to section Programmer’s
Model) in the Debug register when it is decoded at stage one of the pipeline.
This allows the debugger to determine what caused the ARCtangent-A4
processor to halt. The BH bit is cleared when the Halt bit in the Status register
is cleared, e.g. by restarting or single–stepping the ARCtangent-A4 processor.

A breakpoint instruction may be inserted into any position:


MOV r0, 0x04
ADD r1, r0, r0
XOR.F 0, r1, 0x8
BRK ;<----- break here
SUB r2, r0, 0x3
ADD.NZ r1, r0, r0
JZ.D [r8]
OR r5, r4, 0x10

The above code shows assembly code with BRK instruction


Breakpoints are primarily inserted into the code by the host so control is
maintained at all times by the host. The BRK instruction may however be used in
the same way as any other ARCtangent-A4 instruction.
The breakpoint instruction can be placed anywhere in a program. The breakpoint
instruction is decoded at stage one of the pipeline which consequently stalls stage

ARCtangent™-A4 Programmer’s Reference 43


Sleep Instruction

one, and allows instructions in stages two, three and four to continue, i.e. flushing
the pipeline.

BRK instruction in delay slot


Due to stage 2 to stage 1 dependencies, the breakpoint instruction behaves
differently when it is placed in the delay slots of Branch, and Jump instructions.
In these cases, the ARCtangent-A4 processor will stall stages one and two of the
pipeline while allowing instructions in subsequent stages (three and four) to
proceed to completion.
The following example shows BRK in a delay slot of a conditional jump
instruction.
MOV r0, 0x04
ADD r1, r0, r0
XOR.F 0, r1, 0x8
SUB r2, r0, 0x3
ADD.NZ r1, r0, r0
JZ.D [r8]
Instruction Set

BRK ;<---- caution break inserted


Summary

; into delay slot here


5

OR r5, r4, 0x10

The link register is not updated for Branch and Link, BL, (or Jump and Link, JL)
instruction when the BRK instruction is placed in the delay slot. When the
ARCtangent-A4 processor is started, the link register will update as normal.
Interrupts are treated in the same manner by the ARCtangent-A4 processor as
Branch, and Jump instructions when a BRK instruction is detected. Therefore, an
interrupt that reaches stage two of the pipeline when a BRK instruction is in stage
one will keep it in stage two, and flush the remaining stages of the pipeline. It is
also important to note that an interrupt that occurs in the same cycle as a
breakpoint is held off as the breakpoint is of a higher priority. An interrupt at
stage three is allowed to complete when a breakpoint instruction is in stage one.

Sleep Instruction
The sleep mode is entered when the ARCtangent-A4 processor encounters the
SLEEP instruction. It stays in sleep mode until an interrupt or restart occurs.
Power consumption is reduced during sleep mode since the pipeline ceases to
change state, and the RAMs are disabled. More power reduction is achieved
when clock gating option is used, whereby all non-essential clocks are switched
off.

44 ARCtangent™-A4 Programmer’s Reference


Sleep Instruction

The SLEEP instruction can be put anywhere in the code, as in the example
below:
SUB r2, r2, 0x1
ADD r1, r1, 0x2
SLEEP
...
The SLEEP instruction is a single operand instruction without flags or operands.
The SLEEP instruction is decoded in pipeline stage 2. If a SLEEP instruction is
detected, then the sleep mode flag (ZZ) is immediately set and the pipeline stage
1 is stalled. A flushing mechanism assures that all earlier instructions are
executed until the pipeline is empty. The SLEEP instruction itself leaves the
pipeline during the flushing.When in sleep mode, the sleep mode flag (ZZ) is set
and the pipeline is stalled, but not halted. The host interface operates as normal
allowing access to the DEBUG and the STATUS registers and it can halt the
processor. The host cannot clear the sleep mode flag, but it can wake the
ARCtangent-A4 processor by halting then restarting ARCtangent-A4 processor.
The program counter PC points to the next instruction in sequence after the sleep

Instruction Set
instruction.

Summary
The ARCtangent-A4 processor will wake from sleep mode on an interrupt or

5
when the ARCtangent-A4 is restarted. If an interrupt wakes up the ARCtangent-
A4 processor, the ZZ flag is cleared and the instruction in pipeline stage 1 is
killed. The interrupt routine is serviced and execution resumes at the instruction
in sequence after the SLEEP instruction. When the ARCtangent-A4 processor is
started after having been halted, the ZZ flag is cleared.

SLEEP instruction in delay slot


A SLEEP instruction can be put in a delay slot as in the following code example:
BAL.D after_sleep
SLEEP
...
after_sleep:
ADD r1,r1,0x2

In this example, the ARCtangent-A4 processor goes to sleep after the branch
instruction has been executed. When the ARCtangent-A4 processor is
sleeping,the PC points to the “add” instruction after the label
"after_sleep". When an interrupt occurs, the ARCtangent-A4 processor
wakes up, executes the interrupt service routine and continues with the “add”
instruction. If the delay slot is killed, as in the following code example, the
SLEEP instruction in the delay slot is never executed:

ARCtangent™-A4 Programmer’s Reference 45


Software Interrupt Instruction

BAL.ND after_sleep
SLEEP
...
after_sleep:
ADD r1,r1,0x2

SLEEP instruction in delay slot of Jump


The SLEEP instruction is normally used in RTOS type applications by using a
J.F with SLEEP in the delay slot. This allows the interrupts to be re-enabled at
the same time as SLEEP is entered, i.e an atomic operation.

SLEEP instruction in single step mode


The SLEEP instruction acts as a NOP during single step mode, because every
single-step is a restart and the ARCtangent-A4 processor wakes up at the next
single-step. Consequently, the SLEEP instruction behaves exactly like a NOP
propagating through the pipeline.
Instruction Set
Summary

Software Interrupt Instruction


5

The execution of an undefined extension instruction in ARCtangent-A4


processors raises an instruction error exception. A new basecase instruction is
introduced that also raises this exception. Once executed, the control flow is
transferred from the user program to the system instruction error exception
handler.

SWI instruction format


The SWI instruction is a single operand instruction in the same class as the
SLEEP and BREAK instructions and takes no operands or flags.

Load and Store Operations


The transfer of data to and from memory is accomplished with the load and store
commands (LD, ST). It is possible for these instructions to write the result of the
address computation back to the address source register. This is accomplished
with the address write-back suffix: .A

46 ARCtangent™-A4 Programmer’s Reference


Load and Store Operations

Loads are passed to the memory controller once the address has been calculated,
and the register which is the destination of the load is tagged to indicate that is
waiting for a result, as loads take a minimum of one cycle to complete. If an
instruction references the tagged register before the load has completed, the
pipeline will stall until the register has been loaded with the appropriate value.
For this reason it is not recommended that loads be immediately followed by
instructions which reference the register being loaded. Delayed loads from
memory will take a variable amount of time depending upon the presence of
cache and the type of memory which is available to the memory controller.
Consequently, the number of instructions to be executed in between the load and
the instruction using the register will be application specific.
Byte and word loads can be sign extended to 32-bits, or simply loaded into the
appropriate register with unused bits set to zero. This is accomplished with the
sign extend suffix: .X
Stores are passed to the memory controller, which will store the data to memory

Instruction Set
when it is possible to do so. The pipeline may be stalled if the memory controller

Summary
cannot accept any more buffered store requests. Note that if the offset is not

5
required during a store, the value encoded will be set to 0.
If a data-cache is available in the memory controller, the load and store
instructions can bypass the use of that cache. When the suffix .DI is used the
cache is bypassed and the data is loaded directly from or stored directly to the
memory. This is particularly useful for shared data structures in main memory,
for the use of memory-mapped I/O registers, or for bypassing the cache to stop
the cache being updated and overwriting valuable data that has already been
loaded in that cache.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Instruction Operation Description


LD a ← [b+c] load
ST [b + shimm] ← c store

Table 11 Load and Store Instructions


The syntax of the load instruction is:
op<zz><.x><.a><.di> a, [b, c]
Examples:

ARCtangent™-A4 Programmer’s Reference 47


Auxiliary Register Operations

LD r1,[r2,r3] ; r1 replaced by data at address


; r2+r3
LD r4,[r2,10] ; r4 replaced by data at address
; r2 plus offset 10
LD.A.DI r4,[r2,10] ; r4 replaced by data at address
; r2 plus offset 10 and write-back
; result of address calculation to
; r2, bypassing data cache
LDW.X r1,[r2,r3] ; r1 replaced by sign extended word
; from address at r2+r3

The syntax of the store instruction is:


op<zz><.a><.di> c,[b,offset]
Examples:
ST r1,[r2] ; data at address r2 replaced by r1
STB r1,[r2] ; bottom 8 bits of r1 put into
; address r2. Offset = 0
ST.A r1,[r2,14] ; with write-back
Instruction Set

Auxiliary Register Operations


Summary
5

The access to the auxiliary register set is accomplished with the special load
register and store register instructions (LR and SR). They work in a similar way
to the normal load and store instructions except that the access is accomplished in
a single cycle due to the fact that address computation is not carried out and the
scoreboard unit is not used. The LR and SR instruction do not cause stalls like
the normal load and store instructions but in the same cases that arithmetic and
logic instructions would cause a stall.
Access to the auxiliary registers are limited to 32 bit (long word) only and the
instructions are not conditional.
Instruction Operation Description
LR a ← aux. reg [b] load from auxiliary register
SR aux. reg.[b] ← c store to auxiliary register

Table 12 Auxiliary Register Operations


The syntax of the load from auxiliary register instruction is:
op a,[b]
Examples:

48 ARCtangent™-A4 Programmer’s Reference


Extension Instructions

LR r1,[r2] ; r1 replaced by data from auxiliary


; register pointed to by r2
LR r4,[10] ; r4 replaced by data from auxiliary
; register number 10

The syntax of the store to auxiliary register instruction is:


op c,[b]
Examples:
SR r1,[r2] ; data in auxiliary register pointed
; to by r2, replaced by data in r1
SR r1,[14] ; data in auxiliary register number
; 14 replaced by data in r1

Extension Instructions
These operations are of the form a ←b op c (or a ← op b for single operand
instructions) where the destination (a) is replaced by the result of the operation

Instruction Set
(op) on the operand sources (b and c). The ordering of the operands is important

Summary
for some operations (e.g.: SUB, BIC) All arithmetic and logical instructions can

5
be conditional and/or set the flags. However, instructions using the short
immediate addressing mode can not be conditional.
The syntax for extension instructions is:
op<.cc><.f> a,b,c
The syntax for extension single operand instructions is:
op<.cc><.f> a,b

Optional Extensions Library


The extensions library consists of a number of components that can be used to
add functionality to an ARCtangent-A4 processor. These components are
function units, which are interfaced to the ARCtangent-A4 processor through the
use of extension instructions or registers.
The library currently consists of the following components:
• 32-bit Multiplier, small (10 cycle) implementation
• 32-bit Multiplier, fast (4 cycle) implementation

ARCtangent™-A4 Programmer’s Reference 49


Optional Extensions Library

• 32-bit Barrel shift/rotate block (single cycle)


• 32-bit Barrel shift/rotate block (multi cycle)
• Normalise (find-first-bit) instruction
• Swap instruction
• MIN/MAX instructions

Multiply 32 X 32
Two versions of the scoreboarded 32x32 multiplier function are available, 'fast'
and 'small', taking four and ten cycles respectively. The full 64-bit result is
available to be read from the core register set. The middle 32 bits of the 64-bit
result are also available. The multiply is scoreboarded in such a way that if a
multiply is being carried out, and if one of the result registers is required by
another ARCtangent-A4 instruction, the processor stalls until the multiply has
finished.
Instruction Set
Summary

b c
5

MHI MLO

MMID

Figure 7 32x32 Multiply


The syntax of the multiply instruction is:
op<.cc> <0>,b,c
Example:
MUL64 r1,r2 ;multiply r1 by r2
Quick exit for conditional multiplies
If an instruction condition placed on a MUL64 or MULU64 is found to be false,
the multiply is not performed, and the instruction completes on the same cycle
without affecting the values stored in the multiply result registers.
Reading and pre-loading the 32X32 multiply results
The results are accessed via the read-only extension core registers MLO, MMID
and MHI. The extension auxiliary register MULHI is used to restore the multiply

50 ARCtangent™-A4 Programmer’s Reference


Optional Extensions Library

result register if the multiply has been used, for example, by an interrupt service
routine. See Multiply restore register.

Barrel shift/rotate block


This block provides a number of instructions that will allow any operand to be
shifted left or right by up to 32 positions in one cycle, the result being available
for write-back to any core register.
Instruction Operation Description
ASR b
arithmetic shift right, sign filled
....

LSR b logical shift right, zero filled


....

0 a

Instruction Set
Summary
ROR b
rotate right

5
....

ASL b arithmetic shift left, zero filled


....

a 0

Figure 8 Barrel Shift Operations


The syntax for the barrel shift operations is:
op<.cc><.ff> a,b,c
Example:
ASR r2,r2,6 ;arithmetic shift right r2 by 5 places

Normalize instruction
The NORM instruction gives the normalisation integer for the signed value in the
operand. The normalisation integer is the amount by which the operand should be
shifted left to normalise it as a 32-bit signed integer. To find the normalisation
integer of a 32-bit register by using software without a NORM instruction,
requires many ARCtangent-A4 instruction cycles.

ARCtangent™-A4 Programmer’s Reference 51


Optional Extensions Library

S b
#

Figure 9 Norm Instruction


Uses for the NORM instruction include:
• Acceleration of single bit shift division code, by providing a fast 'early out'
option.
• Reciprocal and multiplication instead of division
• Reciprocal square root and multiplication instead of square root
The syntax for the normalize instruction is:
Instruction Set

op<.cc><.f> a,b
Summary

Example:
5

NORM r1,r2 ; find normalization integer for r2


; and write into r1

SWAP instruction
The swap instruction is a very simple extension, intended for use with the
multiply-accumulate block. It exchanges the upper and lower 16-bit of the source
value, and stores the result in a register. This is useful to prepare values for
multiplication, since the multiply-accumulate block takes its 16-bit source values
from the upper 16 bits of the 32-bit values presented.
b

Figure 10 SWAP Instruction


The syntax for the swap instruction is:
op<.cc><.f> a,b
Example:
SWAP r1,r2 ; swap upper and lower 16 bits of r2
; and write into r1

52 ARCtangent™-A4 Programmer’s Reference


Optional Extensions Library

MIN/MAX instructions
These instructions are useful in applications where sorting takes place. Two
signed 32-bit words are compared, and either the larger or smaller of the two is
returned, depending on which instruction is being used.
The syntax for the min/mas instructions is:
op<.cc><.f> a,b,c
Example:
MIN r1,r2,r3 ; write minimum of r2 and r3 into r1

Instruction Set
Summary

ARCtangent™-A4 Programmer’s Reference 53


Chapter 6 — Condition Codes

Introduction
The ARCtangent-A4 processor has an extensive instruction set most of which
can be carried out conditionally and/or set the flags. Those instructions using
short immediate data can not have a condition code test.
Branch, loop and jump instructions use the same condition codes as instructions.
However, the condition code test for these jumps is carried out one stage earlier
in the pipeline than other instructions. Therefore, a single cycle stall will occur if
a jump is immediately preceded by an instruction that sets the flags.

Condition Code Register


The condition code register is part of the status register.
The status register (STATUS), shown in Figure 11, contains the condition codes:
zero (Z), negative (N), carry (C) and overflow (V); the interrupt mask bits
(E[2:1]); the halt bit (H); and the program counter (PC[25:2]).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Z N C V E1 E2 H R PC[25:2]

Figure 11 Status Register

Condition Code Register Notation


In the instruction set details in the next chapter the following notation is used:
Condition codes:
Z N C V
where:

ARCtangent™-A4 Programmer’s Reference 55


Condition Code Test

Z (zero) Set if the result equals zero. Otherwise bit cleared


N (negative) Set if most significant bit of result is set. Else cleared
C (carry) Set if a carry is generated after the result of an arithmetic
operation. Otherwise bit cleared.
V (overflow) Set if there was an overflow generated from an arithmetic
operation. Otherwise bit cleared.
The convention used in the next chapter for the effect of an operation on the
condition codes is:
* Set according to the result of the operation
. Not affected ? Bit undefined after the operation
0 Bit cleared 1 Bit set

Condition Code Test


Table 13 Condition Codes shows condition names and the conditions they test.
Condition Codes

Mnemonic Condition Test Code


AL, RA Always 1 0x00
6

EQ , Z Zero Z 0x01
NE , NZ Non-Zero /Z 0x02
PL , P Positive /N 0x03
MI , N Negative N 0x04
CS , C, LO Carry set, lower than C 0x05
(unsigned)
CC , NC, HS Carry clear, higher or same /C 0x06
(unsigned)
VS , V Over-flow set V 0x07
VC , NV Over-flow clear /V 0x08
GT Greater than (signed) (N and V and /Z) or 0x09
(/N and /V and /Z)
GE Greater than or equal to (N and V) or (/N 0x0A
(signed) and /V)

56 ARCtangent™-A4 Programmer’s Reference


Condition Code Test

Mnemonic Condition Test Code


LT Less than (signed) (N and /V) or (/N 0x0B
and V)
LE Less than or equal to (signed) Z or (N and /V) or 0x0C
(/N and V)
HI Higher than (unsigned) /C and /Z 0x0D
LS Lower than or same C or Z 0x0E
(unsigned)
PNZ Positive non-zero /N and /Z 0x0F
Table 13 Condition Codes

NOTE PNZ does not have an inverse condition.

The remaining 16 condition codes (10-1F) are available for extension and are
used to:
• provide additional tests on the internal condition flags or
• test extension status flags from external sources or
• test a combination external and internal flags

Condition Codes
If an extension condition code is used that is not implemented, then the condition
code test will always return false (i.e. the opposite of AL - always).

6
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

ARCtangent™-A4 Programmer’s Reference 57


Chapter 7 — Register Set Details
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

r0

r1

↓ Basecase Core Registers

r27

r28

ILINK1 r29 level 1 interrupt link register

ILINK2 r30 level 2 interrupt link register

BLINK r31 branch link register

r32

↓ Extension Core Registers

r59
LP_ r60 LP_COUNT[23:0]
COUNT

r61 short immediate - set flags

r62 long immediate

r63 short immediate - no set flags

Figure 12 Core Register Map

ARCtangent™-A4 Programmer’s Reference 59


Core Register Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

STATUS 0x00 Z N C V E1E2 H R PC[25:2]

SEMAPHORE 0x01 S3S2 S1 S0

LP_START 0x02 LP_START[25:2]

LP_END 0x03 LP_END[25:2]

IDENTITY 0x04 MANCODE[7:0] MANVER[7:0] ARCNUM[7:0] ARCVER[7:0]

DEBUG 0x05 LD SHBH ZZ IS FHSS

0x06 Extension Auxiliary Registers

0x5F

Reserved 0x60 Reserved for Build Configuration Registers

Reserved ↓

Reserved 0x7F

0x80 Extension Auxiliary Registers


Register Set Details

0xFFFFFFFF

Figure 13 Auxiliary Register Set


7

Core Register Set


The core register set in the ARCtangent-A4 processor is shown in Figure 12 and
Table 14. Other predefined registers are in the auxiliary register set and they are
shown in Figure 13 and Table 16.
Number Core register name Function
0-28 r0-r28 General Purpose Registers
29 ILINK1 or r29 Maskable interrupt link register
30 ILINK2 or r30 Maskable interrupt link register

60 ARCtangent™-A4 Programmer’s Reference


Core Register Set

Number Core register name Function


31 BLINK or r31 Branch link register
32-59 r32-r59 Register space reserved for extensions
60 LP_COUNT Loop count register (24 Bits)
61 - Short immediate data indicator setting
flags
62 - Long immediate data indicator
63 - Short immediate data indicator not setting
flags
Table 14 Core Register Map
The general purpose registers (r0-r28) can be used for any purpose by the
programmer.

Link registers
The link registers (ILINK1, ILINK2, BLINK) are used to provide links back to
the position where an interrupt or branch occurred. They can also be used as
general purpose registers, but if interrupts or branch-and-link or jump-and-link
are used, then these are reserved for that purpose.
In the basecase ARCtangent-A4 processor prior to version 7, the branch-and-link
and jump-and-link instructions write to the BLink register in a way that bypasses
the LD scoreboard mechanism. Basecase ARCtangent-A4 processor version 7

Register Set Details


remedies this problem by enabling additional scoreboarding on the link registers.

7
Loop count register
The loop count register (LP_COUNT) is used for zero delay loops. Because
LP_COUNT is decremented if the program counter equals the loop end address
and also LP_COUNT does not have next cycle bypass like the other core
registers, it is not recommended that LP_COUNT be used as a general purpose
register, see later in this documentation for details. Note that LP_COUNT is only
24 bits wide.

Immediate data indicators


Register positions 63 to 61 are reserved for encoding immediate data addressing
modes onto instruction words. They are reserved for that purpose and are not
available to the programmer as general purpose registers.

ARCtangent™-A4 Programmer’s Reference 61


Auxiliary Register Set

Extension core registers


The register set is extendible in register positions 32-59 (r32-r59). Results of
accessing the extension register region are undefined in the basecase version of
the ARCtangent-A4 processor. If a core register is read that is not implemented,
then an unknown value is returned. No exception is generated. Writes to non
implemented core registers are ignored.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Multiply result registers


Table 15 shows the defined extension core registers for the optional multiply.
Register Name Use
r57 MLO Multiply low 32 bits, read only
r58 MMID Multiply middle 32 bits, read only
r59 MHI Multiply high 32 bits, read only
Table 15 Multiply Result Registers

Auxiliary Register Set


The auxiliary register set contains special status and control registers. Auxiliary
Register Set Details

registers occupy a special address space that is accessed using special load and
store instructions, or other special commands. The basecase ARCtangent-A4
7

processor uses 6 status and control registers, and reserves the additional registers
0x60 to 0x7F, leaving the rest of the 232 registers for extension purposes. If an
auxiliary register is read that is not implemented, then the IDENTITY register
contents is returned. No exception is generated. Writes to non implemented
auxiliary registers are ignored.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Number Auxiliary register name Description


0x0 STATUS Status register
0x1 SEMAPHORE Inter-process/Host semaphore register

62 ARCtangent™-A4 Programmer’s Reference


Auxiliary Register Set

0x2 LP_START Loop start address (24 bits)


0x3 LP_END Loop end address (24 bits)
0x4 IDENTITY ARCtangent-A4 Identification register
0x5 DEBUG Debug register
0x60 - RESERVED Build Configuration Registers
0x7F
Table 16 Auxiliary Register Set

Status register
The STATUS register contains the PC, the condition flags and interrupt mask
bits. LP_START and LP_END are the other registers used by the zero delay loop
mechanism. The SEMAPHORE register is used to control inter-process
communication. IDENTITY is used by the host and ARCtangent-A4 processor to
determine the version number of the processorr and other implementation
specific information. DEBUG is used by the host to test and control the
ARCtangent-A4 processor during debug situations.
31 30 29 28 27 26 25 24 23 0

Z N C V E2 E1 H R PC[25:2]

Figure 14 Status Register


The status register (STATUS), shown in Figure 14, contains the condition codes:

Register Set Details


zero (Z), negative (N), carry (C) and overflow (V); the interrupt mask bits
(E[2:1]); the halt bit (H); and the program counter (PC[25:2]). When STATUS is
read with a LR instruction, it will return the address of the instruction following

7
the LR and the current condition flags. STATUS cannot be written with SR
instruction. The FLAG and Jcc instructions are used to affect the status register.

Semaphore register
31 4 3 2 1 0

Reserved S[3:0]

Figure 15 Semaphore Register


The SEMAPHORE register, Figure 15, is used to control inter-process or
ARCtangent-A4 processor to host communication. The basecase ARCtangent-A4
processor has at least 4 semaphore bits (S[3:0]). The remaining bits of the

ARCtangent™-A4 Programmer’s Reference 63


Auxiliary Register Set

semaphore register are reserved for future versions of the ARCtangent-A4


processor.
Each semaphore bit is independent of the others and is claimed using a set-and-
test protocol. The semaphore register can be read at any time by the host or
ARCtangent-A4 processor to see which semaphores it currently owns.
To claim a semaphore bit
Write ‘1’ to the semaphore bit.
Read back the semaphore bit. Then:
If returned value is ‘1’ then semaphore has been obtained.
If returned value is ‘0’ then the host has the bit.
To release a semaphore bit.
Write a ‘0’ to the semaphore bit.
Mutual exclusion is provided between the ARCtangent-A4 processor and the
host. In other words, if the host claims a particular semaphore bit, the
ARCtangent-A4 processor will not be able to claim that same semaphore bit until
the host has released it. Conversely, if the ARCtangent-A4 processor claims a
particular semaphore bit, the host will not be able to claim that same semaphore
bit until the ARCtangent-A4 processor has released it.
The semaphore bits are cleared to 0 after a reset, which is the state where neither
Register Set Details

the ARCtangent-A4 processor nor the host have claimed any semaphore bits.
When claiming a semaphore bit (i.e. setting the semaphore bit to a ‘1’), care
should be taken not to clear the remaining semaphore bits. This could be
7

accomplished by keeping a local copy, or reading the semaphore register, and


ORing that value with the bit to be claimed before writing back to the semaphore
register.
Example:
.equ SEMBIT0,1 ; constant to indicate semaphore bit 0
.equ SEMBIT1,2 ; constant to indicate semaphore bit 1
.equ SEMBIT2,4 ; constant to indicate semaphore bit 2
.equ SEMBIT3,8 ; constant to indicate semaphore bit 3
LR r2,[SEMAPHORE] ; r2 <= semaphore pattern already attained
OR r2,r2,SEMBIT1 ; r2 <= semaphore pattern attained and wanted
SR r2,[SEMAPHORE] ; attempt to get the semaphore bit
LR r2,[SEMAPHORE] ; read back semaphore register
AND.F 0,r2,SEMBIT1 ; test for the semaphore bit being set
; EQ means semaphore not attained
; NE means semaphore attained

64 ARCtangent™-A4 Programmer’s Reference


Auxiliary Register Set

NOTE Replacing the statement OR r2,r2,SEMBIT1 with BIC r2,r2,SEMBIT1


will release the semaphore, leaving any previously attained semaphores in
their attained state.

Loop control registers


31 24 23 0

Reserved LPSTART[25:2]

Figure 16 Loop Start Register


31 24 23 0

Reserved LPEND[25:2]

Figure 17 Loop End Register


The loop start (LP_START) and loop end (LP_END) registers contain the
addresses for the zero delay loop mechanism. Figure 16 and Figure 17 show the
format of these registers. The loop start and loop end registers can be set up with
the special loop instruction (LP) or can be manipulated with the auxiliary register
access instructions (LR and SR).

Identity register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

MANCODE[7:0] MANVER[7:0] ARCNUM[7:0] ARCVER[7:0]

Register Set Details


Figure 18 Identity Register
Figure 18 shows the identity register (IDENTITY). It contains the manufacturer
code (MANCODE[7:0]), manufacturer version number (MANVER[7:0]), the

7
additional identity number (ARCNUM[7:0]) and the ARCtangent-A4 basecase
processor version number (ARCVER[7:0]).

Debug register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

LD SH BH ZZ IS Reserved FH SS

Figure 19 Debug Register


The debug register (DEBUG) contains seven bits: load pending bit (LD); self halt
(SH); breakpoint halt (BH); sleep mode (ZZ): single instruction step (IS); force
halt (FH) and single step (SS).

ARCtangent™-A4 Programmer’s Reference 65


Auxiliary Register Set

LD can be read at any time by either the host or the ARCtangent-A4 processor
and indicates that there is a delayed load waiting to complete. The host should
wait for this bit to clear before changing the state of the ARCtangent-A4
processor.
SH indicates that the ARCtangent-A4 processor has halted itself with the FLAG
instruction, this bit is cleared whenever the H bit in the STATUS register is
cleared (i.e. The ARCtangent-A4 processor is running or a single step has been
executed).
Breakpoint Instruction Halt (BH) bit is set when a breakpoint instruction has
been detected in the instruction stream at stage one of the pipeline. A breakpoint
halt is set when BH is ‘1’. This bit is cleared when the H bit in the status register
is cleared, e.g. single stepping or restarting the ARCtangent-A4 processor. BH is
only available for ARCtangent-A4 basecase processor version 7 or higher.
ZZ bit indicates that the ARCtangent-A4 processor is in "sleep" mode. The
ARCtangent-A4 processor enters sleep mode following a SLEEP instruction. ZZ
is cleared whenever the processor "wakes" from sleep mode. ZZ is only available
for ARCtangent-A4 basecase processor version 7 or higher.
The force halt bit (FH) is only available for the ARCtangent-A4 basecase
processor version (ARCVER in IDENTITY register) of 5 or higher. FH is a
foolproof method of stopping the processor externally by the host. The host
setting this bit does not have any side effects when the ARCtangent-A4 processor
is halted already. FH is not a mirror of the STATUS register H bit:- clearing FH
Register Set Details

will not start the ARCtangent-A4 processor. FH always returns 0 when it is read.
See also Halting.
7

Single stepping is provided through the use of IS and SS. Single instruction step
(IS) is used in combination with SS. When IS and SS are both set by the host the
ARCtangent-A4 processor will execute one full instruction. IS is only available
for ARCtangent-A4 basecase processor version 7 or higher.
SS is a write only bit that when set by the host will cause the ARCtangent-A4
processor to single cycle step. The single cycle step function enables the
processor for one cycle. It should be noted that this does not necessarily
correspond to one instruction being executed, since stall conditions may be
present. In order to execute a single instruction, the remote system must
repeatedly single-step the ARCtangent-A4 processor until the values change in
either the program counter or loop count register, or use SS in combination with
IS.

66 ARCtangent™-A4 Programmer’s Reference


Auxiliary Register Set

Extension auxiliary registers


The auxiliary register set is extendible up to the full 232 register space. Results of
accessing the extension auxiliary register region are undefined in the basecase
version of the ARCtangent-A4 processor. If an auxiliary register is read that is
not implemented, then an unknown value is returned. No exception is generated.
Writes to non implemented auxiliary registers are ignored.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Optional extensions auxiliary registers


The following table summarizes the auxiliary registers that are used by the
optional extensions:
Number Name r/w Description
0x12 MULHI w High part of multiply to restore multiply
state
0x7B MULTIPLY_BUILD r Build configuration for: multiply
0x7C SWAP_BUILD r Build configuration for: swap
0x7D NORM_BUILD r Build configuration for: normalize
0x7E MINMAX_BUILD r Build configuration for: min/max
0x7F BARREL_BUILD r Build configuration for: barrel shift

Register Set Details


Multiply restore register
31 0

7
MUL[63:32]

Figure 20 Multiply Restore Register


The extension auxiliary register MULHI is used to restore the multiply result
register if the multiply has been used, for example, by an interrupt service
routine.
NOTE No interlock is provided to stall writes when a multiply is taking place. For this
reason, the user must ensure that the multiply has completed before writing
the MULHI register. This is performed by reading one of the scoreboarded
multiplier result registers.

The lower part of the multiply result register can be restored by multiplying the
desired value by 1.

ARCtangent™-A4 Programmer’s Reference 67


Auxiliary Register Set

Example
To read the upper and lower parts of the multiply results
MOV r1,mlo ;put lower result in r1
MOV r2,mhi ;put upper result in r2
To restore the multiply results
MULU64 r1,1 ;restore lower result
MOV 0,mlo ;wait until multiply complete. N.B causes
;processor to stall,until multiplication is
;finished
SR r2,[mulhi] ;restore upper result
Register Set Details
7

68 ARCtangent™-A4 Programmer’s Reference


Chapter 8 — Instruction Set Details

Introduction
This chapter contains the detailed information about all the instructions available
in the basecase version of the ARCtangent-A4 processor. They are arranged in
alphabetical order.

Instruction Map
There are 32 different instruction codes, only the first 16 of which are used for
the basecase ARCtangent-A4 processor according to the following table:
Basecase instruction set
Code Instruction and/or Type Notes
0x00 LD register + register Delayed load (core registers only)
0x01 LD register + offset, LR Delayed load or load from aux. register
0x02 ST register + offset, SR Buffered store or store to aux. register
0x03 single operand FLAG, single shifts, sign extend
instructions
0x04 Bcc Branch conditionally
0x05 BLcc Branch and link conditionally
0x06 LPcc Loop set up or jump conditionally
0x07 Jcc, JLcc Jump (and link) conditionally
0x08 ADD Addition
0x09 ADC Addition with carry
0x0A SUB Subtract
0x0B SBC Subtract with carry
0x0C AND Logical bitwise AND
0x0D OR Logical bitwise OR

ARCtangent™-A4 Programmer’s Reference 69


Instruction Map

Basecase instruction set


Code Instruction and/or Type Notes
0x0E BIC Logical bitwise AND with invert
0x0F XOR Logical bitwise exclusive-OR
0x10 ASL Optional multiple arithmetic shift left
0x11 LSR Optional multiple logical shift right
0x12 ASR Optional multiple arithmetic shift right
0x13 ROR Optional multiple rotate right
0x14 MUL64 Optional 32 X 32 signed multiply
0x15 MULU64 Optional 32 X 32 unsigned multiply
0x1E MAX Optional maximum of two signed integers
0x1F MIN Optional minimum of two signed integers

NOTE The implemented ARCtangent-A4 system may have extensions or


customizations in this area, please see associated documentation.

Specially Encoded Instructions


Code Instruction Notes
01,[10] LR Load from Aux. Register
02,[10] SR Store to Aux. Register
03,[00] FLAG Set the flags
03,[3F,0] BRK Break Point
03,[3F,1] SLEEP Sleep
Instruction Set Details

03,[3F,2] SWI Software Interrupt


03,[01] ASR Arithmetic Shift Right by one
03,[02] LSR Logical Shift Right by one
8

03,[03] ROR Rotate Right by one


03,[04] RRC Rotate Right through Carry by one
03,[05] SEXB Sign Extend byte to long word
03,[06] SEXW Sign Extend word to long word
03,[07] EXTB Zero Extend byte to long word
03,[08] EXTW Zero Extend word to long word
03,[9] SWAP Optional swap words

70 ARCtangent™-A4 Programmer’s Reference


Addressing Modes

03,[A] NORM Optional normalise Integer


Table 17 Basecase Instruction Map

Addressing Modes
The addressing mode of the instruction are encoded on the instruction. There are
basically only 3 addressing modes: register-register, register-immediate and
immediate-immediate. However, as a consequence of the action performed by the
different instruction groups, these can be expanded as shown in Table 3 Data
Addressing Modes. The operating modes use the key in Table 4.

Dual operand instructions


If we take the ADD instruction as an example:
ADD r1,r2,r3 ; register register
ADD.NZ r1,r2,r3 ; conditional
ADD.F r1,r2,r3 ; setting flags
ADD.NZ.F r1,r2,r3 ; conditional and conditionally set flags
ADD r1,r2,34 ; register immediate
ADD r1,34,r2 ; immediate register
ADD r1,255,255 ; immediate immediate (shimms MUST match)
ADD.F 0,r1,r2 ; test
ADD.F 0,r1,34 ; test with immediate
ADD.F 0,34,r1 ; test with immediate
ADD 0,0,0 ; null instruction, NOP

The complete operating modes for ADD are:


ADD with result ADD without result
ADD<.cc><.f> a,b,c ADD<.cc><.f> 0,b,c

Instruction Set Details


ADD<.f> a,b,shimm ADD<.f> 0,b,shimm
ADD<.f> a,shimm,c ADD<.f> 0,shimm,c

8
ADD<.f> a,shimm,shimm ADD<.f> 0,shimm,shimm
ADD<.cc><.f> a,b,limm ADD<.cc><.f> 0,b,limm
ADD<.cc><.f> a,limm,c ADD<.cc><.f> 0,limm,c
ADD 0,shimm,shimm ;nop

ARCtangent™-A4 Programmer’s Reference 71


Addressing Modes

Single operand instructions


Taking ROR as a single operand example:
ROR r1,r2 ; register
ROR.NZ r1,r2 ; conditional
ROR.F r1,r2 ; setting flags
ROR.NZ.F r1,r2 ; conditional and conditionally set flags
ROR r1,22 ; immediate
ROR.F 0,r2 ; test

The complete operating modes for ROR are


ROR with result ROR without result
ROR<.cc><.f> a,b ROR<.cc><.f> 0,b,c
ROR<.f> a,shimm ROR<.f> 0,shimm
ROR<.cc><.f> a,limm ROR<.cc><.f> 0,limm
ROR 0,shimm ;nop

Branch type Instructions


Branch instructions:
B pos ; relative branch to pos
BNE pos ; conditional branch
B.D pos ; branch and execute next instruction
BNE.D pos ; conditional and always execute next

The operating mode for Bcc is:


Bcc
B<cc><.dd> rel_addr
Instruction Set Details

Jump Instruction
Example
8

JAL [r1] ; jump to address in register r1


JAL 2000 ; jump to address 2000
JNZ [r1] ; conditional jump
JNZ 2000 ; conditional jump
JAL.D [r1] ; jump and execute next instruction
JNZ.D [r1] ; conditional jump, always execute next
JNZ.JD [r1] ; conditional jump, execute next only
; if jump taken
JAL.F [r1] ; jump to address and update flags
JNZ.D.F [r1] ; conditional, execute next, update flags
JNZ.F 2000,64 ; conditional jump to 2000 and set the Z flag

72 ARCtangent™-A4 Programmer’s Reference


Addressing Modes

The operating modes for J are:


Jcc
J<cc>.<dd><.f> [b]
J<cc>.<dd><.f> limm

Load Instruction
Example
LD r1,[r2,r3]
; r1 replaced with data at r2+r3
LD r1,[r2,20]
; r1 replaced with data at r2+20
LDB r1,[r2,r3]
; load byte from r2+r3
LD.A r4,[r2,10]
; r4 replaced by data at address
; r2 plus offset 10 and writeback
; address calculation to r2
LDW.X r1,[r2,r3] ; r1 replaced by sign extended word
; from address at r2+r3
LDW.X.A r1,[r2,r3] ; word, sign extended with writeback
; from address at r2+r3
LD r1,[900] ; load from address 900

The operating modes for LD are:


LD
LD<zz><.x><.a><.di> a,[b,c]
LD<zz><.x><.a><.di> a,[b,shimm]
LD<zz><.x>.<di> a,[imm]

Store instruction
Example:
ST r1,[r2] ; data at address r2 replaced by r1

Instruction Set Details


ST r1,[r2,14] ; store with offset
STB r1,[r2] ; store bottom 8 bits of r1 to
; address r2
ST.A r1,[r2,14] ; with writeback r2+14 to r2

8
STW.A r1,[r2,2] ; store bottom 16 bits of r1 to
; address r2+2 and writeback
; r2+2 to r2
ST r1,[900] ; store r1 to address 900
STB 0,[r2] ; store byte 0 to address r2
ST -8,[r2,-8] ; store -8 to address r2-8
STW 80,[750] ; store word 80 to address 750
ST 12345678,[r2+8] ; store 12345678 to address r2+8

ARCtangent™-A4 Programmer’s Reference 73


Addressing Modes

The operating modes for ST are:


ST
ST<zz><.a><.di> c,[b]
ST<zz><.a><.di> c,[b,shimm]
ST<zz><.di> c,[imm]
ST<zz><.a><.di> 0,[b]
ST<zz><.a><.di> shimm,[b,shimm] (shimms
MUST match)
ST<zz><.di> shimm,[limm]
ST<zz><.a><.di> limm,[b,shimm]

Load from auxiliary register instruction


Example:
LR r1,[r2] ; r1 replaced with data from aux reg pointed
; to by r2
LR r1,[20] ; r1 replaced with data from aux reg 20

The operating modes for LR are:


LR
LR a,[b]
LR a,[imm]

Store to auxiliary register instruction


Instruction Set Details

Example:
SR r1,[r2] ; data in aux reg pointed to by r2
; replaced by data in r1
8

SR r1,[14] ; data in aux reg 14 replaced by


; data in r1

The operating modes for SR are:


SR
SR c,[b]
SR c,[imm]

74 ARCtangent™-A4 Programmer’s Reference


Instruction Encoding

Instruction Encoding
The instructions are encoded according to the type of instruction.
The general encoding outlines are shown below. Some fields have additional
encoding on them and are covered in detail for each instruction.
Those instructions that test the condition codes use the encoding shown in the
following table.
Mnemonic Condition Code
AL, RA Always 0x00
EQ , Z Zero 0x01
NE , NZ Non-Zero 0x02
PL , P Positive 0x03
MI , N Negative 0x04
CS , C, LO Carry set, lower than (unsigned) 0x05
CC , NC, HS Carry clear, higher or same (unsigned) 0x06
VS , V Over-flow set 0x07
VC , NV Over-flow clear 0x08
GT Greater than (signed) 0x09
GE Greater than or equal to (signed) 0x0A
LT Less than (signed) 0x0B
LE Less than or equal to (signed) 0x0C
HI Higher than (unsigned) 0x0D

Instruction Set Details


LS Lower or same (unsigned) 0x0E
PNZ Positive non-zero 0x0F
Condition codes 0x10 to 0x1F are reserved for extension purposes.

8
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

Immediate data is indicated on the instruction according to the following table:


Short immediate setting flags 61/0x3D
Short immediate not setting flags 63/0x3F
Long immediate 62/0x3E

ARCtangent™-A4 Programmer’s Reference 75


Instruction Encoding

The immediate data indicator is encoded on the B or C register address field


according to the ordering of operands or the type of instruction.
The result of an operation is discarded if an immediate data indicator is encoded
in the destination field. If a long immediate data indicator is encoded in the
destination field, only then, long immediate data is NOT fetched. However, if one
of the source register fields, B or C, contains a long immediate data indicator as
well as the destination field then long immediate data IS fetched as normal.

Register
This is the general form used for register-register and register-long-immediate
addressing.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

I[4:0] A[5:0] B[5:0] C[5:0] F R N N Q Q Q Q Q

I[4:0] Instruction opcode A[5:0] Destination register address


B[5:0] Operand 1 address C[5:0] Operand 2 address
F Flags set field N Jump/Call nullify instruction mode
Q Condition code test R Reserved should be set to 0.
field

To encode AND r1,r2,r3


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
Instruction Set Details

The flag field is clear and there is no condition test.


To encode AND r1,r2,0x13A with long immediate data.
8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

The flag field is clear and there is no condition test.


The special register number 62 is used to indicate that long immediate data is
used for the second operand field. The instruction word is then followed by the
extra word [0000 0000 0000 0000 0000 0001 0011 1010] which is 0x13A.

76 ARCtangent™-A4 Programmer’s Reference


Instruction Encoding

Short immediate
This is the form used for register with short immediate. Note that the short
immediate data is always sign extended to 32 bits before use.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

I[4:0] A[5:0] B[5:0] C[5:0] D[8:0]

I[4:0] Instruction opcode A[5:0] Destination register address


B[5:0] Operand 1 address C[5:0] Operand 2 address
D[8:0] Short immediate data
To encode AND r1,r2,0x03A with short immediate data.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 1 1 0 1 0

The code in field C[5:0] (instruction bits[14:9]) is 63 for short immediate data
without setting flags. If the instruction needed to set flags, then code 61 would be
used.
The result of the operation is discarded if the short immediate code is included in
the destination field A[5:0].

Single operand
Single operand instructions use the same format as “register” and “short
immediate” encoding styles, except that the I-field contains 0x03 and the C-field
is used to encode the particular single operand instruction code.

Instruction Set Details


Branch
This form is used for the branch type instructions.

8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

I[4:0] L[21:0] N N Q Q Q Q Q

I[4:0] Instruction opcode L[21:2] Relative address


Q Condition code test field N Jump/Call nullify instruction mode

ARCtangent™-A4 Programmer’s Reference 77


Instruction Encoding

The Jump/Call nullify instruction modes are shown below.


Mnemonic Operation Code
ND No Delayed instruction slot. Default mode. 0
Only execute next instruction when not jumping
D Delayed instruction slot 1
Always execute next instruction
JD Jump Delayed instruction slot 2
Only execute next instruction when jumping
- Reserved 3
The branch target address is calculated by adding the offset within the instruction
to the address of the instruction fetched after the branch. Hence if the relative
address was 0 then the target of the branch would be the instruction immediately
following the branch. (i.e. the instruction in the delay slot.)
Care should be taken when the instruction in the delay slot is not the immediately
following instruction address, in which case the relative address could be
encoded incorrectly. With bad coding, this can occur if the branch is the very last
instruction in a zero overhead loop or if the branch is itself is executed in the
delay slot of another branch, jump or loop.
To encode BRA 8000, jumping 8000 bytes (2000 long words) forward.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0

The condition code for "branch always" instruction is 0. The instruction


following this branch would not be executed since 0 is contained in the call/jump
nullify field.
Instruction Set Details
8

78 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Instruction Set Details


The instructions are arranged in alphabetical order. The instruction name is given
at the top left and top right of the page, along with a brief instruction description
and the instruction opcode.
The following terms are used in the description of each instruction.
Operation: The operation of the instruction using the following key:
dest: The target register
operand1: The first source operand
operand2: The second source operand
flags: The flag bits in the status register
PC: The program counter
rel_addr: Relative branch address
Syntax: The syntax of the instruction and supported constructs using the key in
Table 4.
Example: A simple coding example
Description: Full description of the instruction
Status Flags: The status flags that are affected
Instruction format: Layout of the instruction encoding
Instruction fields: Description of the fields used in the instruction format

Instruction Set Details


8

ARCtangent™-A4 Programmer’s Reference 79


Instruction Set Details

Addition with Carry


ADC ADC
Arithmetic Operation

ADC
Operation:
dest ← operand1 + operand2 + carry
Syntax:
with result without result
ADC<.cc><.f> a,b,c ADC<.cc><.f> 0,b,c
ADC<.f> a,b,shimm ADC<.f> 0,b,shimm
ADC<.f> a,shimm,c ADC<.f> 0,shimm,c
ADC<.f> a,shimm,shimm ADC<.f> 0,shimm,shimm
ADC<.cc><.f> a,b,limm ADC<.cc><.f> 0,b,limm
ADC<.cc><.f> a,limm,c ADC<.cc><.f> 0,limm,c
Example:
ADC r1,r2,r3
Description:
Add operand1 to operand2 and carry, and place the result in the destination
register.
Status flags:
Z N C V
* * * *
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Set if an overflow is generated
Instruction format:
Instruction Set Details

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 1 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

OR
8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 1 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1

80 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Addition
ADD ADD
Arithmetic Operation
ADD
Operation:
dest ← operand1 + operand2
Syntax:
with result Without result
ADD<.cc><.f> a,b,c ADD<.cc><.f> 0,b,c
ADD<.f> a,b,shimm ADD<.f> 0,b,shimm
ADD<.f> a,shimm,c ADD<.f> 0,shimm,c
ADD<.f> a,shimm,shimm ADD<.f> 0,shimm,shimm
(shimms MUST match)
ADD<.cc><.f> a,b,limm ADD<.cc><.f> 0,b,limm
ADD<.cc><.f> a,limm,c ADD<.cc><.f> 0,limm,c

Example:
ADD r1,r2,r3
Description:
Add operand1 to operand2 and place the result in the destination register.
Status flags:
Z N C V
* * * *
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Set if an overflow is generated
Instruction format:

Instruction Set Details


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 0 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

OR

8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 81


Instruction Set Details

Logical Bitwise AND


AND AND
Logical Operation
AND
Operation:
dest ← operand1 AND operand2
Syntax:
with result without result
AND<.cc><.f> a,b,c AND<.cc><.f> 0,b,c
AND<.f> a,b,shimm AND<.f> 0,b,shimm
AND<.f> a,shimm,c AND<.f> 0,shimm,c
AND<.f> a,shimm,shimm AND<.f> 0,shimm,shimm
(shimms MUST match)
AND<.cc><.f> a,b,limm AND<.cc><.f> 0,b,limm
AND<.cc><.f> a,limm,c AND<.cc><.f> 0,limm,c
Example:
AND r1,r2,r3
Description:
Logical bitwise AND of operand1 with operand2 and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
Instruction Set Details

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

OR
8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1

82 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Arithmetic Shift Left


ASL/LSL ASL/LSL
Logical Operation
ASL/LSL
Operation:
dest ← arithmetic shift left by one of operand
C b
....

a 0

Syntax:
with result without result
ASL<.cc><.f> a,b ASL<.cc><.f> 0,b
ASL<.f> a,shimm ASL<.f> 0,shimm
ASL<.cc><.f> a,limm ASL<.cc><.f> 0,limm
Example:
ASL r1,r2
Description:
Arithmetically shift operand left by one place and place the result in the
destination register. When interpreting as an arithmetic shift, the overflow flag
will be set if the sign bit changes after the shift. When interpreting as a logical
shift, the overflow flag can be ignored. ASL is included for instruction set
symmetry. It is basically the ADD instruction. (ADD a,b,b etc)
Status flags:
Z N C V
* * * *

Instruction Set Details


Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Set if the sign bit changes after a shift
Instruction format:

8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 A[5:0] B[5:0] B[5:0] F Res. Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 0 A[5:0] B[5:0] B[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address - in both fields
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 83


Instruction Set Details

Multiple Arithmetic shift left


ASL ASL
Extension Option
multiple multiple
ASL
Operation:
dest ← arithmetic shift left of operand1 by operand2
b
....

a 0

Syntax:
with result without result
ASL<.cc><.f> a,b,c ASL<.cc><.f> 0,b,c
ASL<.cc><.f> a,b,limm ASL<.cc><.f> 0,b,limm
ASL<.f> a,b,shimm ASL<.f> 0,b,shimm
ASL<.cc><.f> a,limm,c ASL<.cc><.f> 0,limm,c
ASL<.f> a,shimm,c ASL<.f> 0,shimm,c
Example:
ASL r1,r2,r3
Description:
Arithmetically, shift left operand1 by operand2 places and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details

C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8

1 0 0 0 0 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field

84 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Arithmetic Shift Right


ASR ASR
Logical Operation
ASR
Operation:
dest ← arithmetic shift right by one of operand
b C
....

Syntax:
with result Without result
ASR<.cc><.f> a,b ASR<.cc><.f> 0,b
ASR<.f> a,shimm ASR<.f> 0,shimm
ASR<.cc><.f> a,limm ASR<.cc><.f> 0,limm
Example:
ASR r1,r2
Description:
Arithmetically shift operand right by one place and place the result in the
destination register. The sign of the operand is retained after the shift.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction format:

Instruction Set Details


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 0 0 0 1 F Res. Q Q Q Q Q

OR

8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 0 0 0 1 D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 85


Instruction Set Details

ASR Arithmetic shift right ASR


Extension Option
multiple multiple
ASR multiple
Operation:
dest ← arithmetic shift right of operand1 by operand2
b
....

Syntax:
with result without result
ASR<.cc><.f> a,b,c ASR<.cc><.f> 0,b,c
ASR<.cc><.f> a,b,limm ASR<.cc><.f> 0,b,limm
ASR<.f> a,b,shimm ASR<.f> 0,b,shimm
ASR<.cc><.f> a,limm,c ASR<.cc><.f> 0,limm,c
ASR<.f> a,shimm,c ASR<.f> 0,shimm,c
Example:
ASR r1,r2,r3
Description:
Arithmetically, shift right operand1 by operand2 places and place the result in the
destination register. The destination is sign filled.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details

C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8

1 0 0 1 0 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 0 1 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field

86 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Logical bitwise AND with invert


BIC BIC
Logical Operation
BIC
Operation:
dest ← operand1 AND NOT operand2
Syntax:
with result without result
BIC<.cc><.f> a,b,c BIC<.cc><.f> 0,b,c
BIC<.f> a,b,shimm BIC<.f> 0,b,shimm
BIC<.f> a,shimm,c BIC<.f> 0,shimm,c
BIC<.cc><.f> a,b,limm BIC<.cc><.f> 0,b,limm
BIC<.cc><.f> a,limm,c BIC<.cc><.f> 0,limm,c
Example:
BIC r1,r2,r3
Description:
Logical bitwise AND of operand1 with the inverse of operand2 and place the
result in the destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 1 0 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

Instruction Set Details


OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

8
0 1 1 1 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 87


Instruction Set Details

Branch Conditionally
Bcc Bcc
Branch Operation
Bcc
Operation:
If condition true then PC ← PC + rel_addr
Syntax:
B<cc><.dd> rel_addr
Example:
BNE.ND new_code
Description:
If the specified condition is met then program execution is resumed at location
PC + relative displacement (rel_addr), where PC is the address of the instruction
in the delay slot . The displacement is a 20 bit signed long word offset. The
instruction following the branch is executed according to the nullify instruction
mode shown in the following table:
ND Only execute next instruction when not jumping (Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111

NOTE Condition codes 10000 to 11111 are reserved for extensions.

Status flags:
Instruction Set Details

Not affected.
Instruction format:
8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 0 0 L[21:2] N N Q Q Q Q Q

Instruction fields:
L[21:2] Relative address long word displacement
N Nullify instruction mode
Q Condition code field

88 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Branch and Link Conditionally


BLcc BLcc
Branch Operation
BLcc
Operation:
If condition true then PC ← PC + rel_addr.
Return address and flags are written to link register (BLINK)
Syntax:
BL<cc><.dd> rel_addr
Example:
BLNE.ND new_code
Description:
If the specified condition is met, then program execution is resumed at location
PC + relative displacement (rel_addr), where PC is the address of the instruction
in the delay slot. The displacement is a 20 bit signed long word offset. The
instruction following the branch is executed according to the nullify instruction
mode according to the following table:
ND Only execute next instruction when not jumping (Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The return address is stored in the link register BLINK. This address is the whole
of the status register and is taken either from the first instruction following the
branch (current PC) or the instruction after that (next PC) according to the delay
slot execution mode.
The flags stored are those set by the instruction immediately preceding the
branch.

Instruction Set Details


Return from the subroutine is accomplished with the jump instruction Jcc.
The condition codes that can be used in the condition code field are:

8
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111

NOTE Condition codes 10000 to 11111 are reserved for extensions.

Status flags:
Not affected.

ARCtangent™-A4 Programmer’s Reference 89


Instruction Set Details

Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 0 1 L[21:2] N N Q Q Q Q Q

Instruction fields:
L[21:2] Relative address long word displacement
N Nullify instruction mode
Q Condition code field
Instruction Set Details
8

90 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Breakpoint
BRK BRK
Debug Operation
BRK
Operation:
Halt and flush the ARCtangent-A4 processor
Syntax:
BRK
Example:
BRK
Description:
The breakpoint instruction can be placed anywhere in a program. The breakpoint
instruction is decoded at stage one of the pipeline which consequently stalls stage
one, and allows instructions in stages two, three and four to continue, i.e. flushing
the pipeline.
Due to stage 2 to stage 1 dependencies, the breakpoint instruction behaves
differently when it is placed in the delay slots of Branch, and Jump instructions.
In these cases, the processor will stall stages one and two of the pipeline while
allowing instructions in subsequent stages (three and four) to proceed to
completion.
Interrupts are treated in the same manner by the processor as Branch, and Jump
instructions when a BRK instruction is detected. Therefore, an interrupt that
reaches stage two of the pipeline when a BRK instruction is in stage one will
keep it in stage two, and flush the remaining stages of the pipeline. It is also
important to note that an interrupt that occurs in the same cycle as a breakpoint is

Instruction Set Details


held off as the breakpoint is of a higher priority. An interrupt at stage three is
allowed to complete when a breakpoint instruction is in stage one.
Status flags:

8
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

ARCtangent™-A4 Programmer’s Reference 91


Instruction Set Details

Zero Extend
EXT EXT
Arithmetic Operation
EXT
Operation:
dest ← operand zero extended from byte or word
Syntax:
with result without result
EXT<zz><.cc><.f> a,b EXT<zz><.cc><.f> 0,b
EXT<zz><.f> a,shimm EXT<zz><.f> 0,shimm
EXT<zz><.cc><.f> a,limm EXT<zz><.cc><.f> 0,limm
Example:
EXTW r1,r2
Description:
Zero extend operand to most significant bit in long word from byte or word
according to size field <zz> and place the result in the destination register. Valid
values for <zz> are:
W zero extend from word
B zero extend from byte
Status flags:
Z N C V
* 0 . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
Instruction Set Details

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] H[5:0] F Res. Q Q Q Q Q

OR
8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] H[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
H[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1

92 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Set Flags
FLAG FLAG
Control Operation
FLAG
Operation:
flags ← low bits of operand
b
....

flags

Syntax:
FLAG<.cc> b
FLAG shimm
FLAG<.cc> limm
Example:
FLAG r2
Description:
Move the low bits of the operand into the flags register.
Z, N, C, V are replaced by bits [6:3] respectively. The interrupt enables are
replaced by bits 2 and 1. The H bit is the processor halt bit and should be set to
halt the ARCtangent-A4 processor.
If the H bit is set then the other flag bits are unchanged.
For proper operation, the set flags field should be set to “not set flags”, i.e. bit 8
should be clear, or r63 used for the short-immediate indicator.
Status Flags:

Instruction Set Details


Z N C V E2 E1 H
* * * * * * *
Z Set according to bit 6 of operand N Set according to bit 5 of operand

8
C Set according to bit 4 of operand V Set according to bit 3 of operand
E2 Set according to bit 2 of operand E1 Set according to bit 1 of operand
H Set according to bit 0 of operand
Instruction format:
The destination field must contain an immediate operand indicator.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 1 1 1 1 0 1 B[5:0] 0 0 0 0 0 0 0 Res. Q Q Q Q Q

ARCtangent™-A4 Programmer’s Reference 93


Instruction Set Details

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 1 1 1 1 0 1 B[5:0] 0 0 0 0 0 0 D[8:0]

Instruction fields:
B[5:0] Operand address. D[8:0] Immediate data field
Q Condition code field R Reserved. Should be set to 0.
Instruction Set Details
8

94 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Jump Conditionally
Jcc Jcc
Jump Operation
Jcc
Operation:
If condition true then PC ← operand1

Syntax:
J<cc><.dd><.f> [b]
J<cc><.JD><.f> addr ; limm = addr (top 7 bits of addr will update the
flags if flag field is set)
J<cc><.JD>.f addr, flags ; limm contains both flags (define values as per
FLAG) and addr
Example:
JNZ.ND [r1]
Description:
If the specified condition is met, then program execution is resumed at location
contained in operand 1. If the flag field is set, then operand 1 replaces the whole
of the status register (except the halt bit), otherwise if the flag field is clear then
only the PC is replaced (the alternative syntax for updating flags is supplied for
ease of programming). The operand value updates the status register according
to:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Z N C V E2 E1 H R PC[25:2]

Instruction Set Details


If operand 1 is an explicit address (long immediate data), then for this instruction
the nullify instruction mode is ignored. Otherwise if operand 1 is a register, the
instruction following the jump is executed according to the nullify instruction

8
mode:
ND Only execute next instruction when not jumping(Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111

ARCtangent™-A4 Programmer’s Reference 95


Instruction Set Details

NOTE Condition codes 10000 to 11111 are reserved for extensions.

Status flags:
Are changed if flag field is set.
Z N C V E2 E1 H
* * * * * * .

Z Set according to bit 31 of operand E2 Set according to bit 27 of operand


N Set according to bit 30 of operand E1 Set according to bit 26 of operand
C Set according to bit 29 of operand H Unchanged
V Set according to bit 28 of operand
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 1 1 0 0 0 0 0 0 B[5:0] 0 0 0 0 0 0 F R N N Q Q Q Q Q

Instruction fields:
B[5:0] Operand address. F Set fags if set to 1
Q Condition code field R Reserved. Should be set to 0.
N Nullify instruction mode
Instruction Set Details
8

96 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Jump and Link Conditionally


JLcc JLcc
Jump Operation
JLcc
Operation:
If condition true then PC ← operand1.
Return address and flags are written to link register (BLINK).

Syntax:
JL<cc><.dd><.f> [b]
JL<cc><.JD><.f> Addr ; limm = addr (top 7 bits of addr will update the
flags if flag field is set)
JL<cc><.JD>.f addr, flags ; limm contains both flags (define values as per
FLAG) and addr
Example:
JLNZ.ND [r1]
Description:
NOTE This instruction is only available for ARCtangent-A4 Basecase processor
version 6 and higher.

If the specified condition is met, then program execution is resumed at location


contained in operand 1. If the flag field is set then operand 1 replaces the whole
of the status register (except the halt bit), otherwise if the flag field is clear, then
only the PC is replaced (the alternative syntax for updating flags is supplied for
ease of programming). The operand value updates the status register according to

Instruction Set Details


the definition:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Z N C V E2 E1 H R PC[25:2]

8
If operand 1 is an explicit address (long immediate data), then for this instruction
the .JD nullify instruction mode must be used. If .D or .ND is used, then the link
register BLINK will contain the incorrect return address, whereupon, the
ARCtangent-A4 processor will attempt to execute the long immediate data on
return from the subroutine. When operand 1 is a register, however, the instruction
following the jump is executed according to the nullify instruction mode:
ND Only execute next instruction when not jumping (Default for [b], disallowed for 00
addr)

ARCtangent™-A4 Programmer’s Reference 97


Instruction Set Details

D Always execute next instruction (Disallowed for addr) 01


JD Only execute next instruction when jumping (Default for addr) 10
The return address is stored in the link register BLINK. This address is the whole
of the status register and is taken either from the first instruction following the
jump (current PC) or the instruction after that (next PC) according to the delay
slot execution mode. The flags stored are those set by the instruction immediately
preceding the jump. Return from the subroutine is accomplished with the jump
instruction Jcc.
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111

NOTE Condition codes 10000 to 11111 are reserved for extensions.

Status flags:
Are changed if flag field is set.
Z N C V E2 E1 H
* * * * * * .

Z Set according to bit 31 of operand E2 Set according to bit 27 of operand


N Set according to bit 30 of operand E1 Set according to bit 26 of operand
C Set according to bit 29 of operand H Unchanged
V Set according to bit 28 of operand
Instruction format:
JLcc is encoded as Jcc, except bit 9 is set to ‘1’
Instruction Set Details

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 1 1 0 0 0 0 0 0 B[5:0] 0 0 0 0 0 1 F R N N Q Q Q Q Q
8

Instruction fields:
B[5:0] Operand address. F Set fags if set to 1
Q Condition code field R Reserved. Should be set to 0.
N Nullify instruction mode

98 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Delayed load from memory


LD LD
Memory Operation
LD
Operation:
dest ← contents of address [operand 1 + operand 2]
Syntax:
LD<zz><.x><.a><.di> a,[b]
LD<zz><.x><.a><.di> a,[b,c]
LD<zz><.x><.a><.di> a,[b,shimm]
LD<zz><.x><.a><.di> a,[b,limm]
LD<zz><.x><.di> a,[limm,c]
LD<zz><.x><.di> a,[shimm,shimm]
(shimms MUST match)
LD<zz><.x><.di> a,[limm]
Example:
LD r1,[r2,r3]
Description:
Add operand1 with operand2, get the data from the calculated address and place
it in the destination register. The data size of the load is set according to the size
field <zz>. The following table shows the sizes available:
- no field in syntax Long word 00
W Word 10
B Byte 01
When data is loaded, if the size is not a long word the most significant bit of the
data can be sign extended to the most significant bit of the long word, with the
.X suffix. The result of the address computation can be written back to the first

Instruction Set Details


register operand in the address field. This write back occurs when the address
write back field, .A, is set. If a data-cache is available in the memory controller
the load instruction can bypass the use of that cache when the direct from

8
memory field, .DI, is set.
Note that the destination of a load should not be an immediate data indicator. The
operation of the load/store unit may be degraded if this occurs.
When the target of a LD.A instruction is the same register as the one used for
address write-back (.A), the returning load will overwrite the value from the
address write-back.

ARCtangent™-A4 Programmer’s Reference 99


Instruction Set Details

LD effectively uses 2 instruction positions. One opcode for short immediate form
and another opcode for the general form.
NOTE When a memory controller is employed:
Load bytes can be made to any byte alignments
Load words should be made from word aligned addresses and
Load longs should be made only from long aligned addresses.

Status flags:
Not affected.
Instruction format
Load using generic opcode form:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 0 0 A[5:0] B[5:0] C[5:0] R Di R A Z Z X

OR
Load with short immediate opcode form
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 0 1 A[5:0] B[5:0] Di 0 A Z Z X D[8:0]

Instruction fields:
A[5:0] Destination register address. Di Direct to memory (cache bypass) enable
B[5:0] Operand 1 address A Address write-back enable
C[5:0] Operand 2 address Z Size field
D[8:0] Immediate data offset X Sign extend field
R Reserved. Should be set to 0
Instruction Set Details
8

100 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Loop Set Up
LPcc LPcc
Branch Operation
LPcc
Operation:
If condition false then PC ← PC + rel_addr.
If condition true then LP_END ← PC + rel_addr and LP_START ← next PC.
Syntax:
LP<cc><.dd> rel_addr
Example:
LPNE.ND end_loop1
Description:
If the specified condition is not met, then program execution is resumed at
location PC + relative displacement (rel_addr), where PC is the address of the
instruction in the delay slot. The displacement is a 20 bit signed long word offset.
If the condition is met, then the zero overhead loop registers are set up. The
instruction following the loop set up is executed according to the nullify
instruction mode according to the following table:
ND Only execute next instruction when not jumping(Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111

Instruction Set Details


NOTE Condition codes 10000 to 11111 are reserved for extensions.

Status flags:

8
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 1 0 L[21:2] N N Q Q Q Q Q

Instruction fields:
L[21:2] Relative address long word displacement
N Nullify instruction mode
Q Condition code field

ARCtangent™-A4 Programmer’s Reference 101


Instruction Set Details

Load from auxiliary register


LR LR
Control Operation
LR
Operation:
dest ← contents of auxiliary register number [operand 1]
Syntax:
LR a,[b]
LR a,[shimm]
LR a,[limm]
Example:
LR r1,[4]
Description:
Get the data from the auxiliary register whose number is obtained from operand 1
and place it in the destination register.
Status flags:
Not affected
Instruction format:
This is an encoding on the LD instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 0 1 A[5:0] B[5:0] R 1 R D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
D[8:0] Immediate data field R Reserved. Should be set to 0.
Instruction Set Details
8

102 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Logical Shift Left


LSL LSL
Logical Operation
LSL
Operation:
dest ← logical shift left by one of operand
C b
....

a 0

See ASL.

Instruction Set Details


8

ARCtangent™-A4 Programmer’s Reference 103


Instruction Set Details

Logical Shift Right


LSR LSR
Logical Operation
LSR
Operation:
dest ← logical shift right by one of operand
b C
....

0 a

Syntax:
with result without result
LSR<.cc><.f> a,b LSR<.cc><.f> 0,b
LSR<.f> a,shimm LSR<.f> 0,shimm
LSR<.cc><.f> a,limm LSR<.cc><.f> 0,limm
Example:
LSR r1,r2
Description:
Logically shift operand right by one place and place the result in the destination
register.
The most significant bit of the result is replaced with 0.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction Set Details

Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 0 0 1 0 F Res. Q Q Q Q Q


8

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 0 0 1 0 D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1

104 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

LSR Logical shift right LSR


Extension Option
multiple mutliple
LSR multiple
Operation:
dest ← logical shift right of operand1 by operand2
b
....

0 a

Syntax:
with result without result
ASR<.cc><.f> a,b,c ASR<.cc><.f> 0,b,c
ASR<.cc><.f> a,b,limm ASR<.cc><.f> 0,b,limm
ASR<.f> a,b,shimm ASR<.f> 0,b,shimm
ASR<.cc><.f> a,limm,c ASR<.cc><.f> 0,limm,c
ASR<.f> a,shimm,c ASR<.f> 0,shimm,c
Example:
LSR r1,r2,r3
Description:
Logical shift right operand1 by operand2 places and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set

Instruction Set Details


C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

8
1 0 0 0 1 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 0 0 1 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field

ARCtangent™-A4 Programmer’s Reference 105


Instruction Set Details

MAX Return Maximum MAX


Extension Option
MAX
Operation:
dest ← MAX (operand1, operand2)
Syntax:
with result without result
MAX<.cc><.f> a,b,c MAX<.cc><.f> 0,b,c
MAX<.f> a,b,shimm MAX<.f> 0,b,shimm
MAX<.f> a,shimm,c MAX<.f> 0,shimm,c
MAX<.cc><.f> a,b,limm MAX<.cc><.f> 0,b,limm
MAX<.cc><.f> a,limm,c MAX<.cc><.f> 0,limm,c
Example:
MAX r1,r2,r3
Description:
Return the maximum of the two operands and place the result in the destination
register. Note, both of the compared numbers are signed.
Status flags:
Z N C V
* * * *
Z Set if both source operands are equal (equivalent to a SUB instruction)
N Set as the MSB of subtraction result (equivalent to a SUB instruction)
C Set if the second source operand is selected (src2 >=src1)
V Set if the subtraction overflows (equivalent to a SUB instruction)
Instruction Set Details

Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 0 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q


8

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field

106 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

MIN Return minimum value MIN


Extension Option
MIN
Operation:
dest ← MIN (operand1, operand2)
Syntax:
with result without result
MIN<.cc><.f> a,b,c MIN<.cc><.f> 0,b,c
MIN<.f> a,b,shimm MIN<.f> 0,b,shimm
MIN<.f> a,shimm,c MIN<.f> 0,shimm,c
MIN<.cc><.f> a,b,limm MIN<.cc><.f> 0,b,limm
MIN<.cc><.f> a,limm,c MIN<.cc><.f> 0,limm,c
Example:
MIN r1,r2,r3
Description:
Return the minimum of the two operands and place the result in the destination
register. Note, both of the compared numbers are signed.
Condition codes:
Z N C V
* * * *
Z Set if both source operands are equal (equivalent to a SUB instruction)
N Set as the MSB of subtraction result (equivalent to a SUB instruction)
C Set if the second source operand is selected (src2 <=src1)
V Set if the substraction overflows (equivalent to a SUB instruction)

Instruction Set Details


Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 1 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q

8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 1 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field

ARCtangent™-A4 Programmer’s Reference 107


Instruction Set Details

Move contents
MOV MOV
Arithmetic Operation
MOV
Operation:
dest ← operand
Syntax:
with result without result
MOV<.cc><.f> a,b MOV<.cc><.f> 0,b
MOV<.f> a,shimm MOV<.f> 0,shimm
MOV<.cc><.f> a,limm MOV<.cc><.f> 0,limm
Example:
MOV r1,r2
Description:
The contents of the operand are moved to the destination register
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
MOV is included for instruction set symmetry. It is basically the AND
instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 0 A[5:0] B[5:0] B[5:0] F Res. Q Q Q Q Q

OR
Instruction Set Details

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 0 A[5:0] B[5:0] B[5:0] D[8:0]


8

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1

108 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

MUL64 32 x 32 Multiply MUL64


Extension Option
MUL64
Operation:
MLO ← low part of (operand 1 X operand 2)
MHI ← high part of (operand 1 X operand 2)
MMID ← middle part of (operand 1 X operand 2)
b c

MHI MLO

MMID

Syntax:
MUL64<.cc> <0,>b,c
MUL64 <0,>b,shimm
MUL64 <0,>shimm,c
MUL64<.cc> <0,>b,limm
MUL64<.cc> <0,>limm,c
Example:
MUL64 r2,r3
Description:
Perform a signed 32-bit by 32-bit multiply of operand1 and operand2 then place
the most significant 32 bits of the 64-bit result in register MHI, the least

Instruction Set Details


significant 32 bits of the 64-bit result in register MLO, and the middle 32 bits of
the 64-bit result in register MMID.

8
Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 1 0 0 1 1 1 1 1 1 B[5:0] C[5:0] 0 R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 1 0 0 1 1 1 1 1 1 B[5:0] C[5:0] D[8:0]

ARCtangent™-A4 Programmer’s Reference 109


Instruction Set Details

Instruction fields:
B[5:0] Operand 1 address Q Condition code field
C[5:0] Operand 2 address R Reserved: set to 0
D[8:0] Immediate data field
Instruction Set Details
8

110 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

MULU64 32 x 32 Unsigned Multiply MULU64


Extension Option
MULU64
Operation:
MLO ← low part of (operand 1 X operand 2)
MHI ← high part of (operand 1 X operand 2)
MMID ← middle part of (operand 1 X operand 2)
b c

MHI MLO

MMID

Syntax:
MULU64<.cc> <0,>b,c
MULU64 <0,>b,shimm
MULU64 <0,>shimm,c
MULU64<.cc> <0,>b,limm
MULU64<.cc> <0,>limm,c
Example:
MULU64 r2,r3
Description:
Perform an unsigned 32-bit by 32-bit multiply of operand1 and operand2 then
place the most significant 32 bits of the 64-bit result in register MHI, the least

Instruction Set Details


significant 32 bits of the 64-bit result in register MLO, and the middle 32 bits of
the 64-bit result in register MMID.

8
Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 1 0 1 1 1 1 1 1 1 B[5:0] C[5:0] 0 R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 1 0 1 1 1 1 1 1 1 B[5:0] C[5:0] D[8:0]

ARCtangent™-A4 Programmer’s Reference 111


Instruction Set Details

Instruction fields:
B[5:0] Operand 1 address Q Condition code field
C[5:0] Operand 2 address R Reserved: set to 0
D[8:0] Immediate data field
Instruction Set Details
8

112 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

No Operation
NOP NOP
Control Operation
NOP
Operation:
No Operation
Syntax:
NOP
Example:
NOP
Description:
No operation. The state of the processor is not changed. NOP is included for
instruction set symmetry. It is basically the XOR instruction:
XOR 0x1FF, 0x1FF, 0x1FF.
Status flags:
Z N C V
. . . .
Z Unchanged N Unchanged
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Instruction Set Details


8

ARCtangent™-A4 Programmer’s Reference 113


Instruction Set Details

NORM Normalize Integer NORM


Extension Option
NORM
Operation:
dest ← normalize value of operand
S b
#

Syntax:
with result without result
NORM<.cc><.f> a,b NORM<.cc><.f> 0,b
NORM<.f> a,shimm NORM<.f> 0,shimm
NORM<.cc><.f> a,limm NORM<.cc><.f> 0,limm
Example:
NORM r1,r2
Description:
Gives the normalization integer for the signed value in the operand. The
normalisation integer is the amount by which the operand should be shifted left
to normalise it as a 32-bit signed integer. This function is sometimes referred to
as “find first bit”. Examples of returned values are shown in the table below:
Operand Value Returned Value Notes
0x00000000 0x0000001F
0x1FFFFFFF 0x00000002
0x3FFFFFFF 0x00000001
Instruction Set Details

0x7FFFFFFF 0x00000000
0x80000000 0x00000000 This result is not particularly useful since the
8

operand is the most negative value.


0xC0000000 0x00000001
0xE0000000 0x00000002
0xFFFFFFFF 0x0000001F

114 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 1 0 1 0 F R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 1 0 1 0 D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
D[8:0] Immediate data field F Set flags on result if 1

Instruction Set Details


8

ARCtangent™-A4 Programmer’s Reference 115


Instruction Set Details

Logical Bitwise OR
OR OR
Logical Operation
OR
Operation:
dest ← operand1 OR operand2
Syntax:
with result without result
OR<.cc><.f> a,b,c OR<.cc><.f> 0,b,c
OR<.f> a,b,shimm OR<.f> 0,b,shimm
OR<.f> a,shimm,c OR<.f> 0,shimm,c
OR<.f> a,shimm,shimm OR<.f> 0,shimm,shimm
(shimms MUST match)
OR<.cc><.f> a,b,limm OR<.cc><.f> 0,b,limm
OR<.cc><.f> a,limm,c OR<.cc><.f> 0,limm,c
Example:
OR r1,r2,r3
Description:
Logical bitwise OR of operand1 with operand2 and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
Instruction Set Details

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 0 1 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

OR
8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 1 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field F Set flags on result if set to 1
Res Reserved. Should be set to 0.

116 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Rotate Left Through Carry


RLC RLC
Logical Operation
RLC
Operation:
dest ← rotate left through carry by one of operand

Syntax:
with result without result
RLC<.cc><.f> a,b RLC<.cc><.f> 0,b
RLC<.f> a,shimm RLC<.f> 0,shimm
RLC<.cc><.f> a,limm RLC<.cc><.f> 0,limm
Example:
RLC r1,r2
Description:
Rotate operand left by one place and place the result in the destination register.
The carry flag is shifted into the least significant bit of the result, and the most
significant bit of the source is placed in the carry flag. RLC is included for
instruction set symmetry. It is basically the ADC instruction.
Status flags:
Z N C V
* * * .

Instruction Set Details


Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction format:

8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 1 A[5:0] B[5:0] B[5:0] F Res. Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 1 A[5:0] B[5:0] B[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address - in both fields
D[8:0] Immediate data field Q Condition code field
F Set flags on result if set to 1 Res Reserved. Should be set to 0.

ARCtangent™-A4 Programmer’s Reference 117


Instruction Set Details

Rotate Left
ROL ROL
Not implemented
ROL
Operation:
dest ← rotate left by one of operand

The instruction is listed for instruction set symmetry.


To carry out this instruction in the basecase version of ARCtangent-A4
processor, it is recommended that the following 2 instructions are used.
ADD.f a,b,b
ADC<.f> a,a,0
The flags are set by the first instruction, hence ROL cannot be used without
affecting the flags.
Instruction Set Details
8

118 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Rotate Right
ROR ROR
Logical Operation
ROR
Operation:
dest ← rotate right by one of operand
b C
....

Syntax:
with result Without result
ROR<.cc><.f> a,b ROR<.cc><.f> 0,b
ROR<.f> a,shimm ROR<.f> 0,shimm
ROR<.cc><.f> a,limm ROR<.cc><.f> 0,limm
Example:
ROR r1,r2
Description:
Rotate operand right by one place and place the result in the destination register.
The least significant bit of the source is also copied to carry flag.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Instruction Set Details


0 0 0 1 1 A[5:0] B[5:0] 0 0 0 0 1 1 F Res. Q Q Q Q Q

OR

8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 0 0 1 1 D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 119


Instruction Set Details

ROR Rotate right ROR


Extension Option
multiple multiple
ROR multiple
Operation:
dest ←rotate right of operand1 by operand2
b
....

Syntax:
with result without result
ROR<.cc><.f> a,b,c ROR<.cc><.f> 0,b,c
ROR<.cc><.f> a,b,limm ROR<.cc><.f> 0,b,limm
ROR<.f> a,b,shimm ROR<.f> 0,b,shimm
ROR<.cc><.f> a,limm,c ROR<.cc><.f> 0,limm,c
ROR<.f> a,shimm,c ROR<.f> 0,shimm,c
Example:
ROR r1,r2,r3
Description:
Rotate right operand1 by operand2 places and place the result in the destination
register.
Condition codes:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details

C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8

1 0 0 1 1 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 0 1 1 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field

120 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Rotate Right through Carry


RRC RRC
Logical Operation
RRC
Operation:
dest ← rotate right through carry by one of operand

Syntax:
with result Without result
RRC<.cc><.f> a,b RRC<.cc><.f> 0,b
RRC<.f> a,shimm RRC<.f> 0,shimm
RRC<.cc><.f> a,limm RRC<.cc><.f> 0,limm
Example:
RRC r1,r2
Description:
Rotate operand right by one place and place the result in the destination register.
The carry flag is shifted into the most significant bit of the result, and the least
significant bit of the source is placed in the carry flag.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set

Instruction Set Details


C Set if carry is generated V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

8
0 0 0 1 1 A[5:0] B[5:0] 0 0 0 1 0 0 F Res. Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 0 1 0 0 D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 121


Instruction Set Details

Subtract with Carry


SBC SBC
Arithmetic Operation
SBC
Operation:
dest ← operand1 - operand2 - /carry
Syntax:
with result without result
SBC<.cc><.f> a,b,c SBC<.cc><.f> 0,b,c
SBC<.f> a,b,shimm SBC<.f> 0,b,shimm
SBC<.f> a,shimm,c SBC<.f> 0,shimm,c
SBC<.f> a,shimm,shimm SBC<.f> 0,shimm,shimm
SBC<.cc><.f> a,b,limm SBC<.cc><.f> 0,b,limm
SBC<.cc><.f> a,limm,c SBC<.cc><.f> 0,limm,c
Example:
SBC r1,r2,r3
Description:
Subtract operand2 from operand1 with carry, and place the result in the
destination register. Operand2 is subtracted from operand1 and if carry has
previously been set, the result is decremented by one.
The carry flag is interpreted as a “borrow” for the subtract instruction.
Status flags:
Z N C V
* * * *

Z Set if result is zero N Set if most significant bit of result is set


Instruction Set Details

C Set if borrow is generated V Set if an overflow is generated


Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8

0 1 0 1 1 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 1 1 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 addres
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field R Reserved. Should be set to 0.
F Set flags on result if set to 1

122 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Sign Extend
SEX SEX
Arithmetic Operation
SEX
Operation:
dest ← operand sign extended from byte or word
Syntax:
with result without result
SEX<zz><.cc><.f> a,b SEX<zz><.cc><.f> 0,b
SEX<zz><.f> a,shimm SEX<zz><.f> 0,shimm
SEX<zz><.cc><.f> a,limm SEX<zz><.cc><.f> 0,limm
SEX 0,shimm ;nop
Example:
SEXW r1,r2
Description:
Sign extend operand to most significant bit in long word from byte or word
according to size field <zz> and place the result in the destination register. Valid
values for <zz> are:
W sign extend from word
B sign extend from byte
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged

Instruction Set Details


Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] H[5:0] F Res. Q Q Q Q Q

8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] H[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand addres
H[5:0] Extend size. 05=byte, 06=word. D[8:0] Immediate data field
Q Condition code field R Reserved. Should be set to 0.
F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 123


Instruction Set Details

Enter Sleep Mode


SLEEP SLEEP
Control Operation
SLEEP
Operation:
Enter Processor Sleep Mode
Syntax:
SLEEP
Example:
SLEEP
Description:
The SLEEP instruction is a single operand instruction without flags or operands.
The SLEEP instruction is decoded in pipeline stage 2. If a SLEEP instruction is
detected, then the sleep mode flag (ZZ) is immediately set and the pipeline stage
1 is stalled. A flushing mechanism assures that all earlier instructions are
executed until the pipeline is empty. The SLEEP instruction itself leaves the
pipeline during the flushing.
When in sleep mode, the sleep mode flag (ZZ) is set and the pipeline is stalled,
but not halted. The host interface operates as normal allowing access to the
DEBUG and the STATUS registers and it can halt the processor. The host cannot
clear the sleep mode flag, but it can wake the ARCtangent-A4 processor by
halting then restarting it. The program counter PC points to the next instruction in
sequence after the sleep instruction.
The ARCtangent-A4 processor will wake from sleep mode on an interrupt or
when it is restarted. If an interrupt wakes it, the ZZ flag is cleared and the
instruction in pipeline stage 1 is killed. The interrupt routine is serviced and
execution resumes at the instruction in sequence after the SLEEP instruction.
Instruction Set Details

When it is started after having been halted the ZZ flag is cleared.


SLEEP behaves as a NOP during single step mode.
Status flags:
8

Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1

124 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Store to auxiliary register


SR SR
Control Operation
SR
Operation:
auxiliary register number[operand 2]← operand 1
Syntax:
SR c,[b]
SR c,[limm]
SR c,[shimm]
SR limm,[shimm]
SR shimm,[limm]
SR shimm,[b]
SR limm,[b]

NOTE The operand syntax matches LR.

Example:
SR r1,[12]
Description:
Store operand 1 to the auxiliary register whose number is obtained from operand
2.
Status flags:
Not affected
Instruction format:

Instruction Set Details


This is an encoding on the ST instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 0 R 1 R R R R B[5:0] C[5:0] D[8:0]

8
Instruction fields:
B[5:0] Operand 2 register address C[5:0] Operand 1 register address
D[8:0] Immediate data field R Reserved. Should be set to 0.

ARCtangent™-A4 Programmer’s Reference 125


Instruction Set Details

Store to memory
ST ST
Memory Operation
ST
Operation:
[operand 2 + offset]← operand 1
Syntax:
ST<zz><.a><.di> c,[b]
ST<zz><.di> c,[limm]
ST<zz><.a><.di> c,[b,shimm]
ST<zz><.di> c,[shimm,shimm] shimms MUST match
ST<zz><.a><.di> 0,[b]
ST<zz><.di> shimm,[limm] actually: shimm,[limm,shimm]
ST<zz><.a><.di> shimm,[b,shimm] shimms MUST match
ST<zz><.di> limm,[shimm,shimm] shimms MUST match
ST<zz><.a><.di> limm,[b,shimm]

NOTE The operand syntax matches LD.

Example:
ST.A r1,[r2,10]
Description:
Store operand 1 to the address calculated by adding operand 2 with offset.
NOTE If the offset is not required, the value encoded for the immediate offset will be
set to 0.

The data size of the load is set according to the size field <zz>. The following
Instruction Set Details

table shows the sizes available.


- no field in syntax Long word 00
W Word 10
8

B Byte 01
The result of the address computation can be written back to the first register
operand in the address field. This write back occurs when the address write back
field, .A, is set.
If a data-cache is available in the memory controller the store instruction can
bypass the use of that cache when the direct to memory field, .DI, is set.

126 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

NOTE Note that when a memory controller is employed:


Store bytes can be made to any byte alignments
Store words should be made from word aligned addresses and
Store longs should be made only from long aligned addresses.

Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 0 Di 0 A Z Z R B[5:0] C[5:0] D[8:0]

Instruction fields:
B[5:0] Operand 2 register address C[5:0] Operand 1 register address
D[8:0] Immediate data offset Di Direct to memory (cache bypass) enable
A Address write-back enable R Reserved. Should be set to 0
Z Size field

Encoding examples:
ST r5,[r7,50] ; ST c,[b,shimm]
I[4:0] B[5:0] C[5:0] D[8:0]
2=ST 7 5 50

ST 50,[12345678] ; ST shimm,[limm]
I[4:0] B[5:0] C[5:0] D[8:0] LIMM
2=ST 62=limm 63=shimm 50 12345678-50

ST 50,[r7,50] ; ST shimm,[b,shimm]
I[4:0] B[5:0] C[5:0] D[8:0]
2=ST 7 63=shimm 50

Instruction Set Details


ST r3,[12345678] ; ST c,[limm]
I[4:0] B[5:0] C[5:0] D[8:0] LIMM

8
2=ST 62=limm 3 0 12345678

ST 12345678,[r20,8] ; ST limm,[b,shimm]
I[4:0] B[5:0] C[5:0] D[8:0] LIMM
2=ST 20 62=limm 8 12345678

ST 50,[50,50] ; ST shimm,[shimm,shimm]
I[4:0] B[5:0] C[5:0] D[8:0]
2=ST 62=shimm 62=shimm 50

ARCtangent™-A4 Programmer’s Reference 127


Instruction Set Details

Subtract
SUB SUB
Arithmetic Operation
SUB
Operation:
dest ← operand1 - operand2
Syntax:
with result without result
SUB<.cc><.f> a,b,c SUB<.cc><.f> 0,b,c
SUB<.f> a,b,shimm SUB<.f> 0,b,shimm
SUB<.f> a,shimm,c SUB<.f> 0,shimm,c
SUB<.f> a,shimm,shimm SUB<.f> 0,shimm,shimm
SUB<.cc><.f> a,b,limm SUB<.cc><.f> 0,b,limm
SUB<.cc><.f> a,limm,c SUB<.cc><.f> 0,limm,c
Example:
SUB r1,r2,r3
SUB.F 0,r3,200 ; compare r3 with 200 and set flags
SUB.LT r2,r2,r2 ; same effect as MOV.LT r2,0 but no
; limm 0 data needed
Description:
Subtract operand2 from operand1 and place the result in the destination register.
The carry flag if set is by the subtract instruction is interpreted as a “borrow”.
Status flags:
Z N C V
* * * *
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details

C Set if borrow is generated V Set if an overflow is generated


Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8

0 1 0 1 0 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 1 0 A[5:0] B[5:0] C[5:0] D[8:0]

Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved
F Set flags on result if set to 1

128 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

SWAP Swap words SWAP


0x03 / 0x09
Extension: Swap instruction
SWAP
Operation:
dest ← word swap of operand
b

Syntax:
with result without result
SWAP<.cc><.f> a,b SWAP<.cc><.f> 0,b
SWAP<.f> a,shimm SWAP<.f> 0,shimm
SWAP<.cc><.f> a,limm SWAP<.cc><.f> 0,limm
Example:
SWAP r1,r2
Description:
Swap the lower 16 bits of the operand with the upper 16 bits of the operand and
place the result of that swap in the destination register.
Condition codes:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set

Instruction Set Details


C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

8
0 0 0 1 1 A[5:0] B[5:0] 0 0 1 0 0 1 F R R R Q Q Q Q Q

OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 0 1 1 A[5:0] B[5:0] 0 0 1 0 0 1 D[8:0]

Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
D[8:0] Immediate data field F Set flags on result if 1

ARCtangent™-A4 Programmer’s Reference 129


Instruction Set Details

Software Interrupt
SWI SWI
Control Operation
SWI
Operation:
instruction_error ← '1'
Syntax:
SWI
Example:
SWI
Description:
The software interrupt (SWI) instruction can be placed anywhere in the program,
even in the delay slot of a branch instruction. The software interrupt instruction is
decoded in stage two of the pipeline and if executed, then it immediately raises
the instruction error exception. The instruction error exception will be serviced
using the normal interrupt system. ILINK2 is used at the return address in the
service routine.
Once an instruction error exception is taken, then the medium and low priority
interrupts are masked off so that ILINK2 register can not be updated again as a
result of an interrupt thus preserving the return address of the instruction error
exception.
NOTE Only the reset and memory error exceptions have higher priorities than the
instruction error exception.

Status flags:
Not affected.
Instruction Set Details

Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8

0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0

130 ARCtangent™-A4 Programmer’s Reference


Instruction Set Details

Logical Bitwise Exclusive OR


XOR XOR
Logical Operation
XOR
Operation:
dest ← operand1 XOR operand2
Syntax:
with result without result
XOR<.cc><.f> a,b,c XOR<.cc><.f> 0,b,c
XOR<.f> a,b,shimm XOR<.f> 0,b,shimm
XOR<.f> a,shimm,c XOR<.f> 0,shimm,c
XOR<.cc><.f> a,b,limm XOR<.cc><.f> 0,b,limm
XOR<.cc><.f> a,limm,c XOR<.cc><.f> 0,limm,c
Example:
XOR r1,r2,r3
Description:
Logical bitwise Exclusive-OR of operand1 with operand2 and place the result in
the destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 1 1 A[5:0] B[5:0] C[5:0] F Res. Q Q Q Q Q

Instruction Set Details


OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 1 1 1 A[5:0] B[5:0] C[5:0] D[8:0]

8
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0
F Set flags on result if set to 1

ARCtangent™-A4 Programmer’s Reference 131


Chapter 9 — The Host
The ARCtangent-A4 processor was developed with an integrated host interface
to support communications with a host system. It can be started, stopped and
communicated by the host system using special registers. How the various parts
of the ARCtangent-A4 processor appear to the host is host interface dependent.
An outline of processor control techniques are given in this section.
Most of the techniques outlined here will be handled by the software debugging
system, and the programmer, in general, need not be concerned with these
specific details.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.

It is expected that the registers and the program memory of the ARCtangent-A4
processor will appear as a memory mapped section to the host. For example,
Figure 21 shows two examples: a) a contiguous part of host memory and b) a
section of memory and a section of I/O space.

ARCtangent™-A4 Programmer’s Reference 133


Instruction Set Details

Memory Map I/O Map

ARCtangent-A4 Core ARCtangent-A4 Core


Registers Registers

ARCtangent-A4 Auxiliary ARCtangent-A4 Auxiliary


Registers Registers

Memory Map

ARCtangent-A4 memory ARCtangent-A4 memory

a)Single Memory Map b)Memory Map with I/O


Map

Figure 21 Example Host Memory Maps


Once a reset has occurred, the ARCtangent-A4 processor is put into a known
state and executes the initial reset code. From this point, the host can make the
changes to the appropriate part of the processor, depending on whether the
ARCtangent-A4 processor is running or halted as shown in Table 18.
The Host

Running Halted
9

Memory Read/Write Read/Write


Auxiliary Registers Mainly No access Read/Write
Core Registers No access Read/Write
Table 18 Host Accesses to the ARCtangent-A4 processor

134 ARCtangent™-A4 Programmer’s Reference


Halting

Halting
The ARCtangent-A4 processor can halt itself with the FLAG instruction or it can
be halted by the host. The host halts the ARCtangent-A4 processor by setting the
H bit in the STATUS register, or for basecase version numbers greater than 5 by
setting the FH bit in the DEBUG register. See Figure 14 and Figure 19.
NOTE Note that when the ARCtangent-A4 processor is running that only the H bit will
change if the host writes to the STATUS register. However, if the ARCtangent-
A4 processor had halted itself, the whole of the STATUS register will be
updated when the host writes to the STATUS register.

The consequence of this is that the host may assume that the ARCtangent-A4
processor is running by previously reading the STATUS register. By the time
that the host forces a halt, the ARCtangent-A4 processor may have halted itself.
Therefore, the write of a “halt” number to the STATUS register, say
0x02000000, would overwrite any program counter information that the host
required.
In order to force the ARCtangent-A4 processor to halt without overwriting the
program counter, Basecase versions greater than 5 have the additional FH bit in
the DEBUG register. See Figure 19. The host can test whether the ARCtangent-
A4 processor has halted by checking the state of the H bit in the STATUS
register. Additionally, the SH bit in the debug register is available to test whether
the halt was caused by the host, the ARCtangent-A4 processor, or an external
halt signal. The host should wait for the LD (load pending) bit in the DEBUG
register to clear before changing the state of the processor.

Starting
The host starts the ARCtangent-A4 processor by clearing the H bit in the
STATUS register. It is advisable that the host clears any instructions in the
pipeline before modifying any registers and re-starting the ARCtangent-A4
processor, by sending NOP instructions through, so that any pending instructions The Host
that are about to modify any registers in the processor are allowed to complete. If 9
the ARCtangent-A4 processor has been running code, and is to be restarted at a
different location, then it will be necessary to put the processor into a state
similar to the post-reset condition to ensure correct operation.
• reset the three hardware loop registers to their default values

ARCtangent™-A4 Programmer’s Reference 135


Pipecleaning

• flush the pipeline. This is known as ‘pipecleaning’


• disable interrupts, using the PC/Status register
• any extension logic should be reset to its default state
If the ARCtangent-A4 processor has been running and is to be restarted to
CONTINUE where it left off, then the procedure is as follows:
• host reads the PC in the STATUS Register
• host writes back to the STATUS register with the same PC value as was just
read, but clearing the H bit
• the ARCtangent-A4 processor will continue from where it left off when it
was stopped. (Note: at first glance it appears that the same instruction would
be executed twice, but in fact this has been taken care of in the hardware; the
pipeline is held stopped for the first cycle after the STATUS register has
been written and thus the execution starts up again as if there has been no
interruption).

Pipecleaning
If the processor is halted whilst it is executing a program, it is possible that the
later stages of the pipeline may contain valid instructions. Before re-starting the
processor at a new address, these instructions must be cleared to prevent
unwanted register writes or jumps from taking place.
If the processor is to be restarted from the point at which it was stopped, then the
instructions in the pipeline are to be executed, hence pipecleaning should not be
performed.
Pipecleaning is not necessary at times when the pipeline is known to be clean -
e.g. immediately after a reset, or if the processor has been stopped by a FLAG
instruction followed by three NOPs.
The Host

Pipecleaning is achieved as follows:


9

1. Stop the ARCtangent-A4 processor


2. Download a ‘NOP’ instruction into memory.
3. Invalidate instruction cache to ensure that the NOP is loaded from memory
4. Point the PC/Status register to the downloaded NOP

136 ARCtangent™-A4 Programmer’s Reference


Single Stepping

5. Single step until the values in the program counter or loop count register
change.
6. Point the PC/Status register to the downloaded NOP
7. Single step until the values in the program counter or loop count register
change.
8. Point the PC/Status register to the downloaded NOP
9. Single step until the values in the program counter or loop count register
change.
Notice that the program counter is written before each single step, so all branches
and jumps, that might be in the pipeline, are overridden, ensuring that the NOP is
fetched every time.
It should be noted that the instructions in the pipeline may perform register
writes, flag setting, loop set-up, or other operations which change the processor
state. Hence, pipecleaning should be performed before any operations which set
up the processor state in preparation for the program to be executed - for example
loading registers with parameters.

Single Stepping
The Single Step function is controlled by two bits in the DEBUG register. These
bits can be set by the debugger to enable the Single Cycle Stepping or Single
Instruction Stepping. The two bits, Single Step (SS) and Instruction Step (IS), are
write-only by the host and keep their values for one cycle (see Table 19).
Field Description Access Type
SS Single Step:- Cycle Step enable Write only from the host
IS Instruction Step:- Instruction Step Write only from the host
enable
Table 19 Single Step Flags in Debug Register
The Host
9
Single cycle step
The Single Cycle Step function enables the processor for one cycle only.
Normally, an instruction is completed in four cycles: fetch, register read, execute
and register writeback. In order to complete an instruction, the debugger must
repeatedly single cycle step the processor until the program counter value is

ARCtangent™-A4 Programmer’s Reference 137


Single Stepping

updated. Single Cycle Stepping is the only stepping function supported in


ARCtangent-A4 basecase processor prior to version 7.
The Single Cycle Step function is enabled by setting the (SS) bit and clearing the
(IS) bit in the DEBUG register when the ARCtangent-A4 processor is halted. On
the next clock cycle the processor will be enabled for one clock cycle. The single
step lasts only for one clock cycle after which the processor is halted.

Single instruction step


The Single Instruction Step function enables the processor for completion of a
whole instruction Single Instruction Stepping is supported on basecase
ARCtangent-A4 processor version 7 or above. The Single Instruction Step
function is enabled by setting both the (SS) and (IS) bits in the debug register
when the ARCtangent-A4 processor is halted.
On the next clock cycle the processor is kept enabled for as many cycle as
required to complete the instruction. Therefore, any stalls due to register conflicts
or delayed loads are accounted for when waiting for an instruction to complete.
All earlier instructions in the pipeline are flushed, the instruction that the
program counter is pointing to is completed, the next instruction is fetched and
the program counter is incremented.
NOTE If the stepped instruction was:
A Branch, Jump or Loop with a killed delay slot, or
Using Long Immediate data.

Then two instruction fetches are made so that the program counter would be
updated appropriately.

SLEEP instruction in single step mode


The SLEEP instruction is treated as a NOP instruction when the processor is in
Single Step Mode. This is because every single step acts as a restart or a wake up
call. Consequently, the SLEEP instruction behaves exactly like a NOP
propagating through the pipeline.
The Host

BRK instruction in single step mode


9

The BRK instruction behaves exactly as when the processor is not in the Single
Step Mode. The BRK instruction is detected and kept in stage one forever until
removed by the host.

138 ARCtangent™-A4 Programmer’s Reference


Software Breakpoints

Software Breakpoints
As long as the host has access to the ARCtangent-A4 code memory, it can
replace any ARCtangent-A4 instruction with a branch instruction. This means
that a “software breakpoint” can be set on any instruction, as long as the target
breakpoint code is within the branch address range. Since a software breakpoint
is a branch instruction, the rules for use of Bcc apply. Care should be taken when
setting breakpoints on the last instructions in zero overhead loops and also on
instructions in delay slots of jump, branch and loop instructions. (See Pipeline
Cycle Diagrams for: Loops and Branches).
For ARCtangent-A4 basecase processor versions 7 and higher, the BRK
instruction can be used to insert a software breakpoint. BRK will halt the
ARCtangent-A4 processor and flush all previous instructions through the pipe.
The host can read the STATUS register to determine where the breakpoint
occurred.

ARCtangent-A4 Core Registers


The ARCtangent-A4 core registers are available to be read and written by the
host. These registers should be accessed by the host once the ARCtangent-A4
processor has halted.

ARCtangent-A4 Auxiliary Registers


Some auxiliary registers, unlike the core registers, may be accessed while the
ARCtangent-A4 processor is running. These dual access registers in the basecase
processor are:

STATUS
The host can read the status register when the ARCtangent-A4 processor is The Host
running. This is useful for code profiling. See Figure 14. 9

SEMAPHORE
The semaphore register is used for inter-processor and host-ARCtangent-A4
communications. Protocols for using shared memory and provision of mutual
exclusion can be accomplished with this register. See Figure 15.

ARCtangent™-A4 Programmer’s Reference 139


ARCtangent-A4 Memory

IDENTITY
The host can determine the version of ARCtangent-A4 processor by reading the
identity register. See Figure 18. Information on extensions added to the
ARCtangent-A4 processor can be determined through build configuration
registers. For more information on build configuration registers please refer to
the 'ARCtangent-A4 Development Kit for ARCtangent-A4 Release Notes'.

DEBUG
In order to halt ARCtangent-A4 processor, the host needs to set the FH bit of the
debug register. The host can determine how the ARCtangent-A4 processor was
halted and if there are any pending loads. See Figure 19.

ARCtangent-A4 Memory
The program memory can be changed by the host. The memory can be changed
at any time by the host.
NOTE If program code is being altered, or transferred into ARCtangent-A4 memory
space, then the instruction cache should be invalidated.
The Host
9

140 ARCtangent™-A4 Programmer’s Reference


Chapter 10 — Pipeline and Timings

Introduction
The ARCtangent-A4 processor has a four stage pipeline as shown in Figure 23.
Load data
Load
Load/Store Address
Store
Store data
Unit Latch

Stage 1 Stage 2 Stage 3 Stage 4


Inst i
Memory
Instruction Operand b ALU Write
Controller
Fetch Data Fetch a Back
c

Core Auxiliary
Registers Registers

ALU short cut

Write back

Figure 22 ARCtangent-A4 Pipeline


An outline of each part of the pipeline is given below.

Stage 1. Instruction fetch


The instruction is fetched from memory via the memory controller depending on
the status registers and link registers

Stage 2. Operand fetch


The operands are fetched from the core registers or from the immediate value
associated with the instruction.

ARCtangent™-A4 Programmer’s Reference 141


Pipeline-Cycle Diagram

Pipeline and Timings


Stage 3. ALU
Any arithmetic or logic functions are carried out on the operands supplied by
10 stage 2.

Stage 4. Write back


Results from stage 3 or data from loads are written back to the core registers.

Pipeline-Cycle Diagram
In the explanation of the passage of instructions through the pipeline stages the
diagram in Figure 23 is used.
t t+1 t+2 t+3 t+4 t+5
Event 1 stage 1 stage 2 stage 3 stage 4
Event 2 stage 1 stage 2 stage 3 stage 4
Event 3 stage 1 stage 2 stage 3 stage 4

Figure 23 Pipeline-Cycle Diagram


Time progresses from left to right and events progress down the page.
In order to read the diagram, take as an example the second cycle at time t+1.
Here, Event 1 has reached stage 2 in the pipeline and Event 2 has reached stage 1
of the pipeline.
If we have the following code:
AND r1,r2,r3
OR r5,r6,r4
BIC r8,r9,r10
SUB r14,r12,r13

We can show the events in the pipeline with the following diagram:

142 ARCtangent™-A4 Programmer’s Reference


Arithmetic and Logic Function Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND ifetch r2, r3 AND r1
OR ifetch r6, r4 OR r5

10
BIC ifetch r9, r10 BIC r8
SUB ifetch r12,r13 SUB r14
At cycle t+3
The write back stage (stage 4) is updating r1
The ALU stage (stage 3) is performing an OR.
The operand fetch (stage 2) fetching the operands r9 and r10 for BIC.
The instruction fetch of SUB is occurring at stage 1.

Arithmetic and Logic Function


Timings
The stages perform the following operations during an Arithmetic or Logic
instruction.
Stage 1
Instruction fetch and start decode
Stage 2
Fetch 2 operands from registers
Stage 3
Do Arithmetic or Logic function
Stage 4
Write result to register
When arithmetic and logic functions are executed sequentially, there are
sometimes dependencies of registers between the instructions. Take the following
code:
AND r1,r2,r3
OR r5,r1,r4
BIC r8,r9,r10
SUB r14,r12,r13

ARCtangent™-A4 Programmer’s Reference 143


Immediate Data Timing

Pipeline and Timings


The second instruction OR uses r1 as an operand. Notice that r1 is updated from
the previous instruction (AND).
10 We can see the effect of this dependency in the following pipeline-cycle diagram:
t t+1 t+2 t+3 t+4 t+5 t+6
AND ifetch r2, r3 AND r1
OR ifetch r1, r4 OR r5
BIC ifetch r9, r10 BIC r8
SUB ifetch r12,r13 SUB r14
Since r1 is not updated until time t+3 and OR needs r1 at t+2 it would appear that
the OR operand fetch would have to be delayed for one cycle.
However, since there is an ALU SHORT CUT (see Figure 22), r1 is ready to be
used as an operand by the OR instruction at t+2.

Immediate Data Timing


When immediate data is used, the data is available at different times depending
on the size of that data.

Short immediate
The short immediate data of an instruction is available at the operand fetch stage
and is taken from the low 9 bits of the instruction. The instruction takes the same
time to cycle through the pipeline.

Long immediate
The long immediate data is taken from the word in the instruction fetch stage
while the instruction is in the operand fetch stage. The stages that the long
immediate data would pass through, if it were an instruction, are disabled.
This means that a long immediate instruction takes one cycle longer and the next
instruction is a cycle later.
The stages perform the following operations during an Arithmetic or Logic
instruction with immediate data.

144 ARCtangent™-A4 Programmer’s Reference


Immediate Data Timing

Pipeline and Timings


Stage 1
Instruction fetch and start decode

10
Stage 2
Fetch 1 operand from registers and the other from the value currently in stage 1
Disable instruction word in stage 1.
Stage 3
Do Arithmetic or Logic function
Stage 4
Write result to register
For example:
AND r1,r2,2000
OR r5,r1,r4
SUB r14,r12,r13

t t+1 t+2 t+3 t+4 t+5 t+6


AND ifetch r2 AND r1
limm 2000 disabled disabled disabled
OR ifetch r3,r4 OR r5
SUB ifetch r12,r13 SUB r14

Destination immediate
If the destination for the result of an instruction is marked as being immediate,
then the write-back at stage 4 is disabled.
For example:
AND 0,r2,r3
OR r5,r1,r4
BIC r8,r9,r10
SUB r14,r12,r13

t t+1 t+2 t+3 t+4 t+5 t+6


AND ifetch r2, r3 AND disabled
OR ifetch r1, r4 OR r5
BIC ifetch r9, r10 BIC r8
SUB ifetch r12,r13 SUB r14

ARCtangent™-A4 Programmer’s Reference 145


Conditional Instruction Timing

Pipeline and Timings


Conditional Instruction Timing
10 Condition code tests for branch, loop and jump instructions occur one stage
earlier in the pipeline, rather than in conditional arithmetic and logic instructions,
and are covered in section 7.6.
The condition code tests for arithmetic and logic instructions are carried out at
the beginning of stage 3. Condition codes are updated at the end of stage 3 ready
for the next instruction.
If the test returns a false value, then the following parts of the pipeline are
affected for that instruction:
• write-back to the core register set at stage 4 is disabled
• update of the flags at stage 3 is disabled
• the ALU SHORTCUT is disabled
The stages during a conditional Arithmetic or Logic instructions:
Stage 1
Instruction fetch and start decode
Stage 2
Fetch 2 operands from registers
Stage 3
Do Arithmetic or Logic function.
If condition true then update flags and enable ALU SHORTCUT
If condition false then do not update flags and disable ALU SHORTCUT
Stage 4
If condition true then write result to register
Take the following code, where the result of the AND is zero:
AND.F r1,r2,r3
OR.NE.F r5,r6,r4
BIC r8,r9,r10
SUB r14,r12,r13

146 ARCtangent™-A4 Programmer’s Reference


Extension Instruction Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2, r3 flag r1
update

10
OR.NE ifetch r6, r4 condition Disable
.F code test wrt-back

BIC ifetch r9, r10 BIC r8


SUB ifetch r12,r13 SUB r14

Extension Instruction Timings


Single cycle extension instructions
Single cycle extension instructions follow the same characteristics as arithmetic
and logic functions, immediate data and conditional instruction timing. The
following extension options have single cycle extension instruction
characteristics:
• 32-bit Barrel shift/rotate block (single cycle)
• Normalise (find-first-bit) instruction
• Swap instruction
• MIN/MAX instructions

Multi cycle extension instructions


Multi cycle extension instructions will stall the pipeline if basecase core registers
are being written to. If extension core or auxiliary registers are defined as specific
destination registers, then the pipeline will only stall if the extension register is
being accessed and the extension instruction has built-in scoreboarding. The
following extension options have multi cycle extension instruction
characteristics:
• 32-bit Barrel shift/rotate block (multi cycle)
• 32-bit Multiplier, small (10 cycle) implementation
• 32-bit Multiplier, fast (4 cycle) implementation

ARCtangent™-A4 Programmer’s Reference 147


Extension Instruction Timings

Pipeline and Timings


Multiply timings
The stages perform the following operations for the multiply instruction:
10
Stage 1
Instruction fetch and start decode
Stage 2
Fetch operands.
Update multiply scoreboard unit with result registers (MLO, MMID, MHI)
marked as invalid.
Stage 3
Perform multiply in four (or ten for small implementation) cycles.
Allow shortcutting of multiply result if required.
Stage 4
No action.
On completion of multiply
Update multiply result registers and update multiply scoreboard unit marking
result registers as valid.
When the multiply registers are waiting to be updated by a multiply and any of
those registers are one of the operands of an instruction in the pipeline at stage 2
then the pipeline is halted until the multiply completes.
A special scoreboard unit is used to retain the information on which registers are
waiting to be written. The scoreboard unit is updated at stage 2 when the multiply
is executed, and updated at stage 4 when the result registers have been written to.
The pipeline is allowed to proceed if the instruction following the multiply does
not use the multiply result registers (this is checked by the instruction in stage 2).
Once an instruction does need that register then the pipeline is halted and waits
for the load to complete.
The result will not be ready for four (or ten for the small implementation) cycles
after the multiply instruction has been issued. Note, that this time is not affected
by other pipeline stalls in the system - once it is issued, the multiply will be ready
after four (or ten for the small implementation) cycles under all conditions.
When the result of the multiply is ready the multiply result registers are updated
with no affect on the pipeline.

148 ARCtangent™-A4 Programmer’s Reference


Extension Instruction Timings

Pipeline and Timings


In this example, the multiply takes four cycles to get to the short cut value.
main:
MUL64 0,r2,r3

10
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12

t t+1 t+2 t+3 t+4 t+5 t+6


MUL64 ifetch r2,r3 MUL64 killed wrt-
back
AND ifetch r5,r6 AND r4
OR ifetch r8,r9 OR r7
SUB ifetch r11,r12 SUB r10

score mark as check check check check result


board pending now
ready
result not not not not not set mult.
registers set set set set result
If the AND used one of the multiply result register then the AND would stall. For
example, with a dependency on MLO:
main:
MUL64 0,r2,r3
AND r4,mlo,r6
OR r7,r8,r9
SUB r10,r11,r12

t t+1 t+2 t+3 t+4 t+5 t+6


MUL64 ifetch r2,r3 MUL64 killed short-cut wrt-
back
AND ifetch mlo,r6 stalled stalled stalled AND
OR ifetch stalled stalled stalled r8,r9
SUB ifetch

score mark as check check check check result


board pending now
ready
result not not not not not mult.
registers set set set set set result

ARCtangent™-A4 Programmer’s Reference 149


Extension Instruction Timings

Pipeline and Timings


Barrel shift timings
The fast barrel shift has the same timing characteristics as arithmetic and logic
10 functions, immediate data and conditional instruction timing.The small barrel
shift, however, will stall the pipeline until the shift is complete. The number of
cycles required to complete a barrel shift operation depend on the number of bit
shifts in that operation, which in this implementation, processes 4 bit shifts per
cycle. Thus, the number of cycles will vary from one (0 to 4 bits) to eight (29 to
32 bits) cycles.
The stages perform the following operations during a barrel shift instruction.
Stage 1
Instruction fetch and start decode
Stage 2
Fetch 2 operands from registers
Stage 3
Do shift function, stalling pipeline if required.
Stage 4
Write result to register
The pipeline will stall depending on the size of the shift. In the following
example a shift of 3 will take one cycle and use the short-cutting mechanism to
get the result register in time:
ASL r1,r2,0x3
OR r5,r1,r4
BIC r8,r9,r10
SUB r14,r12,r13
The second instruction OR uses r1 as an operand. Notice that r1 is updated from
the barrel shift instruction (ASL).
t t+1 t+2 t+3 t+4 t+5 t+6
ASL ifetch r2, 0x3 ASL r1
OR ifetch r1, r4 OR r5
BIC ifetch r9, r10 BIC r8
SUB ifetch r12,r13 SUB r14

150 ARCtangent™-A4 Programmer’s Reference


Jump and Branch Timings

Pipeline and Timings


In the following example a shift of 5 will take two cycles, causing a stall of one
cycle using the short-cutting mechanism to get the result:

10
ASL r1,r2,0x5
OR r5,r1,r4
BIC r8,r9,r10
SUB r14,r12,r13
t t+1 t+2 t+3 t+4 t+5 t+6
ASL ifetch r2, 0x5 ASL stalled r1
OR ifetch r1, r4 stalled OR r5
BIC ifetch stalled r9, r10 BIC r8
SUB ifetch r12,r13 SUB

Jump and Branch Timings


Jump instruction
The jump instruction performs the following action in the pipeline:
Stage 1
Instruction fetch and start decode
Stage 2
Fetch operand.
Test condition code.
If condition true update PC with operand, and if flag bit set then update flags. If
condition false allow PC to update normally.
Execute instruction in delay slot according to the nullify instruction mode.
Stage 3
No action
Stage 4
No action

ARCtangent™-A4 Programmer’s Reference 151


Jump and Branch Timings

Pipeline and Timings


Jump and nullify delay slot instruction
If a jump is not conditional, then the jump is always taken and the instruction
10 immediately following the jump is executed according to the nullify instruction
mode.
A single cycle stall will occur if a jump is immediately preceded by an
instruction that sets the flags. Assuming that r7 contains the address of the code
that starts at label jaddr take the following code:
main:
MOV r5,r7
JAL.ND [r5]
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

The current PC is also included in the following diagram:


t t+1 t+2 t+3 t+4 t+5 t+6
MOV ifetch r7 MOV r5
JAL.ND ifetch update PC no no
with r5
action action
BIC delay → ifetch killed killed killed
slot
OR ifetch r2,r3 OR r1

currentpc main main+1 main+2 jaddr jaddr+1

Jump and execute delay slot instruction


When the delay slot execution flag is set then the above code would become:
main:
MOV r5,r7
JAL.D [r5]
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

The affect of this code through the pipeline is shown in the following diagram:

152 ARCtangent™-A4 Programmer’s Reference


Jump and Branch Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
MOV ifetch r7 MOV r5
JAL.D ifetch update PC no action no action

10
with r5

BIC delay → ifetch r9,r10 BIC r8


slot
OR ifetch r2,r3 OR r1

currentpc main main+1 main+2 jaddr jaddr+1

Jump with immediate address


When a jump occurs with a long immediate address it takes an extra cycle to
execute. The delay slot execution mechanism does not apply since the long
immediate data is contained in the delay slot. A single cycle stall will occur if a
jump is immediately preceded by an instruction that sets the flags.
main:
JAL jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


JAL.ND ifetch update PC no action no action
with limm

limm limm disabled disabled disabled

OR ifetch r2,r3 OR r1

currentpc main main+1 jaddr jaddr+1

ARCtangent™-A4 Programmer’s Reference 153


Jump and Branch Timings

Pipeline and Timings


Jump setting flags
If the flags update field is used by the jump instruction then the flags, except the
10 H bit, will be updated in stage 3. A single cycle stall will occur if a jump is
immediately preceded by an instruction that sets the flags.
main:
MOV.F r5,r7
JAL.D.F [r5] ; pipeline stall due to flags set by MOV
BIC.F r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

NOTE In this case that because the BIC instruction is in the delay slot, the flags are
changed after the jump by BIC. If the delay slot instruction was nullified then
the flags would only be changed by the jump instruction.

. t t+1 t+2 t+3 t+4 t+5 t+6


MOV.F ifetch r7 MOV r5
JAL.D.F ifetch stall (wait update PC update no
for flags) with r5
flags action
BIC.F delay → ifetch r9,r10 BIC r8
slot
OR ifetch r2,r3 OR

currentp main main+1 main+2 main+2 jaddr jaddr+1


c
flags set MOV not set JAL BIC not set
by

Conditional jump
Condition code tests for branch, loop and jump instructions happen at stage 2 in
the pipeline, rather than at stage 3 for conditional arithmetic and logic
instructions. As a result, a single cycle stall will occur if a jump is immediately
preceded by an instruction that sets the flags.
In the following example, the flags are set by two instructions and it can be seen
in the pipeline-cycle diagram where the effects of the flags occur.

154 ARCtangent™-A4 Programmer’s Reference


Jump and Branch Timings

Pipeline and Timings


main:
AND.F r1,r2,r3
MOV r5,r7
JNE.D.F [r5]

10
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


AND.F ifetch r2,r3 update r1
flags
MOV ifetch r7 MOV r5
JNE.D.F ifetch test flags update no
move r5 to flags action
PC

BIC delay → ifetch r9,r10 BIC r8


slot
OR ifetch r2,r3 OR

Currentpc main main+1 main+2 main+3 jaddr jaddr+1


flags set AND not set JNE not set not set
by
The jump instruction tests the flags that have been updated by the AND
instruction.
NOTE If the BIC instruction was conditional then it would be executed according to
the flags set by the jump instruction.

In the following example, the flag setting instruction is immediately followed by


the jump instruction and it can be seen in the pipeline-cycle diagram where the
effects of the flags occur and where the pipeline stall occurs.
main:
AND.F r1,r2,r3
JNE.D.F [r5] ; pipeline stall due to flags set by
AND
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

ARCtangent™-A4 Programmer’s Reference 155


Jump and Branch Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
10
JNE.D.F ifetch stall (wait test flags update no
for flags) move r5 to flags action
PC

BIC delay → ifetch r9,r10 BIC r8


slot
OR ifetch r2,r3 OR

Currentpc main main+1 main+2 main+2 jaddr jaddr+1


flags set AND not set JNE not set not set
by

Jump and link


The jump and link instruction is very similar to the jump instruction except that
the branch link register (BLINK) is used to allow returns from subroutines.
Unlike the non linking jump, stage 3 and stage 4 are enabled to allow the link
register to be updated with the status register value. The whole of the status
register is saved and is taken either from the first instruction following the branch
(current PC) or the instruction after that (next PC) according to the delay slot
execution mode. If the destination address is an explicit address (long immediate
data) then for this instruction the .JD nullify instruction mode must be used. If .D
or .ND is used, incorrectly, then the link register BLINK will contain the wrong
return address.
The flags stored are those set by the instruction immediately preceding the jump.
A single cycle stall will occur if a jump and link is immediately preceded by an
instruction that sets the flags.
Stage 1
Instruction fetch and start decode
Stage 2
Test condition code.
If condition true update PC with operand, and if flag bit set then update flags.
If condition false allow PC to update normally.

156 ARCtangent™-A4 Programmer’s Reference


Jump and Branch Timings

Pipeline and Timings


Execute instruction in delay slot according to the nullify instruction mode.
Stage 3

10
If condition true then pass PC to stage 4
Stage 4
If condition true then write return address to LINK register.
main:
AND.F r1,r2,r3
MOV r5,r7
JLNE.D jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr: OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


AND.F ifetch r2,r3 update r1
flags
MOV ifetch r7 MOV r5
JLNE.D ifetch test flags pass write back
update PC next_pc BLINK
through

BIC delay → ifetch r9,r10 BIC r8


slot
OR ifetch r2,r3 OR

currentp main main+1 main+2 main+3 jaddr jaddr+1


c
next_pc main+ main+2 main+3 main+4 jaddr+1 jaddr+2
1
BLINK not set not set not set updated main+4
flags set AND not set not set not set not set
by

ARCtangent™-A4 Programmer’s Reference 157


Jump and Branch Timings

Pipeline and Timings


Branch
When a branch is taken, like the jump instruction, the instruction in the delay slot
10 is executed according to the nullify instruction mode. The relative address is
calculated and the PC updated in stage 2. A single cycle stall will occur if a
branch is immediately preceded by an instruction that sets the flags.
Calculation of the relative address
The branch target address is calculated by adding the offset within the instruction
to the address of branch instruction. The target address is calculated thus:
new program counter = branch PC address + 24-bit offset + 1
Hence, if the relative address was 0, then the target of the branch would be that
instruction in the delay slot.
Stage 1
Instruction fetch and start decode
Stage 2
Test condition code. If condition true update PC with calculated address.
If condition false allow PC to update normally.
Execute instruction in delay slot according to the nullify instruction mode.
Stage 3
No action
Stage 4
No action
In this example the delay slot instruction is executed.
main:
AND.F r1,r2,r3
MOV r5,r7
BRA.D jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

158 ARCtangent™-A4 Programmer’s Reference


Jump and Branch Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags

10
MOV ifetch r7 MOV r5
BRA.D ifetch update no no
PC with action action
rel_addr
BIC delay → ifetch r9,r10 BIC r8
slot
OR ifetch r2,r3 OR

currentpc main main+1 main+2 main+3 jaddr jaddr+1

Conditional branch
The condition codes are tested at stage 2, as in the jump instruction. A single
cycle stall will occur if a conditional branch is immediately preceded by an
instruction that sets the flags.
main:
AND.F r1,r2,r3
MOV r5,r7
BNE.D jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

ARCtangent™-A4 Programmer’s Reference 159


Jump and Branch Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
10
MOV ifetch r7 MOV r5
BNE.D ifetch test flags no action no action
update PC

BIC delay → ifetch r9,r10 BIC r8


slot
OR ifetch r2,r3 OR

currentpc main main+1 main+2 main+3 jaddr jaddr+1


flags set AND not set not set not set not set
by

Software breakpoints
A software breakpoint is implemented by the use of the branch instruction, Bcc.
The action of a software breakpoint is to branch to the breakpoint code
whereupon the appropriate action will be taken according to the debugging
session, for example, write a value to a register and halt the ARCtangent-A4
processor.

Software breakpoint return address calculation


Software breakpoints can be placed anywhere in ARCtangent-A4 code, except in
executed delay slots of branches.
For example:
BNE.D address
NOP ; !break point may not be placed here
BCS.ND address
NOP ; !break point may be placed here

Once the breakpoint is hit and the breakpoint code is executed, there is a problem
on how to restart the code after the breakpoint. The next instruction to have been
fetched will be the target of the branch not the instruction that was replaced by
the breakpoint.
In this case, for debugging purposes, the breakpoint should replace the branch in
the ARCtangent-A4 code rather than the instruction in the delay slot.

160 ARCtangent™-A4 Programmer’s Reference


Jump and Branch Timings

Pipeline and Timings


Breakpoints may be set on instructions following branches, these do not get
executed as delay slots. In other words, it is okay to place breakpoints in the
instruction slot following a branch, jump or loop instruction that uses the ND

10
delay slot canceling mode.

Branch and link


The branch and link instruction is very similar to the branch instruction except
that the branch link register (BLINK) is used to allow returns from subroutines.
Unlike the non linking branch, stage 3 and stage 4 are enabled to allow the link
register to be updated with the status register value. The whole of the status
register is saved and is taken either from the first instruction following the branch
(current PC) or the instruction after that (next PC) according to the delay slot
execution mode.
The flags stored are those set by the instruction immediately preceding the
branch. A single cycle stall will occur if a branch and link is immediately
preceded by an instruction that sets the flags.
Stage 1
Instruction fetch and start decode
Stage 2
Test condition code.
If condition true update PC with calculated address.
If condition false allow PC to update normally.
Execute instruction in delay slot according to the nullify instruction mode.
Stage 3
If condition true then pass PC to stage 4
Stage 4
If condition true then write return address to LINK register.
main:
AND.F r1,r2,r3
MOV r5,r7
BLNE.D jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3

ARCtangent™-A4 Programmer’s Reference 161


Loop Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
10
MOV ifetch r7 MOV r5
BLNE.D ifetch test flags pass write back
update PC next_pc BLINK
through

BIC delay → ifetch r9,r10 BIC r8


slot
OR ifetch r2,r3 OR

currentp main main+1 main+2 main+3 jaddr jaddr+1


c
next_pc main+1 main+2 main+3 main+4 jaddr+1 jaddr+2
BLINK not set not set not set updated main+4
flags set AND not set not set not set not set
by

Loop Timings
Loop set up
The loop instruction sets up the loop start (LP_START) and loop end (LP_END)
registers. LP_START register is updated with CURRENT PC and LP_END
updated with the relative address (REL_ADDR) at stage 2.
A single cycle stall will occur if a loop is immediately preceded by an instruction
that sets the flags.
Stage 1
Instruction fetch and start decode
Stage 2
Fetch address from instruction.
Test condition code.

162 ARCtangent™-A4 Programmer’s Reference


Loop Timings

Pipeline and Timings


If condition true allow PC to update normally and update LP_END with address
and update LP_START.

10
If condition false update PC with address.
Execute instruction in delay slot according to the nullify instruction mode.
Stage 3
No action
Stage 4
No action
main:
AND.F r1,r2,r3
MOV r5,r7
LP loop1
BIC r8,r9,r10
SUB r14,r12,r13
loop1:
OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


AND.F ifetch r2,r3 update r1
flags
MOV ifetch r7 MOV r5
LP ifetch update no no
loop action action
registers
BIC delay → ifetch r9,r10 BIC r8
slot
SUB ifetch r12,r13 SUB

currentpc main main+1 main+2 main+3 main+4


next_pc main+1 main+2 main+3 main+4
LP_START not set updated main+3 main+3 main+3
LP_END not set updated loop1 loop1 loop1
flags set AND MOV not set not set not set
by

ARCtangent™-A4 Programmer’s Reference 163


Loop Timings

Pipeline and Timings


Conditional loop
The conditional LP instruction is similar to the branch instruction. If the
10 condition code test for the LP instruction returns false, then a branch occurs to
the address specified in the LP instruction. If the condition code test is true, then
the address of the next instruction is loaded into LP_START register and the
LP_END register is loaded by the address defined in the LP instruction.
The condition codes are tested in stage 2, like branch, and there is the same delay
slot nullify instruction mode. A single cycle stall will occur if a conditional loop
is immediately preceded by an instruction that sets the flags.
For example, the loop is executed:
main:
AND.F r1,r2,r3 ; clears zero flag
MOV r5,r7
LPNE.D loop1
BIC r8,r9,r10
SUB r14,r12,r13
loop1:
OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


AND.F ifetch r2,r3 update r1
flags
MOV ifetch r7 MOV r5
LPNE.D ifetch test flags no no
update
registers action action

BIC delay slot → ifetch r9,r10 BIC r8


SUB ifetch r12,r13 SUB

currentpc main main+1 main+2 main+3 main+4


next_pc main+1 main+2 main+3 main+4
LP_START not set updated main+3 main+3 main+3
LP_END not set updated loop1 loop1 loop1
flags set by AND MOV not set not set not set

164 ARCtangent™-A4 Programmer’s Reference


Loop Timings

Pipeline and Timings


If the condition code result is false, then a jump to the relative address is taken
and the instruction in the delay slot executed according to the nullify instruction
mode, shown in the following:

10
main:
AND.F r1,r2,r3 ; sets zero flag
MOV r5,r7
LPNE.D loop1
BIC r8,r9,r10
SUB r14,r12,r13
loop1:
OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


AND.F ifetch r2,r3 update r1
flags
MOV ifetch r7 MOV r5
LPNE.D ifetch test flags no no
update PC action action
BIC delay → ifetch r9,r10 BIC r8
slot
OR ifetch r2,r3 OR

currentpc main main+1 main+2 loop1 loop1+1


next_pc main+1 main+2 main+3 main+4 loop1+2
LP_START not set not set not set not set not set
LP_END not set not set not set not set not set
flags set AND MOV not set not set not set
by

Loop execution
The operation of the loop is such that the PC+1 is constantly compared with the
value LP_END. If the comparison is true, then LP_COUNT is tested. If
LP_COUNT is not equal to 1, then the PC is loaded with the contents of
LP_START, and LP_COUNT is decremented. If, however, LP_COUNT is 1,
then the PC is allowed increment normally and LP_COUNT is decremented.

ARCtangent™-A4 Programmer’s Reference 165


Loop Timings

Pipeline and Timings


main:
AND.F r1,r2,r3
MOV r5,r7
LP loop1
10 BIC r8,r9,r10
SUB r14,r12,r13
loop1:
OR r1,r2,r3

t t+1 t+2 t+3 t+4 t+5 t+6


BIC ifetch r9,r10 BIC r8
SUB ifetch r12,r13 SUB r14
BIC ifetch r9,r10 BIC r8
SUB ifetch r12,r13 SUB r14
OR ifetch r2,r3 OR

currentpc main+3 main+4 main+3 main+4 loop1


PC+1 main+4 loop1 main+4 loop1 loop1+1
LP_COUNT 2 2→1 1 1→0 0
LP_START main+3 main+3 main+3 main+3 main+3
LP_END loop1 loop1 loop1 loop1 loop1

Single instruction loops


Single instruction loops cannot be set up with the LP instruction. The LP
instruction can set up loops with 2 or more instructions in them. However, it is
possible to set up a single instruction loop with the use of the LR and SR
instructions.
If a single instruction loop is attempted to be set up with the LP instruction, then
the instruction in the loop (OR) will be executed once and then the code
following the loop (ADD) will be executed as normal. The LP_START and
LP_END registers will be updated by the time the instruction after the attempted
loop (ADD) is being fetched, which is, however, too late for the loop mechanism.
main:
LP loop_end ; this will execute only once
loop_in: OR r21,r22,r23 ; single instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

166 ARCtangent™-A4 Programmer’s Reference


Loop Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
LP ifetch update no no
loop
action action

10
registers

OR ifetch r22,r23 OR r21


ADD ifetch r19,r20 ADD r19

currentpc main main+1 main+2 main+3 main+4


LP_START previous previous loop_in loop_in loop_in
LP_END previous previous loop_end loop_end loop_end

If the user wishes to have single instruction loops, then the following code can be
used. Notice, there has to be a delay to allow the loop start and loop end registers
to be updated with the SR instruction.
MOV LP_COUNT,5 ; no. of times to do loop
MOV r0,dooploop>>2 ; convert to long-word size
ADD r1,r0,1 ; add 1 to dooploop address
main: SR r0,[LP_START] ; set up loop start register
SR r1,[LP_END] ; set up loop end register
NOP ; allow time to update regs
NOP
dooploop: OR r21,r22,r23 ; single instruction in loop
ADD r19,r19,r20 ; first instruction after loop

t t+1 t+2 t+3 t+4 t+5 t+6


SR ifetch r0 update no
loop start
action
SR ifetch r1 update no
loop end
action
NOP ifetch NOP NOP NOP
NOP ifetch NOP NOP NOP
OR ifetch r22,r23 OR
OR ifetch r22,r23

currentpc main main+1 main+2 main+3 main+4 dooploop dooploop

PC+1 main+1 main+2 main+3 main+4 dooploop+1 dooploop+1 dooploop+1

LP_START previous previous previous dooploop dooploop dooploop dooploop


LP_END previous previous previous previous dooploop+1 dooploop+1 dooploop+1

ARCtangent™-A4 Programmer’s Reference 167


Loop Timings

Pipeline and Timings


Reading loop count register
The loop count register, unlike other core registers, has short-cutting disabled.
10 This means that there must be at least 2 instructions (actually 2 cycles) between
an instruction writing LP_COUNT and one reading LP_COUNT.
MOV LP_COUNT,r0 ; update loop count register
MOV r1,LP_COUNT ; old value of LP_COUNT
MOV r1,LP_COUNT ; old value of LP_COUNT
MOV r1,LP_COUNT ; new value of LP_COUNT

t t+1 t+2 t+3 t+4 t+5 t+6


MOV ifetch r0 MOV loop
count
MOV ifetch loop MOV r1
count
MOV ifetch loop MOV r1
count
MOV ifetch loop MOV r1
count

LP_COUNT previous previous previous update new


value
When reading from the loop count register (LP_COUNT) the user must be aware
that the value returned is that value of the counter that applies to the next
instruction to be executed. This means that if the last instruction in a loop reads
LP_COUNT then the value returned would be that value after the loop
mechanism has updated it.
...
AND.F 0,0,LP_COUNT ; loop count for this iteration
AND.F 0,0,LP_COUNT ; loop count for next iteration
loop_end:
ADD r19,r19,r20 ; first instruction after loop

168 ARCtangent™-A4 Programmer’s Reference


Loop Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch loop AND no
count action

10
AND.F ifetch loop AND no
count action

currentpc loop_end loop_end loop_in


-2 -1

PC+1 loop_end loop_end loop_in +1


-1
LP_END loop_end loop_end loop_end

LP_COUNT previous previous new


value

Writing loop count register


In order for the loop mechanism to work properly, the loop count register must
be set up with at least 3 instructions (actually 3 cycles) between it and the last
instruction in the loop. In the following example, the MOV instruction will
override the loop mechanism and the loop will be executed one more time than
expected. The MOV instruction must be followed by a NOP for correct
execution.
main:
MOV LP_COUNT,r0 ; do loop r0 times (flags not set)
LPZ loop_end ; if zero flag set jump to loop_end
loop_in: OR r21,r22,r23 ; first instruction in loop
AND 0,r21,23 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

ARCtangent™-A4 Programmer’s Reference 169


Loop Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
MOV ifetch r0 MOV loop
count
10
LPZ ifetch update no no
loop
registers action action
OR ifetch r22,r23 OR r21
AND ifetch r21,23 AND killed
OR ifetch r22,r23 OR

currentpc main main+1 main+2 main+3 loop_in loop_e loop_in


nd
-1
PC+1 main+1 main+2 main+3 loop_end loop_end loop_end loop_end
-1 -1
LP_START previous previous previous loop_in loop_in loop_in loop_in
LP_END previous previous previous loop_end loop_end loop_end loop_end
LP_COUNT previous previous previous override r0 r0 r0-1

The loop count register is set up correctly in the following:


MOV LP_COUNT,r0 ; do loop r0 times (flags not set)
NOP ; allow time for loop count set up
LPZ loop_end ; if zero flag set jump to
loop_end
loop_in: OR r21,r22,r23 ; first instruction in loop
AND 0,r21,23 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop

170 ARCtangent™-A4 Programmer’s Reference


Loop Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
MOV ifetch r0 MOV loop
count

10
NOP ifetch NOP NOP NOP
LPZ ifetch update no no
loop
registers action action
OR ifetch r22,r23 OR r21
AND ifetch r21,23 AND
OR ifetch r19,r20

currentpc main main+1 main+2 main+3 main+4 loop_in loop_end


-1

PC+1 main+1 main+2 main+3 main+4 loop_end loop_end loop_end


-1
LP_START previous previous previous previous loop_in loop_in loop_in
LP_END previous previous previous previous loop_end loop_end loop_end
LP_COUNT previous previous previous update r0 r0-1 r0-1

Branch and jumps in loops


Jumps or branches without linking will work correctly in any position in the loop.
There are, however, some side effects when a branch or jump is the last
instruction in a loop:
Firstly, it is possible that the branch or jump instruction is contained in the very
last long-word position in the loop. This means that the instruction in the delay
slot would be either the first instruction after the loop or the first instruction in
the loop (pointed to by loop start register) depending on the result of the loop
mechanism. The instruction in the delay slot will be that which would be
executed if the branch or jump was replaced by a NOP.
If a branch-and-link or jump-and-link instruction is used in the one before last
long-word position in a loop then the return address stored in the link register
(BLINK) may contain the wrong value. The following instructions will store the
address of the first instruction after the loop, and therefore should not be used in
the second to last position:
BLcc.D address

ARCtangent™-A4 Programmer’s Reference 171


Loop Timings

Pipeline and Timings


BLcc.JD address
JLcc.D [Rn]
10 JLcc.JD[Rn]
JLcc address
If the ND delay slot execution mode is used for branch-and-link or jump-and-link
instruction in the one before last long-word position in a loop then the return
address is stored correctly in the link register. The loop count does not decrement
if the instruction fetched was subsequently killed as the result of a branch/jump
operation. For these reasons, it is recommended that subroutine calls should not
be used within the loop mechanism.

Software breakpoints in loops


A software breakpoint is implemented by the use of the branch instruction, Bcc.
The action of a software breakpoint is to branch to the breakpoint code
whereupon the appropriate action will be taken according to the debugging
session, for example, write a value to a register and halt the ARCtangent-A4
processor. The loop count does not decrement if the instruction fetched was
subsequently killed as the result of a branch/jump operation. Therefore, since the
software breakpoint is BRA.ND by default, then the loop counter will not
decrement on exit from the loop. On return to the loop the second fetch of the last
instruction in the loop will cause the loop counter to decrement as normal.

Instructions with long immediate data


It is difficult, but nonetheless possible, that an instruction that uses long
immediate data is contained in the very last long-word position in the loop. This
means that the long immediate data would be either be taken from the first
location after the loop or the first location in the loop (pointed to by loop start
register) depending on the result of the loop mechanism. It is unlikely that this
would occur with sensible coding but the following example shows how you
could do it.
MOV r1,limmloop>>2 ; convert to long-word size
ADD r1,r1,1 ; add 1 to limmloop address
SR r1,[LP_END] ; set up loop end register
NOP ; allow time to update reg
NOP
limmloop: OR r21,r22,2048 ; instruction across loop end
ADD r19,r19,r20 ;

172 ARCtangent™-A4 Programmer’s Reference


Flag Instruction Timings

Pipeline and Timings


Flag Instruction Timings
The flag instruction has very similar timing as arithmetic and logic instruction.

10
However, since the flag instruction can halt the ARCtangent-A4 processor, the
pipeline is halted with the following instruction in stage 3. If only the H bit is set
then the other flags are unchanged. See example below:
main:
FLAG 1 ; halt the ARCtangent-A4
OR r21,r22,r23 ;
AND r1,r2,r3 ;
XOR r5,r6,r7 ;

halted
t t+1 t+2 t+3 t+4 t+5 t+6
FLAG ifetch 1 FLAG no no no
action action action
OR ifetch r22,r23 OR OR OR
AND ifetch r2,r3 r2,r3 r2,r3
XOR ifetch ifetch ifetch

currentpc main main+1 main+2 main+3 main+3 main+3


next_pc main+1 main+2 main+3 main+4 main+4 main+4
FLAGS previous previous previous previous previous previous

H 0 0 0 1 1 1

Breakpoint
The breakpoint instruction is decoded in stage one of the ARCtangent-A4
pipeline, and the remaining stages are allowed to complete. Effectively flushing
the pipeline.
The BRK instruction stops any further instructions entering the pipeline. To
resume execution the host will read the program counter (frozen at t+1, below),
re-write current (BRK) memory location with the required instruction, invalidate
the cache (if implemented) and then restart at that memory location.

ARCtangent™-A4 Programmer’s Reference 173


Sleep Mode

Pipeline and Timings


Stage 1
Decode the BRK instruction. Set the BH bit.
10 Stage 2
No action
Stage 3
Update the H bit, pipeline halted
Stage 4
No action.
main:
ADD r0,r1,r2 ;
BRK ;
SUB r3,r4,r5 ;

t t+1 t+2 t+3 t+4 t+5 t+6


ADD ifetch r1,r2 ADD r0
BRK ifetch no action BRK halted halted
Stalled no action no action no action

Stalled no action no action

SUB Not fetched

currentpc main main+1 main+1 main+1 main+1 main+1


next_pc main+1 main+2 main+2 main+2 main+2 main+2
FLAGS previous previous previous previous previous previous
H 0 0 0 0 1 1
BH 0 0 1 1 1 1

Sleep Mode
The SLEEP instruction is decoded at stage 2 of the ARCtangent-A4 pipeline.
When SLEEP reaches stage 2 the earlier instructions and the SLEEP instruction
itself are flushed from the pipe and the processor is then put into sleep mode.
The instruction following the SLEEP enters stage 1 and stays there, until the
ARCtangent-A4 processor is "woken up" from sleep mode.

174 ARCtangent™-A4 Programmer’s Reference


Sleep Mode

Pipeline and Timings


Stage 1
Instruction fetch and start decode.

10
Stage 2
Full decode of sleep instruction. Flush pipeline. Update ZZ bit
Stage 3
No action.
Stage 4
No action
main:
ADD r0,r1,r2 ;
SLEEP ;
SUB r3,r4,r5 ;

t t+1 t+2 t+3 t+4 t+5 t+6


ADD ifetch r1,r2 ADD r0
SLEEP ifetch SLEEP no action no action no action
SUB ifetch no action no action no action
… no action no action no action

currentpc main main+1 main+2 main+2 main+2 main+2


next_pc main+1 main+2 main+3 main+3 main+3 main+3
FLAGS previous previous previous previous previous previous
H 0 0 0 0 0 0
ZZ 0 0 0 1 1 1
On interrupt wake up the interrupt mechanism comes into play. The instruction
following SLEEP is replaced with by a call to the interrupt service routine. The
address of the instruction is copied into the appropriate ILINK register. See
interrupt timings for further details.
On host wake up the processor is simply restarted by re-writing the PC with the
address of the instruction following the SLEEP with the H bit cleared.
On single-instruction-step the ARCtangent-A4 processor "wakes" from sleep
mode. See single instruction step timings for further details.

ARCtangent™-A4 Programmer’s Reference 175


Load and Store Timings

Pipeline and Timings


Load and Store Timings
10 Loads and stores use the ALU in stage 3 to calculate the address with which the
access is to occur.

Load
The stages perform the following in a load instruction:
Stage 1
Instruction fetch and start decode
Stage 2
Fetch operands.
Update scoreboard unit with destination address marked as invalid.
Stage 3
Add operands to form address.
Request load from memory controller.
Stage 4
If address write-back enabled then write-back address calculation to first operand
register.
If address write-back disabled then allow pipeline to continue because data is
unlikely to be ready.
also
Stage 4
Re-enabled when data ready from memory controller, pipeline held for one
cycle.
Update scoreboard unit marking register as valid.
When a register is waiting to be updated by a previous load and that register is
one of the operands or results of an instruction in the pipeline at stage 2 then the
pipeline is halted until that register is updated.
A scoreboard unit is used to retain the information on which registers are waiting
to be written. The scoreboard unit is updated at stage 2 when the destination
register address is known, and updated at stage 4 when the register has been
written to.

176 ARCtangent™-A4 Programmer’s Reference


Load and Store Timings

Pipeline and Timings


The load is sometimes called a delayed load because the data from the load is not
guaranteed to be returned by the time the load instruction has reached stage 4 in
the pipeline. The pipeline is allowed to proceed if the instruction following the

10
load does not use the destination register of the load (this is checked by the
instruction in stage 2). Once an instruction does need that register then the
pipeline is halted and waits for the load to complete.
NOTE When the target of a LD.A instruction is the same register as the one used for
address write-back (.A), the returning load will overwrite the value from the
address write-back.

When the data for the delayed load is ready, the pipeline is stalled because the
load uses the write-back in stage 4 to update the register. In this example, the
load is delayed by two cycles. The OR instruction is stalled in stage 3 and the
SUB stalled in stage 2 until the register write-back is complete.
main:
LD r1,[r2,r3]
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12

t t+1 t+2 t+3 t+4 t+5 t+6


LD ifetch r2,r3 calc addr killed wrt-back

AND ifetch r5,r6 AND r4


OR ifetch r8,r9 OR stalled r7
SUB ifetch r11,r12 stalled SUB

score mark as check check check r1 now


board pending ready

If the AND used a register that was dependent on the result of the load then the
AND would stall.
For example, with a dependency on R1:
main:
LD r1,[r2,r3]
AND r4,r1,r6
OR r7,r8,r9
SUB r10,r11,r12

ARCtangent™-A4 Programmer’s Reference 177


Load and Store Timings

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
LD ifetch r2,r3 calc addr killed wrt-back

10 AND ifetch r1,r6 stalled stalled alu op wrt-back

OR ifetch stalled stalled r8,r9


SUB ifetch r11,r12

score mark as check, - - r1 now


board pending causes ready
stall

Store
The store instruction takes a single cycle to complete. The data to be stored is
ready at stage 2 and the address to which the store is to occur is ready at stage 3.
Stage 1
Instruction fetch and start decode.
Stage 2
Fetch 2 address operands and data operand.
Latch data operand for memory controller.
Stage 3
Add address operand to form address.
Request store to memory controller.
Stage 4
No action
main:
ST r1,[r2,333]
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12

178 ARCtangent™-A4 Programmer’s Reference


Auxiliary Register Access

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5 t+6
ST ifetch r2,r3, calc addr killed

10
shimm
AND ifetch r5,r6 AND r4
OR ifetch r8,r9 OR r7
SUB ifetch r11,r12 SUB r10

Auxiliary Register Access


Accesses to the auxiliary registers work in a similar way to the normal load and
store instructions except that the access is accomplished in a single cycle due to
the fact that address computation is not carried out and the scoreboard unit is not
used. The LR and SR instruction do not cause stalls like the normal load and
store instructions but in the same cases that arithmetic and logic instructions
would cause a stall.

Load from register (LR)


The stages perform the following in a LR instruction:
Stage 1
Instruction fetch and start decode.
Stage 2
Fetch address from operand 1.
Stage 3
Perform load from auxiliary register at address
Stage 4
Write-back the result of the load to the destination register.
main:
LR r1,[r2]
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12

ARCtangent™-A4 Programmer’s Reference 179


Auxiliary Register Access

Pipeline and Timings


t t+1 t+2 t+3 t+4 t+5
LR ifetch r2 LR wb to r1
10 AND ifetch r5,r6 AND r4
OR ifetch r8,r9 OR r7
SUB ifetch r11,r12 SUB

Store to register (SR)


The stages perform the following in a SR instruction:
Stage 1
Instruction fetch and start decode.
Stage 2
Fetch address from operand 1 and data from operand 2.
Stage 3
Perform store of data to auxiliary register at address extracted from operand 1.
Stage 4
No action.
main:
SR r1,[r2]
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12

t t+1 t+2 t+3 t+4 t+5


SR ifetch r2, r1 SR killed
AND ifetch r5,r6 AND r4
OR ifetch r8,r9 OR r7
SUB ifetch r11,r12 SUB

180 ARCtangent™-A4 Programmer’s Reference


Interrupt Timings

Pipeline and Timings


Interrupt Timings

10
Interrupts occur in a similar way to the branch and link instruction. However, the
value that is latched into the link register is the CURRENT PC rather than
NEXT_PC.
When an interrupt occurs, the instruction in instruction fetch at stage 1 is
replaced by a call to the interrupt service routine.
NOTE Interrupts are not allowed to interrupt anything in a delay slot or a fetch of long
immediate data.

Stage 1
Current instruction in ifetch is replaced by a branch like instruction.
The CURRENT PC is not updated to NEXT_PC.
Stage 2
CURRENT PC is routed to the data for next stage.
CURRENT PC is updated to the interrupt vector.
Stage 3
The data from stage 2 is passed to stage 4
Stage 4
The data is written to the ILINK register (the PC from stage 1)

ARCtangent™-A4 Programmer’s Reference 181


Interrupt Timings

Pipeline and Timings


Interrupt on arithmetic instruction
main:
10 AND.F r1,r2,r3
MOV.F r5,r7
BIC r8,r9,r10 ;<---- level 2 Interrupt to ivect7
SUB r14,r12,r13
...
ivect6:
JAL service6
ivect7:
OR r15,r16,r17


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
MOV.F ifetch r7 update r5
flags
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
BIC replaced
BIC delay slot → ifetch killed killed
again
OR ifetch r16,r17 OR

currentpc main main+1 main+2 main+2 ivect7 ivect7+1


next_pc main+1 main+2 main+3 ivect7 ivect7+1 ivect7+2
ILINK2 not set not set not set main+2

182 ARCtangent™-A4 Programmer’s Reference


Interrupt Timings

Pipeline and Timings


Software interrupt
The software interrupt instruction is decoded in stage two of the pipeline and if

10
executed, then it immediately raises the instruction error exception. In this
example a program execution resumes at the instruction error vector, which
contains a jump to the instruction error service routine.
main:
AND.F r1,r2,r3
SWI
BIC r8,r9,r10 ;<---- instruction error exception
SUB r14,r12,r13
...
ins_err:
JAL instr_serv
...
instr_serv:
OR r15,r16,r17


t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
SWI ifetch SWI killed killed
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
BIC replaced
BIC delay → ifetch killed killed
again slot
JAL ifetch JAL
limm limm disabled
OR ifetch

currentpc main main+1 main+2 main+2 ins_err ins_err+1 instr_serv


next_pc main+1 main+2 main+3 ins_err ins_err+1 ins_err+2 instr_serv
+1
ILINK2 not set not set not set main+2 main+2

ARCtangent™-A4 Programmer’s Reference 183


Interrupt Timings

Pipeline and Timings


Interrupt on jump, branch or loop set up
Because interrupts are locked out during a delay slot execution, jump, branch,
10 loop set-up or long immediate data fetch, the instruction will either be killed
because it is in stage 1 or allowed to continue because there is a delay slot in use.
main:
MOV r5,r7 ;r7 = jaddr
JAL.D [r5] ;<---- level 2 Interrupt to ivect7
SUB r14,r12,r13
...
jaddr:
ADD r20,r21,r22
...
ivect6:
JAL service6
ivect7:
OR r15,r16,r17

↓)
(↓ ↓
t t+1 t+2 t+3 t+4 t+5 t+6
MOV ifetch r7 MOV r5
JAL.D ifetch update no action no action
PC
SUB delay → ifetch r12,r13 SUB r14
slot
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
ADD replaced
ADD delay → ifetch killed killed
again slot
OR ifetch r16,r17

currentpc main main+1 main+2 jaddr jaddr ivect7


next_pc main+1 main+2 main+3 jaddr+1 ivect7 ivect7+1
ILINK2 not set not set not set not set jaddr

184 ARCtangent™-A4 Programmer’s Reference


Interrupt Timings

Pipeline and Timings


Interrupt on loop execution
At the end of a loop the NEXT_PC is compared with the LOOP_END value. If

10
this comparison is true then, if LP_COUNT is not 1, then CURRENT PC
becomes LP_START. When an interrupt occurs during this comparison-update
stage the link register (ILINK2) becomes CURRENT PC. In order to stop
LP_COUNT from double decrementing, the loop count decrement mechanism
must be disabled during interrupt for 2 cycles.
main:
AND.F r1,r2,r3
MOV r5,r7
LP loop1
BIC r8,r9,r10
SUB r14,r12,r13;<------ level 2 interrupt to ivect7
loop1:
OR r1,r2,r3
...
ivect6:
JAL service6
ivect7:
ADD r15,r16,r17


t t+1 t+2 t+3 t+4 t+5 t+6
BIC ifetch r9,r10 BIC r8
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
SUB replaced
SUB delay → ifetch killed killed killed
again slot
ADD ifetch r16,r17 ADD r15
ifetch

currentpc main+3 main+4 main+4 ivect7


next_pc main+4 loop1 ivect7 ivect7+1
LP_COU 2 disabled disabled 2 2
NT
LP_STA main+3 main+3 main+3 main+3 main+3
RT
LP_END loop1 loop1 loop1 loop1 loop1
ILINK2 main+4

ARCtangent™-A4 Programmer’s Reference 185


Single Instruction Step

Pipeline and Timings


Interrupt on load
The load instruction is treated in the same way as an arithmetic or logic
10 instruction. However, when the result of a delayed load is ready to be written
back to the core register set the load mechanism will stall whatever is about to
use the write-back at stage 4. See 0 Load and Store Timings. If the interrupt is
about to write to the link register, then it too will be stalled by the write-back
from a delayed load.

Interrupt on store
The store instruction is treated in the same way as an arithmetic or logic
instruction.

Interrupt on auxiliary register access


The auxiliary register access instructions LR and SR will be interrupted in the
same way as arithmetic or logic instructions.

Single Instruction Step


Single cycle step simply enables the pipeline for one clock. Single instruction
step, however, enables the ARCtangent-A4 pipeline until the instruction that was
being fetched in stage 1 completes.

Single instruction step on single word instructions


The instruction is enabled through the 4 stages of the ARCtangent-A4 pipeline.
The following instruction is fetched but held in stage 1.
main:
AND r1,r2,0x102000
OR r5,r6,r4
BIC r8,r9,r10
SUB r14,r12,r13

186 ARCtangent™-A4 Programmer’s Reference


Single Instruction Step

Pipeline and Timings


i-step i-step
↓ ↓
t t+1 t+2 t+3 t+4 t+5 t+6 t+7 t+8 t+9

10
AND ifetch r2, r3 AND r1 halte
d
OR ifetch ifetch ifetch ifetch ifetch r6, r4 OR r5 halte
d
BIC ifetch ifetch ifetch ifetch

H 0 0 0 0 1 0 0 0 0 1

Single instruction step on instruction with long


immediate data
The instruction is enabled through the 4 stages of the ARCtangent-A4 pipeline.
The following immediate data is fetched and allowed through the pipeline. The
following instruction is fetched but held in stage 1.
main:
AND r1,r2,0x102000
OR r5,r6,r4
BIC r8,r9,r10
SUB r14,r12,r13

i-step i-step
↓ ↓
t t+1 t+2 t+3 t+4 t+5 t+6 t+7 t+8 t+9
AND ifetch r2, r3 AND r1 halted
0x102 10200 disabled disabled disabled
00 0
OR ifetch ifetch ifetch ifetch r6, r4 OR r5 halted
BIC ifetch ifetch ifetch ifetch

H 0 0 0 0 1 0 0 0 0 1

ARCtangent™-A4 Programmer’s Reference 187


Index
A
A field, 19
ADC, 80
ADD, 81
addressing mode, 16, 71
alternate interrupt unit, 11, 27
alternate load store unit, 11
AND, 82
ARC basecase version number, 65
arithmetic operations, 29, 49
ASL multiple, 84
ASL/LSL, 83
ASR, 85
ASR multiple, 86
auxiliary register set, 5, 139
auxiliary registers, 48

B
B field, 19
barrel shift instructions, 51, 147, 150
Bcc, 88
BH bit, 66
BIC, 87
BLcc, 89
branch address calculation, 158
branch and jump in loops, 39, 171
branch type instruction, 72, 77
branches, 33
breakpoint instruction, 43, 66, 139, 173
BRK, 91
byte, 16

C
C field, 19
code profiling, 139
condition code field, 19

ARCtangent™-A4 Programmer’s Reference 189


Single Instruction Step

Pipeline and Timings


condition code register, 55
condition codes, 75

10 D
data organisation, 15
data-cache, 11, 47, 99, 126
debug register, 65
delay slot, 34
delayed load, 11, 47, 177
direct memory mode, 47, 99, 100, 127
dual access registers, 139
dual operand instruction, 71

E
encoding immediate data, 61
encoding instructions, 75
endianness, 16
EXT, 92
extensions, 9
auxiliary registers, 9
condition codes, 10
core register, 9
instruction set, 10
extensions, 5
extensions library, 49

F
F bit, 19
FH bit, 135
FLAG, 93
flag instruction, 30
force halt, 65

H
H bit, 135
halting ARC, 12, 93, 135, 154, 173
host interface, 133

I
I field, 19
identity register, 65, 140
immediate data indicator, 76

190 ARCtangent™-A4 Programmer’s Reference


Single Instruction Step

Pipeline and Timings


instruction encoding, 75
instruction error, 26
instruction format, 19

10
instruction layout, 19
instruction map, 10
instruction set summary, 29
instruction-cache, 11
interrupt unit, 11, 27
interrupt vectors, 24
interrupts, 181
IS bit, 66, 137

J
Jcc, 95
JLcc, 97
jump instruction, 72
jumps, 33

L
L field, 19
LD, 99
LD bit, 66
link register, 23, 61
load alignment, 16
load and store, 46
load instruction, 73, 74
load pending, 65, 135
load register, 48
load store unit, 11
logical operations, 29, 49
long immediate, 16
long immediate data and loops, 40, 172
long word, 16
loop construct, 35
loop count register, 38, 61, 168
loop end register, 63
loop start register, 63
loops, 33
LP instruction, 35
LP_COUNT, 35
LP_END, 35, 65
LP_START, 35, 65
LPcc, 101
LR, 102
LSL, 103
LSR, 104

ARCtangent™-A4 Programmer’s Reference 191


Single Instruction Step

Pipeline and Timings


LSR multiple, 105

M
10
manufacturer code, 65
manufacturer version number, 65
MAX, 106
memory alignment, 16
memory controller, 11, 19, 47, 141
memory endianness, 16
memory error, 26
MIN, 107
MIN/MAX instructions, 53, 147
MOV, 108
MUL64, 109
multi cycle extension instructions, 147
multiply instruction, 50, 148
multiply scoreboard unit, 148
MULU64, 111

N
N field, 19
NOP, 113
NORM, 114
normalize instruction, 51, 147
null instruction, 30

O
operand size, 15
OR, 116
orthogonal, 5

P
pipecleaning, 136
pipeline, 47, 141
pipeline cycle diagram, 142
pipeline stall, 47, 148, 177
power management features, 12
program counter, 55, 63

Q
Q field, 19

192 ARCtangent™-A4 Programmer’s Reference


Single Instruction Step

Pipeline and Timings


R
register extensions, 62, 67

10
register set, 5
reset, 26
RISC, 5
RLC, 117
ROL, 118
ROR, 119
ROR multiple, 120
rotate instructions, 30
RRC, 121

S
SBC, 122
scoreboard unit, 11, 47, 176
self halt, 65
semaphore register, 63, 139
SEX, 123
SH bit, 66
short immediate, 16
short immediate addressing, 29, 49, 77
single cycle extension instructions, 147
single instruction loops, 37, 166
single instruction step, 66, 138, 186
single operand instructions, 30, 72
single step, 65, 137
SLEEP, 124
sleep instruction, 44, 66, 174
software breakpoints, 12, 139, 160, 172
software interrupt, 46, 130
SR, 125
SS bit, 66
ST, 126
starting ARC, 135
status register, 55, 63
store alignment, 16
store instruction, 73, 74
store register, 48
SUB, 128
SWAP, 129
swap instruction, 52, 147
SWI, 46, 130

ARCtangent™-A4 Programmer’s Reference 193


Single Instruction Step

Pipeline and Timings


T
timings
10 arithmetic and logic functions, 143
barrel shift, 150
branch, 158
branch and link, 156, 161
conditional branch, 159
conditional jump, 154
conditional loop instruction, 164
interrupt, 181
jump and execute delay slot, 152
jump and nullify delay slot, 152
jump with immediate address, 153
load, 176
loop execution, 165
loop instruction, 162
multiply, 148
store, 178
with immediate data, 144
transfer of data, 46

W
word, 16

X
XOR, 130, 131

Z
zero delay loops, 35, 61
ZZ bit, 66

194 ARCtangent™-A4 Programmer’s Reference

You might also like