ARC4. Programmers Reference
ARC4. Programmers Reference
Programmer’s Reference
ARCtangent™-A4 Programmer’s Reference
ARC™ International
European Headquarters North American Headquarters
ARC House 2025 Gateway Place, Suite 140
Waterfront Business Park San Jose, CA 95110 USA
Elstree Road Tel. 408.437.3400
Elstree, Herts WD6 3BS UK Fax 408.437.3401
Tel. +44 (0) 20.8236.2800
Fax +44 (0) 20.8236.2801
www.arc.com
5050-001 August-2002
Short immediate 20
Long immediate 20
Branch 20
Register Notation 20
Chapter 4 — Interrupts 23
Introduction 23
ILINK Registers 23
Interrupt Vectors 23
Interrupt Enables 24
Returning from Interrupts 25
Reset 26
Memory Error 26
Instruction Error 26
Interrupt Times 27
Alternate Interrupt Unit 27
Extension Instructions 49
Optional Extensions Library 49
Multiply 32 X 32 50
Barrel shift/rotate block 51
Normalize instruction 51
SWAP instruction 52
MIN/MAX instructions 53
SLEEP 124
SR 125
ST 126
SUB 128
SWAP 129
SWI 130
XOR 131
Index 189
Key Features
Data Paths
• 32-Bit Data Bus
• 32-Bit Load/Store Address Bus
• 32-Bit Instruction Bus
• 24-Bit Instruction Address Bus
Registers
• 32 General Purpose Core Registers
• Auxiliary Register Set
Load/Store Unit
• Delayed Load mechanism with Register Scoreboard
• Buffered Store
• Address Register Write-Back
Program Flow
• 4 Stage Pipeline
• Single Cycle Instructions
Preface
1
• Jumps and Branches with Single Instruction Delay Slot
• Delay Slot Execution Modes
• Zero Overhead Loops
Interrupts and Exceptions
• Levels of Exception
• Non-Maskable Exceptions
• Maskable External Interrupts in basecase ARCtangent-A4 processor
Extensions
• 16 Extension Dual Operand Instruction Codes
• 55 Extension Single Operand Instruction Codes
• 28 Extension Core Registers
• 32 Bit addressable Auxiliary Register Set
• 16 Extension Condition Codes
• Build Configuration Registers
System Customizations
• Host Interface
• Separate Memory Controller
• Separate Load/Store Unit
• Separate Interrupt Unit
Host Interface Debug Features
• Start, stop and single step the ARCtangent-A4 processor via special registers
• Check and change the values in the register set and ARCtangent-A4 memory
• Communicate via the semaphore register and shared memory
• Perform code profiling by reading the status register
• Breakpoint Instruction
Prefacen
Power Management
1
• Sleep Mode
• Clock Gating Option
Introduction
The ARCtangent-A4 is a 4-stage pipeline processor incorporating full 32-bit
instruction, data and addressing. In line with RISC (reduced instruction set
computer) based architectures, ARCtangent-A4 has an orthogonal instruction set
with all addressing modes implemented on all arithmetic and logical instructions.
The architecture is extendible in the instruction set and registers. These
extensions will be touched upon in this document but covered fully in other
documents.
This document describes the minimum basecase version of ARCtangent-A4 with
which all future designs incorporating ARCtangent-A4 must adhere to.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Programmer’s Model
The programmer’s model is common to all implementations of ARCtangent-A4
processor to allow upward compatibility of code.
Logically, ARCtangent-A4 processor is based around a 3 (or 4)-port core register
file with many of the instructions being dual operand and 1 destination register.
Other registers are contained in the auxiliary register set and are accessed with
the LOAD-REGISTER/STORE-REGISTER commands or other special
commands.
memory controller
Architectural
Description
2
source 2 source 1
pc
pc controller
i bus
auxiliary instruction decoder
registers ALU
2. The general purpose registers (r0-r28) can be used for any purpose by the
programmer.
Architectural
Description
Number Auxiliary register name Description
0x0 STATUS Status register
2
0x1 SEMAPHORE Inter-process/Host semaphore register
0x2 LP_START Loop start address (24 bits)
0x3 LP_END Loop end address (24 bits)
0x4 IDENTITY ARCtangent-A4 Identification register
0x5 DEBUG Debug register
0x60 - RESERVED Build Configuration Registers
0x7F
The Host
The ARCtangent-A4 processor was developed with an integrated host interface
to support communications with a host system. The ARCtangent-A4 processor
can be started, stopped and communicated by the host system using special
registers. Further information is contained in later sections of this manual.
Most of the techniques outlined here will be handled by the software debugging
system, and the programmer, in general, need not be concerned with these
specific details.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Memory Map
Extensions
The ARCtangent-A4 processor is designed to be extendible according to the
Architectural
Description
requirements of the system in which it is used. These extensions include more
core and auxiliary registers, new instructions, and additional condition code tests.
2
This section is intended to inform the programmer of the ARCtangent-A4
processor where these extensions occur and how they affect the programmer's
view of the ARCtangent-A4 processor.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
The auxiliary register address region 0x7F up to 0x80, is reserved for the Build
Configuration Registers (BCR) that can be used by embedded software or host
debug software to detect the configuration of the ARCtangent-A4 hardware. The
Build Configuration Registers contain the version of each ARCtangent-A4
extension, as well as configuration information that is build specific. The
System Customization
As well as the extensions mentioned in the previous section, ARCtangent-A4
Architectural
Description
processor can be additionally customized to match memory, cache, and interrupt
requirements. This is achieved by using a separate memory controller, load/store
2
unit and interrupt unit.
Memory controller
This unit is defined according to the memory system with which the
ARCtangent-A4 processor is being used. Instruction-cache, data-cache, DRAM
control, instruction versus data arbitration and other memory specific logic will
be defined in the memory controller.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Interrupt unit
The interrupt unit contains the exception and interrupt vector positions, the logic
to tell the ARCtangent-A4 which of the 3 levels of interrupt has occurred, and the
arbitration between the interrupts and exceptions. The interrupt unit can be
modified to alter the priority of interrupts, the vector positions and the number of
interrupts.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Debugging Features
It is possible for the ARCtangent-A4 to be controlled from a host processor using
Architectural special debugging features. The host can:
Description
• start and stop the ARCtangent-A4 processor via the status and debug register
2
Power Management
ARCtangent-A4 basecase version 8 processor and above have special power
management features. The SLEEP instruction halts the ARCtangent-A4
processor and halts the pipeline until an interrupt or a restart occurs. Sleep mode
stalls the core pipeline and disables any on-chip RAM.
Optional clock gating is provided which will switch off all non-essential clocks
when the ARCtangent-A4 processor is halted or the ARCtangent-A4 processor is
in sleep mode. This means the internal ARCtangent-A4 control unit is not active
and major blocks are disabled. The host interface, interrupt unit and memory
interfaces are always left enabled to allow host accesses and "wake" feature. The
following diagram shows a summary of the clock gating and sleep circuitry.
Normal Domain
clk
HOST I/F Gated Domain
Memory
Memory I/F
Architectural
Description
ARCtangen
t-A4
2
Control
Logic core
Interrupts
Introduction
This chapter describes the data organization and addressing of the ARCtangent-
A4 processor.
Operand Size
The ARCtangent-A4 is a 32-bit word architecture and as such most operations
are with 32-bit data. However, there are a few exceptions.
The basic data types are:
• Long word (32-bit) for register-register operation, immediate
data and load/store
• Word (16-bit) for load/store operations only
• Short Immediate (9-bit) for short immediate data only
• Byte (8 bit) for load/store operations only
and addressing:
• absolute (32-bit) for load/store and jumps
• relative (20-bit) for branch and loop
Data Organization
Registers
The core registers and auxiliary registers are 32-bit (long word) wide.
Immediate data
Data Organization and
The immediate data as an operand can be 32-bit (long immediate), or 9-bit sign
Addressing
Memory
The memory operations (load and store) can have data of 32 bit (long word), 16
bit (word) or 8 bit(byte) wide. Byte operations use the low order 8 bits and may
extend the sign of the byte across the rest of the long word depending on the
load/store instruction. The same applies to the word operations with the word
occupying the low order 16 bits. Data memory is accessed using byte addresses,
which means long word or word accesses can be supplied with non-aligned
addresses. The following should be supported as a minimum:
• long words on long word boundaries
• words on word boundaries
• bytes on byte boundaries
There is no "unaligned access exception" available in the ARCtangent-A4
processor. The basecase ARCtangent-A4 processor is “Endian free”, in that the
endianness of the implemented ARCtangent-A4 system is dependant entirely on
the memory system.
Addressing Modes
The addressing modes that the instructions use are encoded within the register
fields of the instruction word. There are basically only 3 addressing modes:
register-register, register-immediate and immediate-immediate. However, as a
consequence of the action performed by the different instruction groups, these
can be expanded as shown in Table 3 Data Addressing Modes.
Addressing
op 0,imm,c imm op c can set flags
3
single operand (register) single_op a,b a ← single_op b
single operand immediate single_op a,imm a ← single_op imm
single operand test single_op 0,b Single_op b can set flags
(register)
single operand test single_op 0,imm Single_op imm can set flags
immediate
flag with register flag b Flags ← b
flag with immediate flag imm Flags ← imm
load ld a,[b,c] a ← data at address [b+c]
load with immediate offset ld a,[b,imm] a ← data at address [b +
ld a,[imm,c] imm]
a ← data at address [imm +
c]
load from immediate ld a,[imm] a ← data at address [imm]
address
load from auxiliary register lr a,[b] a ← data in reg. at address
lr a,[imm] [b]
a ← data in reg. at address
[imm]
store st c,[b] Data at address [b] ← c
store with immediate offset st c,[b,shimm] Data at address [b + shimm]
←c
store to immediate address st c,[imm] Data at address [imm] ← c
store 0 st 0,[b] Data at address [b] ← 0
sr c,[imm] c
Addressing
Memory Addressing
Branch and jump instructions that refer to memory (i.e. J, JL, B, BL, LP) contain
an address. This address is referred to in the form [n:2], where n is the most
significant bit of the word. It is used as a long-word offset or address, but the
numbering has retained the convention for byte addressing.
As an example to refer to the address 4 long words forward in a branch
Addressing
syntax in assembly language would still be in bytes. Therefore, to branch 4 long
words forward, the syntax would be bra 16, although it is unlikely that a
3
programmer would specify a branch’s relative address in such a way.
With the load and store commands (LD and ST), the address calculated by the
instruction is passed as a 32-bit word to the memory controller, and used as a
byte address.
An interrupt may be caused by the memory controller if the size of the operation
and the address are incompatible, e.g. if the memory controller cannot fetch long-
words from byte boundaries. This will be dependent on the memory controller
being used with the ARCtangent-A4 processor and is not part of the basecase
ARCtangent-A4 processor.
Instruction Format
Instructions are one long word in length and may have a long word immediate
value following. There are three basic instruction layouts. The instruction is
encoded on the I field. The result of the instruction is sent to the register defined
by the A field. The two register source addresses are encoded on the B and C
fields. If the result of the instruction needs to set the flags then the F bit is set.
The condition that causes the instruction to be executed is encoded on the
condition code field Q. The reserved bits R are undefined and should be set to 0.
The L field in the branch type instruction specifies the signed relative jump
address and the N field is used in jumps and branches to nullify or execute the
next instruction. See also Chapter 5 — Instruction Set Summary and Chapter 8
— Instruction Set Details for further details.
Register
31 27 26 21 20 15 14 9 8 7 6 5 4 3 2 1 0
Short immediate
31 27 26 21 20 15 14 9 8 0
Data Organization and
Long immediate
3
Limm[32:0]
Branch
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I[4:0] L[21:2] N N Q Q Q Q Q
Register Notation
The core registers are identified as follows:
rn general purpose register number n
ILINK1 maskable interrupt link register 1
ILINK2 maskable interrupt link register 2
BLINK branch link register
LP_COUNT loop count register (24 bits)
Example syntax:
AND r1,r2,r3 ;r1 ← r2 AND r3
AND ILINK2,r21,r21 ;ILINK2 ← r21
Addressing
Example syntax:
3
SR r5,[SEMAPHORE] ;[SEMAPHORE] ← r5
LR r4,[LP_START] ;r4 ← [LP_START]
Introduction
The ARCtangent-A4 interrupt mechanism is such that 3 levels of interrupts are
provided.
• Exceptions like Reset, Memory Error and Invalid Instruction (high priority)
• level 1 (low priority) interrupts which are maskable
• level 2 (mid priority) interrupts which are maskable.
The exception set has the highest priority, level 2 set has middle priority and
level 1 the lowest priority.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
ILINK Registers
When an interrupt occurs, the link register, where appropriate, is loaded with the
status register containing the next PC and the current flags; the PC is then loaded
with the relevant address for servicing the interrupt.
Link register ILINK2 is associated with the level 2 set of interrupts and the two
exceptions: memory error and instruction error. ILINK1 is associated with the
level 1 set of interrupts.
Interrupt Vectors
In the basecase ARCtangent-A4 processor, there are three exceptions and each
exception has it's own vector position, an alternate interrupt unit may be
implemented, see section Alternate Interrupt Unit.
The ARCtangent-A4 processor does not implement interrupt vectors as such, but
rather a table of jumps. When an interrupt occurs the ARCtangent-A4 processor
jumps to fixed addresses in memory, which contain a jump instruction to the
interrupt handling code. The start of these interrupt vectors is dependent on the
particular ARCtangent-A4 system and is often a set of contiguous jump vectors.
Example vector offsets are shown in the following table. Two long-words are
reserved for each interrupt line to allow room for a jump instruction with a long
immediate address.
Vector Name Link register Byte Offset
0 reset - 0x00
1 memory exception ILINK2 0x08
2 instruction error ILINK2 0x10
Interrupts
Interrupt Enables
The level 1 set and level 2 set of interrupts are maskable. The interrupt enable
bits E2 and E1 in the status register are used to enable level 2 set and level 1 set
of interrupts respectively. Interrupts are enabled or disabled with the flag
instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Z N C V E2 E1 H R PC[25:2]
Interrupts
4
When the interrupt routine is entered, the interrupt enable flags are cleared for the
current level and any lower priority level interrupts. Hence, when a level 2
interrupt occurs, both the interrupt enable bits in the status register are cleared at
the same time as the PC is loaded with the address of the appropriate interrupt
routine.
Returning from an interrupt is accomplished by jumping to the contents of the
appropriate link register, using the JAL [ILINKn] instruction. With the flag bit
enabled on the jump instruction, the flags are loaded into the status register along
with the PC, thus returning the flags to their state at point of interrupt. This
includes the interrupt enable bits E1 and E2, one or both of which will have been
cleared on entry to the interrupt routine.
There are 2 link registers ILINK1 (r29) and ILINK2 (r30) for use with the
maskable interrupts, memory exception and instruction error. These link registers
correspond to levels 1 and 2 and the interrupt enable bits E1 and E2 for the
maskable interrupts.
For example, if there was no interrupt service routine for interrupt number 5, the
arrangement of the vector table would be:
ivect4: JAL iservice4 ;vector 4
ivect5: JAL.F [r29] ;vector 5 (jump to ilink1)
NOP ;instruction padding
ivect6: JAL iservice6 ;vector 6
Reset
A reset is an asynchronous, external reset signal that causes the ARCtangent-A4
processor to perform a “hard” reset. Upon reset, various internal states of the
ARCtangent-A4 processor are pre-set to their initial values. The pipeline is
flushed, interrupts are disabled; status register flags are cleared; the semaphore
register is cleared; loop count, loop start and loop end registers are cleared; the
scoreboard unit is cleared; pending load flag is cleared; and program execution
resumes at the interrupt vector base address (offset 0x00) which is the basecase
ARCtangent-A4 processor reset vector position. The core registers are not
initialized except loop count (which is cleared). A jump to the reset vector, a
“soft” reset, will not pre-set any of the internal states of the ARCtangent-A4
processor.
Interrupts
4
Memory Error
A memory error can be caused by an instruction fetch from, a load from or a
store to an invalid part of memory. In the basecase ARCtangent-A4 processor,
this exception is non-recoverable in that the instruction that caused the error
cannot be returned to.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Instruction Error
If an invalid instruction is fetched that the ARCtangent-A4 processor cannot
execute, then an instruction error is caused. In the basecase ARCtangent-A4
processor, this exception is non-recoverable in that the instruction that caused the
error cannot be returned to. The standard instruction field (I[4:0]) is used to
decode whether the instruction is valid. This means that a non-implemented
single-operand instruction will not generate an instruction error when executed.
The software interrupt instruction (SWI) will also generate an instruction error
exception when executed.
Interrupt Times
Interrupts are held off for one cycle when an instruction has a dependency on the
following instruction or is waiting for immediate data from memory. This occurs
during a branch, jump or simply when an instruction uses long immediate data.
The time taken to service an interrupt is basically a jump to the appropriate
vector and then a jump to the routine pointed to by that vector. The timings of
interrupts according to the type of instruction in the pipeline is given later in this
documentation.
The time it takes to service an interrupt will also depend on the following:
• Whether a jump instruction is contained in the interrupt vector table
Interrupts
• Allowing stage 1 to stage 2 dependencies to complete
4
• Returning loads using write-back stage
• An I- Cache miss causing the I-Cache to reload in order to service the
interrupt
• The number of register push items onto a software stack at the start of the
interrupt service routine
• Whether an interrupt of the same or higher level is already being serviced
• An interruption by higher level interrupt
use of extensions in the auxiliary and core register set. How this would be done is
entirely system dependent.
NOTE The implemented ARCtangent-A4 system may have extensions or
Interrupts customizations in this area, please see associated documentation.
4
Introduction
This chapter contains an overview of the types of instructions in the ARCtangent-
A4 processor. The types of instruction in the ARCtangent-A4 processor are:
• Arithmetic and Logical ADD, AND, OR…etc.
• Single Operand FLAG, MOV, LSL...etc.
• Jump, Branch and Loop J,B, LP…etc.
• Load and Store LD, ST…etc.
• Control BRK, SLEEP, SWI…etc
For the operations of the instructions the notation shown in Table 4 is used.
; r2 AND r3
Summary
Null Instruction
Many instructions can be encoded in such a way that no operation is performed.
This is very useful if a NOP instruction is required. To encode a NOP, it is just a
matter of having short immediate data in all register fields and not setting flags.
For example, the encoding of NOP is actually equivalent to:
XOR 0x1FF,0x1FF,0x1FF
Instruction Set
ASR arithmetic shift right
Summary
b C
5
....
0 a
ROR b C
rotate right
....
flags
5
Instruction Set
provided with jump (J, JL), branch (B, BL) and loop (LP) instructions.
Summary
Branch, loop and jump instructions use the same condition codes as instructions.
5
However, the condition code test for these jumps is carried out one stage earlier
in the pipeline than other instructions.
This means that if an instruction setting the flags is immediately followed by a
jump, then a single cycle stall will be incurred before executing the jump
instruction (Even if the jump is unconditional). In this case, performance can be
increased by inserting a useful non-flag setting instruction between the flag
setting instruction and the jump.
Instruction Operation (if cc true) Description
Jcc pc ← addr Jump
JLcc blink ← pc Jump and link
pc ← addr (ARCVER 0x06 and higher)
Bcc pc ← reladdr + pc Branch
BLcc blink ← pc branch and link
pc ← reladdr + pc
LPcc lp_end ← addr Set up Zero-overhead loop
lp_start ← pc
Table 10 Jump, Branch and Loop Instructions
Due to the pipeline in the ARCtangent-A4 processor, the jump instruction does
not take effect immediately, but after a one cycle delay. The execution of the
immediately following instruction after a jump, branch or loop can be controlled.
This instruction is said to be in the delay slot. The branch and link instruction
(BL) and the jump and link instruction (JL for the ARCtangent-A4 basecase
processor version 6 and higher) also save the whole of the status register to the
link register. This status register is taken either from the first instruction
following the branch (current PC) or the instruction after that (next PC)
according to the delay slot execution mode. The modes for specifying the
execution of the delay slot instruction are:
Mode Operation Link Register
ND No Delay slot instruction (default) Link to current
Only execute next instruction when not jumping PC
D Delay slot instruction Link to next PC
Always execute next instruction
Instruction Set
Branch type instructions use 20-bit relative addressing. The syntax of the branch
type instruction is:
op<cc><.dd> reladdr
Examples:
LP end_of_loop ; set up loop registers
LPNZ end_of_loop ; if not zero set up loop regs
; otherwise jump to end_of_loop
BL subroutine1 ; Branch and link to subroutine1
; saving status reg to BLINK
BL.D subroutine1 ; and always execute next instruction
BNE nother_bit ; if zero flag not set then branch
; to nother_bit
BNE.JD label ; if zero flag not set then execute
; next instruction and branch to
; label else skip next instruction
The jump instruction uses 32-bit absolute addressing. To enable the correct flag
state when returning from interrupts the jump instruction also has a flag set field.
NOTE If the jump instruction is used with long immediate data, then the delay slot
execution mechanism does not apply, but should default to .JD for JLcc.
Instruction Set
Zero Overhead Loop Mechanism
Summary
5
The ARCtangent-A4 processor has the ability to perform loops without any
delays being incurred by the count decrement or the end address comparison.
Zero delay loops are set up with the registers LP_START, LP_END and
LP_COUNT. LP_START and LP_END can be directly manipulated with the LR
and SR instructions and LP_COUNT can be manipulated in the same way as
registers in the core register set.
NOTE The LP_START, LP_END and LP_COUNT registers are only 24 bit registers,
with the top 8 bits reading as zeros. The maximum number of loop iterations is
16,777,216 (if LP_COUNT = 0 on entry). The special instruction LP is used to
set up the LP_START and LP_END in a single instruction.
NOTE The loop mechanism is always active and the registers used by the loop
mechanism are set up with the LP instruction. As LP_END is set to 0 upon
reset, it is not advisable to execute an instruction placed at the end of program
memory space (0xFFFFFC) as this will trigger the LP mechanism if no other
LP has been set up since reset . Also, caution is needed if code is copied or
overlaid into memory, that before executing the code that LP_END is initialized
to a safe value (i.e. 0) to prevent accidental LP triggering. Similar caution is
required if using any form of MMU or memory mapping.
When there is not a pipeline stall, an interrupt, a branch or a jump then the loop
mechanism comes into operation.
The operation of the loop mechanism is such that PC+1 is constantly compared
with the value LP_END. If the comparison is true, then LP_COUNT is tested. If
LP_COUNT is not equal to 1, then the PC is loaded with the contents of
LP_START, and LP_COUNT is decremented. If, however, LP_COUNT is 1,
then the PC is allowed increment normally and LP_COUNT is decremented. This
is illustrated in Figure 5.
Instruction Set
Summary
5
PC ! NEXT_PC
is LP_END No
= NEXT_PC?
Yes
decr LP_COUNT
No PC ! LP_START
is LP_COUNT = 1?
Yes
PC ! NEXT_PC
The use of zero delay loops is illustrated in the following code sample:
MOV LP_COUNT,2 ; do loop 2 times (flags not set)
LP loop_end ; set up loop mechanism to work
; between loop_in and loop_end
loop_in: LR r0,[r1] ; first instruction in loop
ADD r2,r2,r0 ; sum r0 with r2
BIC r1,r1,4 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop
In order that the zero delay loop mechanism works as expected, there are certain
affects that the user should be aware of.
Instruction Set
Summary
should not be used as the destination of a load instruction. Attempting to do so
may cause an incorrect value to be loaded into LP_COUNT.
5
The following is an example of code that may not function correctly:
LD LP_COUNT,[r0] ; caution!! LP_COUNT loaded from memory!
This second example loads a value into a register (a process that does have a
shortcut path and which, therefore, will function correctly). The register value is
loaded into the LP_COUNT register, a process that does not require shortcutting
and which will function correctly.
the time the instruction after the attempted loop (ADD) is being fetched, which
is, however, too late for the loop mechanism.
LP loop_end ; this will execute only once
loop_in: OR r21,r22,r23 ; single instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop
If the user wishes to have single instruction loops, then code like that in the
following code example can be used. Notice, there has to be a delay to allow the
loop start and loop end registers to be updated with the SR instruction. The code
basically updates the registers in the loop mechanism that would normally be
updated by the LP instruction.
MOV LP_COUNT,5 ; no. of times to do loop
MOV r0,dooploop>>2 ; convert to long-word size
ADD r1,r0,1 ; add 1 to dooploop address
SR r0,[LP_START] ; set up loop start register
SR r1,[LP_END] ; set up loop end register
NOP ; allow time to update regs
NOP ; can move useful instrs. here
dooploop:OR r21,r22,r23 ; single instruction in loop
Instruction Set
In order for the loop mechanism to work properly, the loop count register must
be set up with at least 3 instructions (actually 3 cycles) between it and the last
instruction in the loop. In the following example, the MOV instruction will
override the loop mechanism (which would decrement LP_COUNT) and the loop
will be executed one more time than expected. The MOV instruction must be
followed by a NOP for correct execution. The following code sample shows an
invalid count loop setup.
MOV LP_COUNT,r0 ; do loop r0 times (flags not set)
LP loop_end ; set up loop mechanism
loop_in: OR r21,r22,r23 ; first instruction in loop
AND 0,r21,23 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop
When reading from the loop count register (LP_COUNT) the user must be aware
that the value returned is that value of the counter that applies to the next
instruction to be executed. If the last instruction in a loop reads LP_COUNT,
then the value returned would be that value after the loop mechanism has updated
it. The following code example shows a Reading Loop Counter near Loop
Mechanism
MOV r0,LP_COUNT ; loop count for this iteration
MOV r0,LP_COUNT ; loop count for next iteration
loop_end:
Instruction Set
ADD r19,r19,r20 ; first instruction after loop
Summary
Branch and jumps in loops
5
Jumps or branches without linking will work correctly in any position in the loop.
There are, however, some side effects for delay slots and link registers when a
branch or jump is the last instruction in a loop:
Firstly, it is possible that the branch or jump instruction is contained in the very
last long-word position in the loop. This means that the instruction in the delay
slot (See Chapter 5 — Instruction Set Summary and Chapter 10 — Pipeline and
Timings) would be either the first instruction after the loop or the first instruction
in the loop (pointed to by loop start register) depending on the result of the loop
mechanism. The instruction in the delay slot will be that which would be
executed if the branch or jump was replaced by a NOP.
If a branch-and-link or jump-and-link instruction is used in the one before last
long-word position in a loop, then the return address stored in the link register
(BLINK) may contain the wrong value. The following instructions will store the
address of the first instruction after the loop, and therefore should not be used in
the second to last position:
BLcc.D address
BLcc.JD address
JLcc.D [Rn]
JLcc.JD [Rn]
JLcc address
If the ND delay slot execution mode is used for branch-and-link or jump-and-link
instruction in the one before last long-word position in a loop, then the return
address is stored correctly in the link register.
The loop count does not decrement if the instruction fetched was subsequently
killed as the result of a branch/jump operation. For these reasons it is
recommended that subroutine calls should not be used within the loop
mechanism.
Instructions with long immediate data: correct coding
Instructions with long immediate date will work correctly with the zero overhead
loop mechanism as long as the LP instruction is used. Even if the instruction
containing the long immediate data is seen as the last instruction in the loop.
Here, we are setting up the loop with an instruction that uses long immediate
Instruction Set
data. The loop_end label points to the first instruction after the loop.
Summary
LP loop_end
loop_in:...
...
XOR r1,r2,r3
OR r21,r22,2048 ; last instruction in loop
loop_end:
ADD r19,r19,r20 ; first instruction after loop
Instruction Set
insn (like MOV r1,LP_COUNT), then the value that the instruction reads will
Summary
be that value after the loop mechanism updated it.
5
For further details see Chapter 7 — Register Set Details.
normally loop
mechanism
5
Insn-1 loop end update after value before wrong return works
not set up loop loop address may be normally
in time mechanism mechanism stored in BLINK
LP_COUNT
decrements
according to
delay slot mode
Insn loop end update after value after loop_count imm data =
not set up loop loop decrements ins1 or outins1
in time mechanism mechanism
delay slot = ins1
or outins1
Loop_end:
Outins1
Outins2
Breakpoint Instruction
The breakpoint instruction is a single operand basecase instruction that halts the
program code when it is decoded at stage one of the pipeline. This is a very basic
debug instruction, which stops the ARCtangent-A4 processor from performing
any instructions beyond the breakpoint. The pipeline is also flushed upon decode
of this instruction. To restart the ARCtangent-A4 processor at the correct
instruction the old instruction is rewritten into main memory. It is immediately
followed by an invalidate instruction cache line command (if an instruction cache
has been implemented) to ensure that the correct instruction is loaded into the
cache before being executed by the ARCtangent-A4 processor. The program
counter must also be rewritten in order to generate a new instruction fetch, which
reloads the instruction. Most of the work is performed by the debugger with
regards to insertion, removal of instructions with the breakpoint instruction.
The program flow is not interrupted when employing the breakpoint instruction,
and there is no need for implementing a breakpoint service routine. There is also
Instruction Set
Summary
no limit to the number of breakpoints that can be inserted into a piece of code.
5
NOTE The breakpoint instruction sets the BH bit (refer to section Programmer’s
Model) in the Debug register when it is decoded at stage one of the pipeline.
This allows the debugger to determine what caused the ARCtangent-A4
processor to halt. The BH bit is cleared when the Halt bit in the Status register
is cleared, e.g. by restarting or single–stepping the ARCtangent-A4 processor.
one, and allows instructions in stages two, three and four to continue, i.e. flushing
the pipeline.
The link register is not updated for Branch and Link, BL, (or Jump and Link, JL)
instruction when the BRK instruction is placed in the delay slot. When the
ARCtangent-A4 processor is started, the link register will update as normal.
Interrupts are treated in the same manner by the ARCtangent-A4 processor as
Branch, and Jump instructions when a BRK instruction is detected. Therefore, an
interrupt that reaches stage two of the pipeline when a BRK instruction is in stage
one will keep it in stage two, and flush the remaining stages of the pipeline. It is
also important to note that an interrupt that occurs in the same cycle as a
breakpoint is held off as the breakpoint is of a higher priority. An interrupt at
stage three is allowed to complete when a breakpoint instruction is in stage one.
Sleep Instruction
The sleep mode is entered when the ARCtangent-A4 processor encounters the
SLEEP instruction. It stays in sleep mode until an interrupt or restart occurs.
Power consumption is reduced during sleep mode since the pipeline ceases to
change state, and the RAMs are disabled. More power reduction is achieved
when clock gating option is used, whereby all non-essential clocks are switched
off.
The SLEEP instruction can be put anywhere in the code, as in the example
below:
SUB r2, r2, 0x1
ADD r1, r1, 0x2
SLEEP
...
The SLEEP instruction is a single operand instruction without flags or operands.
The SLEEP instruction is decoded in pipeline stage 2. If a SLEEP instruction is
detected, then the sleep mode flag (ZZ) is immediately set and the pipeline stage
1 is stalled. A flushing mechanism assures that all earlier instructions are
executed until the pipeline is empty. The SLEEP instruction itself leaves the
pipeline during the flushing.When in sleep mode, the sleep mode flag (ZZ) is set
and the pipeline is stalled, but not halted. The host interface operates as normal
allowing access to the DEBUG and the STATUS registers and it can halt the
processor. The host cannot clear the sleep mode flag, but it can wake the
ARCtangent-A4 processor by halting then restarting ARCtangent-A4 processor.
The program counter PC points to the next instruction in sequence after the sleep
Instruction Set
instruction.
Summary
The ARCtangent-A4 processor will wake from sleep mode on an interrupt or
5
when the ARCtangent-A4 is restarted. If an interrupt wakes up the ARCtangent-
A4 processor, the ZZ flag is cleared and the instruction in pipeline stage 1 is
killed. The interrupt routine is serviced and execution resumes at the instruction
in sequence after the SLEEP instruction. When the ARCtangent-A4 processor is
started after having been halted, the ZZ flag is cleared.
In this example, the ARCtangent-A4 processor goes to sleep after the branch
instruction has been executed. When the ARCtangent-A4 processor is
sleeping,the PC points to the “add” instruction after the label
"after_sleep". When an interrupt occurs, the ARCtangent-A4 processor
wakes up, executes the interrupt service routine and continues with the “add”
instruction. If the delay slot is killed, as in the following code example, the
SLEEP instruction in the delay slot is never executed:
BAL.ND after_sleep
SLEEP
...
after_sleep:
ADD r1,r1,0x2
Loads are passed to the memory controller once the address has been calculated,
and the register which is the destination of the load is tagged to indicate that is
waiting for a result, as loads take a minimum of one cycle to complete. If an
instruction references the tagged register before the load has completed, the
pipeline will stall until the register has been loaded with the appropriate value.
For this reason it is not recommended that loads be immediately followed by
instructions which reference the register being loaded. Delayed loads from
memory will take a variable amount of time depending upon the presence of
cache and the type of memory which is available to the memory controller.
Consequently, the number of instructions to be executed in between the load and
the instruction using the register will be application specific.
Byte and word loads can be sign extended to 32-bits, or simply loaded into the
appropriate register with unused bits set to zero. This is accomplished with the
sign extend suffix: .X
Stores are passed to the memory controller, which will store the data to memory
Instruction Set
when it is possible to do so. The pipeline may be stalled if the memory controller
Summary
cannot accept any more buffered store requests. Note that if the offset is not
5
required during a store, the value encoded will be set to 0.
If a data-cache is available in the memory controller, the load and store
instructions can bypass the use of that cache. When the suffix .DI is used the
cache is bypassed and the data is loaded directly from or stored directly to the
memory. This is particularly useful for shared data structures in main memory,
for the use of memory-mapped I/O registers, or for bypassing the cache to stop
the cache being updated and overwriting valuable data that has already been
loaded in that cache.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
The access to the auxiliary register set is accomplished with the special load
register and store register instructions (LR and SR). They work in a similar way
to the normal load and store instructions except that the access is accomplished in
a single cycle due to the fact that address computation is not carried out and the
scoreboard unit is not used. The LR and SR instruction do not cause stalls like
the normal load and store instructions but in the same cases that arithmetic and
logic instructions would cause a stall.
Access to the auxiliary registers are limited to 32 bit (long word) only and the
instructions are not conditional.
Instruction Operation Description
LR a ← aux. reg [b] load from auxiliary register
SR aux. reg.[b] ← c store to auxiliary register
Extension Instructions
These operations are of the form a ←b op c (or a ← op b for single operand
instructions) where the destination (a) is replaced by the result of the operation
Instruction Set
(op) on the operand sources (b and c). The ordering of the operands is important
Summary
for some operations (e.g.: SUB, BIC) All arithmetic and logical instructions can
5
be conditional and/or set the flags. However, instructions using the short
immediate addressing mode can not be conditional.
The syntax for extension instructions is:
op<.cc><.f> a,b,c
The syntax for extension single operand instructions is:
op<.cc><.f> a,b
Multiply 32 X 32
Two versions of the scoreboarded 32x32 multiplier function are available, 'fast'
and 'small', taking four and ten cycles respectively. The full 64-bit result is
available to be read from the core register set. The middle 32 bits of the 64-bit
result are also available. The multiply is scoreboarded in such a way that if a
multiply is being carried out, and if one of the result registers is required by
another ARCtangent-A4 instruction, the processor stalls until the multiply has
finished.
Instruction Set
Summary
b c
5
MHI MLO
MMID
result register if the multiply has been used, for example, by an interrupt service
routine. See Multiply restore register.
0 a
Instruction Set
Summary
ROR b
rotate right
5
....
a 0
Normalize instruction
The NORM instruction gives the normalisation integer for the signed value in the
operand. The normalisation integer is the amount by which the operand should be
shifted left to normalise it as a 32-bit signed integer. To find the normalisation
integer of a 32-bit register by using software without a NORM instruction,
requires many ARCtangent-A4 instruction cycles.
S b
#
op<.cc><.f> a,b
Summary
Example:
5
SWAP instruction
The swap instruction is a very simple extension, intended for use with the
multiply-accumulate block. It exchanges the upper and lower 16-bit of the source
value, and stores the result in a register. This is useful to prepare values for
multiplication, since the multiply-accumulate block takes its 16-bit source values
from the upper 16 bits of the 32-bit values presented.
b
MIN/MAX instructions
These instructions are useful in applications where sorting takes place. Two
signed 32-bit words are compared, and either the larger or smaller of the two is
returned, depending on which instruction is being used.
The syntax for the min/mas instructions is:
op<.cc><.f> a,b,c
Example:
MIN r1,r2,r3 ; write minimum of r2 and r3 into r1
Instruction Set
Summary
Introduction
The ARCtangent-A4 processor has an extensive instruction set most of which
can be carried out conditionally and/or set the flags. Those instructions using
short immediate data can not have a condition code test.
Branch, loop and jump instructions use the same condition codes as instructions.
However, the condition code test for these jumps is carried out one stage earlier
in the pipeline than other instructions. Therefore, a single cycle stall will occur if
a jump is immediately preceded by an instruction that sets the flags.
Z N C V E1 E2 H R PC[25:2]
EQ , Z Zero Z 0x01
NE , NZ Non-Zero /Z 0x02
PL , P Positive /N 0x03
MI , N Negative N 0x04
CS , C, LO Carry set, lower than C 0x05
(unsigned)
CC , NC, HS Carry clear, higher or same /C 0x06
(unsigned)
VS , V Over-flow set V 0x07
VC , NV Over-flow clear /V 0x08
GT Greater than (signed) (N and V and /Z) or 0x09
(/N and /V and /Z)
GE Greater than or equal to (N and V) or (/N 0x0A
(signed) and /V)
The remaining 16 condition codes (10-1F) are available for extension and are
used to:
• provide additional tests on the internal condition flags or
• test extension status flags from external sources or
• test a combination external and internal flags
Condition Codes
If an extension condition code is used that is not implemented, then the condition
code test will always return false (i.e. the opposite of AL - always).
6
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
r0
r1
r27
r28
r32
r59
LP_ r60 LP_COUNT[23:0]
COUNT
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0x5F
Reserved ↓
Reserved 0x7F
↓
Register Set Details
0xFFFFFFFF
Link registers
The link registers (ILINK1, ILINK2, BLINK) are used to provide links back to
the position where an interrupt or branch occurred. They can also be used as
general purpose registers, but if interrupts or branch-and-link or jump-and-link
are used, then these are reserved for that purpose.
In the basecase ARCtangent-A4 processor prior to version 7, the branch-and-link
and jump-and-link instructions write to the BLink register in a way that bypasses
the LD scoreboard mechanism. Basecase ARCtangent-A4 processor version 7
7
Loop count register
The loop count register (LP_COUNT) is used for zero delay loops. Because
LP_COUNT is decremented if the program counter equals the loop end address
and also LP_COUNT does not have next cycle bypass like the other core
registers, it is not recommended that LP_COUNT be used as a general purpose
register, see later in this documentation for details. Note that LP_COUNT is only
24 bits wide.
registers occupy a special address space that is accessed using special load and
store instructions, or other special commands. The basecase ARCtangent-A4
7
processor uses 6 status and control registers, and reserves the additional registers
0x60 to 0x7F, leaving the rest of the 232 registers for extension purposes. If an
auxiliary register is read that is not implemented, then the IDENTITY register
contents is returned. No exception is generated. Writes to non implemented
auxiliary registers are ignored.
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Status register
The STATUS register contains the PC, the condition flags and interrupt mask
bits. LP_START and LP_END are the other registers used by the zero delay loop
mechanism. The SEMAPHORE register is used to control inter-process
communication. IDENTITY is used by the host and ARCtangent-A4 processor to
determine the version number of the processorr and other implementation
specific information. DEBUG is used by the host to test and control the
ARCtangent-A4 processor during debug situations.
31 30 29 28 27 26 25 24 23 0
Z N C V E2 E1 H R PC[25:2]
7
the LR and the current condition flags. STATUS cannot be written with SR
instruction. The FLAG and Jcc instructions are used to affect the status register.
Semaphore register
31 4 3 2 1 0
Reserved S[3:0]
the ARCtangent-A4 processor nor the host have claimed any semaphore bits.
When claiming a semaphore bit (i.e. setting the semaphore bit to a ‘1’), care
should be taken not to clear the remaining semaphore bits. This could be
7
Reserved LPSTART[25:2]
Reserved LPEND[25:2]
Identity register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
7
additional identity number (ARCNUM[7:0]) and the ARCtangent-A4 basecase
processor version number (ARCVER[7:0]).
Debug register
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
LD SH BH ZZ IS Reserved FH SS
LD can be read at any time by either the host or the ARCtangent-A4 processor
and indicates that there is a delayed load waiting to complete. The host should
wait for this bit to clear before changing the state of the ARCtangent-A4
processor.
SH indicates that the ARCtangent-A4 processor has halted itself with the FLAG
instruction, this bit is cleared whenever the H bit in the STATUS register is
cleared (i.e. The ARCtangent-A4 processor is running or a single step has been
executed).
Breakpoint Instruction Halt (BH) bit is set when a breakpoint instruction has
been detected in the instruction stream at stage one of the pipeline. A breakpoint
halt is set when BH is ‘1’. This bit is cleared when the H bit in the status register
is cleared, e.g. single stepping or restarting the ARCtangent-A4 processor. BH is
only available for ARCtangent-A4 basecase processor version 7 or higher.
ZZ bit indicates that the ARCtangent-A4 processor is in "sleep" mode. The
ARCtangent-A4 processor enters sleep mode following a SLEEP instruction. ZZ
is cleared whenever the processor "wakes" from sleep mode. ZZ is only available
for ARCtangent-A4 basecase processor version 7 or higher.
The force halt bit (FH) is only available for the ARCtangent-A4 basecase
processor version (ARCVER in IDENTITY register) of 5 or higher. FH is a
foolproof method of stopping the processor externally by the host. The host
setting this bit does not have any side effects when the ARCtangent-A4 processor
is halted already. FH is not a mirror of the STATUS register H bit:- clearing FH
Register Set Details
will not start the ARCtangent-A4 processor. FH always returns 0 when it is read.
See also Halting.
7
Single stepping is provided through the use of IS and SS. Single instruction step
(IS) is used in combination with SS. When IS and SS are both set by the host the
ARCtangent-A4 processor will execute one full instruction. IS is only available
for ARCtangent-A4 basecase processor version 7 or higher.
SS is a write only bit that when set by the host will cause the ARCtangent-A4
processor to single cycle step. The single cycle step function enables the
processor for one cycle. It should be noted that this does not necessarily
correspond to one instruction being executed, since stall conditions may be
present. In order to execute a single instruction, the remote system must
repeatedly single-step the ARCtangent-A4 processor until the values change in
either the program counter or loop count register, or use SS in combination with
IS.
7
MUL[63:32]
The lower part of the multiply result register can be restored by multiplying the
desired value by 1.
Example
To read the upper and lower parts of the multiply results
MOV r1,mlo ;put lower result in r1
MOV r2,mhi ;put upper result in r2
To restore the multiply results
MULU64 r1,1 ;restore lower result
MOV 0,mlo ;wait until multiply complete. N.B causes
;processor to stall,until multiplication is
;finished
SR r2,[mulhi] ;restore upper result
Register Set Details
7
Introduction
This chapter contains the detailed information about all the instructions available
in the basecase version of the ARCtangent-A4 processor. They are arranged in
alphabetical order.
Instruction Map
There are 32 different instruction codes, only the first 16 of which are used for
the basecase ARCtangent-A4 processor according to the following table:
Basecase instruction set
Code Instruction and/or Type Notes
0x00 LD register + register Delayed load (core registers only)
0x01 LD register + offset, LR Delayed load or load from aux. register
0x02 ST register + offset, SR Buffered store or store to aux. register
0x03 single operand FLAG, single shifts, sign extend
instructions
0x04 Bcc Branch conditionally
0x05 BLcc Branch and link conditionally
0x06 LPcc Loop set up or jump conditionally
0x07 Jcc, JLcc Jump (and link) conditionally
0x08 ADD Addition
0x09 ADC Addition with carry
0x0A SUB Subtract
0x0B SBC Subtract with carry
0x0C AND Logical bitwise AND
0x0D OR Logical bitwise OR
Addressing Modes
The addressing mode of the instruction are encoded on the instruction. There are
basically only 3 addressing modes: register-register, register-immediate and
immediate-immediate. However, as a consequence of the action performed by the
different instruction groups, these can be expanded as shown in Table 3 Data
Addressing Modes. The operating modes use the key in Table 4.
8
ADD<.f> a,shimm,shimm ADD<.f> 0,shimm,shimm
ADD<.cc><.f> a,b,limm ADD<.cc><.f> 0,b,limm
ADD<.cc><.f> a,limm,c ADD<.cc><.f> 0,limm,c
ADD 0,shimm,shimm ;nop
Jump Instruction
Example
8
Load Instruction
Example
LD r1,[r2,r3]
; r1 replaced with data at r2+r3
LD r1,[r2,20]
; r1 replaced with data at r2+20
LDB r1,[r2,r3]
; load byte from r2+r3
LD.A r4,[r2,10]
; r4 replaced by data at address
; r2 plus offset 10 and writeback
; address calculation to r2
LDW.X r1,[r2,r3] ; r1 replaced by sign extended word
; from address at r2+r3
LDW.X.A r1,[r2,r3] ; word, sign extended with writeback
; from address at r2+r3
LD r1,[900] ; load from address 900
Store instruction
Example:
ST r1,[r2] ; data at address r2 replaced by r1
8
STW.A r1,[r2,2] ; store bottom 16 bits of r1 to
; address r2+2 and writeback
; r2+2 to r2
ST r1,[900] ; store r1 to address 900
STB 0,[r2] ; store byte 0 to address r2
ST -8,[r2,-8] ; store -8 to address r2-8
STW 80,[750] ; store word 80 to address 750
ST 12345678,[r2+8] ; store 12345678 to address r2+8
Example:
SR r1,[r2] ; data in aux reg pointed to by r2
; replaced by data in r1
8
Instruction Encoding
The instructions are encoded according to the type of instruction.
The general encoding outlines are shown below. Some fields have additional
encoding on them and are covered in detail for each instruction.
Those instructions that test the condition codes use the encoding shown in the
following table.
Mnemonic Condition Code
AL, RA Always 0x00
EQ , Z Zero 0x01
NE , NZ Non-Zero 0x02
PL , P Positive 0x03
MI , N Negative 0x04
CS , C, LO Carry set, lower than (unsigned) 0x05
CC , NC, HS Carry clear, higher or same (unsigned) 0x06
VS , V Over-flow set 0x07
VC , NV Over-flow clear 0x08
GT Greater than (signed) 0x09
GE Greater than or equal to (signed) 0x0A
LT Less than (signed) 0x0B
LE Less than or equal to (signed) 0x0C
HI Higher than (unsigned) 0x0D
8
NOTE The implemented ARCtangent-A4 system may have extensions or
customizations in this area, please see associated documentation.
Register
This is the general form used for register-register and register-long-immediate
addressing.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
Instruction Set Details
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Short immediate
This is the form used for register with short immediate. Note that the short
immediate data is always sign extended to 32 bits before use.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 1 1 0 1 0
The code in field C[5:0] (instruction bits[14:9]) is 63 for short immediate data
without setting flags. If the instruction needed to set flags, then code 61 would be
used.
The result of the operation is discarded if the short immediate code is included in
the destination field A[5:0].
Single operand
Single operand instructions use the same format as “register” and “short
immediate” encoding styles, except that the I-field contains 0x03 and the C-field
is used to encode the particular single operand instruction code.
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I[4:0] L[21:0] N N Q Q Q Q Q
0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
ADC
Operation:
dest ← operand1 + operand2 + carry
Syntax:
with result without result
ADC<.cc><.f> a,b,c ADC<.cc><.f> 0,b,c
ADC<.f> a,b,shimm ADC<.f> 0,b,shimm
ADC<.f> a,shimm,c ADC<.f> 0,shimm,c
ADC<.f> a,shimm,shimm ADC<.f> 0,shimm,shimm
ADC<.cc><.f> a,b,limm ADC<.cc><.f> 0,b,limm
ADC<.cc><.f> a,limm,c ADC<.cc><.f> 0,limm,c
Example:
ADC r1,r2,r3
Description:
Add operand1 to operand2 and carry, and place the result in the destination
register.
Status flags:
Z N C V
* * * *
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Set if an overflow is generated
Instruction format:
Instruction Set Details
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1
Addition
ADD ADD
Arithmetic Operation
ADD
Operation:
dest ← operand1 + operand2
Syntax:
with result Without result
ADD<.cc><.f> a,b,c ADD<.cc><.f> 0,b,c
ADD<.f> a,b,shimm ADD<.f> 0,b,shimm
ADD<.f> a,shimm,c ADD<.f> 0,shimm,c
ADD<.f> a,shimm,shimm ADD<.f> 0,shimm,shimm
(shimms MUST match)
ADD<.cc><.f> a,b,limm ADD<.cc><.f> 0,b,limm
ADD<.cc><.f> a,limm,c ADD<.cc><.f> 0,limm,c
Example:
ADD r1,r2,r3
Description:
Add operand1 to operand2 and place the result in the destination register.
Status flags:
Z N C V
* * * *
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Set if an overflow is generated
Instruction format:
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1
a 0
Syntax:
with result without result
ASL<.cc><.f> a,b ASL<.cc><.f> 0,b
ASL<.f> a,shimm ASL<.f> 0,shimm
ASL<.cc><.f> a,limm ASL<.cc><.f> 0,limm
Example:
ASL r1,r2
Description:
Arithmetically shift operand left by one place and place the result in the
destination register. When interpreting as an arithmetic shift, the overflow flag
will be set if the sign bit changes after the shift. When interpreting as a logical
shift, the overflow flag can be ignored. ASL is included for instruction set
symmetry. It is basically the ADD instruction. (ADD a,b,b etc)
Status flags:
Z N C V
* * * *
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address - in both fields
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1
a 0
Syntax:
with result without result
ASL<.cc><.f> a,b,c ASL<.cc><.f> 0,b,c
ASL<.cc><.f> a,b,limm ASL<.cc><.f> 0,b,limm
ASL<.f> a,b,shimm ASL<.f> 0,b,shimm
ASL<.cc><.f> a,limm,c ASL<.cc><.f> 0,limm,c
ASL<.f> a,shimm,c ASL<.f> 0,shimm,c
Example:
ASL r1,r2,r3
Description:
Arithmetically, shift left operand1 by operand2 places and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field
Syntax:
with result Without result
ASR<.cc><.f> a,b ASR<.cc><.f> 0,b
ASR<.f> a,shimm ASR<.f> 0,shimm
ASR<.cc><.f> a,limm ASR<.cc><.f> 0,limm
Example:
ASR r1,r2
Description:
Arithmetically shift operand right by one place and place the result in the
destination register. The sign of the operand is retained after the shift.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction format:
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1
Syntax:
with result without result
ASR<.cc><.f> a,b,c ASR<.cc><.f> 0,b,c
ASR<.cc><.f> a,b,limm ASR<.cc><.f> 0,b,limm
ASR<.f> a,b,shimm ASR<.f> 0,b,shimm
ASR<.cc><.f> a,limm,c ASR<.cc><.f> 0,limm,c
ASR<.f> a,shimm,c ASR<.f> 0,shimm,c
Example:
ASR r1,r2,r3
Description:
Arithmetically, shift right operand1 by operand2 places and place the result in the
destination register. The destination is sign filled.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field
8
0 1 1 1 0 A[5:0] B[5:0] C[5:0] D[8:0]
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1
Branch Conditionally
Bcc Bcc
Branch Operation
Bcc
Operation:
If condition true then PC ← PC + rel_addr
Syntax:
B<cc><.dd> rel_addr
Example:
BNE.ND new_code
Description:
If the specified condition is met then program execution is resumed at location
PC + relative displacement (rel_addr), where PC is the address of the instruction
in the delay slot . The displacement is a 20 bit signed long word offset. The
instruction following the branch is executed according to the nullify instruction
mode shown in the following table:
ND Only execute next instruction when not jumping (Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111
Status flags:
Instruction Set Details
Not affected.
Instruction format:
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 L[21:2] N N Q Q Q Q Q
Instruction fields:
L[21:2] Relative address long word displacement
N Nullify instruction mode
Q Condition code field
8
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111
Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 1 L[21:2] N N Q Q Q Q Q
Instruction fields:
L[21:2] Relative address long word displacement
N Nullify instruction mode
Q Condition code field
Instruction Set Details
8
Breakpoint
BRK BRK
Debug Operation
BRK
Operation:
Halt and flush the ARCtangent-A4 processor
Syntax:
BRK
Example:
BRK
Description:
The breakpoint instruction can be placed anywhere in a program. The breakpoint
instruction is decoded at stage one of the pipeline which consequently stalls stage
one, and allows instructions in stages two, three and four to continue, i.e. flushing
the pipeline.
Due to stage 2 to stage 1 dependencies, the breakpoint instruction behaves
differently when it is placed in the delay slots of Branch, and Jump instructions.
In these cases, the processor will stall stages one and two of the pipeline while
allowing instructions in subsequent stages (three and four) to proceed to
completion.
Interrupts are treated in the same manner by the processor as Branch, and Jump
instructions when a BRK instruction is detected. Therefore, an interrupt that
reaches stage two of the pipeline when a BRK instruction is in stage one will
keep it in stage two, and flush the remaining stages of the pipeline. It is also
important to note that an interrupt that occurs in the same cycle as a breakpoint is
8
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
Zero Extend
EXT EXT
Arithmetic Operation
EXT
Operation:
dest ← operand zero extended from byte or word
Syntax:
with result without result
EXT<zz><.cc><.f> a,b EXT<zz><.cc><.f> 0,b
EXT<zz><.f> a,shimm EXT<zz><.f> 0,shimm
EXT<zz><.cc><.f> a,limm EXT<zz><.cc><.f> 0,limm
Example:
EXTW r1,r2
Description:
Zero extend operand to most significant bit in long word from byte or word
according to size field <zz> and place the result in the destination register. Valid
values for <zz> are:
W zero extend from word
B zero extend from byte
Status flags:
Z N C V
* 0 . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
Instruction Set Details
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
H[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0.
F Set flags on result if set to 1
Set Flags
FLAG FLAG
Control Operation
FLAG
Operation:
flags ← low bits of operand
b
....
flags
Syntax:
FLAG<.cc> b
FLAG shimm
FLAG<.cc> limm
Example:
FLAG r2
Description:
Move the low bits of the operand into the flags register.
Z, N, C, V are replaced by bits [6:3] respectively. The interrupt enables are
replaced by bits 2 and 1. The H bit is the processor halt bit and should be set to
halt the ARCtangent-A4 processor.
If the H bit is set then the other flag bits are unchanged.
For proper operation, the set flags field should be set to “not set flags”, i.e. bit 8
should be clear, or r63 used for the short-immediate indicator.
Status Flags:
8
C Set according to bit 4 of operand V Set according to bit 3 of operand
E2 Set according to bit 2 of operand E1 Set according to bit 1 of operand
H Set according to bit 0 of operand
Instruction format:
The destination field must contain an immediate operand indicator.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 1 0 1 B[5:0] 0 0 0 0 0 0 0 Res. Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 1 0 1 B[5:0] 0 0 0 0 0 0 D[8:0]
Instruction fields:
B[5:0] Operand address. D[8:0] Immediate data field
Q Condition code field R Reserved. Should be set to 0.
Instruction Set Details
8
Jump Conditionally
Jcc Jcc
Jump Operation
Jcc
Operation:
If condition true then PC ← operand1
Syntax:
J<cc><.dd><.f> [b]
J<cc><.JD><.f> addr ; limm = addr (top 7 bits of addr will update the
flags if flag field is set)
J<cc><.JD>.f addr, flags ; limm contains both flags (define values as per
FLAG) and addr
Example:
JNZ.ND [r1]
Description:
If the specified condition is met, then program execution is resumed at location
contained in operand 1. If the flag field is set, then operand 1 replaces the whole
of the status register (except the halt bit), otherwise if the flag field is clear then
only the PC is replaced (the alternative syntax for updating flags is supplied for
ease of programming). The operand value updates the status register according
to:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Z N C V E2 E1 H R PC[25:2]
8
mode:
ND Only execute next instruction when not jumping(Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111
Status flags:
Are changed if flag field is set.
Z N C V E2 E1 H
* * * * * * .
0 0 1 1 1 0 0 0 0 0 0 B[5:0] 0 0 0 0 0 0 F R N N Q Q Q Q Q
Instruction fields:
B[5:0] Operand address. F Set fags if set to 1
Q Condition code field R Reserved. Should be set to 0.
N Nullify instruction mode
Instruction Set Details
8
Syntax:
JL<cc><.dd><.f> [b]
JL<cc><.JD><.f> Addr ; limm = addr (top 7 bits of addr will update the
flags if flag field is set)
JL<cc><.JD>.f addr, flags ; limm contains both flags (define values as per
FLAG) and addr
Example:
JLNZ.ND [r1]
Description:
NOTE This instruction is only available for ARCtangent-A4 Basecase processor
version 6 and higher.
Z N C V E2 E1 H R PC[25:2]
8
If operand 1 is an explicit address (long immediate data), then for this instruction
the .JD nullify instruction mode must be used. If .D or .ND is used, then the link
register BLINK will contain the incorrect return address, whereupon, the
ARCtangent-A4 processor will attempt to execute the long immediate data on
return from the subroutine. When operand 1 is a register, however, the instruction
following the jump is executed according to the nullify instruction mode:
ND Only execute next instruction when not jumping (Default for [b], disallowed for 00
addr)
Status flags:
Are changed if flag field is set.
Z N C V E2 E1 H
* * * * * * .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 B[5:0] 0 0 0 0 0 1 F R N N Q Q Q Q Q
8
Instruction fields:
B[5:0] Operand address. F Set fags if set to 1
Q Condition code field R Reserved. Should be set to 0.
N Nullify instruction mode
8
memory field, .DI, is set.
Note that the destination of a load should not be an immediate data indicator. The
operation of the load/store unit may be degraded if this occurs.
When the target of a LD.A instruction is the same register as the one used for
address write-back (.A), the returning load will overwrite the value from the
address write-back.
LD effectively uses 2 instruction positions. One opcode for short immediate form
and another opcode for the general form.
NOTE When a memory controller is employed:
Load bytes can be made to any byte alignments
Load words should be made from word aligned addresses and
Load longs should be made only from long aligned addresses.
Status flags:
Not affected.
Instruction format
Load using generic opcode form:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
Load with short immediate opcode form
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. Di Direct to memory (cache bypass) enable
B[5:0] Operand 1 address A Address write-back enable
C[5:0] Operand 2 address Z Size field
D[8:0] Immediate data offset X Sign extend field
R Reserved. Should be set to 0
Instruction Set Details
8
Loop Set Up
LPcc LPcc
Branch Operation
LPcc
Operation:
If condition false then PC ← PC + rel_addr.
If condition true then LP_END ← PC + rel_addr and LP_START ← next PC.
Syntax:
LP<cc><.dd> rel_addr
Example:
LPNE.ND end_loop1
Description:
If the specified condition is not met, then program execution is resumed at
location PC + relative displacement (rel_addr), where PC is the address of the
instruction in the delay slot. The displacement is a 20 bit signed long word offset.
If the condition is met, then the zero overhead loop registers are set up. The
instruction following the loop set up is executed according to the nullify
instruction mode according to the following table:
ND Only execute next instruction when not jumping(Default) 00
D Always execute next instruction 01
JD Only execute next instruction when jumping 10
The condition codes that can be used in the condition code field are:
AL, RA 00000 MI , N 00100 VC , NV 01000 LE 01100
EQ , Z 00001 CS , C, LO 00101 GT 01001 HI 01101
NE , NZ 00010 CC , NC, HS 00110 GE 01010 LS 01110
PL , P 00011 VS , V 00111 LT 01011 PNZ 01111
Status flags:
8
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 0 L[21:2] N N Q Q Q Q Q
Instruction fields:
L[21:2] Relative address long word displacement
N Nullify instruction mode
Q Condition code field
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
D[8:0] Immediate data field R Reserved. Should be set to 0.
Instruction Set Details
8
a 0
See ASL.
0 a
Syntax:
with result without result
LSR<.cc><.f> a,b LSR<.cc><.f> 0,b
LSR<.f> a,shimm LSR<.f> 0,shimm
LSR<.cc><.f> a,limm LSR<.cc><.f> 0,limm
Example:
LSR r1,r2
Description:
Logically shift operand right by one place and place the result in the destination
register.
The most significant bit of the result is replaced with 0.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction Set Details
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1
0 a
Syntax:
with result without result
ASR<.cc><.f> a,b,c ASR<.cc><.f> 0,b,c
ASR<.cc><.f> a,b,limm ASR<.cc><.f> 0,b,limm
ASR<.f> a,b,shimm ASR<.f> 0,b,shimm
ASR<.cc><.f> a,limm,c ASR<.cc><.f> 0,limm,c
ASR<.f> a,shimm,c ASR<.f> 0,shimm,c
Example:
LSR r1,r2,r3
Description:
Logical shift right operand1 by operand2 places and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
8
1 0 0 0 1 A[5:0] B[5:0] C[5:0] F R R R Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field
8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field
Move contents
MOV MOV
Arithmetic Operation
MOV
Operation:
dest ← operand
Syntax:
with result without result
MOV<.cc><.f> a,b MOV<.cc><.f> 0,b
MOV<.f> a,shimm MOV<.f> 0,shimm
MOV<.cc><.f> a,limm MOV<.cc><.f> 0,limm
Example:
MOV r1,r2
Description:
The contents of the operand are moved to the destination register
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
MOV is included for instruction set symmetry. It is basically the AND
instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
Instruction Set Details
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1
MHI MLO
MMID
Syntax:
MUL64<.cc> <0,>b,c
MUL64 <0,>b,shimm
MUL64 <0,>shimm,c
MUL64<.cc> <0,>b,limm
MUL64<.cc> <0,>limm,c
Example:
MUL64 r2,r3
Description:
Perform a signed 32-bit by 32-bit multiply of operand1 and operand2 then place
the most significant 32 bits of the 64-bit result in register MHI, the least
8
Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 1 1 1 1 1 B[5:0] C[5:0] 0 R R R Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
B[5:0] Operand 1 address Q Condition code field
C[5:0] Operand 2 address R Reserved: set to 0
D[8:0] Immediate data field
Instruction Set Details
8
MHI MLO
MMID
Syntax:
MULU64<.cc> <0,>b,c
MULU64 <0,>b,shimm
MULU64 <0,>shimm,c
MULU64<.cc> <0,>b,limm
MULU64<.cc> <0,>limm,c
Example:
MULU64 r2,r3
Description:
Perform an unsigned 32-bit by 32-bit multiply of operand1 and operand2 then
place the most significant 32 bits of the 64-bit result in register MHI, the least
8
Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 1 1 1 1 1 1 1 B[5:0] C[5:0] 0 R R R Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
B[5:0] Operand 1 address Q Condition code field
C[5:0] Operand 2 address R Reserved: set to 0
D[8:0] Immediate data field
Instruction Set Details
8
No Operation
NOP NOP
Control Operation
NOP
Operation:
No Operation
Syntax:
NOP
Example:
NOP
Description:
No operation. The state of the processor is not changed. NOP is included for
instruction set symmetry. It is basically the XOR instruction:
XOR 0x1FF, 0x1FF, 0x1FF.
Status flags:
Z N C V
. . . .
Z Unchanged N Unchanged
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Syntax:
with result without result
NORM<.cc><.f> a,b NORM<.cc><.f> 0,b
NORM<.f> a,shimm NORM<.f> 0,shimm
NORM<.cc><.f> a,limm NORM<.cc><.f> 0,limm
Example:
NORM r1,r2
Description:
Gives the normalization integer for the signed value in the operand. The
normalisation integer is the amount by which the operand should be shifted left
to normalise it as a 32-bit signed integer. This function is sometimes referred to
as “find first bit”. Examples of returned values are shown in the table below:
Operand Value Returned Value Notes
0x00000000 0x0000001F
0x1FFFFFFF 0x00000002
0x3FFFFFFF 0x00000001
Instruction Set Details
0x7FFFFFFF 0x00000000
0x80000000 0x00000000 This result is not particularly useful since the
8
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 A[5:0] B[5:0] 0 0 1 0 1 0 F R R R Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
D[8:0] Immediate data field F Set flags on result if 1
Logical Bitwise OR
OR OR
Logical Operation
OR
Operation:
dest ← operand1 OR operand2
Syntax:
with result without result
OR<.cc><.f> a,b,c OR<.cc><.f> 0,b,c
OR<.f> a,b,shimm OR<.f> 0,b,shimm
OR<.f> a,shimm,c OR<.f> 0,shimm,c
OR<.f> a,shimm,shimm OR<.f> 0,shimm,shimm
(shimms MUST match)
OR<.cc><.f> a,b,limm OR<.cc><.f> 0,b,limm
OR<.cc><.f> a,limm,c OR<.cc><.f> 0,limm,c
Example:
OR r1,r2,r3
Description:
Logical bitwise OR of operand1 with operand2 and place the result in the
destination register.
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
Instruction format:
Instruction Set Details
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field F Set flags on result if set to 1
Res Reserved. Should be set to 0.
Syntax:
with result without result
RLC<.cc><.f> a,b RLC<.cc><.f> 0,b
RLC<.f> a,shimm RLC<.f> 0,shimm
RLC<.cc><.f> a,limm RLC<.cc><.f> 0,limm
Example:
RLC r1,r2
Description:
Rotate operand left by one place and place the result in the destination register.
The carry flag is shifted into the least significant bit of the result, and the most
significant bit of the source is placed in the carry flag. RLC is included for
instruction set symmetry. It is basically the ADC instruction.
Status flags:
Z N C V
* * * .
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address - in both fields
D[8:0] Immediate data field Q Condition code field
F Set flags on result if set to 1 Res Reserved. Should be set to 0.
Rotate Left
ROL ROL
Not implemented
ROL
Operation:
dest ← rotate left by one of operand
Rotate Right
ROR ROR
Logical Operation
ROR
Operation:
dest ← rotate right by one of operand
b C
....
Syntax:
with result Without result
ROR<.cc><.f> a,b ROR<.cc><.f> 0,b
ROR<.f> a,shimm ROR<.f> 0,shimm
ROR<.cc><.f> a,limm ROR<.cc><.f> 0,limm
Example:
ROR r1,r2
Description:
Rotate operand right by one place and place the result in the destination register.
The least significant bit of the source is also copied to carry flag.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
C Set if carry is generated V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
OR
8
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1
Syntax:
with result without result
ROR<.cc><.f> a,b,c ROR<.cc><.f> 0,b,c
ROR<.cc><.f> a,b,limm ROR<.cc><.f> 0,b,limm
ROR<.f> a,b,shimm ROR<.f> 0,b,shimm
ROR<.cc><.f> a,limm,c ROR<.cc><.f> 0,limm,c
ROR<.f> a,shimm,c ROR<.f> 0,shimm,c
Example:
ROR r1,r2,r3
Description:
Rotate right operand1 by operand2 places and place the result in the destination
register.
Condition codes:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details
C Unchanged V Unchanged
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
C[5:0] Operand 2 address F Set flags on result if 1
D[8:0] Immediate data field
Syntax:
with result Without result
RRC<.cc><.f> a,b RRC<.cc><.f> 0,b
RRC<.f> a,shimm RRC<.f> 0,shimm
RRC<.cc><.f> a,limm RRC<.cc><.f> 0,limm
Example:
RRC r1,r2
Description:
Rotate operand right by one place and place the result in the destination register.
The carry flag is shifted into the most significant bit of the result, and the least
significant bit of the source is placed in the carry flag.
Status flags:
Z N C V
* * * .
Z Set if result is zero N Set if most significant bit of result is set
8
0 0 0 1 1 A[5:0] B[5:0] 0 0 0 1 0 0 F Res. Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand address
D[8:0] Immediate data field Q Condition code field
Res Reserved. Should be set to 0. F Set flags on result if set to 1
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 addres
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field R Reserved. Should be set to 0.
F Set flags on result if set to 1
Sign Extend
SEX SEX
Arithmetic Operation
SEX
Operation:
dest ← operand sign extended from byte or word
Syntax:
with result without result
SEX<zz><.cc><.f> a,b SEX<zz><.cc><.f> 0,b
SEX<zz><.f> a,shimm SEX<zz><.f> 0,shimm
SEX<zz><.cc><.f> a,limm SEX<zz><.cc><.f> 0,limm
SEX 0,shimm ;nop
Example:
SEXW r1,r2
Description:
Sign extend operand to most significant bit in long word from byte or word
according to size field <zz> and place the result in the destination register. Valid
values for <zz> are:
W sign extend from word
B sign extend from byte
Status flags:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
C Unchanged V Unchanged
8
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand addres
H[5:0] Extend size. 05=byte, 06=word. D[8:0] Immediate data field
Q Condition code field R Reserved. Should be set to 0.
F Set flags on result if set to 1
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1
Example:
SR r1,[12]
Description:
Store operand 1 to the auxiliary register whose number is obtained from operand
2.
Status flags:
Not affected
Instruction format:
8
Instruction fields:
B[5:0] Operand 2 register address C[5:0] Operand 1 register address
D[8:0] Immediate data field R Reserved. Should be set to 0.
Store to memory
ST ST
Memory Operation
ST
Operation:
[operand 2 + offset]← operand 1
Syntax:
ST<zz><.a><.di> c,[b]
ST<zz><.di> c,[limm]
ST<zz><.a><.di> c,[b,shimm]
ST<zz><.di> c,[shimm,shimm] shimms MUST match
ST<zz><.a><.di> 0,[b]
ST<zz><.di> shimm,[limm] actually: shimm,[limm,shimm]
ST<zz><.a><.di> shimm,[b,shimm] shimms MUST match
ST<zz><.di> limm,[shimm,shimm] shimms MUST match
ST<zz><.a><.di> limm,[b,shimm]
Example:
ST.A r1,[r2,10]
Description:
Store operand 1 to the address calculated by adding operand 2 with offset.
NOTE If the offset is not required, the value encoded for the immediate offset will be
set to 0.
The data size of the load is set according to the size field <zz>. The following
Instruction Set Details
B Byte 01
The result of the address computation can be written back to the first register
operand in the address field. This write back occurs when the address write back
field, .A, is set.
If a data-cache is available in the memory controller the store instruction can
bypass the use of that cache when the direct to memory field, .DI, is set.
Status flags:
Not affected.
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
B[5:0] Operand 2 register address C[5:0] Operand 1 register address
D[8:0] Immediate data offset Di Direct to memory (cache bypass) enable
A Address write-back enable R Reserved. Should be set to 0
Z Size field
Encoding examples:
ST r5,[r7,50] ; ST c,[b,shimm]
I[4:0] B[5:0] C[5:0] D[8:0]
2=ST 7 5 50
ST 50,[12345678] ; ST shimm,[limm]
I[4:0] B[5:0] C[5:0] D[8:0] LIMM
2=ST 62=limm 63=shimm 50 12345678-50
ST 50,[r7,50] ; ST shimm,[b,shimm]
I[4:0] B[5:0] C[5:0] D[8:0]
2=ST 7 63=shimm 50
8
2=ST 62=limm 3 0 12345678
ST 12345678,[r20,8] ; ST limm,[b,shimm]
I[4:0] B[5:0] C[5:0] D[8:0] LIMM
2=ST 20 62=limm 8 12345678
ST 50,[50,50] ; ST shimm,[shimm,shimm]
I[4:0] B[5:0] C[5:0] D[8:0]
2=ST 62=shimm 62=shimm 50
Subtract
SUB SUB
Arithmetic Operation
SUB
Operation:
dest ← operand1 - operand2
Syntax:
with result without result
SUB<.cc><.f> a,b,c SUB<.cc><.f> 0,b,c
SUB<.f> a,b,shimm SUB<.f> 0,b,shimm
SUB<.f> a,shimm,c SUB<.f> 0,shimm,c
SUB<.f> a,shimm,shimm SUB<.f> 0,shimm,shimm
SUB<.cc><.f> a,b,limm SUB<.cc><.f> 0,b,limm
SUB<.cc><.f> a,limm,c SUB<.cc><.f> 0,limm,c
Example:
SUB r1,r2,r3
SUB.F 0,r3,200 ; compare r3 with 200 and set flags
SUB.LT r2,r2,r2 ; same effect as MOV.LT r2,0 but no
; limm 0 data needed
Description:
Subtract operand2 from operand1 and place the result in the destination register.
The carry flag if set is by the subtract instruction is interpreted as a “borrow”.
Status flags:
Z N C V
* * * *
Z Set if result is zero N Set if most significant bit of result is set
Instruction Set Details
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved
F Set flags on result if set to 1
Syntax:
with result without result
SWAP<.cc><.f> a,b SWAP<.cc><.f> 0,b
SWAP<.f> a,shimm SWAP<.f> 0,shimm
SWAP<.cc><.f> a,limm SWAP<.cc><.f> 0,limm
Example:
SWAP r1,r2
Description:
Swap the lower 16 bits of the operand with the upper 16 bits of the operand and
place the result of that swap in the destination register.
Condition codes:
Z N C V
* * . .
Z Set if result is zero N Set if most significant bit of result is set
8
0 0 0 1 1 A[5:0] B[5:0] 0 0 1 0 0 1 F R R R Q Q Q Q Q
OR
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Instruction fields:
A[5:0] Destination register address Q Condition code field
B[5:0] Operand 1 address R Reserved: set to 0
D[8:0] Immediate data field F Set flags on result if 1
Software Interrupt
SWI SWI
Control Operation
SWI
Operation:
instruction_error ← '1'
Syntax:
SWI
Example:
SWI
Description:
The software interrupt (SWI) instruction can be placed anywhere in the program,
even in the delay slot of a branch instruction. The software interrupt instruction is
decoded in stage two of the pipeline and if executed, then it immediately raises
the instruction error exception. The instruction error exception will be serviced
using the normal interrupt system. ILINK2 is used at the return address in the
service routine.
Once an instruction error exception is taken, then the medium and low priority
interrupts are masked off so that ILINK2 register can not be updated again as a
result of an interrupt thus preserving the return address of the instruction error
exception.
NOTE Only the reset and memory error exceptions have higher priorities than the
instruction error exception.
Status flags:
Not affected.
Instruction Set Details
Instruction format:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
8
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0
8
Instruction fields:
A[5:0] Destination register address. B[5:0] Operand 1 address
C[5:0] Operand 2 address D[8:0] Immediate data field
Q Condition code field Res Reserved. Should be set to 0
F Set flags on result if set to 1
It is expected that the registers and the program memory of the ARCtangent-A4
processor will appear as a memory mapped section to the host. For example,
Figure 21 shows two examples: a) a contiguous part of host memory and b) a
section of memory and a section of I/O space.
Memory Map
Running Halted
9
Halting
The ARCtangent-A4 processor can halt itself with the FLAG instruction or it can
be halted by the host. The host halts the ARCtangent-A4 processor by setting the
H bit in the STATUS register, or for basecase version numbers greater than 5 by
setting the FH bit in the DEBUG register. See Figure 14 and Figure 19.
NOTE Note that when the ARCtangent-A4 processor is running that only the H bit will
change if the host writes to the STATUS register. However, if the ARCtangent-
A4 processor had halted itself, the whole of the STATUS register will be
updated when the host writes to the STATUS register.
The consequence of this is that the host may assume that the ARCtangent-A4
processor is running by previously reading the STATUS register. By the time
that the host forces a halt, the ARCtangent-A4 processor may have halted itself.
Therefore, the write of a “halt” number to the STATUS register, say
0x02000000, would overwrite any program counter information that the host
required.
In order to force the ARCtangent-A4 processor to halt without overwriting the
program counter, Basecase versions greater than 5 have the additional FH bit in
the DEBUG register. See Figure 19. The host can test whether the ARCtangent-
A4 processor has halted by checking the state of the H bit in the STATUS
register. Additionally, the SH bit in the debug register is available to test whether
the halt was caused by the host, the ARCtangent-A4 processor, or an external
halt signal. The host should wait for the LD (load pending) bit in the DEBUG
register to clear before changing the state of the processor.
Starting
The host starts the ARCtangent-A4 processor by clearing the H bit in the
STATUS register. It is advisable that the host clears any instructions in the
pipeline before modifying any registers and re-starting the ARCtangent-A4
processor, by sending NOP instructions through, so that any pending instructions The Host
that are about to modify any registers in the processor are allowed to complete. If 9
the ARCtangent-A4 processor has been running code, and is to be restarted at a
different location, then it will be necessary to put the processor into a state
similar to the post-reset condition to ensure correct operation.
• reset the three hardware loop registers to their default values
Pipecleaning
If the processor is halted whilst it is executing a program, it is possible that the
later stages of the pipeline may contain valid instructions. Before re-starting the
processor at a new address, these instructions must be cleared to prevent
unwanted register writes or jumps from taking place.
If the processor is to be restarted from the point at which it was stopped, then the
instructions in the pipeline are to be executed, hence pipecleaning should not be
performed.
Pipecleaning is not necessary at times when the pipeline is known to be clean -
e.g. immediately after a reset, or if the processor has been stopped by a FLAG
instruction followed by three NOPs.
The Host
5. Single step until the values in the program counter or loop count register
change.
6. Point the PC/Status register to the downloaded NOP
7. Single step until the values in the program counter or loop count register
change.
8. Point the PC/Status register to the downloaded NOP
9. Single step until the values in the program counter or loop count register
change.
Notice that the program counter is written before each single step, so all branches
and jumps, that might be in the pipeline, are overridden, ensuring that the NOP is
fetched every time.
It should be noted that the instructions in the pipeline may perform register
writes, flag setting, loop set-up, or other operations which change the processor
state. Hence, pipecleaning should be performed before any operations which set
up the processor state in preparation for the program to be executed - for example
loading registers with parameters.
Single Stepping
The Single Step function is controlled by two bits in the DEBUG register. These
bits can be set by the debugger to enable the Single Cycle Stepping or Single
Instruction Stepping. The two bits, Single Step (SS) and Instruction Step (IS), are
write-only by the host and keep their values for one cycle (see Table 19).
Field Description Access Type
SS Single Step:- Cycle Step enable Write only from the host
IS Instruction Step:- Instruction Step Write only from the host
enable
Table 19 Single Step Flags in Debug Register
The Host
9
Single cycle step
The Single Cycle Step function enables the processor for one cycle only.
Normally, an instruction is completed in four cycles: fetch, register read, execute
and register writeback. In order to complete an instruction, the debugger must
repeatedly single cycle step the processor until the program counter value is
Then two instruction fetches are made so that the program counter would be
updated appropriately.
The BRK instruction behaves exactly as when the processor is not in the Single
Step Mode. The BRK instruction is detected and kept in stage one forever until
removed by the host.
Software Breakpoints
As long as the host has access to the ARCtangent-A4 code memory, it can
replace any ARCtangent-A4 instruction with a branch instruction. This means
that a “software breakpoint” can be set on any instruction, as long as the target
breakpoint code is within the branch address range. Since a software breakpoint
is a branch instruction, the rules for use of Bcc apply. Care should be taken when
setting breakpoints on the last instructions in zero overhead loops and also on
instructions in delay slots of jump, branch and loop instructions. (See Pipeline
Cycle Diagrams for: Loops and Branches).
For ARCtangent-A4 basecase processor versions 7 and higher, the BRK
instruction can be used to insert a software breakpoint. BRK will halt the
ARCtangent-A4 processor and flush all previous instructions through the pipe.
The host can read the STATUS register to determine where the breakpoint
occurred.
STATUS
The host can read the status register when the ARCtangent-A4 processor is The Host
running. This is useful for code profiling. See Figure 14. 9
SEMAPHORE
The semaphore register is used for inter-processor and host-ARCtangent-A4
communications. Protocols for using shared memory and provision of mutual
exclusion can be accomplished with this register. See Figure 15.
IDENTITY
The host can determine the version of ARCtangent-A4 processor by reading the
identity register. See Figure 18. Information on extensions added to the
ARCtangent-A4 processor can be determined through build configuration
registers. For more information on build configuration registers please refer to
the 'ARCtangent-A4 Development Kit for ARCtangent-A4 Release Notes'.
DEBUG
In order to halt ARCtangent-A4 processor, the host needs to set the FH bit of the
debug register. The host can determine how the ARCtangent-A4 processor was
halted and if there are any pending loads. See Figure 19.
ARCtangent-A4 Memory
The program memory can be changed by the host. The memory can be changed
at any time by the host.
NOTE If program code is being altered, or transferred into ARCtangent-A4 memory
space, then the instruction cache should be invalidated.
The Host
9
Introduction
The ARCtangent-A4 processor has a four stage pipeline as shown in Figure 23.
Load data
Load
Load/Store Address
Store
Store data
Unit Latch
Core Auxiliary
Registers Registers
Write back
Pipeline-Cycle Diagram
In the explanation of the passage of instructions through the pipeline stages the
diagram in Figure 23 is used.
t t+1 t+2 t+3 t+4 t+5
Event 1 stage 1 stage 2 stage 3 stage 4
Event 2 stage 1 stage 2 stage 3 stage 4
Event 3 stage 1 stage 2 stage 3 stage 4
We can show the events in the pipeline with the following diagram:
10
BIC ifetch r9, r10 BIC r8
SUB ifetch r12,r13 SUB r14
At cycle t+3
The write back stage (stage 4) is updating r1
The ALU stage (stage 3) is performing an OR.
The operand fetch (stage 2) fetching the operands r9 and r10 for BIC.
The instruction fetch of SUB is occurring at stage 1.
Short immediate
The short immediate data of an instruction is available at the operand fetch stage
and is taken from the low 9 bits of the instruction. The instruction takes the same
time to cycle through the pipeline.
Long immediate
The long immediate data is taken from the word in the instruction fetch stage
while the instruction is in the operand fetch stage. The stages that the long
immediate data would pass through, if it were an instruction, are disabled.
This means that a long immediate instruction takes one cycle longer and the next
instruction is a cycle later.
The stages perform the following operations during an Arithmetic or Logic
instruction with immediate data.
10
Stage 2
Fetch 1 operand from registers and the other from the value currently in stage 1
Disable instruction word in stage 1.
Stage 3
Do Arithmetic or Logic function
Stage 4
Write result to register
For example:
AND r1,r2,2000
OR r5,r1,r4
SUB r14,r12,r13
Destination immediate
If the destination for the result of an instruction is marked as being immediate,
then the write-back at stage 4 is disabled.
For example:
AND 0,r2,r3
OR r5,r1,r4
BIC r8,r9,r10
SUB r14,r12,r13
10
OR.NE ifetch r6, r4 condition Disable
.F code test wrt-back
10
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12
10
ASL r1,r2,0x5
OR r5,r1,r4
BIC r8,r9,r10
SUB r14,r12,r13
t t+1 t+2 t+3 t+4 t+5 t+6
ASL ifetch r2, 0x5 ASL stalled r1
OR ifetch r1, r4 stalled OR r5
BIC ifetch stalled r9, r10 BIC r8
SUB ifetch r12,r13 SUB
The affect of this code through the pipeline is shown in the following diagram:
10
with r5
OR ifetch r2,r3 OR r1
NOTE In this case that because the BIC instruction is in the delay slot, the flags are
changed after the jump by BIC. If the delay slot instruction was nullified then
the flags would only be changed by the jump instruction.
Conditional jump
Condition code tests for branch, loop and jump instructions happen at stage 2 in
the pipeline, rather than at stage 3 for conditional arithmetic and logic
instructions. As a result, a single cycle stall will occur if a jump is immediately
preceded by an instruction that sets the flags.
In the following example, the flags are set by two instructions and it can be seen
in the pipeline-cycle diagram where the effects of the flags occur.
10
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3
10
If condition true then pass PC to stage 4
Stage 4
If condition true then write return address to LINK register.
main:
AND.F r1,r2,r3
MOV r5,r7
JLNE.D jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr: OR r1,r2,r3
10
MOV ifetch r7 MOV r5
BRA.D ifetch update no no
PC with action action
rel_addr
BIC delay → ifetch r9,r10 BIC r8
slot
OR ifetch r2,r3 OR
Conditional branch
The condition codes are tested at stage 2, as in the jump instruction. A single
cycle stall will occur if a conditional branch is immediately preceded by an
instruction that sets the flags.
main:
AND.F r1,r2,r3
MOV r5,r7
BNE.D jaddr
BIC r8,r9,r10
SUB r14,r12,r13
...
jaddr:
OR r1,r2,r3
Software breakpoints
A software breakpoint is implemented by the use of the branch instruction, Bcc.
The action of a software breakpoint is to branch to the breakpoint code
whereupon the appropriate action will be taken according to the debugging
session, for example, write a value to a register and halt the ARCtangent-A4
processor.
Once the breakpoint is hit and the breakpoint code is executed, there is a problem
on how to restart the code after the breakpoint. The next instruction to have been
fetched will be the target of the branch not the instruction that was replaced by
the breakpoint.
In this case, for debugging purposes, the breakpoint should replace the branch in
the ARCtangent-A4 code rather than the instruction in the delay slot.
10
delay slot canceling mode.
Loop Timings
Loop set up
The loop instruction sets up the loop start (LP_START) and loop end (LP_END)
registers. LP_START register is updated with CURRENT PC and LP_END
updated with the relative address (REL_ADDR) at stage 2.
A single cycle stall will occur if a loop is immediately preceded by an instruction
that sets the flags.
Stage 1
Instruction fetch and start decode
Stage 2
Fetch address from instruction.
Test condition code.
10
If condition false update PC with address.
Execute instruction in delay slot according to the nullify instruction mode.
Stage 3
No action
Stage 4
No action
main:
AND.F r1,r2,r3
MOV r5,r7
LP loop1
BIC r8,r9,r10
SUB r14,r12,r13
loop1:
OR r1,r2,r3
10
main:
AND.F r1,r2,r3 ; sets zero flag
MOV r5,r7
LPNE.D loop1
BIC r8,r9,r10
SUB r14,r12,r13
loop1:
OR r1,r2,r3
Loop execution
The operation of the loop is such that the PC+1 is constantly compared with the
value LP_END. If the comparison is true, then LP_COUNT is tested. If
LP_COUNT is not equal to 1, then the PC is loaded with the contents of
LP_START, and LP_COUNT is decremented. If, however, LP_COUNT is 1,
then the PC is allowed increment normally and LP_COUNT is decremented.
10
registers
If the user wishes to have single instruction loops, then the following code can be
used. Notice, there has to be a delay to allow the loop start and loop end registers
to be updated with the SR instruction.
MOV LP_COUNT,5 ; no. of times to do loop
MOV r0,dooploop>>2 ; convert to long-word size
ADD r1,r0,1 ; add 1 to dooploop address
main: SR r0,[LP_START] ; set up loop start register
SR r1,[LP_END] ; set up loop end register
NOP ; allow time to update regs
NOP
dooploop: OR r21,r22,r23 ; single instruction in loop
ADD r19,r19,r20 ; first instruction after loop
10
AND.F ifetch loop AND no
count action
10
NOP ifetch NOP NOP NOP
LPZ ifetch update no no
loop
registers action action
OR ifetch r22,r23 OR r21
AND ifetch r21,23 AND
OR ifetch r19,r20
10
However, since the flag instruction can halt the ARCtangent-A4 processor, the
pipeline is halted with the following instruction in stage 3. If only the H bit is set
then the other flags are unchanged. See example below:
main:
FLAG 1 ; halt the ARCtangent-A4
OR r21,r22,r23 ;
AND r1,r2,r3 ;
XOR r5,r6,r7 ;
halted
t t+1 t+2 t+3 t+4 t+5 t+6
FLAG ifetch 1 FLAG no no no
action action action
OR ifetch r22,r23 OR OR OR
AND ifetch r2,r3 r2,r3 r2,r3
XOR ifetch ifetch ifetch
H 0 0 0 1 1 1
Breakpoint
The breakpoint instruction is decoded in stage one of the ARCtangent-A4
pipeline, and the remaining stages are allowed to complete. Effectively flushing
the pipeline.
The BRK instruction stops any further instructions entering the pipeline. To
resume execution the host will read the program counter (frozen at t+1, below),
re-write current (BRK) memory location with the required instruction, invalidate
the cache (if implemented) and then restart at that memory location.
Sleep Mode
The SLEEP instruction is decoded at stage 2 of the ARCtangent-A4 pipeline.
When SLEEP reaches stage 2 the earlier instructions and the SLEEP instruction
itself are flushed from the pipe and the processor is then put into sleep mode.
The instruction following the SLEEP enters stage 1 and stays there, until the
ARCtangent-A4 processor is "woken up" from sleep mode.
10
Stage 2
Full decode of sleep instruction. Flush pipeline. Update ZZ bit
Stage 3
No action.
Stage 4
No action
main:
ADD r0,r1,r2 ;
SLEEP ;
SUB r3,r4,r5 ;
Load
The stages perform the following in a load instruction:
Stage 1
Instruction fetch and start decode
Stage 2
Fetch operands.
Update scoreboard unit with destination address marked as invalid.
Stage 3
Add operands to form address.
Request load from memory controller.
Stage 4
If address write-back enabled then write-back address calculation to first operand
register.
If address write-back disabled then allow pipeline to continue because data is
unlikely to be ready.
also
Stage 4
Re-enabled when data ready from memory controller, pipeline held for one
cycle.
Update scoreboard unit marking register as valid.
When a register is waiting to be updated by a previous load and that register is
one of the operands or results of an instruction in the pipeline at stage 2 then the
pipeline is halted until that register is updated.
A scoreboard unit is used to retain the information on which registers are waiting
to be written. The scoreboard unit is updated at stage 2 when the destination
register address is known, and updated at stage 4 when the register has been
written to.
10
load does not use the destination register of the load (this is checked by the
instruction in stage 2). Once an instruction does need that register then the
pipeline is halted and waits for the load to complete.
NOTE When the target of a LD.A instruction is the same register as the one used for
address write-back (.A), the returning load will overwrite the value from the
address write-back.
When the data for the delayed load is ready, the pipeline is stalled because the
load uses the write-back in stage 4 to update the register. In this example, the
load is delayed by two cycles. The OR instruction is stalled in stage 3 and the
SUB stalled in stage 2 until the register write-back is complete.
main:
LD r1,[r2,r3]
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12
If the AND used a register that was dependent on the result of the load then the
AND would stall.
For example, with a dependency on R1:
main:
LD r1,[r2,r3]
AND r4,r1,r6
OR r7,r8,r9
SUB r10,r11,r12
Store
The store instruction takes a single cycle to complete. The data to be stored is
ready at stage 2 and the address to which the store is to occur is ready at stage 3.
Stage 1
Instruction fetch and start decode.
Stage 2
Fetch 2 address operands and data operand.
Latch data operand for memory controller.
Stage 3
Add address operand to form address.
Request store to memory controller.
Stage 4
No action
main:
ST r1,[r2,333]
AND r4,r5,r6
OR r7,r8,r9
SUB r10,r11,r12
10
shimm
AND ifetch r5,r6 AND r4
OR ifetch r8,r9 OR r7
SUB ifetch r11,r12 SUB r10
10
Interrupts occur in a similar way to the branch and link instruction. However, the
value that is latched into the link register is the CURRENT PC rather than
NEXT_PC.
When an interrupt occurs, the instruction in instruction fetch at stage 1 is
replaced by a call to the interrupt service routine.
NOTE Interrupts are not allowed to interrupt anything in a delay slot or a fetch of long
immediate data.
Stage 1
Current instruction in ifetch is replaced by a branch like instruction.
The CURRENT PC is not updated to NEXT_PC.
Stage 2
CURRENT PC is routed to the data for next stage.
CURRENT PC is updated to the interrupt vector.
Stage 3
The data from stage 2 is passed to stage 4
Stage 4
The data is written to the ILINK register (the PC from stage 1)
↓
t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
MOV.F ifetch r7 update r5
flags
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
BIC replaced
BIC delay slot → ifetch killed killed
again
OR ifetch r16,r17 OR
10
executed, then it immediately raises the instruction error exception. In this
example a program execution resumes at the instruction error vector, which
contains a jump to the instruction error service routine.
main:
AND.F r1,r2,r3
SWI
BIC r8,r9,r10 ;<---- instruction error exception
SUB r14,r12,r13
...
ins_err:
JAL instr_serv
...
instr_serv:
OR r15,r16,r17
↓
t t+1 t+2 t+3 t+4 t+5 t+6
AND.F ifetch r2,r3 update r1
flags
SWI ifetch SWI killed killed
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
BIC replaced
BIC delay → ifetch killed killed
again slot
JAL ifetch JAL
limm limm disabled
OR ifetch
↓)
(↓ ↓
t t+1 t+2 t+3 t+4 t+5 t+6
MOV ifetch r7 MOV r5
JAL.D ifetch update no action no action
PC
SUB delay → ifetch r12,r13 SUB r14
slot
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
ADD replaced
ADD delay → ifetch killed killed
again slot
OR ifetch r16,r17
10
this comparison is true then, if LP_COUNT is not 1, then CURRENT PC
becomes LP_START. When an interrupt occurs during this comparison-update
stage the link register (ILINK2) becomes CURRENT PC. In order to stop
LP_COUNT from double decrementing, the loop count decrement mechanism
must be disabled during interrupt for 2 cycles.
main:
AND.F r1,r2,r3
MOV r5,r7
LP loop1
BIC r8,r9,r10
SUB r14,r12,r13;<------ level 2 interrupt to ivect7
loop1:
OR r1,r2,r3
...
ivect6:
JAL service6
ivect7:
ADD r15,r16,r17
↓
t t+1 t+2 t+3 t+4 t+5 t+6
BIC ifetch r9,r10 BIC r8
interrupt interrupt update pass old write
→ PC PC back
through ILINK2
SUB replaced
SUB delay → ifetch killed killed killed
again slot
ADD ifetch r16,r17 ADD r15
ifetch
Interrupt on store
The store instruction is treated in the same way as an arithmetic or logic
instruction.
10
AND ifetch r2, r3 AND r1 halte
d
OR ifetch ifetch ifetch ifetch ifetch r6, r4 OR r5 halte
d
BIC ifetch ifetch ifetch ifetch
H 0 0 0 0 1 0 0 0 0 1
i-step i-step
↓ ↓
t t+1 t+2 t+3 t+4 t+5 t+6 t+7 t+8 t+9
AND ifetch r2, r3 AND r1 halted
0x102 10200 disabled disabled disabled
00 0
OR ifetch ifetch ifetch ifetch r6, r4 OR r5 halted
BIC ifetch ifetch ifetch ifetch
H 0 0 0 0 1 0 0 0 0 1
B
B field, 19
barrel shift instructions, 51, 147, 150
Bcc, 88
BH bit, 66
BIC, 87
BLcc, 89
branch address calculation, 158
branch and jump in loops, 39, 171
branch type instruction, 72, 77
branches, 33
breakpoint instruction, 43, 66, 139, 173
BRK, 91
byte, 16
C
C field, 19
code profiling, 139
condition code field, 19
10 D
data organisation, 15
data-cache, 11, 47, 99, 126
debug register, 65
delay slot, 34
delayed load, 11, 47, 177
direct memory mode, 47, 99, 100, 127
dual access registers, 139
dual operand instruction, 71
E
encoding immediate data, 61
encoding instructions, 75
endianness, 16
EXT, 92
extensions, 9
auxiliary registers, 9
condition codes, 10
core register, 9
instruction set, 10
extensions, 5
extensions library, 49
F
F bit, 19
FH bit, 135
FLAG, 93
flag instruction, 30
force halt, 65
H
H bit, 135
halting ARC, 12, 93, 135, 154, 173
host interface, 133
I
I field, 19
identity register, 65, 140
immediate data indicator, 76
10
instruction layout, 19
instruction map, 10
instruction set summary, 29
instruction-cache, 11
interrupt unit, 11, 27
interrupt vectors, 24
interrupts, 181
IS bit, 66, 137
J
Jcc, 95
JLcc, 97
jump instruction, 72
jumps, 33
L
L field, 19
LD, 99
LD bit, 66
link register, 23, 61
load alignment, 16
load and store, 46
load instruction, 73, 74
load pending, 65, 135
load register, 48
load store unit, 11
logical operations, 29, 49
long immediate, 16
long immediate data and loops, 40, 172
long word, 16
loop construct, 35
loop count register, 38, 61, 168
loop end register, 63
loop start register, 63
loops, 33
LP instruction, 35
LP_COUNT, 35
LP_END, 35, 65
LP_START, 35, 65
LPcc, 101
LR, 102
LSL, 103
LSR, 104
M
10
manufacturer code, 65
manufacturer version number, 65
MAX, 106
memory alignment, 16
memory controller, 11, 19, 47, 141
memory endianness, 16
memory error, 26
MIN, 107
MIN/MAX instructions, 53, 147
MOV, 108
MUL64, 109
multi cycle extension instructions, 147
multiply instruction, 50, 148
multiply scoreboard unit, 148
MULU64, 111
N
N field, 19
NOP, 113
NORM, 114
normalize instruction, 51, 147
null instruction, 30
O
operand size, 15
OR, 116
orthogonal, 5
P
pipecleaning, 136
pipeline, 47, 141
pipeline cycle diagram, 142
pipeline stall, 47, 148, 177
power management features, 12
program counter, 55, 63
Q
Q field, 19
10
register set, 5
reset, 26
RISC, 5
RLC, 117
ROL, 118
ROR, 119
ROR multiple, 120
rotate instructions, 30
RRC, 121
S
SBC, 122
scoreboard unit, 11, 47, 176
self halt, 65
semaphore register, 63, 139
SEX, 123
SH bit, 66
short immediate, 16
short immediate addressing, 29, 49, 77
single cycle extension instructions, 147
single instruction loops, 37, 166
single instruction step, 66, 138, 186
single operand instructions, 30, 72
single step, 65, 137
SLEEP, 124
sleep instruction, 44, 66, 174
software breakpoints, 12, 139, 160, 172
software interrupt, 46, 130
SR, 125
SS bit, 66
ST, 126
starting ARC, 135
status register, 55, 63
store alignment, 16
store instruction, 73, 74
store register, 48
SUB, 128
SWAP, 129
swap instruction, 52, 147
SWI, 46, 130
W
word, 16
X
XOR, 130, 131
Z
zero delay loops, 35, 61
ZZ bit, 66