Basic Avr Arithmetic
Basic Avr Arithmetic
7
MULTIPLICATION & DIVISION
by RetroDan@GMail.Com
CONTENTS:
If your AVR chip supports the multiplication command (MUL) then multiplying two eight-bit
numbers is quite simple. MUL will work on all 32 registers R0 to R31 and leave the low-byte of the
result in R0 and the high-byte in R1. The registers for multiplicand and multiplier remain
unchanged. The routine takes about three cycles.
If our AVR does not support the hardware MUL command we will have to compute multiplications
manually. If we need to multiply by a power of two such as 2,4,8 etc. The result can be achived by
shifting bits to the left. Each shift to the left is a multiplication by two.
The Logical Shift Left (LSL) command is used on the lower byte because it will shift the contents
one bit to the left, a zero is shifted into the lowest bit and the highest bit is shifted into the carry
flag.
10101010
Carry [1] <-- 01010100 <--0 (LSL)
We use the Rotate Left though Carry (ROL) command on the high byte because it will also shift
contents one bit to the left, but it will shift the contents of the Carry Flag into the lowest bit.
00000000
Carry [0] <-- 00000001 <--[1] Carry (ROL)
Every time we shift the multiplicand to the left we are multiplying it by two. So to multiply by eight
we simply shift the multiplicand to the left three times. The routine takes about ten cycles.
00101010 = 42 multiplicand
x00001010 = 10 multiplier
--------
00000000
00101010
00000000
00101010
00000000
00000000
00000000
00000000
----------------
0000000110100100 = 420 result
================
The routine below mimics the hardware multiply (MUL) by leaving the multiplicand and multiplier
untouched, and the result appears in the register pair R1 and R0. It shifts the bits of the multiplier
into the carry bit and uses the contents of the carry to add the multiplicand if it is a one or skip it if
the carry is a zero. The routine takes about sixty cycles.
Multiplying two 16-bit numbers can result in a four-byte result. We will use the hardware multiply
(MUL) command to create all four cross products and add them to the 32-bit result. The MUL
command leaves it results each time in R1:R0 which we then add into our result. The routine takes
about twenty cycles.
MUL16x16:
CLR ZERO ;Set R2 to zero
MUL AH,BH ;Multiply high bytes AHxBH
MOVW ANS4:ANS3,R1:R0 ;Move two-byte result into answer
Multiplying Two-Byte numbers together can leave a result that is four-bytes wide (32-bits). With
this routine we add the multiplicand to the high-bytes of our result for each one that appears in our
16-bit multiplier, then shift the result into the lower bytes of our result sixteen times, once for each
bit of our multiplier. The routine takes about 180 cycles.
MUL16x16:
CLR ANS3 ;Set high bytes of result to zero
CLR ANS4 ;
LDI C,16 ;Bit Counter
LOOP: LSR BH ;Shift Multiplier to the right
ROR BL ;Shift lowest bit into Carry Flag
BRCC SKIP ;If carry is zero skip addition
ADD ANS3,AL ;Add Multiplicand into high bytes
ADC ANS4,AH ;of the Result
SKIP: ROR ANS4 ;Rotate high bytes of result into
ROR ANS3 ;the lower bytes
ROR ANS2 ;
ROR ANS1 ;
DEC C ;Check if all 16 bits handled
BRNE LOOP ;If not then loop back
6. MULTIPLYING TWO 32-BIT NUMBERS MANUALLY
The follwing routine will multiply two 32-bit numbers with a 64-Bit (8 Byte) result. The routine
takes about 500 clock cycles.
Since there is no hardware divide command, we will have to do it manually. If we need to divide by
a power of two such as 2,4,8 etc. The result can be achieved by shifting bits to the right. Each shift
to the right is a division by two.
The Logical Shift Right (LSR) command is used on the higher byte because it will shift the contents
one bit to the right, a zero is shifted into the highest bit and the lowest bit is shifted into the carry
flag.
01010101
0--> 00101010 -->[1] Carry
We use the Rotate Right though Carry (ROR) command on the low byte because it will also shift
contents one bit to the right, but it will shift the contents of the Carry Flag into the highest bit.
00000000
Carry [1] --> 10000000 -->[0] Carry (ROL)
Every time we shift the multiplicand to the right we are dividing it by two. So to divide by eight we
simply shift the multiplicand to the right three times. The routine takes about ten cycles.
Just as multiplication can be achieved with shifting and addition. Dividing can be accomplished by
shifting and subtraction. The routine below tries to repeatedly subtract the divisor. If the result is
negative it reverses the process and shifts the dividend to the left to try again. The routine takes
about 90 cycles.
The previous routine can be expanded to handle two-byte numbers in the range of zero to 65,535.
The routine takes about 230 cycles.
The previous routine can be futher expanded to handle 32-bit number divided by a 16-bits number,
This means numbers in the range of zero to 4,294,967,295 divided by numbers in the range zero to
65,535. The routine takes about 700 cycles.
DIV3216:
CLR ZERO
MOVW ANS2:ANS1,A2:A1 ;Copy dividend into answer
MOVW ANS4:ANS3,A4:A3 ;
LDI C,33 ;Load bit counter
SUB REM1,REM1 ;Clear Remainder and Carry
CLR REM2 ;
CLR REM3 ;
CLR REM4 ;
LOOP: ROL ANS1 ;Shift the answer to the left
ROL ANS2 ;
ROL ANS3 ;
ROL ANS4 ;
DEC C ;Decrement Counter
BREQ DONE ;Exit if 32 bits done
ROL REM1 ;Shift remainder to the left
ROL REM2 ;
ROL REM3 ;
ROL REM4 ;
SUB REM1,BL ;Try to subtract divisor from remainder
SBC REM2,BH ;
SBC REM3,ZERO ;
SBC REM4,ZERO ;
BRCC SKIP ;If the result was negative then
ADD REM1,BL ;reverse the subtraction to try again
ADC REM2,BH ;
ADC REM3,ZERO ;
ADC REM4,ZERO ;
CLC ;Clear Carry Flag so zero shifted into A
RJMP LOOP ;Loop Back
SKIP: SEC ;Set Carry Flag to be shifted into A
RJMP LOOP
DONE:
11. SQUARE-ROOT OF A SINGLE-BYTE NUMBER
To compute the square of an eight-byte number we can take advantage of the fact that the sum of
the odd numbers create square numbers:
1 = 1^2 = 1
1+3 = 2^2 = 4
1+3+5 = 3^2 = 9
1+3+5+7 = 4^2 = 16
1+3+5+7+9 = 5^2 = 25
If we study the above table we notice that the square-root of the number equals the number of odd
numbers we have summed. We simply create a routine that keeps track of the total of odd numbers
that have been subtracted. The routine takes fifteen to one hunded cycles.
SQRT:
LOOP:
SUB A,B ;Subtract B from Square
BRCS DONE ;If bigger than sqaure we are done
INC ANS ;Increment the answer
SUBI B,-2 ;Increment B by two
RJMP LOOP
12. SQUARE-ROOT OF A 16-BIT NUMBER
We could expand the routine above to handle the square-root of a sixteen-bit number but the
number of clock cycles can be as high as 3750. The routine below processes two bits at a time
starting with the highest bits and also provides the remainder. It uses about 160 clock cycles.
LDI AL,LOW($FFFF)
LDI AH,HIGH($FFFF)
SQRT16:
PUSH AL ;Save Square for later restore
PUSH AH ;
CLR REML ;Initialize Remainder to zero
CLR REMH ;
CLR ANSL ;Initialize Root to zero
CLR ANSH ;
LDI C,8 ;Set Loop Counter to eight
LOOP:
LSL ANSL ;Multiply Root by two
ROL ANSH ;
LSL AL ;Shift two high-bits of Square
ROL AH ;into Remainder
ROL REML ;
ROL REMH ;
LSL AL ;Shift second high bit of Sqaure
ROL AH ;into Remainder
ROL REML ;
ROL REMH ;
CP ANSL,REML ;Compare Root to Remainder
CPC ANSH,REMH ;
BRCC SKIP ;If Remainder less or equal than Root
INC ANSL ;Increment Root
SUB REML,ANSL ;Subtract Root from Remainder
SBC REMH,ANSH ;
INC ANSL ;Increment Root
SKIP:
DEC C ;Decrement Loop Counter
BRNE LOOP ;Check if all bits processed
LSR ANSH ;Divide Root by two
ROR ANSL ;
POP AH ;Restore Original Square
POP AL
RJMP LOOP
13. SQUARE-ROOT OF A 32-BIT NUMBER
We can expand the previous routine to handle 32-bit numbers. It takes between 500 to 580 clock
cycles to complete.
Since there are only seven possible cube roots of a single-byte number (0 to 6) this routine simply
cycles through them until the correct root is found. The routine takes from 20 to 90 clock cycles.
The following routine finds the cube root by using two successive approximations, one high and
one low and waits until they merge. The routine uses 150 to 250 clock cycles.
;-------------------------------------------------;
; Calculates Cube of B, Results in CUB3:CUB2:CUB1 ;
;-------------------------------------------------;
CUBE:
MUL B,B ;Calc B*B
MOVW TMP2:TMP1,R1:R0 ;
MUL TMP1,B ;Calc B*B*B
MOVW CUB2:CUB1,R1:R0 ;
MUL TMP2,B ;
ADD CUB2,R0 ;
CLR CUB3 ;
ADC CUB3,R1 ;
RET