Arbitrary Rotation of Raster Images
with SIMD Machine Architectures
zyxw
zyxw
zyxwvutsrq
zyxwvutsrq
zyxwvutsrq
zyxwvutsrqpo
H.R. Arabnia and M.A. Oliver’
Abstract
An algorithm for the rotation of a raster image by an
arbitrary angle is described. The image data structure
is closely related to runlength code. The algorithm has
been designed to exploit SIMD parallel architectures.
It has been implemented on an ICL DAP on which
non trivial images can be rotated in times very close to
real time.
lower left corner of the image space with the x-axis to
the right and the y-axis up. Consider a runlength
wded image and specify a background colour: each
strip of colour, other than the background colour, is
specified by the position coordinates of its origin (ordinal number of its first pixel, scanline number), length
(number of pixels), and colour; the background colour
is not explicitly coded.
Notation
1. Introduction
In this paper we are concerned with the problem of
how to exploit SIMD machine architectures for the
rotation of a digitised raster image by an arbitrary
angle.
SIMD array processor architectures
A typical SIMD (Single Instruction on Multiple Data
stream) array processor’ consists of an array of elemen-
tary processors (PES) where each processor has a
private memory. A single instruction broadcast from a
Master Control Unit (MCU) is obeyed simultaneously
by all PEs. Masking schemes are used to control the
status of each PE during the execution of an instruction
(from the MCU). In this way each PE may be either
enabled or disabled for an instruction; only enabled PEs
perfom computation. The PE array can commonly be
organised either as a square array or as a linear array
with nearest neighbour connections in both cases.
Stry code
The image space is a square raster of 2” x 2” pixels.
Parallel assignment of vectors is denoted by the symbol
c whereas := denotes assignment of scalar quantities.
The context makes clear the variable type(vector or
scalar). The operators which follow are used for bit
manipulation:
&
I
<<
>>
bitwise ‘and‘
bitwise ‘or’
left shift
right shift
where the shift operators shift their left-hand operand
by the number of bit positions given by the right-hand
operand. In expressions where a logical vector is used
to index the components of a vector on the left-hand
side of the assignment operator c the assignments will
take place only for the components which correspond
to true values in the logical vector. Thus, only those
components of x which correspond to “true” values of
mask are set equal to the corresponding components of
x:
zyxwvut
zyxwvutsr
The data structure used to represent the image in the
machine is the “stripcode” which we introduced in a
previous paper, where we showed that it has some nice
properties for SIMD array processor architectures.*
Stripcode is essentially runlength code which exploits
the horizontal coherence between adjacent pixels on a
scan. In this way a reduction in the number of data
objects needed for the representation of the image,
compared to the number of pixels in a frame buffer, is
achieved. The coordinate system has its origin in the
*Computing Laboratory
The University of Kent
Canterbury, Kent CT2 7NF
United Kingdom
h’orth-Hollmd
Compiiter Graphics Fonim 6 ( 1 9 8 7 ) 3-12
x(mask) c
X
the other components of x remain unchanged. In
expressions where a scalar quantity is used in parallel
calculations, the scalar quantity will be expanded to
vector mode: all elements of the vector are set equal to
the scalar quantity. The four vectors x , y , l , c hold the
xcoordinates, y-coordinates, lengths and colours of the
strips. The program fragments are in a Pascal-like
pseudo code with the comments between braces.
4
zyxwvutsrqp
zyxwvu
zyxwv
zyxwvutsrqponml
zyxwvutsrqp
zyxwvuts
zyxwvuts
zyxwvutsr
zyxwv
H.R. Arabiiia et al. /Arbitrary Rotation of'Raster Inurges
2. image Rotation
2.1. The Basic Algorithm
The manipulation rotates the image by B degrees about
a specified point. The 90 and 180 degree rotation
operations are treated as special cases; see the remarks
in the conclusion.
The algorithm has six steps and can be regarded as a
series of image mappings.
The strips are rotated by 0 degrees which gives
the rotated image exactly, but not digitised. The
problem now is to represent these rotated strips
by a set of new horizontal strips in the raster, i.e.
to digitise them.
The rotated strips are clipped to the image space.
The rectangular parts of the rotated strips which
lie completely outside the image space are
clipped, as shown in figure 1.
The clipped rotated strips are mapped to an auxiliary space. Each strip is represented by a rectangular region (figure 3) the construction of
which is described in detail below. The horizontal boundaries of these rectangular regions lie on
scanline boundaries.
The rectangular regions are horizontally divided
into strips which coincide with scanline boundaries (figure 5). These strips are mapped onto
the image raster in a way which generates the
required digitised representation of the original
rotated strip (figure 6 shows this before the final
horizontal digitisation).
(a) unclipped
Figure 1. Rotation and clip of image strips
(v)
At this stage the code is not correctly ordered
and has to be sorted.
(vi)
Finally the stripcode has to be compacted in the
sense that adjacent strips with the same colour
on a scanline are represented by one longer strip.
In order to represent the rotated strips by a new set of
horizontal strips, the rotated strips are mapped onto
rectangular regions in an auxiliary space as shown in
figure 3 (step (iii)). The reason for this mapping is that
these rectangular regions can be divided horizontally
into strips using the SIMD parallelism very efficiently
(see the detailed description of step (iv)). These new
horizontal strips can be mapped easily onto the image
raster in a way which generates the required digitised
representation of the original rotated strips (figure 6).
Now for a more detailed description of the complex
steps in the algorithm.
Step (iii)
Find the intersection point of the lower side of the
rotated strip with the x-axis, if no intersection occurs
then extend the side down until it does. An example is
shown in figure 2. These intersection points, one for
each rotated strip, will be kept in the vector intx.
Find the intersection points of the ends of the rotated
strip with the scanlines. Each end intersects one scanline at most. If no intersection occurs then extend the
ends down until they do. This is done in four statements:
(6) clipped
zyxwvutsr
zyxwvu
zyxwv
zyx
H.R. Arobnia et al. 1Arbitrary Rototion of Raster Images
Figure 2.
zyxwvutsrqp
zyxwvutsrqponm
zyxwvutsrqponmlkjihgfedcbaZYX
zyxwv
X Z ' c x2 -(truncateO,,)-yI)*tan(B)
Y I ' c truncate(y,)
X I ' t X I - (truncateCy2)- y 2 ) * tan(@)
y2' t truncate(y2)
where ( x , , y z )and ( x 2 , y I )are the coordinates of the
top left and top right corners of the rotated strip. The
intersection points ( x , ' , y z ' )and ( x 2 ' . y l ' )are shown in
figure 3.
Now map the rotated strip into a rectangular region
(unrotated) in an auxiliary space. This region has the
1
X i , Y 1'
Figure 3.
5
following specification:
coordinates of
lower left comer:
horizontal width:
vertical height:
colour:
(intx,y I '),
sec(4,
+ 1,
same colour as strip.
y2' -y1'
zyx
In figure 3 both spaces and their coordinate systems are
superposed; the rotated strip is shown mapped into the
rectangular region. The order of these rectangular
regions in memory is discussed in $2.2.
intx
6
zyxwvutsrqponmlk
H.R. Arabriia et al. /Arbitrary Rotation oj’RasterImages
auxiliary vectors
temporary vectors
result vectors
zyxwvutsrqp
zyxwvutsrqp
zyxw
zyxwvu
zyxwvuts
zyxwvutsrqpon
zyxwvutsrq
Figure 4.
Step (iv)
The rectangular regions are horizontally divided
into strips. To do this three separate sets of vectors are
used: auxiliary vectors which hold the information
needed to generate the strips, temporary vectors used as
working area, and result vectors which hold the
generated strips. The vectors used in each set are
shown diagrammatically in figure 4.
For rectangular regions with a height of one unit
the strips are generated directly from the information in
the auxiliary vectors: put the results in the result vectors c, x , y , I. In this case most of the elements of these
vectors ar’: unused.
mask
height = 1
( elements of mask are set to true or fake
1
if any mask then
{ check if mask has any true values
begin
c(mask) c colour
1
+~ t ( 8 ) * ~ 2 ’
+ cot(8) * yI ‘ - x
end
For the geometric significance of this code refer to
figure 5.
For rectangular regions with a height of two
units or more, first, lowest strips are generated. Again
these are going to be generated in the result vectors
directly. The new strips till the unused elements in
these vectors.
mask c notmark
c(mask) c colour
x(mask) c intx
y(mask) + YI’
I ( m a k ) c x2‘ cot(@* y - intx
+
Tc c colour
TX c x 1 ’ ~ t ( 8 ) * ~ 2 ’
TY
Y2I
TI c intx sec(8) - Tx
discard( not mask, Tc, Tx, Ty, TI)
append strips in Tc, Tx, Ty, TI to strips in
c, x , y . I;
+
+
+
See figure 5 for the geometric significance of this code.
The procedure “discard” removes the excess data from
the vectors by closing the gaps in memory, where the
gaps are identified by the true.elements in the logical
vector in its first argument: the rest of the arguments
are the vectors with gaps (all l n the same positions).
zyxw
c
x(mask) c X I ’
y ( m a k ) 4- YI’
I(rnak) t x2’
generated in the temporary vectors. Elements of these
vectors which correspond to rectangles of height one
unit are unused: close these gaps and append the strips
just generated to the strips in the corresponding result
vectors. The code that follows does the job:
,‘
For the geometric significance of this code refer to
figure 5.
Generate the highest strip of each region which
has a height of two units or more. These strips are
The final stage is to generate the strips represent.ing the middle sections of the rectangular regions. To
do this make a record in m of the number of strips
already generated. Generate the first (lowest) strips of
the middle sections by using the information in auxiliary vectors: put the result in temporary vectors. Close
the gaps and append these strips to the strips in result
vectors. Update heighr: discard its elements that are
zero or less: shift its data m places to the right. The
undefined elements of height are set to zero.
height t height - 2
excessdata c height < 0
Tc c colour
Tx c intx
Ty c y I ’ 1
I (unused elements) c sec(8)
( all strips have equal lengths }
discard(excessdata, Tc, Tx, Ty, height)
+
shiftright( height, m )
append strips in Tc, Tx, Ty to strips in c, x, y ;
The function “shiftright” moves the elements of the
first argument (vector) a number of places given in the
,
zyxwv
zyxwvu
zyxwvutsr
zyxw
zyxwvutsr
zyxwvutsr
zyxwvutsrq
zyxwvuts
H.R. Arabnia et al. /Arbitrary Rotation of’RasterIrnages
second argument (scalar) to the right; the undefined
elements will be set to zero. The first element of a vector is on the left and the last on the right.
The rest of the strips (if any) for each rectangular
region have equal parameters to the first (lowest) strip
of the middle section of that region, except for theycoordinates. Thus, copy the strips from the result vectors onto the temporary vectors: increment the ycoordinates of the copied strips by the appropriate
value. Close the gaps in temporary vectors where the
gaps are the excess strips generated. Append the strips
in temporary vectors to the strips in result vectors. The
above process is repeated (i.e. doubling up) until the
strips are all generated.
The nth pass through the body of the while-loop gennew strips for each region, so for a 2” X 2”
erates 2”
image the loop is executed the maximum of N times.
All strips will be generated when the while-loop terminates. Strips will be in c, x, y , I (x and I are reals).
In general, the strips will be scattered through the elements of the vectors (unordered). Figure 5 shows the
generated strips.
-’
Map the strips onto the image space and truncate the
fractional parts after mapping:
+
I - y *cot(@)
x’ c truncate(x
x c truncate(x -y*cot(O))
I c x‘-x
zyxwvuts
height c height - 1
excessdata c height
n := 1
<0
while not excessdutu do
begin
TC c c
Tx t x
Ty t y
2‘”-’)
n := n + l
Theight c height / 2 ( integer division )
height c height - Theight - 1
swap elements of height and Theight wherever
height < Theight;
discard(excessdata, Tc, Tx, Ty, Theight)
append data in Tc, Tx, Ty, Theight to data in
c, x, y , height;
excessdata c height < 0
end
+
I
where the vector x’ holds the final x-coordinates of the
strips. Both x and I are now integers. The mapping is
shown in figure 6 before horizontal digitisation.
step (v)
The strips are not correctly ordered and have to be
sorted. TOdo this, combine the three vectors x, y, I in
a way which expedites the computation (assuming
image space is a square raster of zNx2” pixels, i.e.
IMAGESIZE = 2”):
codedform c ( ( ( y < < N ) I x ) < < ( N + l ) ) I I
Sort the codedform into increasing numerical order and
at the same time shuffle the cciours, c, so that each
strip has its colour in the corrcsponding element of c.
A modified procedure based Jn Batcher’s Bitonic Sort
Algorithm3 does this efficien:ly. Lastly uncode to get x,
y . I:
x-axis
Figure 5 .
8
zyxwvu
zyxwvu
zyxwvutsrqpon
H.R. Arabiiia et at. /Arbitrary Rotation of Raster Images
zyxwvutsrqp
zyxwvuts
Figure 6 . Mapping the strips in auxiliary space to image raster; before horizontal digitisation
+
(codedform >> ( N
I))& (IMAGESIZE
>> ( 2 * N + 1 )
I c codedform & ( 2 IMAGESIZE - 1 )
x
y
c
- 1)
5
+ codedform
discard(not mask, c, x', y)
musk c shiftright(mask, 1 )
musk[l] := m e
6
7
zyxwvutsr
Step (vi)
8
discard(not mask, x)
9
I t XI-x
The line numbers are used in the example which follows.
Strips of the same colour which are adjacent on a scanline can be replaced by one longer strip of the same
colour. This can be achieved by the code which follows:
1
x' c I
x
{ final x-coordinates }
2
musk c shiftleft(x, 1 ) = x' and
shiftleft(y, 1 ) = y and
shiftleft( c, 1) = c
{ the above sets the elements of musk to
true wherever the condition is satisfied
+
3
4
Example
The above routine is illustrated by the example shown
in figure 7.
In figure 7 the integers inside the strips represent the
colours. The data in the vectors after execution of each
statement of the code is given below (statement
numbers are on the left-hand side).
zyxwvu
zyxwvutsrqpo
zyxwvutsr
zyxwvu
musk c notmask
set the element of musk representing
the last strip to true;
y=5
y=4
x=
10
Figure 7. Uncompacted strips
1
1
15
2
4
I
20
40 45
E l
60
70
zyxwvutsrq
zyxwvuts
zyxwv
zyxwvuts
zyxwvutsrq
zyxwvu
H.R. Arabnia et al. 1Arbitrary Rotation of Raster Images
1
Y=5
y=4
1
10
x=
20
9
E l
60
4 0 45
70
zyxwvu
zy
zyxwvuts
zyxw
Figure 8. Compacted strips of those shown in Figure 7
c
4
1
1
5
1
20
y
=
4
4
I =
10 20
10
5
10
20
5
20
40
5
60
5
10
20
f
20
t
f
f
1
40
f
45
f
70
?
t
t
t
t
?
5
45
5
f
f
40
5
1
70
5
x
=
=
1
x’=
2
3
4
5
mask=
musk=
mask=
c=
x’=
6
7
8
9
y
=
mask=
musk=
x=
I=
2
10
40
f
t
t
i
t
2
20
4
40
4
?
t
10
10
40
4
5
t
t
t
t
20
20
10
30
5
t
?
?
?
t
t
I
t
?
?
60
10
number of processor elements (PEs) in the machine.
This section describes the modification needed if more
strips are to be generated at step (iv). By assumption,
the final number of strips is not more than the number
of PES.
Apply steps (i), (ii) and (iii) of the algorithm. The rectangular regions which are obtained from the strips will
appear something like those shown in figure 9.
Each PE holds at most the information for one rectangle in its local memory (it is only the intermediate code
which overflows). The parameters of the rectangular
regions are held in a set of vectors in the following
order:
rectangles = (al , a2 , a g ,
.. . )
together with:
The vectors c, x , y , I now hold the compacted strips.
c
=
2
4
x =
10 20
y
=
4
4
I=
10
20
1
10
5
30
5
40
5
5
1
60
5
10
height = ( h , , h 2 . h 3 , . . . )
Apply the steps that follow:
(a)
First obtain:
sumleji = ( h l , hl + h , , h ,
+h,,
... )
the details of the algorithm to do this will
depend in part on the hardware facilities available; for example see the DAP Subroutine
library documentation!
The result is shown diagrammatically in figure 8.
2.2. Generalisation for complex images
The rotation algorithm just described works only if the
number of strips generated at step (iv) is less than the
+h2
(b)
Generate the strips for the first j - 1 rectangular
regions (step (iv)), where j is the index of the
first element of sumleft that is greater than the
number of PEs.
(c)
Map the strips onto the image space (last part of
step (iv)) then sort (step (v)) and compact (step
(vi)): the result is in one layer.
(d)
Subtract
I
kZJ-1
2
hk from sumleft and discard the
k=l
first j
- 1 rectangular regions.
rectangles = (0, , a,,, , . . . )
sumleft = (h, , h j + h , + I , * . . )
If this is the first pass through (a), @), (c). (d)
then go to (b).
> x-axis
Figure 9. Rectangular regions in the auxiliary space
(e)
Sort the generated strips together with the strips
10
zyxwvutsrqpo
zyxwvut
zyxwvutsrq
zyxwvutsrq
H.R. Arabnia et al. /Arbitrary Rotation of Raster Images
already found. The strips in each layer are
already in correct order. So they only need to be
merged together. The merge algorithm used is
the last part of “Batchers Bitonic Sort” algorithm.3
(0
Compact the strips, step (vi). Go to (b) if there
are more unprocessed rectangular regions.
3. ICL DAP Implementation
In the DAPS (a 64 X 64 array processor; thus 4096 PES)
the data for each strip is stored in the memory of a PE
as four integers: x-coordinate, y-coordinate, horizontal
length (each of two bytes) and colour (one byte). If
there are more than 4096 strips a new layer of stripcode
is inserted into the memory of the PEs. The algorithm
has been implemented to work on an image held in one
layer of strips, although internally more than one layer
may be generated (step (iv)). It is assumed that after
compaction, step (vi), no more than one strip is held in
the memory of any PE.
two steps about one third of the time is taken by step
(iv) (strip formation) and two thirds by step (v) (sorting).
Variables are declared as 8, 16,. or 32 bit entities in
DAP-Fortran. The DAP is a bit-serial machine and
can be programmed in the DAP assembly language,
APAL, with objects of any number of bits. For an
image of 512x512 pixels the 16 and 32 bit entities can
be reduced to 9 and 28 bits respectively. It is estimated
that this alone would lead to an increase in the speed
of execution by about one third.
4. Conclusion
In this paper we have presented an algorithm for the
arbitrary rotation of digitised images using SIMD
machine architectures. This algorithm together with the
algorithms described in a previous paper2 form the
basis of a system for processing digitised images using
SIMD machine architectures.
zyxwvutsrq
zyxwvuts
The routine is written in the high level language DAPFortran6 which comes with the DAP. We have not
tried to refine the performance of the code. The timings in Table 1 are for a 5 12X 5 12 pixel image. However, it should be noted that the algorithm is insensitive
to the image size.
I
I
Rotation about the point ( I 00, 100)
-
-
I500
1500
2190
3000
3000
The 90 (or 270) degree rotation operation is also
treated as a special case. The algorithm to do this is
similar to the one presented in this paper but much
simpler. There is no need to map the transformed
strips to an auxiliary space, and the routine which horizontally divides the transformed strips (step (iv)) can
be greatly simplified.
zyxwvutsrq
zyxwvutsrqponm
angle
1000
The 180 degree rotation operation is very simple and is
treated as a special case. It involves reversing the order
of the strips in the stripcode vector. The 180 degree
rotation operation takes about 6ms on the DAP for
images made up of one layer of strips (i.e. up to 4096
strips).
839
834
1435
2170
2999
2996
38
60
20
10
4
5
10
-
rime in ms
(binmy)
44
90
41
45
45
74
92
(colour)
49
83
I03
I
2
2
Table 1
The first and the second columns in that table contain
the number of strips before and after rotation respectively. The last column, layers, shows the number of
memory layers that were required to generate the strips
before compaction (parallel operations can be performed on one layer at a time). The algorithm used is
described in 52.1 for the case where only one layer was
required and 52.2 otherwise.
The time taken for steps (i), (ii), (iii), and (vi) of the
algorithm is independent of the image and take about
19ms. The time taken for steps (iv) and (v) depends
on the character of the image and typically for these
When a digitised image is rotated by an angle other
than multiples of 90 degrees, the resultant image gets
degraded. Thus for a series of rotation operations, each
operation should be performed on the original image
(source) in order to avoid cumulative degradation.
The algorithm is efficient in the sense that at every
stage the SIMD parallelism is exploited. The problem
of how to compare the performance of an SIMD
machine with that of a serial machine is fraught with
difficulties and we shall not attempt this here. Elsewhere we have published7 another algorithm for the
arbitrary rotation of a digitised image on a network of
transputers. A comparison of these algorithms is rather
pointless since the underlying hardware architectures
are so fundamentally different.
zyxwvu
Acknowledgements
We wish to express our gratitude to the DAP Support
Unit at Queen Mary College who provided us with the
opportunity to use the DAP. We are deeply indebted
to John Quinn of the DAP Support Unit who has been
zyxwvutsrq
zyxwv
zyxwv
zyx
zyxwvutsrqp
zyxwvutsrq
zyxw
H.R. Arabriia et al. f Arbitrary Rotation of Raster Images
so generous with his knowledge of the DAP machine
and its software; also for his comments on a draft of
this paper.
4.
DAP Subroutine Library, DAPS, Computing
Centre, Queen Mary College, London.
5.
Reddaway, S.F, “The DAP Approach,” Infotech
International, pp. 3 11-329, Infotech State of the
Art Report: Super Computers (1979).
6.
Gostick, R.W, “Software and algorithms for the
distributed array processor,” CL Technical Journal 1, pp. 116-135 (1979).
Arabnia, H.R. and Oliver, M.A, “A Transputer
Network for the Arbitrary Rotation of Digitised
Images,” The Computer Journal (1987).
References
1.
2.
3.
Thurber, K.J, Lorge Scale Computer Architecture Parallel and Associative P;ocessors, Hayden Book
Co., N.J (1976).
Arabnia, H.R. and Oliver, M.A, “Fast Operations
on Raster Images with SIMD Machine Architectures,’’ Computer Graphics Forum 5(1), pp. 179188 (1986).
Knuth, D.E, in The Art of computer programming,
Yof.3, Addison-Wesley (1973).
11
7.