[go: up one dir, main page]

Academia.eduAcademia.edu

Arbitrary rotation of raster images with SIMD machine architectures

1987, Computer Graphics Forum

Arbitrary Rotation of Raster Images with SIMD Machine Architectures zyxw zyxw zyxwvutsrq zyxwvutsrq zyxwvutsrq zyxwvutsrqpo H.R. Arabnia and M.A. Oliver’ Abstract An algorithm for the rotation of a raster image by an arbitrary angle is described. The image data structure is closely related to runlength code. The algorithm has been designed to exploit SIMD parallel architectures. It has been implemented on an ICL DAP on which non trivial images can be rotated in times very close to real time. lower left corner of the image space with the x-axis to the right and the y-axis up. Consider a runlength wded image and specify a background colour: each strip of colour, other than the background colour, is specified by the position coordinates of its origin (ordinal number of its first pixel, scanline number), length (number of pixels), and colour; the background colour is not explicitly coded. Notation 1. Introduction In this paper we are concerned with the problem of how to exploit SIMD machine architectures for the rotation of a digitised raster image by an arbitrary angle. SIMD array processor architectures A typical SIMD (Single Instruction on Multiple Data stream) array processor’ consists of an array of elemen- tary processors (PES) where each processor has a private memory. A single instruction broadcast from a Master Control Unit (MCU) is obeyed simultaneously by all PEs. Masking schemes are used to control the status of each PE during the execution of an instruction (from the MCU). In this way each PE may be either enabled or disabled for an instruction; only enabled PEs perfom computation. The PE array can commonly be organised either as a square array or as a linear array with nearest neighbour connections in both cases. Stry code The image space is a square raster of 2” x 2” pixels. Parallel assignment of vectors is denoted by the symbol c whereas := denotes assignment of scalar quantities. The context makes clear the variable type(vector or scalar). The operators which follow are used for bit manipulation: & I << >> bitwise ‘and‘ bitwise ‘or’ left shift right shift where the shift operators shift their left-hand operand by the number of bit positions given by the right-hand operand. In expressions where a logical vector is used to index the components of a vector on the left-hand side of the assignment operator c the assignments will take place only for the components which correspond to true values in the logical vector. Thus, only those components of x which correspond to “true” values of mask are set equal to the corresponding components of x: zyxwvut zyxwvutsr The data structure used to represent the image in the machine is the “stripcode” which we introduced in a previous paper, where we showed that it has some nice properties for SIMD array processor architectures.* Stripcode is essentially runlength code which exploits the horizontal coherence between adjacent pixels on a scan. In this way a reduction in the number of data objects needed for the representation of the image, compared to the number of pixels in a frame buffer, is achieved. The coordinate system has its origin in the *Computing Laboratory The University of Kent Canterbury, Kent CT2 7NF United Kingdom h’orth-Hollmd Compiiter Graphics Fonim 6 ( 1 9 8 7 ) 3-12 x(mask) c X the other components of x remain unchanged. In expressions where a scalar quantity is used in parallel calculations, the scalar quantity will be expanded to vector mode: all elements of the vector are set equal to the scalar quantity. The four vectors x , y , l , c hold the xcoordinates, y-coordinates, lengths and colours of the strips. The program fragments are in a Pascal-like pseudo code with the comments between braces. 4 zyxwvutsrqp zyxwvu zyxwv zyxwvutsrqponml zyxwvutsrqp zyxwvuts zyxwvuts zyxwvutsr zyxwv H.R. Arabiiia et al. /Arbitrary Rotation of'Raster Inurges 2. image Rotation 2.1. The Basic Algorithm The manipulation rotates the image by B degrees about a specified point. The 90 and 180 degree rotation operations are treated as special cases; see the remarks in the conclusion. The algorithm has six steps and can be regarded as a series of image mappings. The strips are rotated by 0 degrees which gives the rotated image exactly, but not digitised. The problem now is to represent these rotated strips by a set of new horizontal strips in the raster, i.e. to digitise them. The rotated strips are clipped to the image space. The rectangular parts of the rotated strips which lie completely outside the image space are clipped, as shown in figure 1. The clipped rotated strips are mapped to an auxiliary space. Each strip is represented by a rectangular region (figure 3) the construction of which is described in detail below. The horizontal boundaries of these rectangular regions lie on scanline boundaries. The rectangular regions are horizontally divided into strips which coincide with scanline boundaries (figure 5). These strips are mapped onto the image raster in a way which generates the required digitised representation of the original rotated strip (figure 6 shows this before the final horizontal digitisation). (a) unclipped Figure 1. Rotation and clip of image strips (v) At this stage the code is not correctly ordered and has to be sorted. (vi) Finally the stripcode has to be compacted in the sense that adjacent strips with the same colour on a scanline are represented by one longer strip. In order to represent the rotated strips by a new set of horizontal strips, the rotated strips are mapped onto rectangular regions in an auxiliary space as shown in figure 3 (step (iii)). The reason for this mapping is that these rectangular regions can be divided horizontally into strips using the SIMD parallelism very efficiently (see the detailed description of step (iv)). These new horizontal strips can be mapped easily onto the image raster in a way which generates the required digitised representation of the original rotated strips (figure 6). Now for a more detailed description of the complex steps in the algorithm. Step (iii) Find the intersection point of the lower side of the rotated strip with the x-axis, if no intersection occurs then extend the side down until it does. An example is shown in figure 2. These intersection points, one for each rotated strip, will be kept in the vector intx. Find the intersection points of the ends of the rotated strip with the scanlines. Each end intersects one scanline at most. If no intersection occurs then extend the ends down until they do. This is done in four statements: (6) clipped zyxwvutsr zyxwvu zyxwv zyx H.R. Arobnia et al. 1Arbitrary Rototion of Raster Images Figure 2. zyxwvutsrqp zyxwvutsrqponm zyxwvutsrqponmlkjihgfedcbaZYX zyxwv X Z ' c x2 -(truncateO,,)-yI)*tan(B) Y I ' c truncate(y,) X I ' t X I - (truncateCy2)- y 2 ) * tan(@) y2' t truncate(y2) where ( x , , y z )and ( x 2 , y I )are the coordinates of the top left and top right corners of the rotated strip. The intersection points ( x , ' , y z ' )and ( x 2 ' . y l ' )are shown in figure 3. Now map the rotated strip into a rectangular region (unrotated) in an auxiliary space. This region has the 1 X i , Y 1' Figure 3. 5 following specification: coordinates of lower left comer: horizontal width: vertical height: colour: (intx,y I '), sec(4, + 1, same colour as strip. y2' -y1' zyx In figure 3 both spaces and their coordinate systems are superposed; the rotated strip is shown mapped into the rectangular region. The order of these rectangular regions in memory is discussed in $2.2. intx 6 zyxwvutsrqponmlk H.R. Arabriia et al. /Arbitrary Rotation oj’RasterImages auxiliary vectors temporary vectors result vectors zyxwvutsrqp zyxwvutsrqp zyxw zyxwvu zyxwvuts zyxwvutsrqpon zyxwvutsrq Figure 4. Step (iv) The rectangular regions are horizontally divided into strips. To do this three separate sets of vectors are used: auxiliary vectors which hold the information needed to generate the strips, temporary vectors used as working area, and result vectors which hold the generated strips. The vectors used in each set are shown diagrammatically in figure 4. For rectangular regions with a height of one unit the strips are generated directly from the information in the auxiliary vectors: put the results in the result vectors c, x , y , I. In this case most of the elements of these vectors ar’: unused. mask height = 1 ( elements of mask are set to true or fake 1 if any mask then { check if mask has any true values begin c(mask) c colour 1 +~ t ( 8 ) * ~ 2 ’ + cot(8) * yI ‘ - x end For the geometric significance of this code refer to figure 5. For rectangular regions with a height of two units or more, first, lowest strips are generated. Again these are going to be generated in the result vectors directly. The new strips till the unused elements in these vectors. mask c notmark c(mask) c colour x(mask) c intx y(mask) + YI’ I ( m a k ) c x2‘ cot(@* y - intx + Tc c colour TX c x 1 ’ ~ t ( 8 ) * ~ 2 ’ TY Y2I TI c intx sec(8) - Tx discard( not mask, Tc, Tx, Ty, TI) append strips in Tc, Tx, Ty, TI to strips in c, x , y . I; + + + See figure 5 for the geometric significance of this code. The procedure “discard” removes the excess data from the vectors by closing the gaps in memory, where the gaps are identified by the true.elements in the logical vector in its first argument: the rest of the arguments are the vectors with gaps (all l n the same positions). zyxw c x(mask) c X I ’ y ( m a k ) 4- YI’ I(rnak) t x2’ generated in the temporary vectors. Elements of these vectors which correspond to rectangles of height one unit are unused: close these gaps and append the strips just generated to the strips in the corresponding result vectors. The code that follows does the job: ,‘ For the geometric significance of this code refer to figure 5. Generate the highest strip of each region which has a height of two units or more. These strips are The final stage is to generate the strips represent.ing the middle sections of the rectangular regions. To do this make a record in m of the number of strips already generated. Generate the first (lowest) strips of the middle sections by using the information in auxiliary vectors: put the result in temporary vectors. Close the gaps and append these strips to the strips in result vectors. Update heighr: discard its elements that are zero or less: shift its data m places to the right. The undefined elements of height are set to zero. height t height - 2 excessdata c height < 0 Tc c colour Tx c intx Ty c y I ’ 1 I (unused elements) c sec(8) ( all strips have equal lengths } discard(excessdata, Tc, Tx, Ty, height) + shiftright( height, m ) append strips in Tc, Tx, Ty to strips in c, x, y ; The function “shiftright” moves the elements of the first argument (vector) a number of places given in the , zyxwv zyxwvu zyxwvutsr zyxw zyxwvutsr zyxwvutsr zyxwvutsrq zyxwvuts H.R. Arabnia et al. /Arbitrary Rotation of’RasterIrnages second argument (scalar) to the right; the undefined elements will be set to zero. The first element of a vector is on the left and the last on the right. The rest of the strips (if any) for each rectangular region have equal parameters to the first (lowest) strip of the middle section of that region, except for theycoordinates. Thus, copy the strips from the result vectors onto the temporary vectors: increment the ycoordinates of the copied strips by the appropriate value. Close the gaps in temporary vectors where the gaps are the excess strips generated. Append the strips in temporary vectors to the strips in result vectors. The above process is repeated (i.e. doubling up) until the strips are all generated. The nth pass through the body of the while-loop gennew strips for each region, so for a 2” X 2” erates 2” image the loop is executed the maximum of N times. All strips will be generated when the while-loop terminates. Strips will be in c, x, y , I (x and I are reals). In general, the strips will be scattered through the elements of the vectors (unordered). Figure 5 shows the generated strips. -’ Map the strips onto the image space and truncate the fractional parts after mapping: + I - y *cot(@) x’ c truncate(x x c truncate(x -y*cot(O)) I c x‘-x zyxwvuts height c height - 1 excessdata c height n := 1 <0 while not excessdutu do begin TC c c Tx t x Ty t y 2‘”-’) n := n + l Theight c height / 2 ( integer division ) height c height - Theight - 1 swap elements of height and Theight wherever height < Theight; discard(excessdata, Tc, Tx, Ty, Theight) append data in Tc, Tx, Ty, Theight to data in c, x, y , height; excessdata c height < 0 end + I where the vector x’ holds the final x-coordinates of the strips. Both x and I are now integers. The mapping is shown in figure 6 before horizontal digitisation. step (v) The strips are not correctly ordered and have to be sorted. TOdo this, combine the three vectors x, y, I in a way which expedites the computation (assuming image space is a square raster of zNx2” pixels, i.e. IMAGESIZE = 2”): codedform c ( ( ( y < < N ) I x ) < < ( N + l ) ) I I Sort the codedform into increasing numerical order and at the same time shuffle the cciours, c, so that each strip has its colour in the corrcsponding element of c. A modified procedure based Jn Batcher’s Bitonic Sort Algorithm3 does this efficien:ly. Lastly uncode to get x, y . I: x-axis Figure 5 . 8 zyxwvu zyxwvu zyxwvutsrqpon H.R. Arabiiia et at. /Arbitrary Rotation of Raster Images zyxwvutsrqp zyxwvuts Figure 6 . Mapping the strips in auxiliary space to image raster; before horizontal digitisation + (codedform >> ( N I))& (IMAGESIZE >> ( 2 * N + 1 ) I c codedform & ( 2 IMAGESIZE - 1 ) x y c - 1) 5 + codedform discard(not mask, c, x', y) musk c shiftright(mask, 1 ) musk[l] := m e 6 7 zyxwvutsr Step (vi) 8 discard(not mask, x) 9 I t XI-x The line numbers are used in the example which follows. Strips of the same colour which are adjacent on a scanline can be replaced by one longer strip of the same colour. This can be achieved by the code which follows: 1 x' c I x { final x-coordinates } 2 musk c shiftleft(x, 1 ) = x' and shiftleft(y, 1 ) = y and shiftleft( c, 1) = c { the above sets the elements of musk to true wherever the condition is satisfied + 3 4 Example The above routine is illustrated by the example shown in figure 7. In figure 7 the integers inside the strips represent the colours. The data in the vectors after execution of each statement of the code is given below (statement numbers are on the left-hand side). zyxwvu zyxwvutsrqpo zyxwvutsr zyxwvu musk c notmask set the element of musk representing the last strip to true; y=5 y=4 x= 10 Figure 7. Uncompacted strips 1 1 15 2 4 I 20 40 45 E l 60 70 zyxwvutsrq zyxwvuts zyxwv zyxwvuts zyxwvutsrq zyxwvu H.R. Arabnia et al. 1Arbitrary Rotation of Raster Images 1 Y=5 y=4 1 10 x= 20 9 E l 60 4 0 45 70 zyxwvu zy zyxwvuts zyxw Figure 8. Compacted strips of those shown in Figure 7 c 4 1 1 5 1 20 y = 4 4 I = 10 20 10 5 10 20 5 20 40 5 60 5 10 20 f 20 t f f 1 40 f 45 f 70 ? t t t t ? 5 45 5 f f 40 5 1 70 5 x = = 1 x’= 2 3 4 5 mask= musk= mask= c= x’= 6 7 8 9 y = mask= musk= x= I= 2 10 40 f t t i t 2 20 4 40 4 ? t 10 10 40 4 5 t t t t 20 20 10 30 5 t ? ? ? t t I t ? ? 60 10 number of processor elements (PEs) in the machine. This section describes the modification needed if more strips are to be generated at step (iv). By assumption, the final number of strips is not more than the number of PES. Apply steps (i), (ii) and (iii) of the algorithm. The rectangular regions which are obtained from the strips will appear something like those shown in figure 9. Each PE holds at most the information for one rectangle in its local memory (it is only the intermediate code which overflows). The parameters of the rectangular regions are held in a set of vectors in the following order: rectangles = (al , a2 , a g , .. . ) together with: The vectors c, x , y , I now hold the compacted strips. c = 2 4 x = 10 20 y = 4 4 I= 10 20 1 10 5 30 5 40 5 5 1 60 5 10 height = ( h , , h 2 . h 3 , . . . ) Apply the steps that follow: (a) First obtain: sumleji = ( h l , hl + h , , h , +h,, ... ) the details of the algorithm to do this will depend in part on the hardware facilities available; for example see the DAP Subroutine library documentation! The result is shown diagrammatically in figure 8. 2.2. Generalisation for complex images The rotation algorithm just described works only if the number of strips generated at step (iv) is less than the +h2 (b) Generate the strips for the first j - 1 rectangular regions (step (iv)), where j is the index of the first element of sumleft that is greater than the number of PEs. (c) Map the strips onto the image space (last part of step (iv)) then sort (step (v)) and compact (step (vi)): the result is in one layer. (d) Subtract I kZJ-1 2 hk from sumleft and discard the k=l first j - 1 rectangular regions. rectangles = (0, , a,,, , . . . ) sumleft = (h, , h j + h , + I , * . . ) If this is the first pass through (a), @), (c). (d) then go to (b). > x-axis Figure 9. Rectangular regions in the auxiliary space (e) Sort the generated strips together with the strips 10 zyxwvutsrqpo zyxwvut zyxwvutsrq zyxwvutsrq H.R. Arabnia et al. /Arbitrary Rotation of Raster Images already found. The strips in each layer are already in correct order. So they only need to be merged together. The merge algorithm used is the last part of “Batchers Bitonic Sort” algorithm.3 (0 Compact the strips, step (vi). Go to (b) if there are more unprocessed rectangular regions. 3. ICL DAP Implementation In the DAPS (a 64 X 64 array processor; thus 4096 PES) the data for each strip is stored in the memory of a PE as four integers: x-coordinate, y-coordinate, horizontal length (each of two bytes) and colour (one byte). If there are more than 4096 strips a new layer of stripcode is inserted into the memory of the PEs. The algorithm has been implemented to work on an image held in one layer of strips, although internally more than one layer may be generated (step (iv)). It is assumed that after compaction, step (vi), no more than one strip is held in the memory of any PE. two steps about one third of the time is taken by step (iv) (strip formation) and two thirds by step (v) (sorting). Variables are declared as 8, 16,. or 32 bit entities in DAP-Fortran. The DAP is a bit-serial machine and can be programmed in the DAP assembly language, APAL, with objects of any number of bits. For an image of 512x512 pixels the 16 and 32 bit entities can be reduced to 9 and 28 bits respectively. It is estimated that this alone would lead to an increase in the speed of execution by about one third. 4. Conclusion In this paper we have presented an algorithm for the arbitrary rotation of digitised images using SIMD machine architectures. This algorithm together with the algorithms described in a previous paper2 form the basis of a system for processing digitised images using SIMD machine architectures. zyxwvutsrq zyxwvuts The routine is written in the high level language DAPFortran6 which comes with the DAP. We have not tried to refine the performance of the code. The timings in Table 1 are for a 5 12X 5 12 pixel image. However, it should be noted that the algorithm is insensitive to the image size. I I Rotation about the point ( I 00, 100) - - I500 1500 2190 3000 3000 The 90 (or 270) degree rotation operation is also treated as a special case. The algorithm to do this is similar to the one presented in this paper but much simpler. There is no need to map the transformed strips to an auxiliary space, and the routine which horizontally divides the transformed strips (step (iv)) can be greatly simplified. zyxwvutsrq zyxwvutsrqponm angle 1000 The 180 degree rotation operation is very simple and is treated as a special case. It involves reversing the order of the strips in the stripcode vector. The 180 degree rotation operation takes about 6ms on the DAP for images made up of one layer of strips (i.e. up to 4096 strips). 839 834 1435 2170 2999 2996 38 60 20 10 4 5 10 - rime in ms (binmy) 44 90 41 45 45 74 92 (colour) 49 83 I03 I 2 2 Table 1 The first and the second columns in that table contain the number of strips before and after rotation respectively. The last column, layers, shows the number of memory layers that were required to generate the strips before compaction (parallel operations can be performed on one layer at a time). The algorithm used is described in 52.1 for the case where only one layer was required and 52.2 otherwise. The time taken for steps (i), (ii), (iii), and (vi) of the algorithm is independent of the image and take about 19ms. The time taken for steps (iv) and (v) depends on the character of the image and typically for these When a digitised image is rotated by an angle other than multiples of 90 degrees, the resultant image gets degraded. Thus for a series of rotation operations, each operation should be performed on the original image (source) in order to avoid cumulative degradation. The algorithm is efficient in the sense that at every stage the SIMD parallelism is exploited. The problem of how to compare the performance of an SIMD machine with that of a serial machine is fraught with difficulties and we shall not attempt this here. Elsewhere we have published7 another algorithm for the arbitrary rotation of a digitised image on a network of transputers. A comparison of these algorithms is rather pointless since the underlying hardware architectures are so fundamentally different. zyxwvu Acknowledgements We wish to express our gratitude to the DAP Support Unit at Queen Mary College who provided us with the opportunity to use the DAP. We are deeply indebted to John Quinn of the DAP Support Unit who has been zyxwvutsrq zyxwv zyxwv zyx zyxwvutsrqp zyxwvutsrq zyxw H.R. Arabriia et al. f Arbitrary Rotation of Raster Images so generous with his knowledge of the DAP machine and its software; also for his comments on a draft of this paper. 4. DAP Subroutine Library, DAPS, Computing Centre, Queen Mary College, London. 5. Reddaway, S.F, “The DAP Approach,” Infotech International, pp. 3 11-329, Infotech State of the Art Report: Super Computers (1979). 6. Gostick, R.W, “Software and algorithms for the distributed array processor,” CL Technical Journal 1, pp. 116-135 (1979). Arabnia, H.R. and Oliver, M.A, “A Transputer Network for the Arbitrary Rotation of Digitised Images,” The Computer Journal (1987). References 1. 2. 3. Thurber, K.J, Lorge Scale Computer Architecture Parallel and Associative P;ocessors, Hayden Book Co., N.J (1976). Arabnia, H.R. and Oliver, M.A, “Fast Operations on Raster Images with SIMD Machine Architectures,’’ Computer Graphics Forum 5(1), pp. 179188 (1986). Knuth, D.E, in The Art of computer programming, Yof.3, Addison-Wesley (1973). 11 7.