AU766816B2 - Apparatus for stack capable of multiple push or pop operations in one clock cycle - Google Patents
Apparatus for stack capable of multiple push or pop operations in one clock cycle Download PDFInfo
- Publication number
- AU766816B2 AU766816B2 AU31345/01A AU3134501A AU766816B2 AU 766816 B2 AU766816 B2 AU 766816B2 AU 31345/01 A AU31345/01 A AU 31345/01A AU 3134501 A AU3134501 A AU 3134501A AU 766816 B2 AU766816 B2 AU 766816B2
- Authority
- AU
- Australia
- Prior art keywords
- stack
- stack apparatus
- cells
- compositing
- clock cycle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Landscapes
- Executing Machine-Instructions (AREA)
Description
S&FRef: 541412
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT
ORIGINAL
0 000 0 0 0.::0 0 0 00 Name and Address of Applicant: Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha 30-2, Shimomaruko 3-chome, Ohta-ku Tokyo 146 Japan Yu-Ling Chen Spruson Ferguson St Martins Tower,Level 31 Market Street Sydney NSW 2000 Apparatus for Stack Capable of Multiple Push or Pop Operations in One Clock Cycle ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU PQ6554 [32] Application Date 29 Mar 2000 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- APPARATUS FOR STACK CAPABLE OF MULTIPLE PUSH OR POP OPERATIONS IN ONE CLOCK CYCLE Field of the Invention The present invention relates to optimisation of stack operations and has particular application to graphics compositing stacks.
Background In graphics rendering, graphical objects are often composited together, sometimes to obtain some special visual effect. For a binary compositing operation there are two operands, named source and destination. The source operand is either provided 10 with the compositing command, or obtained from a previously computed intermediate result, which is sometimes retained in a stack. The destination operand derives typically from the top of stack. Intermediate compositing results are then pushed back onto the compositing stack for further compositing. When the source operand is obtained from the stack, such processing requires two pops from, and one push to, the compositing stack.
15 Conventionally implemented stacks can only finish a single stack operation in each clock cycle. As the intermediate result is needed for the next compositing operation, the compositing pipeline has to stall until the intermediate result is computed. Such a "situation slows down data throughput in image rendering. It is desirable to ameliorate delays in such operations.
Summary of the Invention It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
According to a first aspect of the invention, there is provided stack apparatus comprising: a first plurality of stack cells arranged for moving data content between said cells; 541412.doc -2at least one selector device coupled to a second plurality said cells to provide access to data content within each of said second plurality of said cells; and means for manipulating said at least one selector device to select at least one of said second plurality of said cells as an output of said stack apparatus.
Brief Description of the Drawings At least one preferred embodiment of the present invention will now be described with reference to the appended drawings and tables, in which: Fig. 1 is a flow chart depicting a conventional stack process; Fig. 2 is a flowchart depicting a stack process according to the present 10 disclosure; Figs. 3A to 3D illustrate stack operations performed by conventional o arrangements and desired to be replicated by more efficient processes; Fig. 4 shows a multiple access stack structure formed using shift registers; Fig. 5 is an arrangement similar to Fig. 4 but formed using memory; S 15 Table 1 shows the stack change and actions for Fig 4; and •moo Table 2 shows the stack change and actions for Fig Detailed Description .e Binary raster operations require two operands source and destination. When performed in a stack, such operations are controlled by a 2-bit STACK_OP field. As seen in Fig. 3A, in a normal operation, the STACK_OP field is set to STACK_STD_OP (00).
The source operand 301 comes along with the raster operation code 302 and the destination operand 303 is popped 304a from the top of the stack 304. When a raster operation 306 between the operands 301 and 303 is complete, the result 305 is pushed onto the stack 304. If the stack 304 is initially empty, opaque white is traditionally substituted for the destination operand 303.
541412.doc -3- If the destination operand needs to be kept for a later operation, the STACK_OP field is set to STACKNO_POPDEST as seen in Fig 3B. In such a case, the destination operand 307 is read 308 from the top level of the stack 304, but is not popped, and the result 309 is pushed onto the stack.
To perform a graphics operation on the top two levels of the stack, rather than using data that comes with the raster operation code, the STACK_OP field is set to STACK_POP_SRC (10) as seen in Fig 3C. In this case, the source operand 310 is popped from the stack 311 first, and then the destination operand 312 is popped from the stack 313. The result 313 is pushed onto the stack 311 after the operation is complete. If the stack 311 is empty, opaque white can again be substituted.
If the STACK_OP field is set to STACK_KEEP_SRC (11) as shown in Fig. 3D, the source operand is popped 315 from the top of the stack, and then the destination operand is also popped 316. The source operand and the result are pushed (318 and 317 respectively) onto the stack after the operation 319 is complete, leaving the stack in the S• 15Is configuration shown at 319. If the stack is empty, again opaque white is substituted.
S: i In the above operations, only the intermediate result is written back to the stack.
•o•0• The final compositing result is output to the downstream device without writing back to the stack. In Figs. 3A to 3D, the pseudo code represented next to each STACK_OP setting shows the required steps. The last step in each pseudocode, push(result), is omitted if the compositing command is the last for each pixel.
Fig. 1 shows, in flow diagram form, the conventional implementation of the above stack operations. A command is received in step 101 and decoded at step 201. In each case of Figs. 3A to 3D, an operand is popped from the stack, this being shown at step 103. Step 104 determines from the decoded operation whether another operand is required from the stack. If so, a feedback check loop 105 requires multiple clock cycles 541412.doc -4to complete a compositing command using a conventionally implemented stack. If not (106), the composite step 107 is performed. The destination of the result or operands is determined in step 108 where the compositing result is output at 109 (eg. last composite in stack) or steps 110, 111 and 112 to reconfigure the stack for the next composite operation.
Fig. 2 shows a modified stack operation flow according to the present disclosure, which analyses the stack depth changes after the compositing command and updates the stack after compositing accordingly. In Fig. 2, a command is received at step 201 and decoded at step 202 as in the conventional approach. At step 203, the composite 10 operation to be performed is analysed and the required stack depth determined so that operands can be simultaneously extracted from the stack where required. Compositing is °then performed at step 204. Step 205 then determines what is required to update the stack. For a final composite, no update is required and path 206 directly provides the output compositing result 209. Otherwise, path 207 causes step 208 to update the stack, simultaneously where required and also provide the output result 209.
A first implementation of the process of Fig. 2 uses shift registers and is shown in Fig. 4. In Fig. 4, a stack 400 is shown formed ofn m-bit shift registers 401 having two top elements 402 and 404 readable at any time via two multiplexers 414 and 416. The registers 401 are also are arranged for shifting either left or right. The multiplexers 414 and 416 are each controlled by the STACK_OP signal that can acquire one of two values:
STACKKEEPSRC
STACKPOPSRC
When STACK_OP is set to STACK_KEEP_SRC, the source operand 406 is obtained from a compositing command 410, compose_d, and the destination operand 408 is obtained from the top element 402.
541412.doc When STACK_OP is set to STACK_POP_SRC, the source operand 406 is obtained from the top element 402 and the destination operand is popped from the second element 404. The top of the stack 400 is indexed as 0. The stack 400 has a bottom pointer 418 which points to 0 when the stack 400 is empty and incremented when new data is pushed to stack 400. The bottom pointer 400 points to n when the stack 400 is full.
Table 1 included with the drawings of this specification, lists the stack depth change after the completion of a compositing command and the actions required for update. In normal operation, actions are to be carried out in the same clock cycle. The 10 registers 401 perform a right shift (as per Fig. 4) when the stack depth decreases and left shift when the stack depth increases. When the stack depth has no change, only the top element or the top second element is updated. In the case where the stack depth change is more than one, the update can be done in two clock cycles with one shift in each clock cycle. At the end of each compositing command, the top of stack is written with the compositing result, if it is not the last compositing command. On the last compositing command of each pixel, the result will be output to the downstream device instead of pushed back to the stack.
The stack 400 is reset with a WHITE signal 420 and maintained WHITE in the empty entries 401 by shifting WHITE data into the stack 400, (stack(n-1)<=WHITE as shown in Table 1) when the number of data in the stack decreases. If the stack overflows or underflows, exceptions are raised. In these cases, compositing can still continue. An example of stack overflow is when STACK_OP is set to STACK_NO_POPDEST (as shown in Fig. 3B) while the stack is already full. In this case, the actions will become 1. stack(0)<=result 2. stack(bottom-l:1)<=stack(bottom-2:0).
541412.doc -6- This means the element in the bottom of the stack will be discarded. When the stack is empty, opaque white data will be used. The stack should be reset to a default colour (opaque white in this example) and maintained at the default colour by shifting the default colour into the stack when the stack depth decreases.
A further implementation is performed using memory as shown in Fig. 5. In this embodiment, a stack 500 is formed of n words of m-bits of 3 port memory 520, having two read ports 501 and 502 and one write port 503. Ports 501 and 502 are for reading from the memory 520, while port 503 is for writing to the memory 520. Three pointers, top 504, plusl 505 and minusl 506 represent the top element, a vacant location next to the top element and the top second element of the stack, respectively, and access an address port 507 of the memory 520. The pointers 504-506 are incremented or decremented as the stack depth changes. Entry 0 and 1 in the memory 520 are reserved for the default colour, so the default colour can be output when the stack 500 is empty.
As a result, the 3 pointers (minusl, top and plusl) are reset to 0, 1 and 2 respectively.
is The first data pushed into the stack 500 is therefore written to the entry indexed as 2.
•:i Data is extracted from the stack 500 using a pair of multiplexers 508 and 504 via the read ports 501 and 502 in a manner similar to the stack 400 of Fig. 4, and for which corresponding operation applies.
Table 2 lists the stack depth change after the completion of compositing command and the actions required for update. Normal stack actions are carried out in the same clock cycle. On the case of updating two elements, the update can be performed in two clock cycles with one write in each clock cycle. At the end of each compositing command, the top of stack will be written with the compositing result, if it is not the last compositing command. On the last compositing command of each pixel, the result will be output to the downstream device instead of pushed back to the stack.
541412.doc -7- Industrial Applicability It is apparent from the above that which is applicable to the hardware stack implementations where operations require the utilization of plural stack entries.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiment(s) being illustrative and not restrictive.
For example, whilst described with reference to computer graphics operations involving two operands, the stack implementations described can be used or modified for use in any environment where a stack is desired. Further, by increasing the number of 10 multiplexers (414,416; 508,504) arrangements may be formed which can be provided for •o• the popping from the stack of more than two stack values. Specifically, where two or more multiplexers are in use, any combination of the multiplexers may be used to provide plural outputs. Such may be appropriate where complex operations are required involving the multiple use of one or more operands.
15 In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including" and not "consisting only ooo* of'. Variations of the word comprising, such as "comprise" and "comprises" have corresponding meanings.
541412.doc
Claims (3)
- 3. Stack apparatus according to claim 1 or 2, wherein said at least one selector device comprises at least first and second multiplexers each coupled to respective ones of said second plurality of said cells, wherein a control input of each of said multiplexers comprises said means for manipulating.
- 4. Stack apparatus according to claim 1, 2 or 3, further comprising means for setting each of said cells to a predetermined value. Stack apparatus according to claim 4, wherein said means for setting simultaneously sets each of said cells to said predetermined value.
- 541412.doc 6. Stack apparatus according to any one of claims 1 to 5, wherein said cells comprise a plurality of shift registers. 7. Stack apparatus according to any one of claims 1 to 5, wherein said cells comprise locations within a memory device. 8. Stack apparatus according to any one of the preceding claims wherein said second plurality of cells is accessible during a single clock cycle. 9. Stack apparatus substantially as described herein with reference to Fig. 4 of the drawings. Stack apparatus substantially as described herein with reference to Fig. 5 of the drawings. 06 0 S. 11. A computer graphics compositing system comprising stack apparatus according to any one of the preceding claims. DATED this twenty-seventh Day of March, 2001 Canon Kabushiki Kaisha Patent Attorneys for the Applicant/Nominated Person SPRUSON FERGUSON 541412.doc
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU31345/01A AU766816B2 (en) | 2000-03-29 | 2001-03-27 | Apparatus for stack capable of multiple push or pop operations in one clock cycle |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AUPQ6554 | 2000-03-29 | ||
AUPQ6554A AUPQ655400A0 (en) | 2000-03-29 | 2000-03-29 | Apparatus for stack capable of multiple push or pop operations in on clock cycle |
AU31345/01A AU766816B2 (en) | 2000-03-29 | 2001-03-27 | Apparatus for stack capable of multiple push or pop operations in one clock cycle |
Publications (2)
Publication Number | Publication Date |
---|---|
AU3134501A AU3134501A (en) | 2001-10-04 |
AU766816B2 true AU766816B2 (en) | 2003-10-23 |
Family
ID=25621747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU31345/01A Ceased AU766816B2 (en) | 2000-03-29 | 2001-03-27 | Apparatus for stack capable of multiple push or pop operations in one clock cycle |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU766816B2 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2231181A (en) * | 1989-04-07 | 1990-11-07 | Intel Corp | Stack method and circuitry |
WO1997035257A1 (en) * | 1996-03-18 | 1997-09-25 | Advanced Micro Devices, Inc. | A data cache configured to store stack data in a stack data storage |
US6349383B1 (en) * | 1998-09-10 | 2002-02-19 | Ip-First, L.L.C. | System for combining adjacent push/pop stack program instructions into single double push/pop stack microinstuction for execution |
-
2001
- 2001-03-27 AU AU31345/01A patent/AU766816B2/en not_active Ceased
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2231181A (en) * | 1989-04-07 | 1990-11-07 | Intel Corp | Stack method and circuitry |
WO1997035257A1 (en) * | 1996-03-18 | 1997-09-25 | Advanced Micro Devices, Inc. | A data cache configured to store stack data in a stack data storage |
US6349383B1 (en) * | 1998-09-10 | 2002-02-19 | Ip-First, L.L.C. | System for combining adjacent push/pop stack program instructions into single double push/pop stack microinstuction for execution |
Also Published As
Publication number | Publication date |
---|---|
AU3134501A (en) | 2001-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240221290A1 (en) | Processing primitives which have unresolved fragments in a graphics processing system | |
US5778250A (en) | Method and apparatus for dynamically adjusting the number of stages of a multiple stage pipeline | |
DE69233412T2 (en) | Device and computer program product for executing branch instructions | |
US5896542A (en) | System and method for assigning tags to control instruction processing in a superscalar processor | |
US5799163A (en) | Opportunistic operand forwarding to minimize register file read ports | |
US5669012A (en) | Data processor and control circuit for inserting/extracting data to/from an optional byte position of a register | |
JPS6162980A (en) | Picture memory peripheral lsi | |
EP2309382A1 (en) | System with wide operand architecture and method | |
US7370180B2 (en) | Bit field extraction with sign or zero extend | |
US7512290B2 (en) | Image processing apparatus with SIMD-type microprocessor to perform labeling | |
US6542989B2 (en) | Single instruction having op code and stack control field | |
AU766816B2 (en) | Apparatus for stack capable of multiple push or pop operations in one clock cycle | |
US20090154834A1 (en) | Rendering system and data processing method for the same | |
US4812970A (en) | Microprogram control system | |
US7370184B2 (en) | Shifter for alignment with bit formatter gating bits from shifted operand, shifted carry operand and most significant bit | |
EP0487819A2 (en) | Video random access memory with fast, alligned clear and copy | |
US4980853A (en) | Bit blitter with narrow shift register | |
TW388818B (en) | Method and system for single cycle direct execution of floating-point status and control register instructions | |
KR910008947A (en) | Digital filter for digital image data filter | |
US6983297B2 (en) | Shifting an operand left or right while minimizing the number of multiplexor stages | |
US6378057B1 (en) | Data processing apparatus | |
KR20020067003A (en) | System for processing graphic patterns | |
US5912677A (en) | Method for forming a sum in a signal processing system | |
JP4256263B2 (en) | Texture rising method and apparatus | |
JPH0391829A (en) | Bit data transfer circuit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
DA3 | Amendments made section 104 |
Free format text: THE NATURE OF THE AMENDMENT IS: SUBSTITUTE PATENT REQUEST REGARDING ASSOCIATED DETAILS |
|
FGA | Letters patent sealed or granted (standard patent) |