CN102819819B - A kind of implementation method of quick reading summit in GPU - Google Patents
A kind of implementation method of quick reading summit in GPU Download PDFInfo
- Publication number
- CN102819819B CN102819819B CN201210287997.4A CN201210287997A CN102819819B CN 102819819 B CN102819819 B CN 102819819B CN 201210287997 A CN201210287997 A CN 201210287997A CN 102819819 B CN102819819 B CN 102819819B
- Authority
- CN
- China
- Prior art keywords
- summit
- order
- fifo
- primitive
- control module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 9
- 238000009877 rendering Methods 0.000 claims description 6
- 241001269238 Data Species 0.000 claims description 4
- 230000008520 organization Effects 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Landscapes
- Image Generation (AREA)
Abstract
The invention discloses the implementation method of a kind of quick reading primitive vertices in GPU design.The order that it comprises vertex data is deposited, the configuration of summit start address, and drawing command is resolved, primitive control, primitive vertices digital independent, redundant data the step such as to empty.This implementation method can make full use of bandwidth of memory, alleviates total linear pressure, improves the summit percent of pass of GPU chip.
Description
Technical field
The present invention is mainly concerned with GPU design field, refers in particular to drawing command in GPU and resolves and primitive vertices acquisition field.
Background technology
The tissue of vertex data and reading are major issues in the GPU of fixed flowline realization.Its quality directly has influence on the efficiency of drawing.Traditional way is the information such as start address and Stride of specifying primitive types, component number, number of vertices, each component in command word, often drawing command needs multiple (7 or more) command word to describe, the shortcoming done like this is: (1) due to command word more, very large pressure is brought to pci bus, in drawing process, pci bus needs to transmit order always, and due to the restriction of frequency, the transfer rate of order does not often catch up with drafting speed; (2) fetch data because each component needs the information such as start address and Stride of specifying according to command word to send out request to DDR, cause Burst smaller, the delay of reading DDR is just larger, the data reading a summit often need to send repeatedly read request and could obtain data, can not make full use of the bandwidth of DDR.
Summary of the invention
The problem to be solved in the present invention is just: the shortcoming existed for prior art, what the invention provides fast fetching summit in a kind of GPU realizes structure, this realizes structure and passes through vertex data sequential organization, strengthen the reading manner of Burst value, make full use of bandwidth of memory, substantially increase the efficiency of getting summit, adopt the number that can also reduce command word in this way simultaneously.
Implementation method of the present invention needs vertex data to deposit according to fixing order, if current pel is line segment, requires that the data on each summit of line segment are according to X, Y, Z, W, R, G, B, A(is 32 single-precision floating-point datas, respectively the horizontal ordinate of corresponding vertex, ordinate, depth coordinate, homogeneous coordinates coefficient, color component red, green, blue, transparency) order deposit continuously, if current pel is triangle, require that leg-of-mutton each vertex data is according to X, Y, Z, W, R, G, B, A, S, T(is 32 single-precision floating-point datas, respectively the horizontal ordinate of corresponding vertex, ordinate, depth coordinate, homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate, texture picture ordinate), 0, it is to make data 128 alignment that 0(supplements two 0 data, be conducive to the high-bit width of DDR) order deposit, simultaneously CPU by pci bus to the initial storage address of primitive control block configuration vertex data (can be realized by the corresponding register of configuration pel administration module, primitive control module with this address for start address gets vertex data continuously from DDR), then CPU sends order by pci bus to command analysis module, command analysis module is by asynchronous FIFO reading order word, if current order is effective order (pel rendering order or empty fifo command), then decoding is carried out to each component of command word and obtain primitive types and number of vertices information, and send this information to primitive control module, if need amendment rendering parameter (as: texture address switching, transformation matrix switch) or the rendering order of present frame to be sent in drawing process, need to send by software the order that empties FIFO.Primitive control module is after receiving the start address of software merit rating, start address is sent to reading vertex data module, because all vertex data orders are deposited, therefore can with larger BurstLength(number of bursts, namely multiple vertex data can be returned continuously) send read request to DDR controller, as long as read summit FIFO less than, just can continue to send read request according to address increment order, the vertex data obtained is sent to primitive control module simultaneously, primitive control module sends to graphics module after these Organization of Datas being become the primitive data of correspondence according to order request, if what receive current is the order emptying FIFO, primitive control module can empty order according to this and be emptied by FIFO obtaining DDR data, ensure that the order next time sent can not get wrong summit.
Advantage of the present invention is just: 1, make full use of bandwidth of memory: the present invention propose fast fetching summit realize structure can with larger BurstLength send read memory request, make full use of bandwidth of memory; 2, reduce command word number: because vertex data is deposited in order, so the information such as each component start address, Stride in typical graphics command word can be omitted, the command word that is ordered can be reduced to 2 by 7 ~ 8.
Summary of the invention
Fig. 1 be in the GPU that realizes of the present invention a kind of fast fetching summit realize structure;
Embodiment
Below with reference to the drawings and specific embodiments, the present invention is described in further details.
As shown in Figure 1, in GPU a kind of quick reading summit realize structure.CPU by the initial storage address of pci bus configuration pel vertex data, (all according to fixed format deposit by primitive data, line segment vertex format is X, Y, Z, W, R, G, B, A, triangular apex form is X, Y, Z, W, R, G, B, A, S, T, 0,0), then CPU sends order by pci bus to command analysis module, command analysis module obtains order data by reading asynchronous FIFO, if each component decoding in command word is then sent into primitive control module by lawful order; Primitive control module sends request to DDR controller with larger BurstLength by reading vertex data module according to the start address of configuration, after the return data obtaining DDR, write in FIFO, as long as send read request less than just continuing in FIFO; Primitive control module reads return data from FIFO, then according to the form of command word, Organization of Data is sent to drawing streamline well; If the order obtained current is for emptying fifo command, and so primitive control module can send and empty signal to FIFO, the data in FIFO is emptied, and ensures that drafting next time can not read wrong summit.
Claims (1)
- A kind of implementation method of quick reading summit in 1.GPU, its realization comprises following steps:Step 1: vertex data is deposited in order, if line segment then requires that each summit is according to summit horizontal ordinate X, summit ordinate Y, summit depth coordinate Z, summit homogeneous coordinates coefficient W, color components red R, the green G of color component, the blue B of color component, the order of color clarity A is deposited, if triangle then requires that each summit is according to summit horizontal ordinate X, summit ordinate Y, summit depth coordinate Z, summit homogeneous coordinates coefficient W, color components red R, the green G of color component, the blue B of color component, color clarity A, texture horizontal ordinate S, texture ordinate T, 0, the order of 0 is deposited, CPU is by the initial storage address of pci bus to primitive control block configuration vertex data simultaneously,Step 2:CPU transmits order by pci bus to command analysis module, command analysis module is by asynchronous FIFO reading order word, if the current command is effective order, i.e. rendering order or empty fifo command, then decoding is carried out to each component of command word and obtain primitive types, number of vertices information, and sent to primitive control module, rendering parameter is revised if need in drawing process, or the rendering order of present frame is sent, command analysis module needs transmission one to empty the order of FIFO;Step 3: according to the summit start address of pci bus configuration in step 1, primitive control module sends start address to reading vertex data module, because all vertex data orders are deposited, therefore can with larger BurstLength and number of bursts, read request is sent to DDR controller, as long as read summit FIFO less than, just can continue to send read request according to address increment order, the vertex data obtained is sent to primitive control module simultaneously, primitive control module sends to graphics module after these Organization of Datas being become the primitive data of correspondence according to order request, if what receive current is the order emptying FIFO, primitive control module can empty order according to this and be emptied by FIFO obtaining DDR data, ensure that the order next time sent can not get wrong summit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210287997.4A CN102819819B (en) | 2012-08-14 | 2012-08-14 | A kind of implementation method of quick reading summit in GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210287997.4A CN102819819B (en) | 2012-08-14 | 2012-08-14 | A kind of implementation method of quick reading summit in GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102819819A CN102819819A (en) | 2012-12-12 |
CN102819819B true CN102819819B (en) | 2015-09-16 |
Family
ID=47303926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210287997.4A Active CN102819819B (en) | 2012-08-14 | 2012-08-14 | A kind of implementation method of quick reading summit in GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102819819B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559078B (en) * | 2013-11-08 | 2017-04-26 | 华为技术有限公司 | GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device |
CN108520489B (en) * | 2018-04-12 | 2022-12-06 | 长沙景美集成电路设计有限公司 | Device and method for realizing command analysis and vertex acquisition parallel in GPU |
CN111915475B (en) * | 2020-07-10 | 2024-04-05 | 长沙景嘉微电子股份有限公司 | Processing method of drawing command, GPU, host, terminal and medium |
CN112581350B (en) * | 2020-12-05 | 2024-08-23 | 西安翔腾微电子科技有限公司 | Drawing command synchronization method based on continuous primitives |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6018353A (en) * | 1995-08-04 | 2000-01-25 | Sun Microsystems, Inc. | Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing |
CN1702692A (en) * | 2004-05-03 | 2005-11-30 | 微软公司 | System and method for providing an enhanced graphics pipeline |
CN102096897A (en) * | 2011-03-17 | 2011-06-15 | 长沙景嘉微电子有限公司 | Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering |
-
2012
- 2012-08-14 CN CN201210287997.4A patent/CN102819819B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6018353A (en) * | 1995-08-04 | 2000-01-25 | Sun Microsystems, Inc. | Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing |
CN1702692A (en) * | 2004-05-03 | 2005-11-30 | 微软公司 | System and method for providing an enhanced graphics pipeline |
CN102096897A (en) * | 2011-03-17 | 2011-06-15 | 长沙景嘉微电子有限公司 | Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering |
Also Published As
Publication number | Publication date |
---|---|
CN102819819A (en) | 2012-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105224482B (en) | A kind of FPGA accelerator cards high-speed memory system | |
CN100498806C (en) | Device and method for outputting signal of emulation infrared detector | |
CN101901200B (en) | Method for realizing double advanced high-performance bus (AHB) Master interface-based on-chip direct memory access (DMA) controller | |
CN109271335A (en) | A kind of FPGA implementation method of multi-channel data source DDR caching | |
CN102819819B (en) | A kind of implementation method of quick reading summit in GPU | |
CN104284079B (en) | Space remote sensing digital image recognition device | |
CN101882302B (en) | Motion blur image restoration system based on multi-core | |
CN106951388A (en) | A kind of DMA data transfer method and system based on PCIe | |
CN102314400B (en) | Method and device for dispersing converged DMA (Direct Memory Access) | |
CN108279927A (en) | The multichannel command control method and system, controller of adjustable instruction priority | |
CN108632624A (en) | Image processing method, device, terminal device and readable storage medium storing program for executing | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
CN105208275A (en) | System supporting real-time processing inside streaming data piece and design method | |
CN105786741B (en) | SOC high-speed low-power-consumption bus and conversion method | |
CN104021099B (en) | A kind of method and dma controller of control data transmission | |
CN201927324U (en) | Color liquid crystal screen display control device based on SPI (single program initiation) serial or parallel interface | |
CN104952088A (en) | Method for compressing and decompressing display data | |
CN102135946A (en) | Data processing method and device | |
CN106062814B (en) | Improved banked memory access efficiency by a graphics processor | |
CN102388359A (en) | Method and device for remaining signal sequence | |
CN101216931A (en) | 3D graphical display superposition device based on OpenGL | |
CN105389282A (en) | Communication method of processor and ARINC429 bus | |
WO2022089504A1 (en) | Data processing method and related apparatus | |
CN102750244B (en) | Transmitting device and transmitting method of graded buffer direct memory access (DMA) | |
CN107608927A (en) | A kind of design method for supporting Full Featured lpc bus host port |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C53 | Correction of patent for invention or patent application | ||
CB03 | Change of inventor or designer information |
Inventor after: Rao Xianhong Inventor before: Jiao Yong |
|
COR | Change of bibliographic data |
Free format text: CORRECT: INVENTOR; FROM: JIAO YONG TO: RAO XIANHONG |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |