[go: up one dir, main page]

CN102819819B - A kind of implementation method of quick reading summit in GPU - Google Patents

A kind of implementation method of quick reading summit in GPU Download PDF

Info

Publication number
CN102819819B
CN102819819B CN201210287997.4A CN201210287997A CN102819819B CN 102819819 B CN102819819 B CN 102819819B CN 201210287997 A CN201210287997 A CN 201210287997A CN 102819819 B CN102819819 B CN 102819819B
Authority
CN
China
Prior art keywords
summit
order
fifo
primitive
control module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210287997.4A
Other languages
Chinese (zh)
Other versions
CN102819819A (en
Inventor
饶先宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Original Assignee
CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHANGSHA JINGJIA MICROELECTRONIC Co Ltd filed Critical CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Priority to CN201210287997.4A priority Critical patent/CN102819819B/en
Publication of CN102819819A publication Critical patent/CN102819819A/en
Application granted granted Critical
Publication of CN102819819B publication Critical patent/CN102819819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Generation (AREA)

Abstract

The invention discloses the implementation method of a kind of quick reading primitive vertices in GPU design.The order that it comprises vertex data is deposited, the configuration of summit start address, and drawing command is resolved, primitive control, primitive vertices digital independent, redundant data the step such as to empty.This implementation method can make full use of bandwidth of memory, alleviates total linear pressure, improves the summit percent of pass of GPU chip.

Description

A kind of implementation method of quick reading summit in GPU
Technical field
The present invention is mainly concerned with GPU design field, refers in particular to drawing command in GPU and resolves and primitive vertices acquisition field.
Background technology
The tissue of vertex data and reading are major issues in the GPU of fixed flowline realization.Its quality directly has influence on the efficiency of drawing.Traditional way is the information such as start address and Stride of specifying primitive types, component number, number of vertices, each component in command word, often drawing command needs multiple (7 or more) command word to describe, the shortcoming done like this is: (1) due to command word more, very large pressure is brought to pci bus, in drawing process, pci bus needs to transmit order always, and due to the restriction of frequency, the transfer rate of order does not often catch up with drafting speed; (2) fetch data because each component needs the information such as start address and Stride of specifying according to command word to send out request to DDR, cause Burst smaller, the delay of reading DDR is just larger, the data reading a summit often need to send repeatedly read request and could obtain data, can not make full use of the bandwidth of DDR.
Summary of the invention
The problem to be solved in the present invention is just: the shortcoming existed for prior art, what the invention provides fast fetching summit in a kind of GPU realizes structure, this realizes structure and passes through vertex data sequential organization, strengthen the reading manner of Burst value, make full use of bandwidth of memory, substantially increase the efficiency of getting summit, adopt the number that can also reduce command word in this way simultaneously.
Implementation method of the present invention needs vertex data to deposit according to fixing order, if current pel is line segment, requires that the data on each summit of line segment are according to X, Y, Z, W, R, G, B, A(is 32 single-precision floating-point datas, respectively the horizontal ordinate of corresponding vertex, ordinate, depth coordinate, homogeneous coordinates coefficient, color component red, green, blue, transparency) order deposit continuously, if current pel is triangle, require that leg-of-mutton each vertex data is according to X, Y, Z, W, R, G, B, A, S, T(is 32 single-precision floating-point datas, respectively the horizontal ordinate of corresponding vertex, ordinate, depth coordinate, homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate, texture picture ordinate), 0, it is to make data 128 alignment that 0(supplements two 0 data, be conducive to the high-bit width of DDR) order deposit, simultaneously CPU by pci bus to the initial storage address of primitive control block configuration vertex data (can be realized by the corresponding register of configuration pel administration module, primitive control module with this address for start address gets vertex data continuously from DDR), then CPU sends order by pci bus to command analysis module, command analysis module is by asynchronous FIFO reading order word, if current order is effective order (pel rendering order or empty fifo command), then decoding is carried out to each component of command word and obtain primitive types and number of vertices information, and send this information to primitive control module, if need amendment rendering parameter (as: texture address switching, transformation matrix switch) or the rendering order of present frame to be sent in drawing process, need to send by software the order that empties FIFO.Primitive control module is after receiving the start address of software merit rating, start address is sent to reading vertex data module, because all vertex data orders are deposited, therefore can with larger BurstLength(number of bursts, namely multiple vertex data can be returned continuously) send read request to DDR controller, as long as read summit FIFO less than, just can continue to send read request according to address increment order, the vertex data obtained is sent to primitive control module simultaneously, primitive control module sends to graphics module after these Organization of Datas being become the primitive data of correspondence according to order request, if what receive current is the order emptying FIFO, primitive control module can empty order according to this and be emptied by FIFO obtaining DDR data, ensure that the order next time sent can not get wrong summit.
Advantage of the present invention is just: 1, make full use of bandwidth of memory: the present invention propose fast fetching summit realize structure can with larger BurstLength send read memory request, make full use of bandwidth of memory; 2, reduce command word number: because vertex data is deposited in order, so the information such as each component start address, Stride in typical graphics command word can be omitted, the command word that is ordered can be reduced to 2 by 7 ~ 8.
Summary of the invention
Fig. 1 be in the GPU that realizes of the present invention a kind of fast fetching summit realize structure;
Embodiment
Below with reference to the drawings and specific embodiments, the present invention is described in further details.
As shown in Figure 1, in GPU a kind of quick reading summit realize structure.CPU by the initial storage address of pci bus configuration pel vertex data, (all according to fixed format deposit by primitive data, line segment vertex format is X, Y, Z, W, R, G, B, A, triangular apex form is X, Y, Z, W, R, G, B, A, S, T, 0,0), then CPU sends order by pci bus to command analysis module, command analysis module obtains order data by reading asynchronous FIFO, if each component decoding in command word is then sent into primitive control module by lawful order; Primitive control module sends request to DDR controller with larger BurstLength by reading vertex data module according to the start address of configuration, after the return data obtaining DDR, write in FIFO, as long as send read request less than just continuing in FIFO; Primitive control module reads return data from FIFO, then according to the form of command word, Organization of Data is sent to drawing streamline well; If the order obtained current is for emptying fifo command, and so primitive control module can send and empty signal to FIFO, the data in FIFO is emptied, and ensures that drafting next time can not read wrong summit.

Claims (1)

  1. A kind of implementation method of quick reading summit in 1.GPU, its realization comprises following steps:
    Step 1: vertex data is deposited in order, if line segment then requires that each summit is according to summit horizontal ordinate X, summit ordinate Y, summit depth coordinate Z, summit homogeneous coordinates coefficient W, color components red R, the green G of color component, the blue B of color component, the order of color clarity A is deposited, if triangle then requires that each summit is according to summit horizontal ordinate X, summit ordinate Y, summit depth coordinate Z, summit homogeneous coordinates coefficient W, color components red R, the green G of color component, the blue B of color component, color clarity A, texture horizontal ordinate S, texture ordinate T, 0, the order of 0 is deposited, CPU is by the initial storage address of pci bus to primitive control block configuration vertex data simultaneously,
    Step 2:CPU transmits order by pci bus to command analysis module, command analysis module is by asynchronous FIFO reading order word, if the current command is effective order, i.e. rendering order or empty fifo command, then decoding is carried out to each component of command word and obtain primitive types, number of vertices information, and sent to primitive control module, rendering parameter is revised if need in drawing process, or the rendering order of present frame is sent, command analysis module needs transmission one to empty the order of FIFO;
    Step 3: according to the summit start address of pci bus configuration in step 1, primitive control module sends start address to reading vertex data module, because all vertex data orders are deposited, therefore can with larger BurstLength and number of bursts, read request is sent to DDR controller, as long as read summit FIFO less than, just can continue to send read request according to address increment order, the vertex data obtained is sent to primitive control module simultaneously, primitive control module sends to graphics module after these Organization of Datas being become the primitive data of correspondence according to order request, if what receive current is the order emptying FIFO, primitive control module can empty order according to this and be emptied by FIFO obtaining DDR data, ensure that the order next time sent can not get wrong summit.
CN201210287997.4A 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU Active CN102819819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210287997.4A CN102819819B (en) 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210287997.4A CN102819819B (en) 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU

Publications (2)

Publication Number Publication Date
CN102819819A CN102819819A (en) 2012-12-12
CN102819819B true CN102819819B (en) 2015-09-16

Family

ID=47303926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210287997.4A Active CN102819819B (en) 2012-08-14 2012-08-14 A kind of implementation method of quick reading summit in GPU

Country Status (1)

Country Link
CN (1) CN102819819B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559078B (en) * 2013-11-08 2017-04-26 华为技术有限公司 GPU (Graphics Processing Unit) virtualization realization method as well as vertex data caching method and related device
CN108520489B (en) * 2018-04-12 2022-12-06 长沙景美集成电路设计有限公司 Device and method for realizing command analysis and vertex acquisition parallel in GPU
CN111915475B (en) * 2020-07-10 2024-04-05 长沙景嘉微电子股份有限公司 Processing method of drawing command, GPU, host, terminal and medium
CN112581350B (en) * 2020-12-05 2024-08-23 西安翔腾微电子科技有限公司 Drawing command synchronization method based on continuous primitives

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018353A (en) * 1995-08-04 2000-01-25 Sun Microsystems, Inc. Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing
CN1702692A (en) * 2004-05-03 2005-11-30 微软公司 System and method for providing an enhanced graphics pipeline
CN102096897A (en) * 2011-03-17 2011-06-15 长沙景嘉微电子有限公司 Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6018353A (en) * 1995-08-04 2000-01-25 Sun Microsystems, Inc. Three-dimensional graphics accelerator with an improved vertex buffer for more efficient vertex processing
CN1702692A (en) * 2004-05-03 2005-11-30 微软公司 System and method for providing an enhanced graphics pipeline
CN102096897A (en) * 2011-03-17 2011-06-15 长沙景嘉微电子有限公司 Realization of tile cache strategy in graphics processing unit (GPU) based on tile based rendering

Also Published As

Publication number Publication date
CN102819819A (en) 2012-12-12

Similar Documents

Publication Publication Date Title
CN105224482B (en) A kind of FPGA accelerator cards high-speed memory system
CN100498806C (en) Device and method for outputting signal of emulation infrared detector
CN101901200B (en) Method for realizing double advanced high-performance bus (AHB) Master interface-based on-chip direct memory access (DMA) controller
CN109271335A (en) A kind of FPGA implementation method of multi-channel data source DDR caching
CN102819819B (en) A kind of implementation method of quick reading summit in GPU
CN104284079B (en) Space remote sensing digital image recognition device
CN101882302B (en) Motion blur image restoration system based on multi-core
CN106951388A (en) A kind of DMA data transfer method and system based on PCIe
CN102314400B (en) Method and device for dispersing converged DMA (Direct Memory Access)
CN108279927A (en) The multichannel command control method and system, controller of adjustable instruction priority
CN108632624A (en) Image processing method, device, terminal device and readable storage medium storing program for executing
CN104317770A (en) Data storage structure and data access method for multiple core processing system
CN105208275A (en) System supporting real-time processing inside streaming data piece and design method
CN105786741B (en) SOC high-speed low-power-consumption bus and conversion method
CN104021099B (en) A kind of method and dma controller of control data transmission
CN201927324U (en) Color liquid crystal screen display control device based on SPI (single program initiation) serial or parallel interface
CN104952088A (en) Method for compressing and decompressing display data
CN102135946A (en) Data processing method and device
CN106062814B (en) Improved banked memory access efficiency by a graphics processor
CN102388359A (en) Method and device for remaining signal sequence
CN101216931A (en) 3D graphical display superposition device based on OpenGL
CN105389282A (en) Communication method of processor and ARINC429 bus
WO2022089504A1 (en) Data processing method and related apparatus
CN102750244B (en) Transmitting device and transmitting method of graded buffer direct memory access (DMA)
CN107608927A (en) A kind of design method for supporting Full Featured lpc bus host port

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Rao Xianhong

Inventor before: Jiao Yong

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: JIAO YONG TO: RAO XIANHONG

C14 Grant of patent or utility model
GR01 Patent grant