CN102819819B

CN102819819B - A kind of implementation method of quick reading summit in GPU

Info

Publication number: CN102819819B
Application number: CN201210287997.4A
Authority: CN
Inventors: 饶先宏
Original assignee: CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Current assignee: CHANGSHA JINGJIA MICROELECTRONIC Co Ltd
Priority date: 2012-08-14
Filing date: 2012-08-14
Publication date: 2015-09-16
Anticipated expiration: 2032-08-14
Also published as: CN102819819A

Abstract

The invention discloses the implementation method of a kind of quick reading primitive vertices in GPU design.The order that it comprises vertex data is deposited, the configuration of summit start address, and drawing command is resolved, primitive control, primitive vertices digital independent, redundant data the step such as to empty.This implementation method can make full use of bandwidth of memory, alleviates total linear pressure, improves the summit percent of pass of GPU chip.

Description

A kind of implementation method of quick reading summit in GPU

Technical field

The present invention is mainly concerned with GPU design field, refers in particular to drawing command in GPU and resolves and primitive vertices acquisition field.

Background technology

The tissue of vertex data and reading are major issues in the GPU of fixed flowline realization.Its quality directly has influence on the efficiency of drawing.Traditional way is the information such as start address and Stride of specifying primitive types, component number, number of vertices, each component in command word, often drawing command needs multiple (7 or more) command word to describe, the shortcoming done like this is: (1) due to command word more, very large pressure is brought to pci bus, in drawing process, pci bus needs to transmit order always, and due to the restriction of frequency, the transfer rate of order does not often catch up with drafting speed; (2) fetch data because each component needs the information such as start address and Stride of specifying according to command word to send out request to DDR, cause Burst smaller, the delay of reading DDR is just larger, the data reading a summit often need to send repeatedly read request and could obtain data, can not make full use of the bandwidth of DDR.

Summary of the invention

The problem to be solved in the present invention is just: the shortcoming existed for prior art, what the invention provides fast fetching summit in a kind of GPU realizes structure, this realizes structure and passes through vertex data sequential organization, strengthen the reading manner of Burst value, make full use of bandwidth of memory, substantially increase the efficiency of getting summit, adopt the number that can also reduce command word in this way simultaneously.

Implementation method of the present invention needs vertex data to deposit according to fixing order, if current pel is line segment, requires that the data on each summit of line segment are according to X, Y, Z, W, R, G, B, A(is 32 single-precision floating-point datas, respectively the horizontal ordinate of corresponding vertex, ordinate, depth coordinate, homogeneous coordinates coefficient, color component red, green, blue, transparency) order deposit continuously, if current pel is triangle, require that leg-of-mutton each vertex data is according to X, Y, Z, W, R, G, B, A, S, T(is 32 single-precision floating-point datas, respectively the horizontal ordinate of corresponding vertex, ordinate, depth coordinate, homogeneous coordinates coefficient, color component red, green, blue, transparency, texture picture horizontal ordinate, texture picture ordinate), 0, it is to make data 128 alignment that 0(supplements two 0 data, be conducive to the high-bit width of DDR) order deposit, simultaneously CPU by pci bus to the initial storage address of primitive control block configuration vertex data (can be realized by the corresponding register of configuration pel administration module, primitive control module with this address for start address gets vertex data continuously from DDR), then CPU sends order by pci bus to command analysis module, command analysis module is by asynchronous FIFO reading order word, if current order is effective order (pel rendering order or empty fifo command), then decoding is carried out to each component of command word and obtain primitive types and number of vertices information, and send this information to primitive control module, if need amendment rendering parameter (as: texture address switching, transformation matrix switch) or the rendering order of present frame to be sent in drawing process, need to send by software the order that empties FIFO.Primitive control module is after receiving the start address of software merit rating, start address is sent to reading vertex data module, because all vertex data orders are deposited, therefore can with larger BurstLength(number of bursts, namely multiple vertex data can be returned continuously) send read request to DDR controller, as long as read summit FIFO less than, just can continue to send read request according to address increment order, the vertex data obtained is sent to primitive control module simultaneously, primitive control module sends to graphics module after these Organization of Datas being become the primitive data of correspondence according to order request, if what receive current is the order emptying FIFO, primitive control module can empty order according to this and be emptied by FIFO obtaining DDR data, ensure that the order next time sent can not get wrong summit.

Advantage of the present invention is just: 1, make full use of bandwidth of memory: the present invention propose fast fetching summit realize structure can with larger BurstLength send read memory request, make full use of bandwidth of memory; 2, reduce command word number: because vertex data is deposited in order, so the information such as each component start address, Stride in typical graphics command word can be omitted, the command word that is ordered can be reduced to 2 by 7 ~ 8.

Summary of the invention

Fig. 1 be in the GPU that realizes of the present invention a kind of fast fetching summit realize structure;

Embodiment

Below with reference to the drawings and specific embodiments, the present invention is described in further details.

As shown in Figure 1, in GPU a kind of quick reading summit realize structure.CPU by the initial storage address of pci bus configuration pel vertex data, (all according to fixed format deposit by primitive data, line segment vertex format is X, Y, Z, W, R, G, B, A, triangular apex form is X, Y, Z, W, R, G, B, A, S, T, 0,0), then CPU sends order by pci bus to command analysis module, command analysis module obtains order data by reading asynchronous FIFO, if each component decoding in command word is then sent into primitive control module by lawful order; Primitive control module sends request to DDR controller with larger BurstLength by reading vertex data module according to the start address of configuration, after the return data obtaining DDR, write in FIFO, as long as send read request less than just continuing in FIFO; Primitive control module reads return data from FIFO, then according to the form of command word, Organization of Data is sent to drawing streamline well; If the order obtained current is for emptying fifo command, and so primitive control module can send and empty signal to FIFO, the data in FIFO is emptied, and ensures that drafting next time can not read wrong summit.

Claims

A kind of implementation method of quick reading summit in 1.GPU, its realization comprises following steps:

Step 1: vertex data is deposited in order, if line segment then requires that each summit is according to summit horizontal ordinate X, summit ordinate Y, summit depth coordinate Z, summit homogeneous coordinates coefficient W, color components red R, the green G of color component, the blue B of color component, the order of color clarity A is deposited, if triangle then requires that each summit is according to summit horizontal ordinate X, summit ordinate Y, summit depth coordinate Z, summit homogeneous coordinates coefficient W, color components red R, the green G of color component, the blue B of color component, color clarity A, texture horizontal ordinate S, texture ordinate T, 0, the order of 0 is deposited, CPU is by the initial storage address of pci bus to primitive control block configuration vertex data simultaneously,

Step 2:CPU transmits order by pci bus to command analysis module, command analysis module is by asynchronous FIFO reading order word, if the current command is effective order, i.e. rendering order or empty fifo command, then decoding is carried out to each component of command word and obtain primitive types, number of vertices information, and sent to primitive control module, rendering parameter is revised if need in drawing process, or the rendering order of present frame is sent, command analysis module needs transmission one to empty the order of FIFO;

Step 3: according to the summit start address of pci bus configuration in step 1, primitive control module sends start address to reading vertex data module, because all vertex data orders are deposited, therefore can with larger BurstLength and number of bursts, read request is sent to DDR controller, as long as read summit FIFO less than, just can continue to send read request according to address increment order, the vertex data obtained is sent to primitive control module simultaneously, primitive control module sends to graphics module after these Organization of Datas being become the primitive data of correspondence according to order request, if what receive current is the order emptying FIFO, primitive control module can empty order according to this and be emptied by FIFO obtaining DDR data, ensure that the order next time sent can not get wrong summit.