Disclosure of Invention
The purpose of the present disclosure is to provide an image rendering method, a graphics processor, a graphics processing system, a device and equipment, which aim to avoid page fault errors and reduce waste of video memory space.
According to one aspect of the present disclosure, there is provided an image rendering method, applied to a GPU, the method including:
receiving a drawing command submitted by a CPU (Central processing Unit), and acquiring state information corresponding to the drawing command;
Under the condition that the state information indicates that texture sampling is not required to be executed, geometric processing and attribute interpolation calculation are executed on an object to be rendered so as to determine a plurality of attribute information of each fragment, the attribute information of each fragment is stored into a video memory, feedback information is sent to a CPU, and the CPU loads corresponding texture data into the video memory according to texture index information included in the attribute information; wherein the plurality of attribute information includes texture index information;
In the case where the state information indicates that texture sampling needs to be performed, the texture data and the plurality of attribute information of each tile are read from the video memory, and the read texture data and the plurality of attribute information of each tile are input to the tile shader for processing.
In one possible implementation of the present disclosure, the state information includes a sampling identifier, where the sampling identifier is used to indicate whether texture sampling needs to be performed; after the state information corresponding to the drawing command is acquired, the method further comprises the following steps:
And judging whether texture sampling is required to be executed or not according to the sampling identification in the state information.
In one possible implementation of the present disclosure, before sending the feedback information to the CPU, the method further includes:
Writing the storage addresses of the attribute information of each fragment in the video memory into state information;
and storing the state information written into the storage address into the video memory according to the target address carried by the drawing command.
In one possible implementation of the present disclosure, reading texture data and a plurality of attribute information of each fragment from a video memory includes:
Reading texture data from the video memory according to the texture video memory address in the state information; wherein, the texture video memory address is written into the state information by the CPU according to the address of the texture data in the video memory;
And reading a plurality of attribute information of each fragment from the video memory according to the attribute information storage address in the state information.
In one possible implementation manner of the present disclosure, the number of state information corresponding to a drawing command is multiple, each state information corresponds to one object to be rendered, each state information includes a sampling identifier, and each sampling identifier included in each state information is used for characterizing whether texture sampling needs to be performed on the corresponding object to be rendered;
After the state information corresponding to the drawing command is acquired, the method further comprises the following steps:
Determining whether texture sampling is required to be executed on the corresponding object to be rendered according to the sampling identification in each state information;
In a case where the state information indicates that texture sampling does not need to be performed, performing geometric processing and attribute interpolation computation on an object to be rendered to determine a plurality of attribute information of each fragment, comprising:
in the case where the drawing command corresponds to state information indicating that texture sampling is not required to be performed, geometric processing and attribute interpolation computation are performed for an object to be rendered corresponding to the state information to determine a plurality of attribute information for each fragment.
In one possible implementation of the present disclosure, in a case where the state information indicates that texture sampling needs to be performed, reading texture data and a plurality of attribute information of each tile from a video memory includes:
Under the condition that the drawing command corresponds to the state information which indicates that the texture sampling needs to be executed, the texture data corresponding to the object to be rendered and the attribute information of each fragment are read from the video memory aiming at the object to be rendered corresponding to the state information.
In one possible implementation of the present disclosure, in a case where the state information indicates that texture sampling does not need to be performed, performing geometric processing and attribute interpolation computation on an object to be rendered to determine a plurality of attribute information of each fragment, and storing the plurality of attribute information of each fragment to a display memory, including:
performing geometric processing and attribute interpolation computation on an object to be rendered to determine a plurality of attribute information of each fragment in the case where the state information indicates that texture sampling is not required to be performed;
performing depth detection on the plurality of fragments so as to reserve fragments which are not shielded;
And storing the attribute information of the non-occluded fragments into a display memory.
According to another aspect of the present disclosure, there is provided a graphic processor including:
the command receiving module is used for receiving drawing commands submitted by the CPU;
The state information acquisition module is used for acquiring state information corresponding to the drawing command;
The geometric processing module is used for executing geometric processing on the object to be rendered under the condition that the state information indicates that texture sampling is not required to be executed;
The interpolation calculation module is used for executing attribute interpolation calculation on the object to be rendered to determine a plurality of attribute information of each fragment under the condition that the state information indicates that texture sampling is not required to be executed, storing the attribute information of each fragment into a video memory, and sending feedback information to the CPU so that the CPU loads corresponding texture data into the video memory according to the texture index information; wherein the plurality of attribute information includes texture index information;
and the texture sampler is used for reading the texture data and the attribute information of each fragment from the video memory and inputting the read texture data and the attribute information of each fragment into the fragment shader for processing under the condition that the state information indicates that the texture sampling is required to be performed.
In one possible implementation of the present disclosure, the state information includes a sampling identifier, where the sampling identifier is used to characterize whether texture sampling needs to be performed, and the texture sampler is further used to determine whether texture sampling needs to be performed according to the sampling identifier in the state information.
In one possible implementation of the present disclosure, before sending feedback information to the CPU, the interpolation calculation module is further configured to write the storage addresses of the plurality of attribute information of each primitive in the video memory into the state information, and store the state information written into the storage addresses into the video memory according to the target address carried by the drawing command.
In one possible implementation of the present disclosure, when the texture sampler reads texture data and a plurality of attribute information of each fragment from the video memory, the texture sampler is specifically configured to:
Reading texture data from the video memory according to the texture video memory address in the state information; wherein, the texture video memory address is written into the state information by the CPU according to the address of the texture data in the video memory;
And reading a plurality of attribute information of each fragment from the video memory according to the attribute information storage address in the state information.
In one possible implementation manner of the present disclosure, the number of state information corresponding to a drawing command is multiple, each state information corresponds to one object to be rendered, each state information includes a sampling identifier, and each sampling identifier included in each state information is used for characterizing whether texture sampling needs to be performed on the corresponding object to be rendered;
the texture sampler is further used for determining whether texture sampling is required to be executed on the corresponding object to be rendered according to the sampling identification in each state information;
The geometric processing module is specifically configured to execute geometric processing for an object to be rendered corresponding to state information when the state information indicating that texture sampling is not required to be executed is corresponding to a drawing command when the geometric processing is executed for the object to be rendered;
The interpolation calculation module is specifically configured to perform attribute interpolation calculation for an object to be rendered corresponding to state information when the drawing command corresponds to the state information indicating that texture sampling is not required to be performed when the attribute interpolation calculation is performed for the object to be rendered.
In one possible implementation manner of the present disclosure, when the texture sampler reads texture data and a plurality of attribute information of each fragment from the video memory, the texture sampler is specifically configured to, when the drawing command corresponds to state information indicating that texture sampling needs to be performed, read, for an object to be rendered corresponding to the state information, the texture data corresponding to the object to be rendered and the plurality of attribute information of each fragment from the video memory.
In one possible implementation of the present disclosure, the graphics processor further includes:
The depth detection module is used for carrying out depth detection on the plurality of fragments so as to keep the fragments which are not shielded;
the interpolation calculation module is specifically configured to store the attribute information of the non-occluded fragment into the video memory when storing the attribute information of each fragment into the video memory.
According to another aspect of the present disclosure, there is also provided a graphics processing system including the graphics processor of any of the above embodiments.
According to another aspect of the present disclosure, there is also provided an electronic device including the graphics processing system described above. In some use cases, the product form of the electronic device is embodied as a graphics card; in other use scenarios, the product form of the electronic device is embodied as a CPU motherboard.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including the above-described electronic device. In some use scenarios, the product form of the electronic device is a portable electronic device, such as a smart phone, a tablet computer, a VR device, etc.; in some use cases, the electronic device is in the form of a personal computer, a game console, or the like.
According to another aspect of the present disclosure, there is also provided an image rendering method including:
Submitting a drawing command to the GPU, wherein the drawing command corresponds to first state information used for representing that texture sampling is not required to be performed, enabling the GPU to perform geometric processing and attribute interpolation calculation according to an object to be rendered so as to determine a plurality of attribute information of each fragment, and storing the attribute information of each fragment into a display memory; wherein the plurality of attribute information includes texture index information;
Responding to feedback information of the GPU, and loading corresponding texture data to a video memory according to the texture index information;
And submitting a drawing command again to the GPU, wherein the drawing command corresponds to second state information for representing that texture sampling needs to be executed, so that the GPU reads texture data and a plurality of attribute information of each fragment from a video memory, and inputs the read texture data and the plurality of attribute information of each fragment into a fragment shader for processing.
In one possible implementation of the present disclosure, before submitting the drawing command to the GPU, the method further includes:
The first state information is configured, the first state information containing a first sample identification, the first sample identification being used to indicate that texture sampling need not be performed.
In one possible implementation of the present disclosure, before resubmitting the drawing command to the GPU, the method further includes:
replacing a first sampling identifier in the first state information with a second sampling identifier to obtain second state information, wherein the second sampling identifier is used for indicating that texture sampling needs to be performed; or configuring second state information, wherein the second state information comprises a second sampling identifier, and the second sampling identifier is used for indicating that texture sampling needs to be performed.
In one possible implementation of the present disclosure, responding to feedback information of a GPU, loading corresponding texture data to a video memory according to texture index information, includes:
Responding to the feedback information of the GPU, reading a plurality of attribute information of each fragment from the video memory according to the attribute information storage address recorded in the first state information, and loading corresponding texture data to the video memory according to the texture index information in the attribute information.
In one possible implementation of the present disclosure, before resubmitting the drawing command to the GPU, the method further includes:
And writing the address of the texture data in the video memory into the second state information.
In one possible implementation manner of the present disclosure, each drawing command submitted to the GPU corresponds to a plurality of state information, where the plurality of state information includes first state information and second state information, the first state information includes a first sample identifier for indicating that texture sampling is not required to be performed, the second state information includes a second sample identifier for indicating that texture sampling is required to be performed, and each state information corresponds to an object to be rendered respectively;
Before each submission of a drawing command to the GPU, the method further comprises:
replacing a first sampling identifier in the first state information corresponding to the previous drawing command with a second sampling identifier to obtain second state information;
configuring first state information for a new object to be rendered, wherein a sampling identifier in the first state information is a first sampling identifier;
and taking the obtained second state information and the configured first state information as a plurality of state information corresponding to the current drawing command.
Detailed Description
Before describing embodiments of the present disclosure, it should be noted that: some embodiments of the disclosure are described as process flows, in which the various operational steps of the flows may be numbered sequentially, but may be performed in parallel, concurrently, or simultaneously.
The terms "first," "second," and the like may be used in embodiments of the present disclosure to describe various features, but these features should not be limited by these terms. These terms are only used to distinguish one feature from another.
The term "and/or," "and/or" may be used in embodiments of the present disclosure to include any and all combinations of one or more of the associated features listed.
It will be understood that when two elements are described in a connected or communicating relationship, unless a direct connection or direct communication between the two elements is explicitly stated, connection or communication between the two elements may be understood as direct connection or communication, as well as indirect connection or communication via intermediate elements.
In order to make the technical solutions and advantages of the embodiments of the present disclosure more apparent, the following detailed description of exemplary embodiments of the present disclosure is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments of which are exhaustive. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.
In the related art, in order to load texture data actually required by the GPU into the video memory as much as possible, an application running on the CPU predicts the camera position and loads texture data corresponding to an object that may need to be rendered into the video memory according to the predicted camera position. However, if the camera position predicted by the application program is inaccurate, on the one hand, the GPU may sample texture data that is not loaded into the memory when performing texture sampling, thereby generating a page fault. On the other hand, in the rendering process, the GPU needs to perform coordinate conversion on the vertex of the object to be rendered, and the application program cannot know the result after the coordinate conversion of the GPU, so that the application program needs to load texture data possibly used by the GPU into the video memory, and a certain waste of the video memory space is caused.
The purpose of the present disclosure is to provide an image rendering method, a graphics processor, a graphics processing system, a device and equipment, which avoid page fault and reduce waste of video memory space.
Referring to fig. 1, fig. 1 is a flowchart of an image rendering method according to an embodiment of the disclosure, where the method is applied to a GPU. As shown in fig. 1, the method comprises the steps of:
s110: and receiving a drawing command submitted by the CPU, and acquiring state information corresponding to the drawing command.
The drawing command may specifically be drawcall commands. In some embodiments, the state information is CPU configured, and the CPU may send the state information or an address of the state information to the GPU while submitting drawcall commands to the GPU. The GPU may directly obtain the state information carried by the drawcall command, or may read the corresponding state information from the video memory according to the address carried by the drawcall command.
S120: under the condition that the state information indicates that texture sampling is not required to be executed, geometric processing and attribute interpolation calculation are executed on an object to be rendered so as to determine a plurality of attribute information of each fragment, the attribute information of each fragment is stored into a video memory, feedback information is sent to a CPU, and the CPU loads corresponding texture data into the video memory according to texture index information included in the attribute information; wherein the plurality of attribute information includes texture index information.
In some embodiments, the GPU may perform geometric processing on objects to be rendered, including but not limited to vertex coordinate conversion, clipping operations, backface culling, primitive assembly, and the like, through programmable shaders and/or custom hardware. The GPU may perform attribute interpolation calculations on each primitive of the primitives via custom hardware (e.g., an interpolation calculation module) to determine a plurality of attribute information for each primitive.
The interpolation calculation module calculates corresponding attribute information of each primitive by interpolation according to attribute information of primitive vertexes aiming at each predefined attribute. For example, the predefined attribute includes a depth value, texture index information, and a normal line, and the interpolation calculation module calculates a corresponding depth value for each primitive interpolation in the primitive according to the depth value of the primitive vertex, calculates corresponding texture index information for each primitive interpolation in the primitive according to the texture index information of the primitive vertex, and calculates a corresponding normal line for each primitive interpolation in the primitive according to the normal line of the primitive vertex.
In some embodiments, after the GPU stores the plurality of attribute information of each tile in the video memory, feedback information may be sent to the CPU by generating an interrupt, so that the CPU may load corresponding texture data into the video memory according to texture index information in the plurality of attribute information of each tile. The texture index information may be a texture virtual address, a block number of a texture block (tile), or a texture coordinate, and the specific type of the texture index information is not limited in the present disclosure.
In the method, the GPU cuts the part which is not required to be displayed in the object to be rendered through geometric processing, and eliminates the hidden surface of the object to be rendered, so that the primitive obtained after geometric processing is more likely to be the primitive which is required to be displayed. The GPU calculates a plurality of attribute information of the more likely displayed fragments by carrying out pixel attribute interpolation calculation on each fragment in the primitives. Then, the GPU sends feedback information to the CPU, so that the CPU can load corresponding texture data into the video memory according to the texture index information in the plurality of attribute information of each fragment. Therefore, the CPU loads texture data corresponding to the more likely displayed fragments to the video memory, on one hand, the waste of the video memory space is reduced, and on the other hand, the CPU loads the texture data according to the attribute information of the fragments determined by the GPU instead of the prediction of the CPU on the camera, so that the situation that the GPU samples the texture data which is not loaded to the video memory when performing texture sampling can be avoided, namely page missing errors are avoided.
S130: in the case where the state information indicates that texture sampling needs to be performed, the texture data and the plurality of attribute information of each tile are read from the video memory, and the read texture data and the plurality of attribute information of each tile are input to the tile shader for processing.
In some embodiments, in the case where the state information indicates that texture sampling needs to be performed, geometry processing and attribute interpolation computation are directly skipped (i.e., bypass) for the GPU, the texture data and the plurality of attribute information of each tile are directly read from the video memory, and the read texture data and the plurality of attribute information of each tile are input to the tile shader for processing, so that the color information of each tile is determined according to the texture data and the plurality of attribute information of each tile by the tile shader.
In the present disclosure, rendering is divided into two phases, and in the first phase, state information corresponding to a drawing command indicates that texture sampling is not required to be performed, and a GPU performs geometric processing and attribute interpolation computation on an object to be rendered to determine a plurality of attribute information of each primitive, thereby calculating a plurality of attribute information of primitives more likely to be displayed. Then, the GPU sends feedback information to the CPU, so that the CPU can load corresponding texture data into the video memory according to the texture index information in the plurality of attribute information of each fragment. Therefore, the waste of the display memory space can be reduced, and page fault can be avoided. In the second stage, the state information corresponding to the drawing command indicates that texture sampling needs to be performed, and because the GPU has performed interpolation calculation on the plurality of attributes of each primitive in the first stage, in the second stage, the GPU only needs to read the texture data and the plurality of attribute information corresponding to each primitive from the video memory and send the texture data and the plurality of attribute information to the primitive shader for processing, and does not need to perform geometric processing and attribute interpolation calculation again, so that the flow of the GPU in the second stage can be simplified.
In some embodiments, the state information includes a sample flag that indicates whether texture sampling needs to be performed. After the GPU acquires the state information corresponding to the drawing command, judging whether texture sampling is required to be executed according to the sampling identification in the state information. Wherein, if the texture sampling is not required, the GPU performs the above step S120, and if the texture sampling is required, the GPU performs the above step S130.
In specific implementation, the CPU may pre-configure state information, and store the state information in the display memory, where the state information includes a first sample identifier or a second sample identifier, where the first sample identifier indicates that texture sampling is not required, and the second sample identifier indicates that texture sampling is required. The drawing command submitted by the CPU to the GPU carries the storage address of the state information in the video memory. After the GPU receives the drawing command, the GPU can read the state information from the video memory according to the storage address carried by the drawing command, and judge whether texture sampling is required to be executed according to the sampling identification in the state information.
In some embodiments, after the GPU interpolates and calculates the attribute information of each primitive and stores the attribute information of each primitive into the video memory, the GPU may write the storage address of the attribute information of each primitive in the video memory into the state information, and store the state information written into the storage address into the video memory according to the target address carried by the drawing command. Thus, after receiving the feedback information, the CPU can read out new state information from the video memory according to the target address, inquire the storage address of the attribute information in the video memory from the new state information, and further read out a plurality of attribute information of each fragment from the video memory according to the storage address. The CPU may load corresponding texture data to the video memory according to texture index information among the plurality of attribute information.
In some embodiments, when the GPU reads texture data and the plurality of attribute information of each fragment from the video memory, the GPU may specifically read the texture data from the video memory according to the texture video memory address in the state information; and reading a plurality of attribute information of each fragment from the video memory according to the attribute information storage address in the state information. Wherein, the texture video memory address is written into the state information by the CPU according to the address of the texture data in the video memory.
When the CPU receives feedback information of the GPU, new state information is read from the video memory according to the target address, and a plurality of attribute information of each fragment is read from the video memory according to the address of the attribute information recorded in the new state information in the video memory. And the CPU loads corresponding texture data to the video memory according to the texture index information in the attribute information of each fragment. The CPU takes the address of the texture data in the video memory as the texture video memory address, and writes the texture data into the state information. Or the CPU configures new state information and writes the texture video memory address into the new state information. And the CPU submits a drawing command to the GPU again, wherein the state information corresponding to the drawing command comprises a texture video memory address, and the state information comprises the second sampling identifier.
After the GPU receives the drawing command, according to the second sampling identification contained in the state information, the GPU determines that texture sampling needs to be executed. In this case, the GPU reads texture data from the memory according to the texture memory address in the state information. And the GPU reads a plurality of attribute information of each fragment from the video memory according to the address of the attribute information recorded in the state information in the video memory. The GPU submits the read texture data and attribute information to the fragment shader, so that the fragment shader determines the color information of each fragment according to the texture data and the attribute information.
Referring to fig. 2, fig. 2 is a flowchart of an image rendering method according to an embodiment of the disclosure in a first stage. As shown in fig. 2, the flow of the first stage is as follows:
S201: an application program running in the CPU allocates a virtual address of the video memory for the texture data, and temporarily does not allocate a physical address of the video memory, namely the application program does not load the texture data to the video memory; the application program configures the state information and writes the state information into the video memory.
Wherein, the status information of the first stage at least comprises the following information:
1) The address of the texture data in the video memory is a virtual address because the texture is not loaded into the video memory in the first stage.
2) A sample flag, such as a first sample flag or a second sample flag, for indicating whether texture sampling is performed, is used for the GPU to identify whether texture data is to be read from the video memory. If the sample identifier is the first sample identifier, the GPU does not read texture data from the video memory, i.e., does not sample texture. If the sample identifier is the second sample identifier, the GPU reads texture data from the memory, i.e., performs texture sampling. In the first stage, the sample identifier in the state information is the first sample identifier, which indicates that texture sampling is not performed.
3) The video memory address of the attribute information is empty because the GPU has not performed attribute interpolation calculation when the state information is configured by the application program, i.e., the video memory address of the attribute information is not recorded in the state information.
S202: when the application program submits a drawing command to the GPU, the address of the state information in the video memory is sent to the GPU.
S203: after the GPU receives the drawing command, the state information is read from the video memory according to the address of the state information in the video memory, and texture sampling is not required at this stage according to the first sampling identification contained in the state information.
S204: the GPU determines a plurality of attribute information of each fragment by performing geometric processing and attribute interpolation calculation on an object to be rendered, and stores the plurality of attribute information of each fragment to a display memory.
S205: the GPU writes the address of the attribute information in the video memory (namely the video memory address of the attribute information) into the state information so as to update the state information, and writes the updated state information into the video memory according to the target address provided by the application program. Wherein the target address may or may not be equal to the status information address carried by the drawing command. If the target address is equal to the status information address carried by the drawing command, the updated status information is stored in the memory at the location of the original status information. If the target address is not equal to the state information address carried by the drawing command, the updated state information is not stored in the position of the original state information in the video memory, but is stored in the new position corresponding to the target address.
S206: and the GPU sends feedback information to the CPU through an interrupt mechanism.
S207: and reading updated state information from the video memory according to the target address by an application program running in the CPU.
S208: the application program reads the attribute information of each fragment from the video memory according to the video memory address of the attribute information recorded in the updated state information, and loads corresponding texture data to the video memory according to the texture index information in the attribute information.
S209: the application program records the address of the texture data in the video memory (namely the video memory physical address of the texture data) into the state information, replaces the first sampling identification in the state information with the second sampling identification, and stores the latest state information into the video memory.
Referring to fig. 3, fig. 3 is a flow chart of an image rendering method in a second stage according to an embodiment of the disclosure. As shown in fig. 3, the flow of the second stage is as follows:
s301: when the application program submits a drawing command to the GPU, the address of the latest state information in the video memory is sent to the GPU.
S302: after the GPU receives the drawing command, the state information is read from the video memory according to the address of the state information in the video memory, and the texture sampling is required at the stage according to the second sampling identification contained in the state information.
S303: the GPU skips the steps of geometric processing, attribute interpolation calculation and the like, and directly calls a texture sampler, and the texture sampler reads texture data from a video memory according to the address of the texture data recorded in the state information in the video memory. Or the texture sampler reads the attribute information of each fragment from the video memory according to the attribute information storage address recorded in the state information, and reads the texture data from the video memory according to the texture index information in the attribute information.
S304: the texture sampler submits the read texture data and the attribute information of each fragment to a fragment shader running in the GPU, so that the fragment shader determines the color information of each fragment according to the texture data and the attribute information of each fragment.
In some embodiments, when performing pixel attribute interpolation computation for each primitive, the GPU first interpolates the texture index information of each primitive according to the texture index information of the primitive vertices. After the GPU calculates the texture index information of each fragment, the GPU stores the texture index information of each fragment to a first address in a video memory, records the first address to state information, and then sends feedback information to the CPU. After the GPU sends the feedback information, performing attribute interpolation calculation on other attributes (such as a depth value, a normal line, a tangent line) of each fragment, and storing the other attributes calculated by the interpolation calculation to a second address in the video memory. After receiving the feedback information, the CPU reads out the texture index information of each fragment from the video memory according to the first address recorded in the state information, and loads corresponding texture data to the video memory according to the texture index information. After the CPU loads the texture data, the CPU submits the drawing command to the GPU again. In this embodiment, the GPU performs interpolation calculation on the remaining attribute information while the CPU loads texture data, which is beneficial to improving the image rendering efficiency.
In some embodiments, the drawing command includes a plurality of status information, each status information corresponds to one object to be rendered, each status information includes a sampling identifier, and each status information includes a sampling identifier for indicating whether texture sampling needs to be performed on the corresponding object to be rendered. After the GPU receives the drawing command submitted by the CPU, the GPU may further determine, according to the sampling identifier in each state information, whether texture sampling needs to be performed on the corresponding object to be rendered.
Wherein the object to be rendered may be a model and a primitive to be rendered. The model to be rendered refers to a model to be displayed under the current view angle, taking a game rendering scene as an example, and assuming that a game main angle, an opponent character, a referee character and an arena are required to be rendered under the current view angle, the game main angle, the opponent character, the referee character and the arena are the model to be rendered.
For example, in some cases, each frame of image includes a plurality of models, the plurality of models are sequentially rendered according to a preset rendering order, and each drawing command corresponds to state information of a second stage of the previously rendered model and state information of a first stage of the currently rendered model. For ease of understanding, assume that the current frame picture includes A, B, C models. The first drawing command submitted by the CPU corresponds to the state information of the model A in the first stage, and the GPU determines that texture sampling is not required to be performed on the model A according to the corresponding state information after receiving the first drawing command. The second drawing command submitted by the CPU corresponds to the state information of the model A in the second stage and the state information of the model B in the first stage, and after the GPU receives the second drawing command, the GPU determines that texture sampling is required to be performed on the model A according to the corresponding state information of the second drawing command, and the texture sampling is not required to be performed on the model B. The third drawing command submitted by the CPU corresponds to the state information of the model B in the second stage and the state information of the model C in the first stage, and after the GPU receives the third drawing command, the GPU determines that texture sampling is required to be performed on the model B according to the corresponding state information of the third drawing command, and the texture sampling is not required to be performed on the model C. The fourth drawing command submitted by the CPU corresponds to the state information of the model C in the second stage, and the GPU determines that texture sampling is required to be executed on the model C according to the corresponding state information after receiving the fourth drawing command.
Or, for example, in other cases, one drawing command is used to render a plurality of primitives PRIMITIVE, each of which corresponds to a respective one of the state information. Each state information contains a sample identity, and each state information contains a sample identity used to characterize whether texture sampling needs to be performed on the corresponding primitive.
In the case that the state information corresponding to the drawing command is plural, when the GPU executes the above step S120, specifically, in the case that the drawing command corresponds to the state information indicating that the texture sampling is not required, geometric processing and attribute interpolation computation may be performed on the object to be rendered corresponding to the state information, so as to determine plural attribute information of each fragment.
When the GPU executes the above step S130, if the state information corresponding to the drawing command is plural, and if the state information corresponding to the drawing command indicates that texture sampling needs to be performed, the GPU reads, from the video memory, the texture data corresponding to the object to be rendered and the plural attribute information of each fragment corresponding to the object to be rendered, for the object to be rendered corresponding to the state information.
In some embodiments, the GPU may specifically perform the following sub-steps when performing step S120:
S120-1: in the case where the state information indicates that texture sampling does not need to be performed, geometric processing and attribute interpolation computation are performed on the object to be rendered to determine a plurality of attribute information for each fragment.
S120-2: and performing depth detection on the plurality of fragments to reserve the fragments which are not shielded.
S120-3: and storing the attribute information of the non-occluded fragments into a display memory.
In the substep S120-2, depth search may be performed on the plurality of primitives based on Early-z depth test method, so as to filter out the occluded primitives, and only the non-occluded primitives remain. In this embodiment, before the CPU loads the texture data, the depth detection is performed on the primitives to filter the blocked primitives, so that the CPU will not load the texture data corresponding to the blocked primitives to the video memory, thereby further reducing the waste of the video memory space.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a graphics processor according to an embodiment of the present disclosure. The graphics processor shown in fig. 4 and the image rendering method shown in fig. 1 are based on the same inventive concept, and in order to avoid repetition, the graphics processor is briefly described below. For a particular implementation of the graphics processor, reference may be made to a corresponding particular implementation of the image rendering method. As shown in fig. 4, the graphic processor includes:
the command receiving module 410 is configured to receive a drawing command submitted by the CPU.
The status information obtaining module 420 is configured to obtain status information corresponding to the drawing command.
The geometry processing module 430 is configured to perform geometry processing on the object to be rendered in a case where the state information indicates that texture sampling is not required to be performed.
The interpolation calculation module 440 is configured to perform attribute interpolation calculation on the object to be rendered to determine a plurality of attribute information of each fragment when the state information indicates that texture sampling is not required, store the plurality of attribute information of each fragment to a display memory, and send feedback information to the CPU, so that the CPU loads corresponding texture data to the display memory according to the texture index information; wherein the plurality of attribute information includes texture index information.
And a texture sampler 450 for reading texture data and the attribute information of each tile from the video memory and inputting the read texture data and the attribute information of each tile into the tile shader for processing, in case the state information indicates that the texture sampling needs to be performed.
Optionally, the state information includes a sampling identifier, where the sampling identifier is used to characterize whether texture sampling needs to be performed, and the texture sampler is further used to determine whether texture sampling needs to be performed according to the sampling identifier in the state information.
Optionally, before sending the feedback information to the CPU, the interpolation calculation module is further configured to write the storage addresses of the plurality of attribute information of each primitive in the video memory into the state information, and store the state information written into the storage addresses into the video memory according to the target address carried by the drawing command.
Optionally, the texture sampler is specifically configured to, when reading texture data and a plurality of attribute information of each tile from the video memory: reading texture data from the video memory according to the texture video memory address in the state information; wherein, the texture video memory address is written into the state information by the CPU according to the address of the texture data in the video memory; and reading a plurality of attribute information of each fragment from the video memory according to the attribute information storage address in the state information.
Optionally, the state information corresponding to the drawing command is multiple, each state information corresponds to one object to be rendered respectively, each state information contains a sampling identifier, and the sampling identifier contained in each state information is used for representing whether texture sampling needs to be performed on the corresponding object to be rendered.
The texture sampler is further configured to determine, according to the sampling identifier in each state information, whether texture sampling needs to be performed on the corresponding object to be rendered.
The geometry processing module is specifically configured to execute geometry processing for an object to be rendered corresponding to state information when the state information indicating that texture sampling is not required to be executed is corresponding to the drawing command when executing geometry processing for the object to be rendered.
The interpolation calculation module is specifically configured to perform attribute interpolation calculation for an object to be rendered corresponding to state information when the drawing command corresponds to the state information indicating that texture sampling is not required to be performed when the attribute interpolation calculation is performed for the object to be rendered.
Optionally, when the texture sampler reads the texture data and the plurality of attribute information of each fragment from the video memory, the texture sampler is specifically configured to, when the drawing command corresponds to the state information indicating that the texture sampling needs to be performed, read, for an object to be rendered corresponding to the state information, the texture data corresponding to the object to be rendered and the plurality of attribute information of each fragment from the video memory.
The graphics processor further includes: and the depth detection module is used for carrying out depth detection on the plurality of fragments so as to reserve the fragments which are not shielded.
The interpolation calculation module is specifically configured to store the attribute information of the non-occluded fragment into the video memory when storing the attribute information of each fragment into the video memory.
Based on the same inventive concept, the embodiments of the present disclosure also provide a graphics processing System, which may be a die, a SOC (System on Chip) with multiple die interconnections, or other organization forms.
The architecture and the working principle of the graphics processing system provided in the present disclosure are described below by taking one die as an example.
In one embodiment shown in FIG. 5, a single die graphics processing system includes multiple GPU cores (i.e., the graphics processor described in any of the embodiments above).
Each GPU core is used for processing drawing commands, and executing Pipeline of image rendering according to the drawing commands and also can be used for executing other operation commands; the multiple GPU cores as a whole perform drawing or other computing tasks. Each GPU core further includes: the computing unit is used for executing instructions compiled by the shader, belongs to a programmable module and consists of a large number of ALUs; a Cache (Cache) for caching GPU-kernel data to reduce access to memory; a rasterization module, a fixed stage of the 3D rendering pipeline; a dicing (Tilling) module, configured to dice a frame in the TBR and TBDR GPU architecture; the geometric processing module is used for carrying out coordinate conversion on vertex data at a fixed stage of the 3D rendering pipeline and cutting out primitives which are out of an observation range or are not displayed on the back; the texture sampler is used for reading texture data from the video memory and sending the read texture data to the fragment shader for processing; the post-processing module is used for performing operations such as zooming, cutting, rotating and the like on the drawn graph; microcores (microcores) for scheduling between various pipeline hardware modules on a GPU core, or for task scheduling for multiple GPU cores.
As shown in fig. 5, the graphics processing system may further include:
The network on chip is used for data exchange among all IP cores on the graphics processing system;
A universal DMA (Direct Memory Access ) for performing data movement between the host side to graphics processing system memory (e.g., graphics card memory), e.g., moving vertex (vertex) data of a 3D drawing from the host side to graphics processing system memory via DMA;
The PCIe controller is used for realizing PCIe protocol through the interface communicated with the host, so that the graphics processing system is connected to the host through the PCIe interface, and programs such as a graphics API, a driver of a display card and the like are run on the host;
The application processor is used for scheduling tasks of each module on the graphic processing system, for example, the GPU is notified to the application processor after rendering a frame of image, and the application processor is restarted to display the image drawn by the GPU on a screen by the display controller;
the memory controller is used for connecting a system memory and storing data on the SOC;
A display controller for controlling the frame buffer in the system memory to be output to the display by a display interface (HDMI, DP, etc.);
Video decoding, which can decode the coded video on the host hard disk into pictures capable of being displayed;
The original video code stream on the hard disk of the host can be coded into a specified format and returned to the host.
Based on the graphics processing system shown in fig. 5, in one embodiment, the GPU receives a drawing command submitted by the CPU based on the PCIe interface, and the GPU obtains state information corresponding to the drawing command from the video memory. In the case that the state information indicates that texture sampling is not required, the geometry processing module performs geometry processing on the object to be rendered, including vertex coordinate conversion, model clipping, back surface culling, primitive assembly, and the like. Then, for each primitive, performing attribute difference calculation on each primitive in the primitives by an interpolation calculation module in the rasterization module, wherein the attribute of each primitive comprises: depth values, texture imperative information, normals, tangents, etc. The interpolation calculation module stores the attribute information of each fragment into a video memory, and records the attribute information video memory address into the state information. The GPU gives feedback to the CPU based on an interrupt mechanism, so that the CPU obtains the attribute information video memory address of each fragment from the state information, reads texture index information from the video memory according to the attribute information video memory address, and loads texture data to be sampled to the video memory according to the texture index information. The CPU also writes the texture video memory address of the texture data into the state information, and then the CPU sends the drawing command to the GPU again.
The GPU receives a drawing command submitted by the CPU based on the PCIe interface, and acquires state information corresponding to the drawing command from the video memory. In the case that the state information indicates that texture sampling is required, the geometry processing module does not perform geometry processing operation, the interpolation calculation module does not perform attribute information interpolation calculation operation, texture data is directly read from a video memory by a texture sampler according to a texture video memory address in the state information, and the texture sampler also reads a plurality of attribute information of each fragment from the video memory according to an attribute information storage address in the state information. The texture sampler sends the read texture data and attribute information to a fragment shader for processing, thereby determining the color of each fragment.
It should be noted that, the implementation of the graphics processing system provided in the embodiments of the present disclosure is described above by taking the specific structure shown in fig. 5 as an example. In practical applications, the specific implementation of the graphics processor that may be included in the graphics processing system may refer to any of the foregoing embodiments, which are not described herein again.
The embodiment of the disclosure also provides an electronic device, which comprises the graphics processing system. In some use cases, the product form of the electronic device is embodied as a graphics card; in other use scenarios, the product form of the electronic device is embodied as a CPU motherboard.
The embodiment of the disclosure also provides electronic equipment, which comprises the electronic device. In some use scenarios, the product form of the electronic device is a portable electronic device, such as a smart phone, a tablet computer, a VR device, etc.; in some use cases, the electronic device is in the form of a personal computer, game console, workstation, server, etc.
Referring to fig. 6, fig. 6 is a schematic diagram of an image rendering method provided by an embodiment of the present disclosure, which may be performed by a CPU or an application running on the CPU. The image rendering method shown in fig. 6 is based on the same inventive concept as the image rendering method shown in fig. 1, and the image rendering method shown in fig. 6 is briefly described below in order to avoid repetition. For a specific implementation of the image rendering method, reference may be made to a corresponding specific implementation of the image rendering method in fig. 1. As shown in fig. 6, the image rendering method includes:
s610: submitting a drawing command to the GPU, wherein the drawing command corresponds to first state information used for representing that texture sampling is not required to be performed, enabling the GPU to perform geometric processing and attribute interpolation calculation according to an object to be rendered so as to determine a plurality of attribute information of each fragment, and storing the attribute information of each fragment into a display memory; wherein the plurality of attribute information includes texture index information.
S620: responding to the feedback information of the GPU, and loading corresponding texture data to the video memory according to the texture index information.
S630: and submitting a drawing command again to the GPU, wherein the drawing command corresponds to second state information for representing that texture sampling needs to be executed, so that the GPU reads texture data and a plurality of attribute information of each fragment from a video memory, and inputs the read texture data and the plurality of attribute information of each fragment into a fragment shader for processing.
Optionally, before submitting the drawing command to the GPU, the method further comprises: the first state information is configured, the first state information containing a first sample identification, the first sample identification being used to indicate that texture sampling need not be performed.
Optionally, before resubmitting the drawing command to the GPU, the method further comprises: replacing a first sampling identifier in the first state information with a second sampling identifier to obtain second state information, wherein the second sampling identifier is used for indicating that texture sampling needs to be performed; or configuring second state information, wherein the second state information comprises a second sampling identifier, and the second sampling identifier is used for indicating that texture sampling needs to be performed.
Optionally, when executing step S620, specifically, in response to the feedback information of the GPU, multiple attribute information of each primitive may be read from the video memory according to the attribute information storage address recorded in the first state information, and corresponding texture data may be loaded into the video memory according to the texture index information in the multiple attribute information.
Optionally, before resubmitting the drawing command to the GPU, the method further comprises: and writing the address of the texture data in the video memory into the second state information.
Optionally, each drawing command submitted to the GPU corresponds to a plurality of state information, where the plurality of state information includes first state information and second state information, the first state information includes a first sample identifier for indicating that texture sampling is not required, the second state information includes a second sample identifier for indicating that texture sampling is required, and each state information corresponds to an object to be rendered.
Before each submission of a drawing command to the GPU, the method further comprises: replacing a first sampling identifier in the first state information corresponding to the previous drawing command with a second sampling identifier to obtain second state information; configuring first state information for a new object to be rendered, wherein a sampling identifier in the first state information is a first sampling identifier; and taking the obtained second state information and the configured first state information as a plurality of state information corresponding to the current drawing command.
While the preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the disclosure.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present disclosure without departing from the spirit or scope of the disclosure. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.