Disclosure of Invention
The invention provides a method and a device for realizing real-time global illumination of a mobile terminal, which are used for solving the problem of realizing dynamic global illumination by fully utilizing a hardware structure of a mobile platform and realizing high-performance and low-power consumption.
According to one aspect of the invention, a method for realizing real-time global illumination of a mobile terminal is provided, which comprises the following steps:
Rendering a shadow map of the picture;
Generating a geometric buffer of the Lumen scene according to the shadow map on the block buffer area, and calculating direct illumination in the surface buffer;
the accumulated result of multi-frame ray ejection is obtained from the voxel illumination of the previous frame, and the final illumination is formed to serve as indirect illumination of the current frame;
and directly adding the calculated diffuse reflection indirect illumination in the whole Lumen process and the calculated indirect illumination of the rough mirror surface in the reflection step with the calculated direct illumination result in the rendering process to form a global illumination result.
The shadow map of the rendered picture comprises:
And firstly rendering the depth of the whole scene, and then judging whether the current rendering pixel point is shielded or not by comparing the depths to obtain a shadow map.
The method further comprises the steps of:
the noise reduction processing is carried out on the shadow map, and specifically comprises the following steps:
recoding the data format by adopting a space-time difference guided filtering algorithm, and simultaneously carrying out noise reduction on 4 light sources in one noise reduction calculation;
dividing 4 textures for transferring data into two groups, each group creating a data format of RGBA32UINT and RGBA 8;
Information for noise reduction calculation on a light source is stored on each single channel in each texture, and blur radius, transmission distance, light miss count and sampling count are stored respectively, and are sampled in the noise reduction calculation.
The method further comprises the steps of:
And transmitting data back and forth among a plurality of calculation steps by using the 4 intermediate texture formats, wherein the output data corresponding to each step is input data of the next step, and the 4 intermediate texture formats are alternated back and forth to obtain a noise reduction result.
Generating a geometric buffer of the Lumen scene according to the shadow map on the blocking buffer area, and calculating direct illumination in the surface buffer, wherein the method comprises the following steps:
and firstly, rendering the geometric Buffer G-Buffer through a basic rendering path on the Buffer area of the partitioned chip, and drawing all basic geometric information related to the scene in the current screen for sampling information of a subsequent rendering process.
The method further comprises the steps of:
Copying selected necessary data in a basic rendering path from a block chip buffer area to a system buffer area, wherein the data comprises inherent color, normal, metaliness, roughness, reflectivity and depth information.
The method further comprises the steps of:
The basic rendering stage to the illumination stage are completed on the whole block chip buffer area, and the whole rendering process occurs in one rendering path;
the final combination and reflection steps are all calculated after the whole direct illumination after the buffer area of the segmented chip;
And finally, combining the direct illumination and the indirect illumination obtained by coloring and rendering together to serve as a rendering result of the global illumination.
According to another aspect of the present invention, there is provided a mobile terminal real-time global illumination implementation apparatus, including:
A shadow rendering unit for rendering a shadow map of the picture;
the direct illumination calculation unit is used for generating a geometric buffer of the Lumen scene according to the shadow map on the block buffer area and calculating direct illumination in the surface buffer;
the indirect illumination calculation unit is used for obtaining the accumulated result of multi-frame light ejection from the voxel illumination of the previous frame to form final illumination serving as the indirect illumination of the current frame;
The global illumination acquisition unit is used for directly adding the calculated diffuse reflection indirect illumination in the whole Lumen process and the calculated indirect illumination of the rough mirror surface in the reflection step with the calculated direct illumination result of the rendering process to form a global illumination result.
The apparatus further comprises:
The device comprises a shadow map, a noise reduction unit, 4 textures used for transmitting data, a noise reduction unit, a processing unit and a processing unit, wherein the shadow map is subjected to noise reduction processing, recoding is conducted on a data format by adopting a space-time difference guided filtering algorithm, 4 light sources are simultaneously subjected to noise reduction in one noise reduction calculation, 4 textures used for transmitting data are divided into two groups, RGBA32UINT and RGBA8 data formats are created in each group, information used for noise reduction calculation on one light source is stored on each single channel in each texture, blur radius, transmission distance, light miss count and sampling count are respectively stored, the information is sampled in the noise reduction calculation, the 4 intermediate texture formats are used for transmitting data back and forth among a plurality of calculation steps, output data corresponding to each step are input data of the next step, and the 4 intermediate texture formats are alternated back and forth, so that a noise reduction result is obtained.
The apparatus further comprises:
the partitioned chip cache area is used for completing the whole rendering process from a basic rendering stage to an illumination stage in one rendering path;
The direct illumination calculation unit is used for firstly rendering the geometric Buffer G-Buffer through a basic rendering path on the block chip Buffer area, drawing all scene-related basic geometric information in a current screen and sampling information for a subsequent rendering process, copying selected necessary data in the basic rendering path, and copying the selected necessary data from the block chip Buffer area to the system Buffer area, wherein the data comprise inherent color, normal line, metaliness, roughness, reflectivity and depth information.
By adopting the technical scheme of the invention, a real-time global illumination implementation scheme of the mobile terminal is provided, a geometric buffer G-buffer of the Lumen scene LumenScene is generated on a block buffer area according to a shadow map, and direct illumination is calculated in a Surface buffer Cache. And obtaining the accumulated result of multi-frame light ejection Bounce from the Voxel illumination of the previous frame to form final illumination FINAL LIGHTING as indirect illumination of the current frame. And directly adding the calculated diffuse reflection indirect illumination in the whole Lumen process and the calculated indirect illumination of the rough mirror surface in the reflection step with the calculated direct illumination result in the rendering process to form a global illumination result.
According to the scheme, the segmented chip buffer area is fully utilized for rendering, the synthesis sequence of the Lumen indirect illumination is changed, the leading rendering flow is enabled to occur in the segmented chip buffer area as much as possible, so that higher performance is obtained, and the global illumination efficient scheme at the mobile terminal is realized.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Global illumination (Global Illumination, GI) refers to a rendering technique that considers both direct illumination from a light source in a scene and indirect illumination after reflection by other objects in the scene.
Lumen is a real-time global illumination and reflection system that can provide full-dynamic indirect illumination and reflection effects for scenes. The Lumen system greatly improves global illumination because it more accurately simulates the interaction of light sources with objects, and the transfer of light sources and reflections between objects.
The partitioned chip buffer (Tiled memory) is a cache on the integrated chip that is close to the GPU core, which is very efficient to access and low in power consumption due to its close proximity to the GPU core, so high performance rendering on mobile devices requires full utilization Tiled memory. Because Tiled memory is expensive, the buffer space design is small enough to accommodate the size of only one Tile, approximately 32x32 screen pixels. Therefore, the GPU of the mobile device works by dividing the whole screen into a plurality of tiles, and rendering the tiles one by one. It is also for this reason that Tiled memory cannot be randomly sampled as a complete texture.
The segmented chip buffer is a block of cache located on the mobile device chip, and memory access is very efficient in the segmented chip buffer. But the segmented chip buffer also has its own limitations on which random sampling cannot be buffered for texture. In order to fully utilize the hardware characteristics of the block chip buffer area of the mobile platform, the problem of random sampling of textures is solved in the scheme of the application, and the whole rendering process is completely executed on the block chip buffer area and then is synthesized with the indirect illumination result of the Lumen. Therefore, the scheme of the application can obtain higher execution efficiency on the mobile platform.
Fig. 1 is a flowchart of a method for implementing real-time global illumination of a mobile terminal in an embodiment of the invention. As shown in fig. 1, the mobile terminal real-time global illumination implementation method comprises the following steps:
And 101, rendering a shadow map of the picture.
In the embodiment of the invention, shadows are calculated through ray tracing. Firstly, rendering the depth of the whole scene, and then judging whether the current rendering pixel point is shielded or not by comparing the depths to obtain a shadow map. Since the quality of the rendered pixels obtained in the ray tracing step is related to the number of samples of the emitted ray samples, in order to obtain high performance execution efficiency during operation, ray tracing is usually emitted only once for each pixel, which results in an insufficient number of samples of the ray samples, which may generate noise, and noise reduction is further performed. And obtaining a high-quality image after noise reduction through some existing noise reduction algorithm schemes, and then entering a coloring process on the buffer area of the whole block chip.
And 102, generating a geometric buffer G-buffer of the Lumen scene on the block buffer area according to the shadow map, and calculating direct illumination in a Surface buffer Cache.
In the embodiment of the present invention, G-Buffer refers to Geometry Buffer, i.e. "Geometry Buffer". Unlike ordinary rendering of colors only into textures, G-Buffer refers to buffers containing geometric information of color, normal, reflectivity, metallization, roughness, etc., i.e., textures containing the above information.
In the embodiment of the invention, the geometric Buffer G-Buffer is rendered on the Buffer area of the partitioned chip through the Base Pass, wherein the Base Pass is the most basic rendering step, and all scene-related Base geometric information in the current screen is to be drawn and used for sampling information for the subsequent rendering process for calculation.
After the base rendering path BasePass is finished, some necessary geometry Buffer G-Buffer data are parsed, the parsing process is to copy the data in the base rendering path BasePass, and copy the data from the block chip Buffer to the system Buffer, and the corresponding geometry Buffer is expanded, where the geometry data include intrinsic color, normal, metal, roughness, reflectivity and depth information. The intrinsic color, the metal degree, the roughness and the reflectivity information are used for coloring materials, the normal and the reflectivity information participate in illumination calculation, and the depth information is used for reconstructing positions in world space. Because the subsequent final collection step FINAL GATHER needs to randomly sample the geometry Buffer G-Buffer, but there is no way to randomly sample the Buffer of the segmented chip, we parse some necessary data onto the system Buffer texture, so that the texture on the system Buffer is sampled in the final collection step, and the corresponding information can be obtained.
And 103, obtaining the accumulated result of multi-frame light ejection from the Voxel illumination of the previous frame to form final illumination FINAL LIGHTING as indirect illumination of the current frame.
In the embodiment of the present invention, the phase from the base rendering BasePass to the illumination Lighting is completed in the whole block chip buffer, and the whole rendering process Shading occurs in one rendering path RENDERPASS. Therefore, there is no way to re-interleave the computation logic in the rendering path RENDERPASS, for both the final combination FINAL GATHER and the Reflection step Reflection in the Lumen flow, after the segmented chip cache, i.e., after the entire direct illumination. And finally, combining the direct illumination and the indirect illumination after coloring rendering Shading to be used as a rendering result of our global illumination.
The size of the voxels can be divided according to actual needs. Each voxel contains material information of the current voxel including, but not limited to, diffuse reflectance (Albedo), roughness (Roughness), metallic, transparency (Alpha), normal, etc.
Step 104, directly adding the calculated diffuse reflection indirect illumination in the whole Lumen process and the calculated indirect illumination of the rough mirror surface in the reflection step with the calculated direct illumination result in the rendering process to form a global illumination result.
In the embodiment of the invention, the rendering pipeline is kept to be executed on the block chip buffer area as much as possible, so as to obtain higher performance. The order of composition can be adjusted in this way because both direct and indirect illumination each correctly calculate the occlusion, which comes from either shadow mapping or ray tracing shadows, and indirect illumination from distance field tracing shadows calculated in the Lumen scene illumination, so they can be correctly combined together. If the occlusion is not calculated correctly, the result is brighter. Finally, we combine with the indirect illumination of Lumen to obtain the final image.
As shown in fig. 2, a schematic diagram of a rendering pipeline of a frame of image is shown, where the Lumen scene represents the update procedure of the entire Lumen scene. The geometric buffer G-buffer of the Lumen scene LumenScene is generated from the beginning, namely the Surface buffer Surface Cache is generated from the grid piece information MeshCard, then the direct illumination is calculated in the Surface buffer Surface Cache, the accumulated result of multi-frame light ejection Bounce is obtained from the Voxel illumination Voxel Lighting of the previous frame, the final illumination FINAL LIGHTING is formed as the indirect illumination of the current frame, and the final illumination of the current frame is contributed to the Voxel illumination of the next frame to be used as the multi-frame light ejection result. This is the rendering Lumen SCENE LIGHTING flow of the entire Lumen scene illumination.
The scene is rendered in the lower box of fig. 2, which represents the mobile-side rendering pipeline. The rendering process occurring in the partitioned chip buffer is completed from the basic rendering BasePass stage to the illumination Lighting stage over the partitioned chip buffer, and the whole rendering process Shading occurs in one rendering path RENDERPASS, so that there is no way to re-insert the calculation logic in the rendering path RENDERPASS, so that the final combination FINAL GATHER and Reflection step Reflection in the Lumen process are calculated after the partitioned chip buffer, that is, after the whole direct illumination, and finally we combine the direct illumination and indirect illumination obtained after the coloring rendering Shading as the rendering result of our global illumination.
As shown in fig. 3, a data flow diagram of the whole rendering pipeline is shown, wherein a shadow map is first rendered, shadows are calculated by ray tracing, the depth of the whole scene needs to be rendered first, and then whether the current rendering pixel point is blocked or not is judged by comparing the depths, so as to obtain the shadow map. Since the rendered pixels obtained in the ray tracing step are related to the number of samples of the emitted ray, breakfast occurs when the number of samples is insufficient, noise reduction is further performed. And obtaining a high-quality image after noise reduction through some existing noise reduction algorithm schemes, and then entering a coloring process on the buffer area of the whole block chip.
The noise reduction algorithm in the embodiment of the application adopts SVGF (Spatiotemporal Variance-Guided Filter, space-time difference Guided filtering). Based on the algorithm, recoding is carried out on the data format, so that 4 light sources can be simultaneously subjected to noise reduction in one noise reduction calculation, and the noise reduction efficiency is improved. The 4 textures used to transfer data are divided into two groups, each group creating the data format of RGBA32UINT and RGBA8, as shown in FIG. 4. Information for noise reduction calculations on one light source is stored on each single channel in each texture, respectively blur radius WorldBluringRadius, transmission distance TransmissionDistance, ray miss count MissCount, and sample count SampleCount, which are sampled in the noise reduction calculations.
After recoding according to the information, the shadow of 4 light sources can be calculated simultaneously by realizing one shadow noise reduction calculation step, so that the total calculation step cost of multi-light source shadow noise reduction is greatly reduced. The 4 intermediate texture formats are used to pass data back and forth between the multiple computation steps, with the output data corresponding to each step being the input data for the next step, thus alternating back and forth between the 4 intermediate textures, as shown in fig. 5.
As shown in FIG. 6, a shadow rendering process is schematically illustrated, wherein a geometry Buffer G-Buffer is first rendered by a base render BasePass on a tiled chip Buffer. The basic rendering path is the most basic rendering step, and all scene-related basic geometric information in the current screen is to be drawn for providing calculation use for the subsequent rendering flow.
After the Base rendering path Pass is finished, some necessary geometry Buffer G-Buffer data are parsed, namely the geometry Buffer in fig. 3 is expanded, and the geometry data comprise inherent color, normal, metallization, roughness, reflectivity and depth information. The intrinsic color, the metal degree, the roughness and the reflectivity information are used for coloring materials, the normal and the reflectivity information participate in illumination calculation, and the depth information is used for reconstructing positions in world space. Since the subsequent final collection step Lumen FINAL GATHER requires random sampling of the geometry Buffer G-Buffer, and there is no way to randomly sample the segmented chip buffers, we parse some of the necessary data into the system cache.
In the embodiment of the invention, in order to improve the performance at the mobile end, the buffer area of the segmented chip is fully utilized for rendering, the synthesis sequence of the Lumen indirect illumination is adjusted, and the synthesis step is put at the last, so that the leading-in rendering flow can be generated in the buffer area of the segmented chip as much as possible, and higher performance is obtained. The composition step is put to the end and is actually not performed on the segmented chip buffer, but rather in the system memory, and cannot be performed on the segmented chip buffer because the texture random sampling is largely used in this step and is implemented based on the compute shader.
In order to realize the above flow, the technical scheme of the invention also provides a mobile terminal real-time global illumination realizing device, which comprises:
A shadow rendering unit for rendering a shadow map of the picture;
the direct illumination calculation unit is used for generating a geometric buffer of the Lumen scene according to the shadow map on the block buffer area and calculating direct illumination in the surface buffer;
the indirect illumination calculation unit is used for obtaining the accumulated result of multi-frame light ejection from the voxel illumination of the previous frame to form final illumination serving as the indirect illumination of the current frame;
The global illumination acquisition unit is used for directly adding the calculated diffuse reflection indirect illumination in the whole Lumen process and the calculated indirect illumination of the rough mirror surface in the reflection step with the calculated direct illumination result of the rendering process to form a global illumination result.
The apparatus further comprises:
The device comprises a shadow map, a noise reduction unit, 4 textures used for transmitting data, a noise reduction unit, a processing unit and a processing unit, wherein the shadow map is subjected to noise reduction processing, recoding is conducted on a data format by adopting a space-time difference guided filtering algorithm, 4 light sources are simultaneously subjected to noise reduction in one noise reduction calculation, 4 textures used for transmitting data are divided into two groups, RGBA32UINT and RGBA8 data formats are created in each group, information used for noise reduction calculation on one light source is stored on each single channel in each texture, blur radius, transmission distance, light miss count and sampling count are respectively stored, the information is sampled in the noise reduction calculation, the 4 intermediate texture formats are used for transmitting data back and forth among a plurality of calculation steps, output data corresponding to each step are input data of the next step, and the 4 intermediate texture formats are alternated back and forth, so that a noise reduction result is obtained.
The apparatus further comprises:
the partitioned chip cache area is used for completing the whole rendering process from a basic rendering stage to an illumination stage in one rendering path;
The direct illumination calculation unit is used for firstly rendering the geometric Buffer G-Buffer through a basic rendering path on the block chip Buffer area, drawing all scene-related basic geometric information in a current screen and sampling information for a subsequent rendering process, copying selected necessary data in the basic rendering path, and copying the selected necessary data from the block chip Buffer area to the system Buffer area, wherein the data comprise inherent color, normal line, metaliness, roughness, reflectivity and depth information.
In summary, the technical scheme of the invention provides a real-time global illumination implementation scheme of a mobile terminal, wherein on a block buffer area, a geometric buffer of a Lumen scene is generated according to a shadow map, and direct illumination is calculated in a surface buffer. And obtaining the accumulated result of multi-frame ray ejection from the voxel illumination of the previous frame to form final illumination as indirect illumination of the current frame. And directly adding the calculated diffuse reflection indirect illumination in the whole process and the calculated indirect illumination of the rough mirror surface in the reflection step with the calculated direct illumination result of the rendering process to form a global illumination result. According to the scheme, the segmented chip buffer area is fully utilized for rendering, the Lumen synthesis step is put at the end, and the leading rendering flow is enabled to occur in the segmented chip buffer area as much as possible, so that higher performance is obtained, and the global care efficient scheme at the mobile terminal is realized.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.