[go: up one dir, main page]

Academia.eduAcademia.edu
Single-Image Specular Highlight Removal on the GPU COSTIN-ANTON BOIANGIU, RAZVAN MADALIN OITA, MIHAI ZAHARESCU Computer Science Department “Politehnica” University of Bucharest Splaiul Independentei 313, Sector 6, Bucharest, 060042 ROMANIA costin.boiangiu@cs.pub.ro, razvan.oita@siggraph.org, mihai.zaharescu@cs.pub.ro Abstract: Highlights on textured surfaces are linear combinations of diffuse and specular reflection components. It is sometimes necessary to separate these lighting elements or completely remove the specular light, especially as a preprocessing step for computer vision. Many methods have been proposed for separating the reflection components. The method presented in this article improves on an existing algorithm by porting it on the GPU in order to optimize the speed, using new features found in DirectX11. New test results are also offered. Key-Words: Specular Removal, GPU porting, DirectX11, computer vision 1 Introduction Separation of diffuse and specular lighting components is an important subject in the field of computer vision, since many algorithms in the field assume perfect diffuse surfaces. For example, a stereo matching algorithm would expect to find similar changes in the positions of the detected regions of interest; however, the specular reflection actually moves over the surface and doesn’t remain stuck to the same part of the object. Segmentation algorithms are highly influenced by the strong and sharp specular lights; the specular lit object will thus manifest two very different regions. Object tracking can also be disturbed by the sudden appearance of a strong specular light on an object that seemed constantly lit for a number of frames. 3D object reconstruction from three images of the same scene being lit from different angles is very easy to achieve, by calculating the normals from the diffuse light intensities, but specular lights again disturb the simple light calculations. In conclusion, any imagebased classification is influenced by the variations in lighting [7]. In the real world almost all dielectrics exhibit specular highlights (except those with extremely rough surfaces or a low Fresnel reflectance at normal incidence, like chalk), while metals exhibit only specular reflection and no diffuse contribution. Thus, it's important to develop algorithms focused on removing specular highlights from textured surfaces. The algorithm presented in this article improves on an existing algorithm by Tan and Ikeuchi [1], by adapting their method to run on a modern GPU. This is done using the newest features of the DirectX11 API, allowing for random read and write access from and to buffers containing various types of data. 2 Related Work In computer graphics certain models are used to approximate the intensity distribution of lighting components. Diffuse reflections approximately follow Lambert's Law [2], their intensity is determined by the angle between the surface normal at a point and the vector from the light source. Specular reflections can be modeled based on a micro-facet model, such as the Torrance-Sparrow model [3], which considers surfaces as consisting of tiny perfect reflectors, averaging their normals using a normal distribution function. There are many existing algorithms based on reflectance, models that separate reflection components, but most of them require additional images besides the original one (ex: [6]). This is because the general case of retrieving two intrinsic images from a single one is ill posed, meaning that we have two unknowns and a single equation. The ability to reconstruct those two images is more or less the result of an estimation process, but the helping hand comes from the fact that a color image contains not a single channel, but three different color channels. Physical properties of the color components for the specular light are different than those of the diffuse light. Two examples are color and polarization: the reflected light is very polarized, meaning that it can be filtered with a polarizing filter. This technique is used to test the effectiveness of software based solutions. And the most important part for us, the specular light bounces off the surface as soon as it touches it, meaning that it doesn’t diffuse through the object to be affected by the object’s color. Thus, the specular reflection has the color of the light source, while the diffuse light, that penetrated the object deeper and got scattered by it, has a mixture of the light source color and the object’s color. In their paper [1], Tan and Ikeuchi introduced an algorithm based on Shafer's dichromatic reflection model (model explained here: [4]). They used a single input image, normalized the illumination color using its chromaticity, thus obtaining an image with a pure white specular component. By shifting the intensity and maximum chromaticities of pixels non-linearly, while retaining their hue a specularfree image was generated. This specular-free image has diffuse geometry identical to the normalized input image, but the surface colors are different. In order to verify if a pixel is diffuse, they used intensity logarithmic differentiation on both the normalized image and the specular-free image. This diffuse-pixel check is used as a termination condition inside an iterative algorithm that removes specular components step by step until no specular reflections exist in the image. These processes are done locally, by sampling a maximum of only two neighboring pixels. Most of the presented methods can’t be used in real-time. There are however implementations that process an image in less than a second, ex: [5]. The algorithm we chose to implement also has the drawback of being to slow. 3 GPU Processing Overview In the past years, processing on the GPU has started to become widely adopted. This is because the GPU has a very different architecture from the CPU, one that is very suited for processing images. Image processing usually means applying the same identical operations over the entire surface of the image: on every pixel or on multiple local regions, overlapped or tiled. Modern GPUs have multiple Processing Units (SIMD Cores), the equivalent of a CPU core. Each Processing unit contains multiple Processing elements (Shader Cores). Each shader core can work on vectors of multiple elements. Quantitative, the total number of theoretical operations that can be done in a cycle is in the order of thousands, where on a modern PC CPU it is around 4. Another advantage is memory speed. Even though accessing GPU RAM takes orders of magnitude more than accessing the GPU Cache (similarly on the CPU), the GPU takes advantage of its possibility to run millions of threads seemingly in parallel. When a group of threads is waiting for a response from the slower memory, with very little overhead, a new group of threads is started. This is like processing a 36x36 tile, and while we are waiting for the data to come, we start a new tile in the same processors and put the old threads on wait. The problem is that mapping the desired problem to the card architecture is seldom possible. From the thousands of threads that can potentially be running in parallel, we may end up with hundreds or tens. But this also gives a two orders of magnitude speedup over the CPU, on simple image processing. For complex problems, the speedup can become sub-unitary, as resources are wasted and memory transfer is slow. This paper tries to map a problem on a GPU architecture in order to obtain a real speedup by massively parallelizing the operations. 4 The GPU Algorithm The algorithm we decided to port to the GPU is a specularity removal problem, because this is a widely-used (and sometimes necessary) preprocessing phase for a lot of computer vision algorithms. We chose an algorithm that works on single images, in order to be integrated easily in existing applications, is robust, so it doesn’t produce unrecognizable results when the input data doesn’t fit the requirements, and most importantly, it has the possibility to be split into multiple individual tasks, in order to be able to benefit from the highly parallel structure of the GPU. We went with Tan and Ikeuchi’s [1] proposed solution. The authors started from the reflection model containing both the specular and diffuse spectral functions and energy diffusion and rewrote it in order to find distinguishable characteristics between the two illumination types. The entire technique can be split into two processing approaches:  A single pixel processing step that plots the colors in a different coordinate system in order to estimate the amount of diffuse and specular components. This phase is perfect for parallelization.  A phase that restores the specular free images, verifying the amounts of specular reflection of neighboring pixels. This phase is not as easy to port on the GPU, as neighboring elements need to be accessed simultaneously, and because the processing is done on just one of the neighboring pixels, without priory knowing which is the one that needs to have its specular value lowered. Another limiting factor is that the algorithm is iterative, imposing a strict parallelization limit, but we hope that by speeding up each of the iterations enough, the entire algorithm can gain enough speed. Details regarding the intricacies of the original algorithm can be found in the original paper [1]. Some important points will be emphasized in the following paragraphs. Also, advanced applications of GPGPU based parallelization are described in [5, 6]. All the steps of the algorithm are implemented using pixel shaders, Shader Model 5.0 with the newest compute features of DirectX 11. Image data is held using a structured buffer. DirectX 11 allows the creation of buffers that can be bound as output to the pixel shader or compute shader parts of the pipeline. This means that a pixel shader can write to a buffer, addressing its contents exactly like one would use an array in C++. A structured buffer contains elements of custom defined structures. In our case it contains the color data and a flag used to indicate the type of pixel (specular, diffuse, noise, etc.). In order to obtain the conflict free random access the project uses the new DirectX 11 Unordered Access View buffer. The notion of chromaticity is important and it basically represents a normalized RGB value. Maximum chromaticity is the normalized maximum color component (R, G, or B). A new space is introduced, called the maximum chromaticity intensity space, where the maximum chromaticity is plotted on the horizontal axis and the maximum intensity of the color components on the vertical axis. It is observed that in this space, specular pixels' chromaticities are lower than diffuse pixels' chromaticities. Thus, converting a specular pixel to a diffuse pixel is similar to shifting its chromaticity with respect to a certain larger diffuse chromaticity. This is called the specular-to-diffuse mechanism and it's a one-pixel operation used to both generate the specular-free image, as well as reducing the specularity of the input image. The basic idea of the method is illustrated in figure 1. Given a normalized input image, firstly a specular-free image is generated as mentioned above. Based on these two images, a diffuse verification process is run. It basically verifies if the input image has diffuse-only pixels. If it does, the process terminates. Otherwise, specular reduction is applied, decreasing the intensity of the specular pixels until they become diffuse pixels. Specularity reduction and diffuse verification are both done iteratively until there is no more specularity in the input image. Fig. 1 Processing flow. Image from original paper. 4.1 Specular-Free Image The specular-free image is generated on the GPU using a pixel shader. The operations used by the specular- to-diffuse mechanism map perfectly to HLSL functions used to manipulate pixel data. The image is rendered to a texture that is then bound to the following passes as a Shader Resource View. 4.2 Pixel Flag Initialization This pass actually represents the diffuse verification process. It runs a pixel shader that samples the normalized input image by loading pixel data from the image buffer UAV and the specular-free image render target texture. Two neighbors are sampled from each image also, the vertical neighbor below and the horizontal neighbor to the right of the current pixel. To determine the type of pixel and initialize the flag accordingly, intensity logarithmic differentiation is used. A pixel can either represent a specular highlight running in horizontal direction or vertical direction, or it can be a diffuse pixel. The flags are properly set for the current pixel by writing to the structured buffer UAV. This shader also counts the number of specular pixels. This number is used as a termination condition for the loop that runs the specular reduction iterations. This is done by using a buffer with a single element bound as an UAV to the pixel shader output. Whenever a pixel is flagged as specular, an atomic increment is performed on the single element in this buffer. Then, the UAV buffer is copied to a second buffer built as a staging resource. This means it can be memory mapped and its contents can be read by the CPU code. 4.3 Specular Reduction This pass runs two pixel shaders. Since, a given pixel can apply the specular-to-diffuse mechanism either on itself, or on one of its neighbors, and pixel shader code is run in parallel on multiple threads, we need to determine which pixels need to execute code. The first shader does this, it checks whether a pixel is “specular”, then using data from its neighbors, the shader code determines which of the pixels needs to be shifted with respect to the other (based on their maximum chromaticities). Then the pixel is flagged, either as modifying itself, or one of its neighbors. The second shader reads these flags and runs specular-to-diffuse shader code only on the pixels that are flagged for execution. If run inside a single shader, the fact that a pixel thread is not aware if it was modified by a neighbor (due to the fact that is run in parallel), means that read-after-write hazards may appear. This generates undefined behavior, but usually the reduction process still converges to a mostly correct result because the specular-to-diffuse mechanism generates spatially coherent results. However, if the algorithm is run in its entirety during a single frame, the visual result varies slightly from frame to frame. This is why two passes are necessary. The rendering loop was written such that an iteration of specular reduction is executed during a frame. This allows one to view the reduction over multiple frames, clearly seeing how the specular highlights are removed. 5 Results The GPU algorithm runs in about 250 milliseconds on a Radeon 7950 with 1792 stream processors running at 900 MHz, for the fish image in figure 2 (with a resolution of 640x480 pixels), and 50 seconds on one i5 CPU core running at 1.9 GHz, if all the iterations are executed in a single frame. The speedup is very large (200:1), however, we also want to point out that the CPU code was written just for validating GPU results and not having optimization in mind. An optimized CPU code may decrease the time by an order of magnitude, but not make real time. The results are presented in the following and include:  Dinosaur image: a comparison with the original algorithm. Strong JPEG compression artifacts are visible in our result because we started from the compressed image offered in the original pdf paper. These are not visible when images with normal compression ratios are offered at the input. Fig. 2 Comparison to author’s results. The tiles observed on our results are most likely the result of using the highly compressed, small resolution image copied from the author’s paper as input image. The following tests don’t show those artifacts. Despite these, the specular removal gives similar results.   Fish image: showing the input, the specular free image constructed during single pixel processing, that generates a recolored specular free result, and the final output generated during the iterative phase that diminishes the specular light at every iteration. Various computer generated and real images: for testing the robustness and how the algorithm fails. The last test contains numerous grey regions, which the original authors mentioned that are not suitable for this algorithm. Fig. 4 Results for a computer generated scene. The black regions are the cause of clipping in the original image. Fig. 3 Fish image: Normalized input image; Specularfree image; Final output image with no specular highlights Fig. 5 Results for a real scene. Problems can be seen in the bottom left corner where an object with small chrominance is visible. the CPU) to less than a second is important. Finally, this paper shows that there could be a large number of algorithms, that were overlooked in the past because of large running times but could benefit from today’s technology for being incorporated in existing applications. 7 Future Work The specular reduction process can be implemented as a single-pass by using a compute shader. Immediately after marking the pixels that change with specific flags, a barrier is placed for synchronization. After that, only the flagged pixels execute code. It's also recommended for the image to be split into tiles, each tile representing a thread group. The tiles need to overlap each other with 1-pixel borders to ensure no tears are visible. Even higher speeds can be achieved by utilizing multiple devices, each for a different image. [6] Fig. 6 6 Results for a real scene containing non colored objects. REFERENCES [1] R. T. Tan, K. Ikeuchi, "Separating reflection Conclusions In this paper we offer the results of porting an existing specular removal algorithm to the GPU for reducing the computation time. Our implementation of the original algorithm works on most of the images, but it seems to be influenced by high compression noise in small images. One category of images on which it fails is on the ones containing non-colored objects. The original authors mentioned this issue at the beginning of their paper, and the cause is obvious: the deduction of the specularity of each pixel is done on the basis of its color. If the object has no color, it’s assumed to be highly specular. We mentioned at the beginning that we wanted to choose a robust algorithm that does not fail when offered improper input data. Even though the grey objects become darker, this will not become an issue of more importance for most computer vision algorithms than the specular light itself. The rest of the objects remain unchanged, so the algorithm can be safely used in the preprocessing phase. Some quality loss is obvious in the output result because of minor simplifications in order to split the algorithm, but the overall gain in speed from the order of tens of seconds (on our implementation on [2] [3] [4] [5] [6] [7] components of textured surfaces using a single image", Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 2, pp. 178,193, Feb. 2005. J. H. Lambert, “Photometria Sive de Mensura de Gratibus Luminis”, Colorum et Umbrae. Augsberg, Germany: Eberhard Klett, 1760. K. E. Torrance, E. M. Sparrow, “Theory for OffSpecular Reflection from Roughened Surfaces,” J. Optics Soc. Am., vol. 57, pp. 1105-1114, 1966. S. Shafer, “Using Color to Separate Reflection Components,” Color Research and Applications, vol. 10, pp. 210-218, 1985. Qingxiong Yang, Shengnan Wang, Narendra Ahuja, “Real-time specular highlight removal using bilateral filtering”, Proceedings of the 11th European conference on Computer vision: Part IV (ECCV'10), Springer-Verlag, Berlin, Heidelberg, 87-100, 2010. Alessandro Artusi, Francesco Banterle and Dmitry Chetverikov, “A Survey of Specularity Removal Methods”, Computer Graphics forum vol. 30, number 8 pp. 2208–2230, 2011. S. N. Tica, C. A. Boiangiu, A. Tigora, "Automatic Coin Classification", International Journal of Computers, vol. 8, pp. 82-89, 2014.