[go: up one dir, main page]

CN109615591A - A 3D block matching noise reduction method based on GPU parallel acceleration - Google Patents

A 3D block matching noise reduction method based on GPU parallel acceleration Download PDF

Info

Publication number
CN109615591A
CN109615591A CN201811426126.XA CN201811426126A CN109615591A CN 109615591 A CN109615591 A CN 109615591A CN 201811426126 A CN201811426126 A CN 201811426126A CN 109615591 A CN109615591 A CN 109615591A
Authority
CN
China
Prior art keywords
dimensional
block
image
noise reduction
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811426126.XA
Other languages
Chinese (zh)
Inventor
韩玉
李磊
闫镔
荣利会
陈健
席晓琦
梁宁宁
孙艳敏
王敬雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Letter Of Fusion Innovation Research Institute
PLA Information Engineering University
Original Assignee
Dongguan Letter Of Fusion Innovation Research Institute
PLA Information Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Letter Of Fusion Innovation Research Institute, PLA Information Engineering University filed Critical Dongguan Letter Of Fusion Innovation Research Institute
Priority to CN201811426126.XA priority Critical patent/CN109615591A/en
Publication of CN109615591A publication Critical patent/CN109615591A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/10Image enhancement or restoration using non-spatial domain filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20052Discrete cosine transform [DCT]

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to technical field of image processing, specifically disclose a kind of three-dimensional Block- matching noise-reduction method accelerated parallel based on GPU, including image to be processed is carried out the pretreatment of boundary symmetric extension;Pretreated image data is sent in the global storage of GPU;Multithreaded network grid is created, the mode of access and the acceleration strategy of shared memory recycled for multiple times are merged using global storage, acceleration processing is carried out to the grouping of similar image Block- matching;The first step noise reduction estimated data of three-dimensional similar matrix is obtained using the parallel acceleration strategy of hard -threshold collaboration filter kernel function;It is reference with first step noise reduction estimated data, the parallel acceleration strategy of joint wiener collaboration filter kernel function obtains second step noise reduction estimated data;Second step noise reduction estimated data is rejected into extended boundary pixel.The present invention can improve data access speed and reduce repeated accesses delay, moreover it is possible to effectively remove the noise in image, be conducive to the processing of large-size images real-time noise-reducing.

Description

A kind of three-dimensional Block- matching noise-reduction method accelerated parallel based on GPU
Technical field
The present invention relates to technical field of image processing, specifically disclose a kind of three-dimensional Block- matching accelerated parallel based on GPU Noise-reduction method.
Background technique
Digital picture is big due to being influenced usually to contain by imaging device and external environment in acquisition and transmission process Noise is measured, picture quality is influenced.It is past in CT image due to being influenced by low dosage voltage and current especially in medical application Toward a large amount of noise can be introduced, cause deteriroation of image quality, influences the clinical disease diagnosis of doctor.
Common three-dimensional Block- matching (BM3D) algorithm combines local, non local, multiple dimensioned sparse, adaptive by its own Filtering noise reduction feature, it is considered to be best Image denoising algorithm at present, but the algorithm is based on the collaboration filter of similar image block Wave, algorithm complexity is high, computationally intensive, when handling large-sized CT image data, takes a long time, treatment effeciency is lower, can not Meet real work demand.
Therefore, it is necessary to a kind of methods that can solve the above problem.
Summary of the invention
In order to overcome shortcoming and defect existing in the prior art, the purpose of the present invention is to provide one kind to be based on GPU simultaneously The three-dimensional Block- matching noise-reduction method that row accelerates.
To achieve the above object, the present invention uses following scheme.
A kind of three-dimensional Block- matching noise-reduction method accelerated parallel based on GPU, comprising:
Image to be processed is subjected to the pretreatment of boundary symmetric extension at the end CPU;
By pretreated image data from the global storage that CPU host side is sent to GPU;
Multithreaded network grid is created, the mode and shared memory recycled for multiple times of access are merged using global storage Acceleration strategy, parallel acceleration processing is carried out to the grouping of similar image Block- matching;
Estimated using the first step noise reduction that the parallel acceleration strategy of hard -threshold collaboration filter kernel function obtains three-dimensional similar matrix It counts;
It is reference with first step noise reduction estimated data, the parallel acceleration strategy of joint wiener collaboration filter kernel function obtains the Two step noise reduction estimated datas;
Second step noise reduction estimated data is sent to CPU host side from GPU, and rejects extended boundary pixel to obtain noise reduction Image afterwards.
Further, the creation multithreaded network grid, comprising:
With the image block matching process of each reference block of image for a thread block block, each detection figure in search window As the similitude matching process of block and reference block is that per thread thread carries out thread distribution;
With the step-length of certain pixel respectively from ranks direction selection reference image block incremented by successively, and referred to according in image Image block number determines the size of thread net grid, determines thread block according to the image block number of reference image block search window Size.
Further, the mode and shared memory recycled for multiple times that access is merged using global storage Acceleration strategy carries out parallel acceleration processing to the grouping of similar image Block- matching, comprising:
All thread thread in the same warp are executed into sequential cells in same instruction access global storage, Merge access module to obtain;
By search window be divided into size be 32*32 4 piecemeals, and in each piecemeal with the thread of block (16,16) into Row Similarity measures obtain similar block,Wherein distance of the d between image block is defined as two figures As size of the mould divided by image block of block corresponding element numerical difference, τthreodFor the suitable distance threshold of selection;
By in search window in pel data Circulant Block shared memory, and be arranged threadIdx.x < 16 and threaIdx.y<16;
The most like image block of defined amount is found using the parallel strategy of minimum value reduction.
Further, the strategy parallel using minimum value reduction finds the most like image block of defined amount, packet It includes:
The similar image block of reference image block is gathered into three-dimensional matrice according to the ascending sequence of similarity distance;
It is corresponding to enable n distance value D [n] that n thread is obtained with Similarity measures respectively;
The value of ith thread is compared with the distance value of (i+n/2) a thread and smaller value is placed on left half In, the larger value is placed on right half, then left part by stages is D [0] to D [n/2], and right part by stages is D [n/2] to D [n];
After completing thread parallel relatively, Thread Count will be compared and halved, above-mentioned comparison is carried out to left half zone distance value, directly Halve to multiple alternative line number of passes and compares rear left partial section value and be reduced to D [0];
Using D [0] as the minimum value in distance value, D [n] starting access position is moved back one and repeated the above steps and is sought Minimum value is looked for, until finding the similar image block apart from the smallest defined amount.
Further, described that three-dimensional similar matrix is obtained using the parallel acceleration strategy of hard -threshold collaboration filter kernel function First step noise reduction estimated data, comprising:
Instruct hybrid optimization to accelerate, by the three-dimensional direct transform of three-dimensional similar matrix, hard -threshold filtering, three-dimensional inverse transformation and The process integration of Weighted estimation is in hard -threshold collaboration filter kernel function;Wherein the three-dimensional direct transform includes successively carrying out two Tie up Bi-orthogonal Spline Wavelet Transformation direct transform and one-dimensional Walsh-Hadanjard Transform;The three-dimensional inverse transformation is one-dimensional including successively carrying out Walsh-Hadanjard Transform and two-dimentional Bi-orthogonal Spline Wavelet Transformation inverse transformation;
According to the size of reference image block, a certain number of similar image blocks are chosen, and keep thread grid grid constant With the size of setting thread block block;
The similar image block number of selection is sent according to from global storage to altogether using the mode that global storage merges access It enjoys in memory to constitute three-dimensional similar matrix;
Two-dimentional Bi-orthogonal Spline Wavelet Transformation direct transform and one-dimensional Walsh-Hadanjard Transform are carried out to three-dimensional matrice, and become Progress hard -threshold filtering in domain is changed, passes through one-dimensional Walsh-Hadanjard Transform and two-dimentional bi-orthogonal spline after hard -threshold filtering again Wavelet inverse transformation obtains the first step noise reduction estimated data of image block,
Gray value in hard -threshold filtered image block is weighted and averaged, weighted average is assigned to the single of image block Pixel, and introduce triumphant plucked instrument window coefficient in weighted average and be weighted optimization to obtain first step image noise reduction value;
Meanwhile three-dimensional similar matrix being carried out to four filter coefficients lpd, hpd of two-dimentional Bi-orthogonal Spline Wavelet Transformation transformation, Lpr, hpr are stored in constant storage, and define private variable storage intermediate result in each thread using register.
Further, it is described with first step noise reduction estimated data be reference, joint wiener collaboration filter kernel function it is parallel Acceleration strategy obtains second step noise reduction estimated data, comprising:
It instructs hybrid optimization to accelerate, by the three-dimensional direct transform of three-dimensional similar matrix, Wiener filtering, three-dimensional inverse transformation and adds The process integration of kernel estimators is in wiener collaboration filter kernel function;Wherein the three-dimensional direct transform include successively carry out two dimension from Dissipate cosine direct transform and one-dimensional Walsh-Hadanjard Transform;The three-dimensional inverse transformation includes successively carrying out one-dimensional Walsh-hada Hadamard transform and 2-D discrete cosine inverse transformation;
According to the size of reference image block, a certain number of similar image blocks are chosen, and keep thread grid grid constant With the size of setting thread block block;
The similar image block number of selection is sent according to from global storage to altogether using the mode that global storage merges access It enjoys in memory to constitute three-dimensional similar matrix;
With first step noise reduction estimated data be reference, by initial three-dimensional similar matrix carry out 2-D discrete cosine direct transform and One-dimensional Walsh-Hadanjard Transform;Wiener filtering is carried out, and one-dimensional Wall is carried out to the three-dimensional similar matrix after Wiener filtering Assorted-Hadamard transform and 2-D discrete cosine inverse transformation obtain second step noise reduction estimated data;
Gray value in image block after Wiener filtering is weighted and averaged, weighted average is assigned to the single picture of image block Element, and introduce triumphant plucked instrument window coefficient in weighted average and be weighted optimization to obtain second step image noise reduction value;
Meanwhile Two Dimension Discrete Cosine is stored in private variable defined in per thread using register In.
Further, when boundary symmetric extension pre-processes, the border column pixel of the left and right sides is carried out respectively first symmetrical Then extension carries out symmetric extension to the border row pixel of upper and lower two sides respectively again, and the pixel width of border extension is by searching The radius size of rope window determines.
Beneficial effects of the present invention: a kind of three-dimensional Block- matching noise-reduction method accelerated parallel based on GPU is provided, merging is passed through The mode of access once reads data required for each GPU thread block in shared memory from global storage, and adopts With the strategy of shared memory is recycled, to improve data access speed and reduce repeated accesses delay, significant increase Algorithm overall performance, improves computational efficiency;Cooperate hard -threshold collaboration filter kernel function and wiener collaboration filter kernel simultaneously Function can effectively remove the noise in image, be conducive to the processing of large-size images real-time noise-reducing.
Detailed description of the invention
Fig. 1 is the flow diagram of the embodiment of the present invention.
Fig. 2 is that image of the embodiment of the present invention carries out the pretreated schematic diagram of boundary symmetric extension.
Fig. 3 is the schematic diagram of thread of embodiment of the present invention grid distribution.
Fig. 4 is the schematic diagram of shared memory of embodiment of the present invention employment mechanism.
Fig. 5 is the schematic diagram of minimum value of embodiment of the present invention reduction sorting in parallel.
Fig. 6 is schematic diagram of the present invention using the original CT image of head mould.
Fig. 7 is the schematic diagram of the CT image of present invention denoising back mould.
Fig. 8 is schematic diagram of the present invention using the original CT image of body mould.
Fig. 9 is the schematic diagram of the CT image of body mould after the present invention denoises.
Specific embodiment
For the ease of the understanding of those skilled in the art, the present invention is made further below with reference to examples and drawings Bright, the content that embodiment refers to not is limitation of the invention.
A kind of three-dimensional Block- matching noise-reduction method accelerated parallel based on GPU, as shown in Figure 1, comprising:
CT image boundarg pixel, which crosses the border, in order to prevent not can be carried out noise reduction process, and it is symmetrical that image to be processed is carried out boundary Extension pretreatment.It needs when as shown in Fig. 2, carrying out symmetric extension pretreatment first to noise variance, similar block number, search window radius Etc. parameters be configured, and the pixel wide of border extension is determined by search window radius size, and general value is 16 pixels.
After completing the pretreatment of image boundary symmetric extension, pretreated image data will be extended and be sent to from host side In the global storage of GPU;
Thread distribution need to be carried out when creating multithreaded network grid, as shown in figure 3, considering the similar Block- matching of different reference blocks The irrelevant property of process, it is every in search window with the image block matching process of each reference block of image for a thread block block The similitude matching process of a detection image block and reference block is per thread thread.Then in the picture along the line of the column direction with The step-length of every 3 pixels selection reference image block incremented by successively, when reference image block number is (M*N) in image, then thread The size of grid grid is (M, N), and the 32*32 image block detected in reference block search window carries out similitude matching, thread block The constant magnitude of block is set as (32,32).
Then, the mode of access and the acceleration strategy of shared memory recycled for multiple times are merged using global storage, Parallel acceleration processing is carried out to the grouping of similar image Block- matching.As shown in figure 4, making all thread thread in the same warp When being carried out continuous unit in same instruction access global storage, best access module is obtained.One of them Warp beam has 32 threads, therefore search window is divided into 4 piecemeals that size is 32*32, it is made to meet global storage merging The requirement of Access Optimization reaches global storage bandwidth communication peak value as far as possible.And in each piecemeal with block (16, 16) the thread method of salary distribution carries out parallel Similarity measures and obtains similar block.
When carrying out Similarity measures, with reference image block PkCentered on, radius is sliding detection pixel-by-pixel in the search window of R Window (size N*N) calculates the image block that judgement detection window is chosenWith reference image block PkSimilitude, and by similar figure As block is assembled to obtain reference image block PkThree-dimensional similar image block group.Similitude judgment formula are as follows:
In formula, distance of the d between image block is defined as the mould of two image block corresponding element numerical differences divided by image block Size, τthreodFor the suitable distance threshold of selection, meet d < τthreodWhen, that is, think that two image blocks are similar, conversely, then Not so.
After completing the operation of similitude PARALLEL MATCHING, block data is copied to shared memory.Since global storage is every It is secondary to access the delay for having up to 400~600 clock cycle, and there is a large amount of again for pel data in image block matching process Repeated accesses, therefore reduced as far as possible during copying the pel data in search window the shared memory of cache to To the access times of global storage.And limited by GPU hardware condition, the maximum thread of thread block block is 1024, will When the pel data of 32*32 detection block disposably copies shared memory to from global storage in search window, pixel is total Number has exceeded thread maximum number, in order to effectively solve this problem, copies using by pel data Circulant Block in search window Strategy into shared memory is accelerated parallel.Simultaneously in order to avoid being overlapped as piecemeal caused by image block repeated matching ThreadIdx.x < 16 and threaIdx.y < 16 are arranged after block data is copied to shared memory in situation.
The most like image block that defined amount is found using the parallel strategy of minimum value reduction, as shown in figure 5, according to reality The most like image block that situation chooses certain amount participates in subsequent processing, i.e., distance need to be selected in similar image block most Small similar image block.It sorts from small to large according to similarity distance referring initially to the similar image block of image block and is gathered into three-dimensional square It is corresponding to enable n distance value D [n] that n thread is obtained with Similarity measures respectively for battle array;By the value of ith thread and (i+ N/2) distance value of a thread is compared and smaller value is placed in left half, and the larger value is placed on right half, then left part subregion Between be D [0] to D [n/2], right part by stages be D [n/2] to D [n];At this time n/2 thread parallel relatively after, minimum value is certain It is D [0] to D [n/2] in left part by stages.After completing n/2 thread parallel relatively, compares Thread Count and halve into n/4, to left half Section equally carries out above-mentioned comparison, then the left part by stages where minimum value is also reduced into D [0] to D [n/4];Repeat above-mentioned step Suddenly, until left part by stages is reduced into D [0], then D [0] is the minimum value in all distance values at this time.Each reduction sorting in parallel After finding minimum value, one will be moved back apart from the starting of array D access position, array length becomes n-1, then in circulating repetition The most like image block of defined amount can be found by stating reduction operations process finally, until finding apart from the smallest defined amount Similar image block.Compare Thread Count with this each reduction and be all reduced into original half, greatly reduces the time of sequence.
Either Floating-point Computation instruction, load instruction or branch instruction occupy instruction processing bandwidth, the finger of each SM Processing bandwidth is enabled all to be limited.Therefore accelerated using instruction hybrid optimization, by the three-dimensional direct transform (two dimension of three-dimensional similar matrix Bi-orthogonal Spline Wavelet Transformation direct transform+one-dimensional Walsh-Hadanjard Transform), hard -threshold filtering, three-dimensional inverse transformation (one-dimensional Walsh- Hadamard transform+two dimension Bi-orthogonal Spline Wavelet Transformation inverse transformation) and the process integration of Weighted estimation unified cooperateed with to a hard -threshold In filter kernel function, to reduce reverse cyclic loadings, the copy instruction of unnecessary intermediate variable, unnecessary time-consuming is saved.
Thread distribution is carried out, the size of reference image block is 8*8, and hard -threshold collaboration filtering selects 16 similar image block ginsengs With processing, therefore the size of thread block block is specifically configured to (64,16), thread grid grid is remained unchanged, and is equally used The mode for merging access copies the most like image block data of selection in shared memory to constitute three from global storage to Tie up similar matrix
Three-dimensional matriceProgress two-dimentional Bi-orthogonal Spline Wavelet Transformation (BIOR) one-dimensional Walsh-Hadanjard Transform between direct transform and block, And hard -threshold filtering is carried out in the transform domain as illustrated, pass through one-dimensional Walsh-Hadanjard Transform and two-dimentional Bi-orthogonal Spline Wavelet Transformation after filtering again (BIOR) inverse transformation obtains the first step noise reduction estimated value of all image blocks in groupProcess is WhereinFor the one-dimensional Walsh-Hadanjard Transform of radial direction between the two-dimentional BIOR transformation of image block each in group and block; The two-dimentional BIOR of radial one-dimensional Walsh-Hadanjard Transform and each image block is inverse between the block of three-dimensional matrice after hard -threshold filtering Transformation;γ is hard -threshold filtering processing;
After three-dimensional direct transform, noise is often focused at coefficient in transform domain smaller value, and true detail information concentrates on becoming It changes at domain coefficient the larger value, therefore the coefficient in transform domain for being less than threshold parameter is set 0 by hard -threshold filtering, other coefficients retain not Become, process is shown below:
In formula, x is three-dimensional matriceTransformation coefficient after three-dimensional direct transform,For the shrinkage parameters of hard -threshold filtering, σ For the noise bias of estimation.
After hard -threshold collaboration filtering, each pixel of each image block obtains an estimated value, but for a certain A pixel i, is likely to appear in multiple images block, thus possess multiple estimated values, the picture for needing to have these block of overlapping First estimated value is weighted and averaged to obtain the basic noise reduction estimated value of pixel i, basic weightIt is filtered by hard -threshold Three-dimensional matrice afterwardsIn non-zero transform domain coefficients numberIt determines,
In weighted mean procedure, in order to be further reduced edge effect, triumphant plucked instrument window coefficient W is addedkaiserIt is weighted poly- Collection, then basic noise reduction estimates the value of any pixel i in image are as follows:
In formula,
Simultaneously as small echo becomes when the two-dimentional Bi-orthogonal Spline Wavelet Transformation for carrying out image block to three-dimensional similar matrix converts Four the filter coefficients lpd, hpd, lpr, hpr changed are constants, and need frequently accessed repeatedly by per thread, therefore by its It is stored in and possesses in the constant storage that caching accelerates, register is made full use of to accelerate access speed, save operation time.Deposit Device and shared memory are located at GPU chip interior, are two most fast memories of access speed respectively.Two-dimentional bi-orthogonal spline is small Wave conversion process is related to interative computation, needs defined variable storage intermediate conversion as a result, register is made full use of simultaneously, each Private variable storage intermediate result is defined in thread, to improve interative computation efficiency.
It is reference with first step noise reduction estimation image, the estimation of Wiener filtering noise reduction is carried out to original noisy image.It is same first Sample is accelerated using instruction hybrid optimization, by three-dimensional direct transform (the 2-D discrete cosine direct transform+one-dimensional Wall of three-dimensional similar matrix Assorted-Hadamard transform), Wiener filtering, three-dimensional inverse transformation (one-dimensional Walsh-Hadanjard Transform+2-D discrete cosine inverse transformation) And the process integration of Weighted estimation is unified into wiener collaboration filter kernel function.
When carrying out thread distribution optimization, Wiener filtering selects 32 similar image block participations processing, therefore by thread block block Size be specifically configured to (64,32), thread grid grid is remained unchanged, and equally merge access copy shared memory to, Constitute new three-dimensional similar matrixThe similar block matrix of three-dimensional of original image is obtained from original noisy image simultaneouslyIt is right Two three-dimensional matricesWith2-D discrete cosine direct transform (DCT) and one-dimensional Walsh-Hadanjard Transform are carried out respectively, Then experience wiener coefficient is calculated with the three-dimensional direct transform matrix of basis estimation image
Followed by experience wiener coefficient to the three-dimensional matrice of original noisy imageWiener filtering processing is carried out, has been handled Cheng Houzai is estimated by the noise reduction that all image blocks in group can be obtained in inverse transformation
Principle using the similar first step is estimated to be weighted and averaged to superposition image member, the difference is that wherein basic weightBy The decision of experience wiener coefficient,Then second step noise reduction estimates the value of any pixel i in image are as follows:
Simultaneously as transformation coefficient is related to the great trigonometric function of expense in the two-dimension discrete cosine transform of image block It calculates, therefore is precalculated out, and register is made full use of to store it in the private variable that per thread defines In.Second step noise reduction estimated data is then sent to CPU host side from GPU, and rejects extended boundary pixel to obtain noise reduction Image afterwards.
A kind of three-dimensional Block- matching noise-reduction method accelerated parallel based on GPU provided by the invention, it is contemplated that the figure of cross-thread As there is repetitions of a large amount of data access, and each of global storage is accessed when having up to 400~600 in block detection The delay in clock period once reads shared memory from global storage using by data required for each GPU thread block, And the strategy of shared memory is reused, to save repeated accesses delay, promote computational efficiency.And data are stored from the overall situation Device is all made of the mode for merging access when reading shared memory, can improve the access speed of data, reach complete as far as possible The peak value of office's bandwidth of memory, can will carry out GPU acceleration based on three-dimensional block matching algorithm, significant increase algorithm overall performance, The processing time is saved, calculating speed improves nearly 80 times than conventional serial algorithm;Hard -threshold is cooperated to cooperate with filter kernel simultaneously Function and wiener cooperate with filter kernel function, have not only effectively removed the noise in image, also help and large scale CT is schemed The real-time noise-reducing of picture is handled.
More specifically, it includes noise original CT image that Fig. 6, which is the head mould that the present invention uses, scanned position is the oral cavity of human body Tooth, Fig. 7 are using the head mould CT image after the method for the present invention denoising;Fig. 8 is that the present invention is to include much noise using body mould Original CT image, Fig. 9 be using the method for the present invention denoising after body mould CT image.Pass through 6,7 and Fig. 8 of comparison diagram, 9 experiment As a result, from can be seen that the method for the present invention has effectively removed the noise in original image, image entirety clarity on improvement of visual effect Preferably, edge detail information is also kept good.From processing speed, original serial algorithm process breadth is 2048*2048 size CT image need 1.5 hours, and the method for the present invention processing identical image time-consuming then only need 69 seconds, speed improve by Nearly 80 times.
The above is only a preferred embodiment of the present invention, for those of ordinary skill in the art, according to the present invention Thought, there will be changes in the specific implementation manner and application range, and the content of the present specification should not be construed as to the present invention Limitation.

Claims (7)

1.一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,包括:1. a three-dimensional block matching noise reduction method based on GPU parallel acceleration, is characterized in that, comprises: 在CPU端将待处理的图像进行边界对称扩展预处理;Perform boundary symmetry expansion preprocessing on the image to be processed on the CPU side; 将预处理后的图像数据从CPU主机端发送至GPU的全局存储器中;Send the preprocessed image data from the CPU host to the global memory of the GPU; 创建线程网络grid,采用全局存储器合并访问的模式及共享存储器多次循环利用的加速策略,对相似图像块匹配分组进行并行加速处理;Create a thread network grid, adopt the mode of global memory combined access and the acceleration strategy of multiple recycling of shared memory, and perform parallel acceleration processing on matching groups of similar image blocks; 采用硬阈值协同滤波内核函数并行加速策略获取三维相似矩阵的第一步降噪估计数据;The first step noise reduction estimation data of the three-dimensional similarity matrix is obtained by adopting the parallel acceleration strategy of the hard threshold collaborative filtering kernel function; 以第一步降噪估计数据为参考,联合维纳协同滤波内核函数并行加速策略获取第二步降噪估计数据;Taking the first step noise reduction estimation data as a reference, the second step noise reduction estimation data is obtained in conjunction with the Wiener collaborative filtering kernel function parallel acceleration strategy; 将第二步降噪估计数据从GPU发送至CPU主机端,并剔除扩展边界像素以获取降噪后的图像。The second-step denoising estimation data is sent from the GPU to the CPU host, and the extended boundary pixels are culled to obtain a denoised image. 2.根据权利要求1所述的一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,所述创建线程网络grid,包括:2. a kind of three-dimensional block matching noise reduction method based on GPU parallel acceleration according to claim 1, is characterized in that, described creation thread network grid, comprises: 以图像每个参考块的图像块匹配过程为一个线程块block,搜索窗内每个检测图像块与参考块的相似性匹配过程为每个线程thread进行线程分配;Taking the image block matching process of each reference block of the image as a thread block block, the similarity matching process between each detected image block and the reference block in the search window is used for thread allocation for each thread thread; 以一定像素的步长分别从行列方向依次递增选取参考图像块,并根据图像中参考图像块个数确定线程网grid的大小,根据参考图像块搜索窗的图像块个数确定线程block的大小。The reference image blocks are selected sequentially from the row and column directions with a step size of a certain pixel, and the size of the thread grid is determined according to the number of reference image blocks in the image, and the size of the thread block is determined according to the number of image blocks in the reference image block search window. 3.据权利要求1所述的一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,所述采用全局存储器合并访问的模式及共享存储器多次循环利用的加速策略,对相似图像块匹配分组进行并行加速处理,包括:3. a kind of three-dimensional block matching noise reduction method based on GPU parallel acceleration according to claim 1, is characterized in that, described adopting the mode of global memory merged access and the acceleration strategy of shared memory repeatedly recycling, to similar images Block matching grouping for parallel accelerated processing, including: 将同一个warp中的所有线程thread执行同一条指令访问全局存储器中连续单元,以获得合并访问模式;All threads in the same warp execute the same instruction to access contiguous units in global memory to obtain a merged access mode; 将搜索窗分成大小为32*32的4个分块,并在每个分块中以block(16,16)的线程进行相似性计算获取相似块,其中d为图像块间的距离,定义为两个图像块对应元素数值差的模除以图像块的大小,τthreod为选取的适合的距离阈值;Divide the search window into 4 blocks with a size of 32*32, and perform similarity calculation with the thread of block(16,16) in each block to obtain similar blocks, Where d is the distance between image blocks, defined as the modulus of the difference between the corresponding elements of the two image blocks divided by the size of the image block, and τ threod is the selected suitable distance threshold; 将搜索窗中像元数据分块循环共享存储器中,并设置threadIdx.x<16且threaIdx.y<16;Divide the pixel data in the search window into blocks and circulate the shared memory, and set threadIdx.x<16 and threadIdx.y<16; 采用最小值归约并行的策略寻找规定数目的最相似图像块。A minimum-reduction-parallel strategy is used to find a specified number of the most similar image patches. 4.据权利要求3所述的一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,所述采用最小值归约并行的策略寻找规定数目的最相似图像块,包括:4. a kind of three-dimensional block matching noise reduction method based on GPU parallel acceleration according to claim 3, is characterized in that, described adopting the strategy of minimum reduction parallel to find the most similar image block of specified number, comprising: 将参考图像块的相似图像块按照相似距离由小到大排序聚集成三维矩阵;The similar image blocks of the reference image block are sorted into a three-dimensional matrix according to the similarity distance from small to large; 启用n个线程分别与相似性计算得到的n个距离值D[n]对应;Enable n threads corresponding to the n distance values D[n] obtained by similarity calculation; 将第i个线程的值与第(i+n/2)个线程的距离值进行比较并将较小值放在左部分中,较大值放在右部分,则左部分区间为D[0]至D[n/2],右部分区间为D[n/2]至D[n];Compare the value of the i-th thread with the distance value of the (i+n/2)-th thread and put the smaller value in the left part and the larger value in the right part, then the left part interval is D[0 ] to D[n/2], the right part of the interval is D[n/2] to D[n]; 完成线程并行比较后,将比较线程数减半,对左部分区间距离值进行上述比较,直至多次比较线程数减半比较后左部分区间值缩至D[0];After the parallel comparison of threads is completed, the number of comparison threads is halved, and the above comparison is performed on the interval distance value of the left part, until the number of comparison threads is halved for multiple comparisons, and the interval value of the left part is reduced to D[0]; 以D[0]作为距离值中的最小值,将D[n]起始访问位置后移一位并重复上述步骤寻找最小值,直至找到距离最小的规定数目的相似图像块。Taking D[0] as the minimum value among the distance values, move the initial access position of D[n] one bit back and repeat the above steps to find the minimum value until a specified number of similar image blocks with the minimum distance are found. 5.根据权利要求1所述的一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,所述采用硬阈值协同滤波内核函数并行加速策略获取三维相似矩阵的第一步降噪估计数据,包括:5. a kind of three-dimensional block matching noise reduction method based on GPU parallel acceleration according to claim 1, is characterized in that, described adopting hard threshold collaborative filtering kernel function parallel acceleration strategy to obtain the first step noise reduction estimation of three-dimensional similarity matrix data, including: 指令混合优化加速,将三维相似矩阵的三维正变换、硬阈值滤波、三维逆变换以及加权估计的过程集成于硬阈值协同滤波内核函数中;其中所述三维正变换包括依次进行二维双正交样条小波正变换和一维沃尔什-哈达玛变换;所述三维逆变换包括依次进行一维沃尔什-哈达玛变换和二维双正交样条小波逆变换;The instruction mixing optimization is accelerated, and the process of 3D forward transformation, hard threshold filtering, 3D inverse transformation and weighted estimation of 3D similarity matrix is integrated into the hard threshold collaborative filtering kernel function; wherein the 3D forward transformation includes sequentially performing two-dimensional biorthogonal Spline wavelet forward transform and one-dimensional Walsh-Hadamard transform; the three-dimensional inverse transform includes sequentially performing one-dimensional Walsh-Hadamard transform and two-dimensional biorthogonal spline inverse wavelet transform; 根据参考图像块的大小,选取一定数量的相似图像块,并保持线程网格grid不变和设置线程块block的大小;According to the size of the reference image block, select a certain number of similar image blocks, keep the thread grid grid unchanged, and set the size of the thread block block; 采用全局存储器合并访问的模式将选取的相似图像块数据从全局存储器送至共享存储器中以构成三维相似矩阵;The selected similar image block data is sent from the global memory to the shared memory to form a three-dimensional similarity matrix by adopting the global memory combined access mode; 对三维矩阵进行二维双正交样条小波正变换和一维沃尔什-哈达玛变换,并在变换域中进行硬阈值滤波,硬阈值滤波后再通过一维沃尔什-哈达玛变换和二维双正交样条小波逆变换得到图像块的第一步降噪估计数据,Perform two-dimensional biorthogonal spline wavelet forward transform and one-dimensional Walsh-Hadamard transform on the three-dimensional matrix, and perform hard threshold filtering in the transform domain, and then pass the one-dimensional Walsh-Hadamard transform after hard threshold filtering. and the two-dimensional biorthogonal spline wavelet inverse transform to obtain the first step noise reduction estimation data of the image block, 对硬阈值滤波后图像块中灰度值进行加权平均,将加权平均值赋给图像块的单个像素,并在加权平均时引入凯瑟窗系数进行加权优化以获取第一步图像降噪值;Perform a weighted average of the grayscale values in the image block after hard threshold filtering, assign the weighted average value to a single pixel of the image block, and introduce a Kaiser window coefficient during the weighted average for weighted optimization to obtain the first step image noise reduction value; 同时,将三维相似矩阵进行二维双正交样条小波变换的四个滤波器系数lpd,hpd,lpr,hpr存储在常数存储器中,并利用寄存器在每个线程中定义私有变量存储中间结果。At the same time, the four filter coefficients lpd, hpd, lpr, hpr of the 2D biorthogonal spline wavelet transform of the 3D similarity matrix are stored in the constant memory, and the register is used to define private variables in each thread to store the intermediate results. 6.根据权利要求1所述的一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,所述以第一步降噪估计数据为参考,联合维纳协同滤波内核函数并行加速策略获取第二步降噪估计数据,包括:6. a kind of three-dimensional block matching noise reduction method based on GPU parallel acceleration according to claim 1, is characterized in that, described taking the first step noise reduction estimation data as reference, joint Wiener collaborative filtering kernel function parallel acceleration strategy Obtain second-step noise reduction estimation data, including: 指令混合优化加速,将三维相似矩阵的三维正变换、维纳滤波、三维逆变换以及加权估计的过程集成于维纳协同滤波内核函数中;其中所述三维正变换包括依次进行二维离散余弦正变换和一维沃尔什-哈达玛变换;所述三维逆变换包括依次进行一维沃尔什-哈达玛变换和二维离散余弦逆变换;The instruction mixing optimization is accelerated, and the process of three-dimensional forward transformation, Wiener filtering, three-dimensional inverse transformation and weighted estimation of three-dimensional similarity matrix is integrated into the Wiener collaborative filtering kernel function; wherein the three-dimensional positive transformation includes sequentially performing two-dimensional discrete cosine sine transformation and one-dimensional Walsh-Hadamard transform; the three-dimensional inverse transform includes sequentially performing one-dimensional Walsh-Hadamard transform and two-dimensional inverse discrete cosine transform; 根据参考图像块的大小,选取一定数量的相似图像块,并保持线程网格grid不变和设置线程块block的大小;According to the size of the reference image block, select a certain number of similar image blocks, keep the thread grid grid unchanged, and set the size of the thread block block; 采用全局存储器合并访问的模式将选取的相似图像块数据从全局存储器送至共享存储器中以构成三维相似矩阵;The selected similar image block data is sent from the global memory to the shared memory to form a three-dimensional similarity matrix by adopting the global memory combined access mode; 以第一步降噪估计数据为参考,将原始三维相似矩阵进行二维离散余弦正变换和一维沃尔什-哈达玛变换;进行维纳滤波,并对维纳滤波后的三维相似矩阵进行一维沃尔什-哈达玛变换和二维离散余弦逆变换获取第二步降噪估计数据;Taking the noise reduction estimation data of the first step as a reference, the original three-dimensional similarity matrix is subjected to two-dimensional discrete cosine transformation and one-dimensional Walsh-Hadamard transform; Wiener filtering is performed, and the three-dimensional similarity matrix after Wiener filtering is performed One-dimensional Walsh-Hadamard transform and two-dimensional inverse discrete cosine transform to obtain the second-step noise reduction estimation data; 对维纳滤波后图像块中灰度值进行加权平均,将加权平均值赋给图像块的单个像素,并在加权平均时引入凯瑟窗系数进行加权优化以获取第二步图像降噪值;Perform a weighted average of the grayscale values in the image block after Wiener filtering, assign the weighted average value to a single pixel of the image block, and introduce the Kaiser window coefficient during the weighted average for weighted optimization to obtain the second-step image noise reduction value; 同时,利用寄存器将二维离散余弦变换系数存储在每个线程中定义的私有变量中。At the same time, the two-dimensional discrete cosine transform coefficients are stored in private variables defined in each thread using registers. 7.根据权利要求1所述的一种基于GPU并行加速的三维块匹配降噪方法,其特征在于,边界对称扩展预处理时,首先分别对左右两侧的边界列像元进行对称扩展,然后再分别对上下两侧的边界行像元进行对称扩展,并且边界扩展的像元宽度由搜索窗的半径大小确定。7. a kind of three-dimensional block matching noise reduction method based on GPU parallel acceleration according to claim 1, is characterized in that, during boundary symmetrical expansion preprocessing, at first the boundary column pixels on the left and right sides are respectively symmetrically expanded, then Then, symmetrically expand the pixels on the upper and lower sides of the boundary row, and the width of the pixels in the boundary expansion is determined by the radius of the search window.
CN201811426126.XA 2018-11-27 2018-11-27 A 3D block matching noise reduction method based on GPU parallel acceleration Pending CN109615591A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811426126.XA CN109615591A (en) 2018-11-27 2018-11-27 A 3D block matching noise reduction method based on GPU parallel acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811426126.XA CN109615591A (en) 2018-11-27 2018-11-27 A 3D block matching noise reduction method based on GPU parallel acceleration

Publications (1)

Publication Number Publication Date
CN109615591A true CN109615591A (en) 2019-04-12

Family

ID=66005289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811426126.XA Pending CN109615591A (en) 2018-11-27 2018-11-27 A 3D block matching noise reduction method based on GPU parallel acceleration

Country Status (1)

Country Link
CN (1) CN109615591A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179201A (en) * 2019-12-31 2020-05-19 广州市百果园信息技术有限公司 Video denoising method and electronic equipment
CN117074443A (en) * 2023-10-17 2023-11-17 广东天信电力工程检测有限公司 X-ray nondestructive testing robot for power transmission line

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110115806A1 (en) * 2009-11-19 2011-05-19 Rogers Douglas H High-compression texture mapping
CN102547289A (en) * 2012-01-17 2012-07-04 西安电子科技大学 Fast motion estimation method realized based on GPU (Graphics Processing Unit) parallel
CN107369169A (en) * 2017-06-08 2017-11-21 温州大学 A GPU Accelerated Approximate Most Similar Image Patch Matching Method Based on Orientation Alignment and Matching Pass

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110115806A1 (en) * 2009-11-19 2011-05-19 Rogers Douglas H High-compression texture mapping
CN102547289A (en) * 2012-01-17 2012-07-04 西安电子科技大学 Fast motion estimation method realized based on GPU (Graphics Processing Unit) parallel
CN107369169A (en) * 2017-06-08 2017-11-21 温州大学 A GPU Accelerated Approximate Most Similar Image Patch Matching Method Based on Orientation Alignment and Matching Pass

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁龙杰: "基于GPU的三维块匹配去噪并行算法研究", 《NSTL国家科技图书文献中心》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179201A (en) * 2019-12-31 2020-05-19 广州市百果园信息技术有限公司 Video denoising method and electronic equipment
CN111179201B (en) * 2019-12-31 2023-04-11 广州市百果园信息技术有限公司 Video denoising method and electronic equipment
CN117074443A (en) * 2023-10-17 2023-11-17 广东天信电力工程检测有限公司 X-ray nondestructive testing robot for power transmission line

Similar Documents

Publication Publication Date Title
Gómez-Ríos et al. Towards highly accurate coral texture images classification using deep convolutional neural networks and data augmentation
Darbon et al. Fast nonlocal filtering applied to electron cryomicroscopy
CN107292851B (en) A BM3D Image Noise Reduction Method Based on Pseudo-3D Transformation
CN108399611B (en) Multi-focus image fusion method based on gradient regularization
Liu et al. True wide convolutional neural network for image denoising
CN114463176B (en) Image super-resolution reconstruction method based on improved ESRGAN
CN110866870B (en) A super-resolution processing method for medical images with arbitrary multiple magnification
Li et al. High throughput hardware architecture for accurate semi-global matching
Feng et al. FasTFit: A fast T-spline fitting algorithm
CN111105452A (en) High-low resolution fusion stereo matching method based on binocular vision
CN109615591A (en) A 3D block matching noise reduction method based on GPU parallel acceleration
Rosin Training cellular automata for image processing
Dudhane et al. Dynamic pre-training: Towards efficient and scalable all-in-one image restoration
CN112102217B (en) Method and system for quickly fusing visible light image and infrared image
KR102064581B1 (en) Apparatus and Method for Interpolating Image Autoregressive
CN114742873B (en) A three-dimensional reconstruction method, device and medium based on adaptive network
Peng et al. Low-light image enhancement based on FPGA
Prasad et al. Image Denoising using CNN in Deep Learning
CN1212586C (en) Motion Estimation Method for Medical Sequence Images Based on Generalized Fuzzy Gradient Vector Flow Field
Meligy Modified fast gray level grouping approach for enhancing image contrast
CN117576180B (en) Multi-view depth estimation method and application based on adaptive multi-scale window
Wang et al. Accelerating block-matching and 3d filtering-based image denoising algorithm on fpgas
Chen et al. Infrared and visible image fusion with deep wavelet-dense network
Zhang et al. Non-blind image deconvolution using deep dual-pathway rectifier neural network
Wang et al. Augmenting C. elegans microscopic dataset for accelerated pattern recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190412