CN109307865A

CN109307865A - A CUDA-based millimeter-wave radar RMA imaging method

Info

Publication number: CN109307865A
Application number: CN201811127360.2A
Authority: CN
Inventors: 邓彬; 杨啸宇; 高敬坤; 王宏强; 秦玉亮
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2019-02-05

Abstract

The invention belongs to the technical field of radar imaging, in particular to a CUDA-based millimeter-wave radar RMA imaging method. The method comprises the steps of: (S2) respectively performing two-dimensional azimuth inverse fast Fourier transform on the horizontal and height dimensions of the three-dimensional scattering data s of the target, and performing parallel translation processing to obtain data s ₁ ; (S3) transforming the data s ₁ in space The non-numerical positions outside the support area of the spectral domain range are set to zero to obtain the data s ₂ ; (S4) the data s ₂ is interpolated in the distance to the wavenumber domain to obtain the data s ₃ ; (S5) the data s ₃ is subjected to three-dimensional inverse fast Fourier Leaf transformation, taking the modulo value of the transformation result, and then taking the maximum scattering value for each dimension of the three-dimensional data for imaging. The invention realizes the innovation of the algorithm acceleration of the array scanning system, solves the disadvantage of slow processing speed under the new system, transfers the dependence of the existing algorithm on the CPU, and realizes the repeated calling and transplantation of the algorithm function.

Description

A CUDA-based millimeter-wave radar RMA imaging method

技术领域technical field

本发明属于雷达成像技术领域，具体涉及一种基于CUDA(统一计算设备架构，缩写：CUDA，是显卡厂商NVIDIA推出的运算平台)的毫米波雷达RMA(距离徙动算法，缩写：RMA)成像方法。The invention belongs to the technical field of radar imaging, and in particular relates to a millimeter-wave radar RMA (Range Migration Algorithm, abbreviation: RMA) imaging method based on CUDA (Unified Computing Device Architecture, abbreviation: CUDA, which is a computing platform launched by graphics card manufacturer NVIDIA). .

背景技术Background technique

目前在世界范围研发的GPU(图形处理器)雷达成像主要集中在SAR/ISAR，GPU高性能计算是一种近年迅速发展的并行计算技术，因其能为大数据量处理提供快速的并行计算框架，已经成为具有非常高的数据并行计算密度；支持多线程；具有多个计算核心的通用数学协处理器,它不仅具有非常大的计算强度,而且GPU内存吞吐量相对于CPU更大。另外，国内外相关的GPU雷达成像研究包括，马德里技术大学利用NVIDIA CUDA技术的GPU实时图像处理算法，用于THz波段3-D穿衣探测雷达，对应于50x90cm2的扫描区域，其允许对由6000个像素组成的图像以大于8fps的速率进行图像刷新。考虑了传统的时域逆投影和基尔霍夫偏移算法对隐藏目标成像的性能加速。印度班加罗尔电子和雷达发展机构实现了传统的时域逆投影和基尔霍夫偏移算法对隐匿目标成像的性能加速，由于在这些算法中图像强度是逐像素计算的，因此，GPU可以加速其他雷达成像算法。另外，国防科技大学熊超在数据量较大时将频谱分析和谱峰搜索移植到GPU内进行运算，有效缩短了系统的整体响应时间。At present, the GPU (graphics processing unit) radar imaging developed in the world mainly focuses on SAR/ISAR. GPU high-performance computing is a parallel computing technology that has developed rapidly in recent years, because it can provide a fast parallel computing framework for large data processing. , has become a very high data-parallel computing density; supports multi-threading; a general-purpose mathematical coprocessor with multiple computing cores, it not only has a very large computing intensity, but also GPU memory throughput is larger than CPU. In addition, related GPU radar imaging research at home and abroad includes the GPU real-time image processing algorithm using NVIDIA CUDA technology at the Technical University of Madrid for 3-D clothing detection radar in the THz band, corresponding to a scanning area of 50x90cm2, which allows the detection of 6000 The image composed of pixels is refreshed at a rate greater than 8fps. The performance acceleration of traditional time-domain backprojection and Kirchhoff migration algorithms for hidden target imaging is considered. The Bangalore Electronics and Radar Development Agency in Bangalore, India has realized the performance acceleration of traditional time-domain back-projection and Kirchhoff migration algorithms for imaging hidden targets. Since the image intensity is calculated pixel by pixel in these algorithms, the GPU can accelerate other Radar Imaging Algorithms. In addition, when the amount of data is large, Xiong Chao from the National University of Defense Technology transplanted the spectrum analysis and spectrum peak search to the GPU for calculation, which effectively shortened the overall response time of the system.

现有的与雷达GPU成像相关专利技术主要集中在各个专用领域的子系统实现，专利申请CN201711246749.4公开了一种基于GPU的快速频域后向投影三维成像方法，运用三维频域后向投影算法，利用图形处理器GPU采用一种单程序多数据的指令模式来对其丰富的硬件资源进行调度的能力，再通过三维傅里叶逆变换得到三维SAR图像，从而大幅度提升了后向投影算法的运行效率。专利申请CN201710485297.9公开了一种基于GPU实现的合成孔径雷达图像目标识别方法，专利申请CN107064930A公开了基于GPU的雷达前视成像方法。从CPU端获取观测向量和测量矩阵，在GPU端，采用CUDA架构并行实现从观测向量中恢复场景目标后向散射系数向量，完成雷达前视成像。专利申请CN201711019081.X公开了基于嵌入式GPU的单像素激光雷达成像装置及成像方法，主要描述基于嵌入式GPU的单像素激光雷达成像装置，信息传入嵌入式GPU底座中，根据输入的信息进行均匀光滑曲面重建，并将重建好的三维光滑曲面模型进行动态成像显示。Existing patent technologies related to radar GPU imaging mainly focus on the realization of subsystems in various special fields. Patent application CN201711246749.4 discloses a GPU-based fast frequency domain backward projection 3D imaging method, which uses 3D frequency domain backward projection. The algorithm uses the graphics processor GPU to use a single-program multi-data instruction mode to schedule its rich hardware resources, and then obtains the three-dimensional SAR image through the three-dimensional inverse Fourier transform, thus greatly improving the backward projection. The efficiency of the algorithm. Patent application CN201710485297.9 discloses a GPU-based synthetic aperture radar image target recognition method, and patent application CN107064930A discloses a GPU-based radar forward-looking imaging method. The observation vector and measurement matrix are obtained from the CPU side. On the GPU side, the CUDA architecture is used to restore the backscattering coefficient vector of the scene target from the observation vector in parallel to complete the radar forward-looking imaging. Patent application CN201711019081.X discloses a single-pixel lidar imaging device and imaging method based on an embedded GPU, mainly describing a single-pixel lidar imaging device based on an embedded GPU. The uniform smooth surface is reconstructed, and the reconstructed three-dimensional smooth surface model is displayed dynamically.

现有的雷达GPU成像算法移植性不强，程序模块只能适用于特定算法，且对算法在GPU上的覆盖性不足，只在特定步骤进行加速，比如国防科学技术大学熊超只进行了FFT算法、矩阵转置和谱峰搜索算法的CUDA并行化，并没有将信号处理部分完全移植到GPU内实现。现有针对高频段雷达近场阵列成像，波长的减小会导致阵元间距变小使其数目急剧增加，导致数据量巨大。回波的大数据量对数据采集系统和存储设备的要求更高，采集信号时间也会更长，原始成像方法效率低。The existing radar GPU imaging algorithm is not very portable, and the program module can only be applied to specific algorithms, and the coverage of the algorithm on the GPU is insufficient, so it is only accelerated in specific steps. For example, Xiong Chao of the National Defense Science and Technology University only performed FFT. The CUDA parallelization of the algorithm, matrix transpose and spectral peak search algorithm does not fully port the signal processing part to the GPU implementation. In the existing near-field array imaging for high-frequency radar, the decrease in wavelength will lead to a decrease in the spacing of array elements and a sharp increase in the number of array elements, resulting in a huge amount of data. The large amount of echo data has higher requirements on the data acquisition system and storage device, and the acquisition time of the signal will be longer, and the efficiency of the original imaging method is low.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题，本发明基于英伟达CUDA计算平台，将GPU融入到雷达近场阵列成像中，从而加快大数据量的处理，具有比较稳定、效率更高的优点。具体技术方案如下：In order to solve the above technical problems, the present invention is based on the NVIDIA CUDA computing platform, and integrates the GPU into the radar near-field array imaging, thereby accelerating the processing of large amounts of data, and has the advantages of relatively stability and higher efficiency. The specific technical solutions are as follows:

一种基于CUDA的毫米波雷达RMA成像方法，包括以下步骤：A CUDA-based millimeter-wave radar RMA imaging method, comprising the following steps:

(S1)接收雷达回波数据，并进行预处理，所述预处理包括对回波数据进行距离向校正，再利用CUDA平台库函数进行杂波滤除，并生成目标三维散射数据s；(S1) receiving radar echo data, and performing preprocessing, the preprocessing includes performing range correction on the echo data, and then using the CUDA platform library function to perform clutter filtering, and generate target three-dimensional scattering data s;

(S2)对目标三维散射数据s水平、高度维分别进行二维方位快速傅里叶逆变换，并做并行平移处理，得到数据s₁；(S2) respectively perform two-dimensional azimuth inverse fast Fourier transform on the horizontal and height dimensions of the three-dimensional scattering data s of the target, and perform parallel translation processing to obtain data s ₁ ;

(S3)对数据s₁在空间谱域范围支撑区外出现NaN值的位置置零，得到数据s₂；NaN为英文Not a Number的缩写，在IEEE浮点数算术标准(IEEE 754)中定义，表示一些特殊数值(无穷与非数值(NaN))，支撑区表示谱域的形状。(S3) Zeroing the position where the NaN value appears in the data s ₁ outside the support area of the spatial spectral domain range to obtain the data s ₂ ; NaN is the abbreviation of Not a Number in English, and is defined in the IEEE floating point arithmetic standard (IEEE 754), Represents some special value (infinity and not-a-number (NaN)), and the support area represents the shape of the spectral domain.

(S4)对数据s₂进行距离向波数域插值，得到数据s₃；(S4) Interpolate the distance to the wavenumber domain for the data s ₂ to obtain the data s ₃ ;

(S5)数据s₃进行三维逆快速傅里叶变换，并对变换结果取模值，再对三维数据各个维度取最大散射值成像。(S5) Perform a _three -dimensional inverse fast Fourier transform on the data s3, take a modulo value for the transform result, and then take a maximum scattering value for each dimension of the three-dimensional data for imaging.

优选地，所述步骤(S2)中的二维方位快速傅里叶变换，由ifftshift核函数、cufft库函数、fftshift核函数组成，分别实现逆向FFT移动、快速傅里叶变换、将零频分量移到频谱中心。Preferably, the two-dimensional orientation fast Fourier transform in the step (S2) is composed of the ifftshift kernel function, the cufft library function, and the fftshift kernel function, which respectively realize the inverse FFT movement, the fast Fourier transform, and the zero-frequency component. Move to the center of the spectrum.

优选地，所述空间谱域范围支撑区外出现NaN值的位置置零的具体过程由ndgrid核函数、getP核函数和padarray核函数实现，所述ndgrid核函数实现N维空间中的矩形网格，所述getP核函数由电磁平面波离散关系并行得到距离向频率，所述padarray核函数实现原始数据外的部分置零。Preferably, the specific process of zeroing the position of the NaN value outside the support area of the spatial spectrum range is realized by the ndgrid kernel function, the getP kernel function and the padarray kernel function, and the ndgrid kernel function realizes the rectangular grid in the N-dimensional space. , the getP kernel function obtains the range frequency in parallel from the electromagnetic plane wave discrete relationship, and the padarray kernel function realizes that the part outside the original data is set to zero.

优选地，所述对数据s₂进行距离向波数域插值由interp1核函数实现。Preferably, the distance-to-wavenumber domain interpolation for the data s ₂ is implemented by the interp1 kernel function.

优选地，所述三维逆快速傅里叶变换由ifftshift核函数、cufft核函数和fftshift核函数实现。Preferably, the three-dimensional inverse fast Fourier transform is implemented by the ifftshift kernel function, the cufft kernel function and the fftshift kernel function.

为了更好的理解本发明，下面结合CUDA平台库和MATLAB软件等现有技术，对本发明作详细说明。In order to better understand the present invention, the present invention will be described in detail below in combination with the prior art such as CUDA platform library and MATLAB software.

本发明方法对原始数据进行预处理、二维方位快速傅里叶变换、空间谱域范围外数据置零、距离向波数域插值和三维逆快速傅里叶变换等步骤的加速处理，这种经典算法处理步骤可以使得RMA能够实现目标的三维成像结果，该方法是通过CUDA平台语言构造核函数，利用MATLAB生成mex文件，创建新的函数对数据进行处理。原始三维数据进入CUDA平台后会转换为一维数据，改造后的RMA还能通过cufft，三维数据分别需要进行后两维的傅里叶变换，只需将数据重新排序，需要变换的维度移到第一个尺度，把第一维度和第二维度相交换；调整变换步进值和指针，即更改函数cufftPlanMany中batch的大小、傅里叶变换的尺度、batches的距离，这些共同改变指针handle，当数据经过一系列步骤处理完成后，再传入MATLAB或者利用OPENCV进行成像，如图2所示为雷达近场成像处理组成结构图。The method of the invention performs accelerated processing on the original data, such as preprocessing, two-dimensional azimuth fast Fourier transform, zeroing of data outside the spatial spectrum domain, range-to-wavenumber domain interpolation, and three-dimensional inverse fast Fourier transform. The algorithm processing step can enable RMA to achieve the 3D imaging result of the target. The method is to construct a kernel function through the CUDA platform language, use MATLAB to generate a mex file, and create a new function to process the data. After entering the CUDA platform, the original three-dimensional data will be converted into one-dimensional data, and the transformed RMA can also pass through cufft. The three-dimensional data needs to undergo the Fourier transform of the last two dimensions. It only needs to reorder the data, and the dimension that needs to be transformed is moved to The first scale, swap the first dimension and the second dimension; adjust the transformation step value and pointer, that is, change the batch size, the scale of the Fourier transform, and the distance of the batches in the function cufftPlanMany, which together change the pointer handle, After the data is processed through a series of steps, it is then transferred to MATLAB or OPENCV is used for imaging. Figure 2 shows the structure diagram of radar near-field imaging processing.

利用MATLAB生成mex文件，创建新的函数对数据进行处理中涉及到“.cpp”数据输入输出文件和“.h”。“.cpp”数据输入输出文件由提取原始数据实虚部、引用“.cu”文件主函数(“.cu”文件是CUDA平台特有文件，主要用来执行device端代码，在这里host端与device端代码都写在“.cu”文件的主函数中)、输出处理后的数据、数据导出部分组成。“.cu”文件主函数的实现过程为：Using MATLAB to generate mex files and creating new functions to process data involves ".cpp" data input and output files and ".h". The ".cpp" data input and output files are extracted from the real and imaginary parts of the original data, and the main function of the ".cu" file is referenced (the ".cu" file is a file unique to the CUDA platform, which is mainly used to execute the device-side code, where the host and device The terminal code is written in the main function of the ".cu" file), the output processed data, and the data export part. The implementation process of the main function of the ".cu" file is:

“.cu”文件主函数运行在CUDA平台，需要包含.h头文件。需要并行的地方主要是采用for循环实现，把顺序依次的迭代换成同时进行，且索引间需要没有联系。Device端核函数中的操作可以使用host端的操作，并可以根据实际需要对函数改变，比如host端的memset改成cudaMemset。但device端的操作不能在host进行。CUDA中取最大值和最小值的函数：fmaxf、fminf。首先三维数据会在CUDA平台host端按行转换为一维数据，再利用cudaMalloc函数给device端变量分配相应内存，然后利用cudaMemcpy函数把数据从host端传入device端，在编写__global__核函数或利用已有库函数对device端数据进行处理，处理完成后再利用cudaMemcpy函数把数据从device端传入host端进行输出。因device端的数据是不能打印出来的，必须传到host端才行，且单个数据不用cudaMemcpy函数进行传入传出，直接调用即可。并且对变量分配内存后，变量初始值一般为0。但对变量进行自身叠加时，变量初始值会被赋值一个10⁸数量级的初始值，所以如果需要利用变量初始值时，最好在host端或device端分别利用memset或cudaMemset对变量赋初值。每种型号的NVIDIA GPU都有额定的每个Block的线程数，在设置blocksize的时候，dim3blockSize(i,j,r)中的i,j,r是dim3在x,y,z方向的维度gridDim.x、gridDim.y、gridDim.z的值，它们的积要小于等于额定的每个Block的线程数，其中j,r可省略为1。一般用dim3定义gridSize的类型，确定gridSize在dim3的x,y,z方向的维度blockDim.x,blockDim.y,blockDim.z，x,y,z分别为数据在三个维度需要计算的大小，也就是CPU是for循环的迭代次数。如果只是在一个x,y,z方向进行，比如实现点乘功能的核函数，只需使blockSize为1，gridSize指定为一个方向的维度，即改核函数的尺度。在Debug的时候，由于一些隐藏的错误编译MATLAB时不会详细指出，可以在VS编译“.cu”文件，一般只会出现“错误LNK2019:无法解析的外部符号main，该符号在函数__tmainCRTStartup中被引用”和“错误LNK1120:1个无法解析的外部命令”两个错误就表示“.cu”文件没问题，但有时即使显示“MEX已成功完成”，也会曝出“MATLAB has encounteredan internal problem and needs to close.”(比如报错原因的一种是host端使用了device端变量)，这时就需要把“.cu”文件改成能单独编译的文件，再在VS上调试，确认无误后再改回原来格式。利用MATLAB生成mex文件时，遇到“LIBCMT.lib(excptptr.obj):fatalerror LNK1112:模块计算机类型’X86’与目标计算机类型’x64’冲突”的错误，这个错误是表明lib库冲突了，解决办法是修改mex命令：-lcudart LINKFLAGS＝"$LINKFLAGS/NODEFAULTLIB:LIBCMT.LIB"-lcufft。忽略LIBCMT.lib库，继续编译即可。CPU有更大的面积用于高速缓存和分支预测。而GPU是一堆运算器搭一个控制器，程序相同，只是输入输出数据不同。由于GPU解决的问题是大规模的并行运算。运算单元(sp，stream processor，流处理器，即ALU单元)不需要像CPU那么复杂，而个数又是几千个。所以利用GPU进行简单运算时，可以把它理解为像verilog一样并行计算，比如进行互换的简单操作时，不必像使用CPU一样分配一个缓存进行交换，直接互换数据即可。如果在核函数里调用核函数，一开始会报错“error:calling a__global__function("")from a__global__function("")is onlyallowed on the compute_35architecture or above”，这时只需单击项目来打开项目属性，更改的第一个设置通过选择“是(-rdc＝true)”告诉CUDA生成可重定位的设备代码，接着必须通过将代码生成更改为“compute_35，sm_35”来告诉CUDA使用计算3.5，然后，通知运行时库使用多线程库。若处于调试模式，则选择“多线程调试(/MTd)”或者处于发布模式”多线程(/MT)“，最后在链接器属性中，添加一个额外的依赖项为“cudadevrt.lib”。The main function of the ".cu" file runs on the CUDA platform and needs to include the .h header file. Where parallelism is required, for loops are used to replace sequential iterations with simultaneous ones, and there should be no connection between indexes. The operations in the kernel function on the Device side can use the operations on the host side, and the functions can be changed according to actual needs, such as changing the memset on the host side to cudaMemset. However, operations on the device side cannot be performed on the host. Functions for taking the maximum and minimum values in CUDA: fmaxf, fminf. First, the three-dimensional data will be converted into one-dimensional data by line on the host side of the CUDA platform, and then use the cudaMalloc function to allocate the corresponding memory to the device side variables, and then use the cudaMemcpy function to transfer the data from the host side to the device side, and write the __global__ kernel function. Or use the existing library functions to process the data on the device side, and then use the cudaMemcpy function to transfer the data from the device side to the host side for output after the processing is completed. Because the data on the device side cannot be printed, it must be transmitted to the host side, and a single data does not need to be passed in and out of the cudaMemcpy function, and can be called directly. And after allocating memory to the variable, the initial value of the variable is generally 0. However, when the variable is superimposed ^on itself, the initial value of the variable will be assigned an initial value of the order of 108. Therefore, if the initial value of the variable needs to be used, it is best to use memset or cudaMemset on the host side or the device side respectively. Assign the initial value to the variable. Each type of NVIDIA GPU has a rated number of threads per block. When setting blocksize, i, j, r in dim3blockSize(i, j, r) is the dimension gridDim of dim3 in the x, y, and z directions The value of .x, gridDim.y, and gridDim.z, their product must be less than or equal to the rated number of threads per block, where j, r can be omitted as 1. Generally use dim3 to define the type of gridSize, Determine the dimensions of gridSize in the x, y, z directions of dim3 blockDim.x, blockDim.y, blockDim.z, x, y, z are the sizes of the data that need to be calculated in the three dimensions, that is, the CPU is the iteration of the for loop frequency. If it is only performed in one x, y, z direction, such as the kernel function that implements the dot product function, just set the blockSize to 1, and specify the gridSize as the dimension of one direction, that is, change the scale of the kernel function. When Debugging, due to some hidden errors when compiling MATLAB, it will not be pointed out in detail. You can compile the ".cu" file in VS, and generally only "Error LNK2019: Unresolved external symbol main, which is in the function __tmainCRTStartup, will appear. referenced" and "error LNK1120: 1 unresolved external command" means that the ".cu" file is fine, but sometimes even if it says "MEX has completed successfully", it will reveal "MATLAB has encounteredan internal problem" and needs to close." (for example, one of the reasons for the error is that the host side uses the device side variable), then you need to change the ".cu" file to a file that can be compiled separately, and then debug it on VS, after confirming that it is correct Change back to the original format. When using MATLAB to generate a mex file, I encountered the error "LIBCMT.lib(excptptr.obj): fatalerror LNK1112: The module computer type 'X86' conflicts with the target computer type 'x64'". This error indicates that the lib library conflicts, solve The solution is to modify the mex command: -lcudart LINKFLAGS="$LINKFLAGS/NODEFAULTLIB:LIBCMT.LIB"-lcufft. Ignore the LIBCMT.lib library and continue compiling. The CPU has a larger area for cache and branch prediction. The GPU is a bunch of computing units with a controller, the program is the same, but the input and output data are different. Because the problem that GPU solves is massively parallel computing. The arithmetic unit (sp, stream processor, stream processor, that is, the ALU unit) does not need to be as complicated as the CPU, and the number is several thousand. Therefore, when using the GPU for simple operations, it can be understood as parallel computing like verilog. For example, when performing simple operations of exchange, it is not necessary to allocate a cache for exchange like using a CPU, and the data can be exchanged directly. If the kernel function is called in the kernel function, an error will be reported at the beginning "error: calling a__global__function("")from a__global__function("")is only allowed on the compute_35architecture or above", then just click the project to open the project properties, change The first setting of CUDA tells CUDA to generate relocatable device code by choosing "yes (-rdc=true)", then it has to tell CUDA to use compute 3.5 by changing the code generation to "compute_35,sm_35", then, the notification runs The time library uses a multithreaded library. If in debug mode, select "Multi-threaded debugging (/MTd)" or in release mode "Multi-threaded (/MT)", and finally in the linker properties, add an additional dependency as "cudadevrt.lib".

“.h头文件”只需声明“.cu”文件中输入输出变量，可以采用函数原型extern"C"来指出要使用的约定。The ".h header file" only needs to declare the input and output variables in the ".cu" file, and the function prototype extern "C" can be used to indicate the convention to be used.

采用本发明获得的有益效果：1、本发明实现了阵列扫描体制算法加速的创新，解决了这种新体制下处理速度缓慢的弊病；2、本发明能够通过GPU浮点计算功能方面的优势，转移现有算法对CPU的依赖；3、本发明由于利用MATLAB生成了mex文件，从而实现了算法函数的多次重复调用和移植。The beneficial effects obtained by adopting the invention: 1. The invention realizes the innovation of algorithm acceleration of the array scanning system, and solves the disadvantage of slow processing speed under this new system; 2. The invention can take advantage of the GPU floating-point calculation function, The dependence of the existing algorithm on the CPU is transferred; 3. The present invention generates the mex file by using MATLAB, thereby realizing the repeated invocation and transplantation of the algorithm function for many times.

附图说明Description of drawings

图1为本发明方法流程图；Fig. 1 is the flow chart of the method of the present invention;

图2为雷达近场成像处理组成框图；Figure 2 is a block diagram of radar near-field imaging processing;

图3为实施例中采用本发明方法对半身人体模型成像结果图，其中图(a)为半身人体模型原图，图(b)为半身人体模型成像结果的正面图，图(c)为半身人体模型成像结果的侧视图。3 is a diagram showing the imaging result of the half-body mannequin by the method of the present invention in the embodiment, wherein the figure (a) is the original image of the half-body mannequin, the figure (b) is the front view of the imaging result of the half-body mannequin, and the figure (c) is the half-body mannequin. Side view of mannequin imaging results.

具体实施方式Detailed ways

下面，结合附图和实施例对本发明作进一步说明。Hereinafter, the present invention will be further described with reference to the accompanying drawings and embodiments.

如图1所示为本发明流程图，一种基于CUDA的毫米波雷达RMA成像方法，包括以下步骤：Figure 1 is a flowchart of the present invention, a CUDA-based millimeter-wave radar RMA imaging method, comprising the following steps:

(S1)接收雷达回波数据，并进行预处理，所述预处理包括对回波数据进行距离向校正，再利用CUDA平台库cufft函数cufftExecC2C中的逆变换参数CUFFT_INVERSE把距离向实际目标范围外的数据置零，去掉杂波影响，再使用库cufft函数cufftExecC2C中的正变换参数CUFFT_FORWARD得到目标三维散射数据s(x,y,f)，其中x,y,f分别为水平、高度、频率维。实例中数据预处理，由一个linspace核函数，一个Equal核函数，和一个multwith核函数，分别用于并行生成线性间距数组，用于去掉数组最后一个数值，用于数据与修正项并行相乘。linspace、Equal两个核函数用于生成实际距离序列，实际距离序列用来限定回波在实际距离范围内的数据，该限定过程在对距离向进行快速傅里叶逆变换和正变换之前，multwith核函数执行后的数据再进行傅里叶变换。(S1) Receive radar echo data, and perform preprocessing. The preprocessing includes performing range correction on the echo data, and then using the inverse transformation parameter CUFFT_INVERSE in the cufft function cufftExecC2C of the CUDA platform library to convert the range to the range beyond the actual target range. Set the data to zero to remove the influence of clutter, and then use the forward transformation parameter CUFFT_FORWARD in the library cufft function cufftExecC2C to obtain the target three-dimensional scattering data s(x, y, f), where x, y, f are the horizontal, height, and frequency dimensions, respectively. The data preprocessing in the example consists of a linspace kernel function, an Equal kernel function, and a multwith kernel function, which are used to generate linearly spaced arrays in parallel, to remove the last value of the array, and to multiply the data and the correction item in parallel. The two kernel functions of linspace and Equal are used to generate the actual distance sequence. The actual distance sequence is used to limit the data of the echo within the actual distance range. Before performing inverse fast Fourier transform and forward transformation on the distance direction, the multwith kernel is used for this limiting process. The data after the function is executed is then Fourier transformed.

(S2)对目标三维散射数据s水平、高度维分别进行二维方位快速傅里叶逆变换，并做并行平移处理，得到数据s₁；具体为利用平台库cufft分别对经过上一步处理的散射数据s(x,y,f)的水平、高度维进行二维方位快速傅里叶逆变换，且对数据编写实现matlab中fftshift功能的核函数分别做水平、高度维的并行平移，得到散射数据水平、高度维的数据s₁(X,Y,f)。(S2) Perform two-dimensional azimuth inverse fast Fourier transform on the horizontal and height dimensions of the three-dimensional scattering data s of the target respectively, and perform parallel translation processing to obtain data s ₁ ; specifically, use the platform library cufft to respectively transform the scattering data processed in the previous step by using the platform library cufft The horizontal and height dimensions of the data s(x, y, f) are subjected to two-dimensional azimuth inverse fast Fourier transform, and the kernel function that implements the fftshift function in matlab is written to perform parallel translation of the horizontal and height dimensions, respectively, to obtain scattering data. Horizontal and height dimension data s ₁ (X,Y,f).

零中频信号s(X,Y)计算公式如下：The formula for calculating the zero-IF signal s(X,Y) is as follows:

其中f(x,y)为目标散射特性函数，D为目标区域，Z₀表示目标位置点在z＝Z₀平面，k为波数，k＝2π/λ，λ为信号波长，k_x,k_y,k_z依次为波数k沿x,y,z方向的波数分量，FT2表示二维傅里叶变换，IFT2表示二维傅里叶逆变换。where f(x,y) is the target scattering characteristic function, D is the target area, Z ₀ indicates that the target position is on the z=Z ₀ plane, k is the wavenumber, k=2π/λ, λ is the signal wavelength, k _x ,k _y , k _z are the wavenumber components of the wavenumber k along the x, y, and z directions in turn, FT2 represents the two-dimensional Fourier transform, and IFT2 represents the two-dimensional inverse Fourier transform.

(S3)对数据s₁在空间谱域范围支撑区外出现NaN值的位置置零，得到数据s₂(X,Y,f)；(S3) zeroing out the position where the NaN value appears in the data s ₁ outside the support area of the spatial spectrum domain, to obtain the data s ₂ (X, Y, f);

首先利用向下取整得到输出(频率维，比如步进值已经经过10倍缩小，频点10倍扩大)坐标F_m(F_m表示步进值缩小十倍后的输入频率F的第m个值)在经过上述操作后的输入(频率维)坐标f_m的相对位置，即d_m＝(int)F_m-f_m，再根据相对位置在相邻输入数据(经过上述步骤后的散射数据)s₂(X,Y,f)的距离权重取值，即i＝blockIdx.x*blockDim.x+threadIdx.x；j＝blockIdx.y*blockDim.y+threadIdx.y；r＝blockIdx.z*blockDim.z+threadIdx.z；设idx1，idx0，idx22表示坐标索引号，因为三维数据在CUDA里会变成一维数据，用坐标索引号指定数据里的索引值。First, use downward rounding to get the output (frequency dimension, for example, the step value has been reduced by 10 times, and the frequency point has been expanded by 10 times) coordinate F _m (F _m represents the mth of the input frequency F after the step value is reduced by ten times value) in the relative position of the input (frequency dimension) coordinate f _m after the above operations, that is, d _m =(int)F _m -f _m , and then according to the relative position in the adjacent input data (scattering data after the above steps) )s ₂ (X, Y, f) distance weight value, i.e. i=blockIdx.x*blockDim.x+threadIdx.x; j=blockIdx.y*blockDim.y+threadIdx.y; r=blockIdx.z *blockDim.z+threadIdx.z; set idx1, idx0, idx22 to represent the coordinate index number, because the three-dimensional data will become one-dimensional data in CUDA, and the coordinate index number is used to specify the index value in the data.

idx22＝sy·(sz·i+r)+j,idx22=sy·(sz·i+r)+j,

idx0＝sx·(sy·r+j)+d_m,idx0=sx·(sy·r+j)+d _m ,

idx1＝sx·(sy·r+j)+d_m+1；idx1=sx·(sy·r+j)+ _dm +1;

则，s₃(X,Y,f)_idx22＝wd·s₂(X,Y,f)_idx0+wu·s₂(X,Y,f)_idx1，其中sx,sy,sz分别为数据的三维尺度，blockDim、blockIdx为线程块操作x、y、z方向维大小、坐标，threadIdx为线程块中线程操作x、y、z方向坐标，wd,wu为距离权重，利用核函数同时进行运算，可以校正距离弯曲。Then, s ₃ (X,Y,f) _idx22 =wd·s ₂ (X,Y,f) _idx0 +wu·s ₂ (X,Y,f) _idx1 , where sx,sy,sz are the three-dimensional data Scale, blockDim and blockIdx are the dimensions and coordinates of the thread block operation in the x, y, and z directions, threadIdx is the x, y, and z direction coordinates of the thread operation in the thread block, and wd and wu are the distance weights. Using the kernel function to perform operations at the same time, you can Corrects for distance curvature.

(S5)分别对经过上述处理后的数据s₃(X,Y,f)分别进行高度、水平、频率三个维度的cufft，执行相应维度时需要把其数据移到第一个维度进行计算，比如高度维在第一维度，水平维在第二维度，执行水平维cufft时需要把水平维和第一维度高度相交换，再设置指针和步进值得到散射数据s₂(x,y,t)。对散射数据s₂(x,y,t)取绝对值，再对三维数据各个维度取最大散射值成像。因matlab中成像的函数imagesc只接受输入值为实数，而变换结果s₂(x,y,t)是复数，故取模值。因为用imagesc成的像是二维数据的二维像，故取三维数据s₂(x,y,t)各个维度最大散射值分别成像。(S5) respectively perform cufft on the data s ₃ (X, Y, f) after the above processing in the three dimensions of height, level and frequency, and when executing the corresponding dimension, the data needs to be moved to the first dimension for calculation, For example, the height dimension is in the first dimension, and the horizontal dimension is in the second dimension. When performing horizontal cufft, you need to exchange the height of the horizontal dimension and the first dimension, and then set the pointer and step value to get the scattering data s ₂ (x,y,t) . Take the absolute value of the scattering data s ₂ (x, y, t), and then take the maximum scattering value for each dimension of the three-dimensional data for imaging. Because the imaging function imagesc in matlab only accepts the input value as a real number, and the transformation result s ₂ (x, y, t) is a complex number, the modulo value is taken. Because the image formed by imagesc is a two-dimensional image of two-dimensional data, the maximum scattering value of each dimension of the three-dimensional data s ₂ (x, y, t) is taken for imaging separately.

如图3所示，为实施例中采用本发明方法对半身人体模型成像结果图(坐标表示位置，单位：米)，原始数据为频率、高度和水平三维扫描的复散射数据，接着对数据进行RMA处理，该处理通过在CUDA平台构建核函数，对数据进行并行处理，达到GPU加速成像算法的功能，5次平均成像原始时间在 CPU E5-2683上MATLAB2017b需要17.839秒，利用TITAN XP GPU需要6.134秒，速度提升2.9倍，本发明方法通过上述实验，验证了实测数据，效果理想。As shown in FIG. 3 , it is the image of the imaging result of the half-body mannequin using the method of the present invention in the embodiment (coordinates indicate the position, unit: m), the original data is the complex scattering data of the frequency, height and horizontal three-dimensional scanning, and then the data is processed RMA processing, this processing achieves the function of GPU-accelerated imaging algorithm by building a kernel function on the CUDA platform and processing the data in parallel. The average imaging original time of 5 times is MATLAB2017b on CPU E5-2683 takes 17.839 seconds, and TITAN XP GPU takes 6.134 seconds, and the speed is increased by 2.9 times. The method of the present invention has verified the measured data through the above experiments, and the effect is ideal.

Claims

1. a CUDA-based millimeter wave radar RMA imaging method, is characterized in that comprising the following steps:

(S1) receiving radar echo data, and performing preprocessing, the preprocessing includes performing range correction on the echo data, and then using the CUDA platform library function to perform clutter filtering, and generate target three-dimensional scattering data s;

(S2) respectively perform two-dimensional azimuth inverse fast Fourier transform on the horizontal and height dimensions of the three-dimensional scattering data s of the target, and perform parallel translation processing to obtain data s ₁ ;

(S3) zeroing the position where the data s ₁ appears non-numerical outside the support region of the spatial spectral domain, to obtain the data s ₂ ;

(S4) Interpolate the distance to the wavenumber domain for the data s ₂ to obtain the data s ₃ ;

(S5) Perform a _three -dimensional inverse fast Fourier transform on the data s3, take a modulo value for the transform result, and then take a maximum scattering value for each dimension of the three-dimensional data for imaging.

2. a kind of CUDA-based millimeter-wave radar RMA imaging method as claimed in claim 1, is characterized in that: the two-dimensional azimuth fast Fourier transform in described step (S2), by ifftshift kernel function, cufft library function , fftshift kernel function, respectively realize inverse FFT shift, fast Fourier transform, and move the zero-frequency component to the center of the spectrum.

3. a kind of CUDA-based millimeter wave radar RMA imaging method as claimed in claim 1, is characterized in that: the concrete process that the position of non-numerical value appears outside described space spectral domain support area is zeroed by ndgrid kernel function, getP The kernel function and the padarray kernel function are realized, the ndgrid kernel function realizes a rectangular grid in the N-dimensional space, the getP kernel function obtains the range frequency in parallel by the electromagnetic plane wave discrete relationship, and the padarray kernel function realizes the non-numerical value outside the original data. position is set to zero.

4 . The CUDA-based millimeter-wave radar RMA imaging method according to claim 1 , wherein the interpolation of the data s ₂ in the range to the wavenumber domain is realized by the interp1 kernel function. 5 .

5 . The CUDA-based millimeter-wave radar RMA imaging method according to claim 1 , wherein the three-dimensional inverse fast Fourier transform is realized by ifftshift kernel function, cufft kernel function and fftshift kernel function. 6 .