CN118069969A - GPU-based method and device for fast calculation of Green's function in layered media - Google Patents
GPU-based method and device for fast calculation of Green's function in layered media Download PDFInfo
- Publication number
- CN118069969A CN118069969A CN202410503575.9A CN202410503575A CN118069969A CN 118069969 A CN118069969 A CN 118069969A CN 202410503575 A CN202410503575 A CN 202410503575A CN 118069969 A CN118069969 A CN 118069969A
- Authority
- CN
- China
- Prior art keywords
- integral
- matrix
- calculation
- gpu
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 152
- 238000000034 method Methods 0.000 title claims abstract description 76
- 239000011159 matrix material Substances 0.000 claims abstract description 98
- 230000010354 integration Effects 0.000 claims abstract description 41
- 229940050561 matrix product Drugs 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 15
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims abstract description 14
- 230000009466 transformation Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 139
- 238000005070 sampling Methods 0.000 claims description 46
- 230000003595 spectral effect Effects 0.000 claims description 34
- 230000005540 biological transmission Effects 0.000 claims description 17
- 238000013213 extrapolation Methods 0.000 claims description 9
- 238000004088 simulation Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 5
- 235000021384 green leafy vegetables Nutrition 0.000 abstract 2
- 230000001737 promoting effect Effects 0.000 abstract 1
- 238000004613 tight binding model Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000001133 acceleration Effects 0.000 description 6
- 238000009795 derivation Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000035699 permeability Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000010355 oscillation Effects 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Complex Calculations (AREA)
Abstract
Description
技术领域Technical Field
本申请涉及计算电磁学技术领域,尤其涉及基于GPU的分层媒质格林函数快速计算方法和装置。The present application relates to the technical field of computational electromagnetics, and in particular to a method and device for fast calculation of Green's functions of layered media based on a GPU.
背景技术Background technique
平面分层媒质中的积分方程法是计算电磁学中最成功的模型之一,已广泛应用于微带、射频电路和芯片的分析。采用矩量法求解分层媒质中的目标电磁响应的难点主要是格林函数的快速计算。不同于自由空间,分层媒质中格林函数不能被解析的写出,需要通过索莫菲积分(Sommerfeld Integral ,SI)的计算将谱域格林函数转为空域格林函数。然而被积函数中贝塞尔(Bessel)函数的高震荡和缓慢衰减性以及积分核本身包含奇异性使得其计算非常困难,其计算效率直接影响了矩量法矩阵方程的填充时间。因此,高效求解Sommerfeld积分是加快平面分层媒质电磁仿真的关键。The integral equation method in planar layered media is one of the most successful models in computational electromagnetics and has been widely used in the analysis of microstrip, RF circuits and chips. The difficulty of using the moment method to solve the target electromagnetic response in layered media is mainly the fast calculation of the Green's function. Unlike free space, the Green's function in layered media cannot be written analytically, and the spectral domain Green's function needs to be converted into the spatial domain Green's function through the calculation of the Sommerfeld Integral (SI). However, the high oscillation and slow decay of the Bessel function in the integrand and the singularity of the integral kernel itself make it very difficult to calculate, and its computational efficiency directly affects the filling time of the moment method matrix equation. Therefore, efficiently solving the Sommerfeld integral is the key to speeding up the electromagnetic simulation of planar layered media.
Sommerfeld积分广泛的应用场景不断吸引着研究人员的关注,目前有许多的方法能够加速索莫菲积分计算。这些方法通常可以分为两类:闭式近似方法和数值积分方法。The wide range of applications of Sommerfeld integral continues to attract the attention of researchers. Currently, there are many methods that can accelerate the calculation of Sommerfeld integral. These methods can generally be divided into two categories: closed-form approximation methods and numerical integration methods.
闭式近似方法计算Sommerfeld积分实现方案:Closed-form approximate method to calculate Sommerfeld integral implementation:
(1)对谱域格林函数进行拟合;(1) Fitting the spectral domain Green’s function;
(2)通过积分恒等式变换到空间域,从而得到若干个球面波与或柱面波叠加的形式;(2) By transforming the integral identity into the spatial domain, we can obtain the form of superposition of several spherical waves and/or cylindrical waves;
最具代表性的闭式近似方法是离散复镜像方法 (DCIM)和有理式拟合方法。 闭式近似方法虽然避免了无限振荡积分的计算,极大的降低了计算时间,但是精度不可控,且在分层媒质中很难准确定位表面波极点的位置。The most representative closed-form approximation methods are the discrete complex image method (DCIM) and the rational fitting method. Although the closed-form approximation method avoids the calculation of infinite oscillation integrals and greatly reduces the calculation time, the accuracy is uncontrollable, and it is difficult to accurately locate the position of the surface wave pole in layered media.
数值积分法求解Sommerfeld积分实现方案如下:The implementation scheme for solving the Sommerfeld integral by numerical integration method is as follows:
(1)确定积分路径;(1) Determine the integration path;
(2)沿积分路径直接数值积分;(2) Direct numerical integration along the integral path;
数值积分方法主要包括最陡下降路径(Steepest Descent Path,SDP)法、快速Hankel 变换法和一系列针对积分核缓慢衰减的处理方法。以最陡下降路径法为例,该方法主要处理积分核的指数项,根据鞍点选择积分路径,使指数函数从鞍点迅速下降。该方法的缺点是,Sommerfeld积分的积分核中可能包含多个不同的指数项,且积分路径必须包含每个鞍点,而当层数较多时鞍点数量迅速增加,因此不适用于解决一般的多层结构问题。Numerical integration methods mainly include the Steepest Descent Path (SDP) method, the fast Hankel transform method, and a series of processing methods for the slow decay of the integral kernel. Taking the Steepest Descent Path method as an example, this method mainly deals with the exponential term of the integral kernel, selects the integral path according to the saddle point, and makes the exponential function drop rapidly from the saddle point. The disadvantage of this method is that the integral kernel of the Sommerfeld integral may contain multiple different exponential terms, and the integral path must include each saddle point. When the number of layers is large, the number of saddle points increases rapidly, so it is not suitable for solving general multi-layer structure problems.
发明内容Summary of the invention
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。The present application aims to solve one of the technical problems in the related art at least to some extent.
为此,本申请的第一个目的在于提出一种基于GPU的分层媒质格林函数快速计算方法,解决了现有方法难以在多层结构中应用的技术问题,实现了多参数的索莫菲积分一次性并行计算,可以大大提高分层媒质中格林函数中Sommerfeld积分的计算效率。To this end, the first purpose of this application is to propose a GPU-based fast calculation method for Green's function in layered media, which solves the technical problem that existing methods are difficult to apply in multi-layer structures, realizes the one-time parallel calculation of Sommerfeld integrals of multiple parameters, and can greatly improve the calculation efficiency of Sommerfeld integrals in Green's function in layered media.
本申请的第二个目的在于提出一种基于GPU的分层媒质格林函数快速计算装置。The second objective of the present application is to propose a GPU-based fast calculation device for layered medium Green's function.
为达上述目的,本申请第一方面实施例提出了一种基于GPU的分层媒质格林函数快速计算方法,包括:初始化GPU的三维网格和每个线程块的线程数;使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,并将矩阵的项的计算任务均匀分配到各个线程块中并行执行,一次得到多个参数点、多个空间点的SI计算结果,其中,SI计算结果包括SI头部积分结果和尾部积分结果,在每个线程块中的计算过程包括:利用CUDA矩阵运算单元Tensor Core执行矩阵乘积,计算头部和尾部的分段积分,并在尾部积分计算时,对分段积分结果采用Euler变换加速收敛。To achieve the above-mentioned purpose, the first aspect of the present application proposes a GPU-based fast calculation method for layered medium Green's function, including: initializing the three-dimensional grid of the GPU and the number of threads in each thread block; using the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU to fill the matrix, generalize the numerical integral of SI to matrix product, and evenly distribute the calculation tasks of the matrix items to each thread block for parallel execution, so as to obtain SI calculation results of multiple parameter points and multiple spatial points at one time, wherein the SI calculation results include SI head integral results and tail integral results, and the calculation process in each thread block includes: using the CUDA matrix operation unit Tensor Core to perform matrix product, calculate the piecewise integral of the head and tail, and when calculating the tail integral, use Euler transform to accelerate the convergence of the piecewise integral results.
本申请实施例的基于GPU的分层媒质格林函数快速计算方法,通过将扫参过程中谱域格林函数与贝塞尔函数进行重复利用以优化计算架构,将索莫菲积分转化为两矩阵相乘的形式,利用GPU强大的并行计算能力以及特殊的矩阵运算单元Tensor Core,以及推导Euler变换表达式加速尾部积分实现了一次计算多个频率或者多个平面分层媒质参数时的索莫菲积分并行方案。The GPU-based fast calculation method of layered media Green's function in the embodiment of the present application optimizes the calculation architecture by reusing the spectral domain Green's function and Bessel function in the scanning process, converts the Somofi integral into the form of two matrix multiplications, and utilizes the powerful parallel computing capability of the GPU and the special matrix operation unit Tensor Core, as well as the derivation of the Euler transform expression to accelerate the tail integral to realize a parallel solution for the Somofi integral when calculating multiple frequencies or multiple plane layered media parameters at one time.
可选地,在本申请的一个实施例中,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,包括:Optionally, in one embodiment of the present application, the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU are used to fill the matrix, and the numerical integration of SI is generalized to matrix product, including:
设定初始化的GPU包含个参数点、/>个空间点的索莫菲积分的计算任务,确定每个线程块计算M个参数点、N个空间点的SI;Setting the initialization GPU includes parameter points, /> The calculation task of the Somofi integral of the spatial points determines that each thread block calculates the SI of M parameter points and N spatial points;
将每个线程块计算的SI排列在M×N矩阵中,每列表示不同参数点的SI,每行表示具有不同空间点的SI,使得SI的数值积分推广为矩阵乘积,并得到第一矩阵和第二矩阵。The SI calculated by each thread block is arranged in an M×N matrix, where each column represents the SI with a different parameter point and each row represents the SI with a different spatial point, so that the numerical integration of SI is generalized to matrix product, and a first matrix and a second matrix are obtained.
可选地,在本申请的一个实施例中,矩阵乘积为,/>,第一矩阵/>为/>矩阵,第一矩阵由M个参数点和K个积分采样点的谱域格林函数组成,第二矩阵/>的项由贝塞尔函数和积分权重系数的乘积的计算结果组成。Optionally, in one embodiment of the present application, the matrix product is ,/> , the first matrix/> For/> Matrix, the first matrix is composed of the spectral domain Green's function of M parameter points and K integral sampling points, the second matrix/> The term consists of the calculation result of the product of the Bessel function and the integral weight coefficient.
可选地,在本申请的一个实施例中,索莫菲积分包括头部积分和尾部积分,头部积分和尾部积分均为分段积分,头部积分表示为:Optionally, in one embodiment of the present application, the Somofi integral includes a head integral and a tail integral, both of which are piecewise integrals, and the head integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为场点和源点间的横向距离,/>为第一类Bessel函数,/>为Bessel函数的阶数,A为长轴,/>和/>分别为权重和采样,/>表示第i个采样点沿椭圆路径的SI积分结果,N表示积分采样点个数;in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the lateral distance between the field point and the source point, /> is the first kind Bessel function, /> is the order of the Bessel function, A is the major axis, /> and/> are weights and samples respectively,/> represents the SI integral result of the i-th sampling point along the elliptical path, and N represents the number of integral sampling points;
尾部积分表示为:The tail integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为第一类Bessel函数,/>为Bessel函数的阶数,/>为场点和源点间的横向距离,A为长轴,/>和/>分别表示权重和采样点,L表示尾部积分区间划分子区间的采样点个数,N表示子积分区间采样点个数,/>表示尾部积分子区间Euler变换后的计算结果。in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the first kind Bessel function, /> is the order of the Bessel function, /> is the lateral distance between the field point and the source point, A is the major axis, /> and/> Represent weights and sampling points respectively, L represents the number of sampling points of the sub-intervals of the tail integral interval, N represents the number of sampling points of the sub-integral intervals, /> Represents the calculation result after Euler transformation of the tail integral subinterval.
可选地,在本申请的一个实施例中,在计算时第K次递归后的SI尾部积分表示为:Optionally, in one embodiment of the present application, the SI tail integral after the Kth recursion during calculation is expressed as:
其中,N表示尾部积分划分子区间的采样点个数,表示相应的分段积分值的系数,/>表示尾部积分各分段积分值。Where N represents the number of sampling points of the tail integral division subinterval, The coefficients representing the corresponding piecewise integral values, /> Indicates the integral value of each segment of the tail integral.
为达上述目的,本申请第二方面实施例提出了一种基于GPU的分层媒质格林函数快速计算装置,包括CPU、GPU,CPU包含内存,其中,To achieve the above-mentioned purpose, the second embodiment of the present application proposes a GPU-based layered medium Green's function fast calculation device, including a CPU and a GPU, wherein the CPU includes a memory, wherein:
CPU,用于对GPU进行初始化,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,并将推广后的数据存储到内存中;The CPU is used to initialize the GPU, fill the matrix with the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU, generalize the numerical integral of SI to matrix product, and store the generalized data in the memory;
GPU,用于将矩阵的项的计算任务均匀分配到各个线程块中并行执行,一次得到多个参数点、多个空间点的SI计算结果,并通过PCIe总线将积分计算结果传输到CPU的内存中,其中,SI计算结果包括SI头部积分结果和尾部积分结果,在每个线程块中的计算过程包括:The GPU is used to evenly distribute the calculation tasks of the matrix items to each thread block for parallel execution, obtain the SI calculation results of multiple parameter points and multiple spatial points at one time, and transmit the integral calculation results to the CPU memory through the PCIe bus. The SI calculation results include the SI head integral result and the tail integral result. The calculation process in each thread block includes:
利用CUDA矩阵运算单元Tensor Core执行矩阵乘积,计算头部和尾部的分段积分,并在尾部积分计算时,对分段积分结果采用Euler变换加速收敛。The CUDA matrix operation unit Tensor Core is used to perform matrix multiplication, calculate the head and tail piecewise integrals, and when calculating the tail integral, the Euler transform is used to accelerate the convergence of the piecewise integral results.
可选地,在本申请的一个实施例中,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,包括:Optionally, in one embodiment of the present application, the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU are used to fill the matrix, and the numerical integration of SI is generalized to matrix product, including:
设定初始化的GPU包含个参数点、/>个空间点的索莫菲积分的计算任务,确定每个线程块计算M个参数点、N个空间点的SI;Setting the initialization GPU includes parameter points, /> The calculation task of the Somofi integral of the spatial points determines that each thread block calculates the SI of M parameter points and N spatial points;
将每个线程块计算的SI排列在M×N矩阵中,每列表示不同参数点的SI,每行表示具有不同空间点的SI,使得SI的数值积分推广为矩阵乘积,并得到第一矩阵和第二矩阵。The SI calculated by each thread block is arranged in an M×N matrix, where each column represents the SI with a different parameter point and each row represents the SI with a different spatial point, so that the numerical integration of SI is generalized to matrix product, and a first matrix and a second matrix are obtained.
可选地,在本申请的一个实施例中,矩阵乘积为,/>,第一矩阵/>为/>矩阵,第一矩阵由M个参数点和K个积分采样点的谱域格林函数组成,第二矩阵/>的项由贝塞尔函数和积分权重系数的乘积的计算结果组成。Optionally, in one embodiment of the present application, the matrix product is ,/> , the first matrix/> For/> Matrix, the first matrix is composed of the spectral domain Green's function of M parameter points and K integral sampling points, the second matrix/> The term consists of the calculation result of the product of the Bessel function and the integral weight coefficient.
可选地,在本申请的一个实施例中,索莫菲积分包括头部积分和尾部积分,头部积分和尾部积分均为分段积分,头部积分表示为:Optionally, in one embodiment of the present application, the Somofi integral includes a head integral and a tail integral, both of which are piecewise integrals, and the head integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为场点和源点间的横向距离,/>为第一类Bessel函数,/>为Bessel函数的阶数,A为长轴,/>和/>分别为权重和采样,/>表示第i个采样点沿椭圆路径的SI积分结果,N表示积分采样点个数;in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the lateral distance between the field point and the source point, /> is the first kind Bessel function, /> is the order of the Bessel function, A is the major axis, /> and/> are weights and samples respectively,/> represents the SI integral result of the i-th sampling point along the elliptical path, and N represents the number of integral sampling points;
尾部积分表示为:The tail integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为第一类Bessel函数,/>为Bessel函数的阶数,/>为场点和源点间的横向距离,A为长轴,/>和/>分别表示权重和采样点,L表示尾部积分区间划分子区间的采样点个数,N表示子积分区间采样点个数,/>表示尾部积分子区间Euler变换后的计算结果。in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the first kind Bessel function, /> is the order of the Bessel function, /> is the lateral distance between the field point and the source point, A is the major axis, /> and/> Represent weights and sampling points respectively, L represents the number of sampling points of the sub-intervals of the tail integral interval, N represents the number of sampling points of the sub-integral intervals, /> Represents the calculation result after Euler transformation of the tail integral subinterval.
可选地,在本申请的一个实施例中,通过公式推导简化Euler外推方法的实现方法,在计算时第K次递归后的SI尾部积分表示为:Optionally, in one embodiment of the present application, the implementation method of simplifying the Euler extrapolation method is derived by formula, and the SI tail integral after the Kth recursion during calculation is expressed as:
其中,N表示尾部积分划分子区间的采样点个数,表示相应的分段积分值的系数,/>表示尾部积分各分段积分值。Where N represents the number of sampling points of the tail integral division subinterval, The coefficients representing the corresponding piecewise integral values, /> Indicates the integral value of each segment of the tail integral.
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the present application will be given in part in the description below, and in part will become apparent from the description below, or will be learned through the practice of the present application.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1为本申请实施例一所提供的一种基于GPU的分层媒质格林函数快速计算方法的流程示意图;FIG1 is a schematic diagram of a process flow of a GPU-based fast calculation method for layered medium Green's function provided in Embodiment 1 of the present application;
图2为本申请实施例的平面层状介质的几何表示图;FIG2 is a geometric representation of a planar layered medium according to an embodiment of the present application;
图3为本申请实施例的Sommerfeld数值积分路径图;FIG3 is a Sommerfeld numerical integration path diagram of an embodiment of the present application;
图4为本申请实施例的使用张量核计算矩阵乘法示例图;FIG4 is an example diagram of using tensor cores to calculate matrix multiplication according to an embodiment of the present application;
图5为本申请实施例的多参数的Sommerfeld积分一次性并行计算方案示意图;FIG5 is a schematic diagram of a one-time parallel calculation scheme of a multi-parameter Sommerfeld integral according to an embodiment of the present application;
图6为本申请实施例的三层分层媒质模型示意图;FIG6 is a schematic diagram of a three-layered medium model according to an embodiment of the present application;
图7为本申请实施例的三层媒质在不同频率下层状格林函数与/>分量的幅值和相对误差的第一示例图;FIG. 7 is a diagram of the layered Green's function of the three-layer medium at different frequencies in an embodiment of the present application. With/> A first example graph of the magnitudes and relative errors of the components;
图8为本申请实施例的三层媒质在不同频率下层状格林函数与/>分量的幅值和相对误差的第二示例图;FIG8 is a layered Green's function of a three-layer medium at different frequencies in an embodiment of the present application. With/> A second example graph of the magnitude and relative error of the components;
图9为本申请实施例的三层媒质在不同频率下层状格林函数与/>分量的幅值和相对误差的第三示例图;FIG. 9 is a diagram of the layered Green's function of the three-layer medium at different frequencies in an embodiment of the present application. With/> A third example graph of the magnitudes and relative errors of the components;
图10为本申请实施例的三层媒质在不同频率下层状格林函数与/>分量的幅值和相对误差的第四示例图;FIG. 10 is a diagram of the layered Green's function of the three-layer medium at different frequencies in an embodiment of the present application. With/> A fourth example graph of the magnitudes and relative errors of the components;
图11为本申请实施例的三层媒质在不同介电常数下的层状格林函数以及/>分量的幅值和相对误差的第一示例图;FIG. 11 is a layered Green's function of a three-layer medium under different dielectric constants in an embodiment of the present application. and/> A first example graph of the magnitudes and relative errors of the components;
图12为本申请实施例的三层媒质在不同介电常数下的层状格林函数以及/>分量的幅值和相对误差的第二示例图;FIG. 12 is a layered Green's function of a three-layer medium under different dielectric constants in an embodiment of the present application. and/> A second example graph of the magnitude and relative error of the components;
图13为本申请实施例的三层媒质在不同介电常数下的层状格林函数以及/>分量的幅值和相对误差的第三示例图;FIG. 13 is a layered Green's function of a three-layer medium under different dielectric constants according to an embodiment of the present application. and/> A third example graph of the magnitudes and relative errors of the components;
图14为本申请实施例的三层媒质在不同介电常数下的层状格林函数以及/>分量的幅值和相对误差的第四示例图;FIG. 14 is a layered Green's function of a three-layer medium under different dielectric constants according to an embodiment of the present application. and/> A fourth example graph of the magnitudes and relative errors of the components;
图15为本申请实施例的三层媒质在不同媒质厚度下的层状格林函数以及/>分量的幅值和相对误差的第一示例图;FIG. 15 is a layered Green's function of a three-layer medium at different medium thicknesses according to an embodiment of the present application. and/> A first example graph of the magnitudes and relative errors of the components;
图16为本申请实施例的三层媒质在不同媒质厚度下的层状格林函数以及/>分量的幅值和相对误差的第二示例图;FIG. 16 is a layered Green's function of a three-layer medium at different medium thicknesses according to an embodiment of the present application. and/> A second example graph of the magnitude and relative error of the components;
图17为本申请实施例的三层媒质在不同媒质厚度下的层状格林函数以及/>分量的幅值和相对误差的第三示例图;FIG. 17 is a layered Green's function of a three-layer medium at different medium thicknesses according to an embodiment of the present application. and/> A third example graph of the magnitudes and relative errors of the components;
图18为本申请实施例的三层媒质在不同媒质厚度下的层状格林函数以及/>分量的幅值和相对误差的第四示例图;FIG. 18 is a layered Green's function of a three-layer medium at different medium thicknesses according to an embodiment of the present application. and/> A fourth example graph of the magnitudes and relative errors of the components;
图19为本申请实施例提供的一种基于GPU的分层媒质格林函数快速计算装置的结构示意图。FIG19 is a schematic diagram of the structure of a GPU-based layered medium Green's function fast calculation device provided in an embodiment of the present application.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。The embodiments of the present application are described in detail below, and examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are intended to be used to explain the present application, and should not be construed as limiting the present application.
在相关研究中,很多种不同的积分方程形式和格林函数被用来分析一些具体的分层媒质问题。例如电场积分方程(Electric field integral equation,EFIE),混合势积分方程(Mixed Potential integral equation,MPIE)等。MPIE已成功用于平面微带天线的计算, 并取得了较高的计算精度。求解分层媒质中的MPIE最重要的一步是格林函数的计算。从麦克斯韦方程出发,将空域内各场量通过傅里叶变换转换到谱域,通过推导,发现其结果具有传输线方程相同的形式,于是将分层媒质等效为传输线结构,利用传输线理论推导出了谱域格林函数的一般表达式,再通过Sommerfeld积分的计算可得到分层媒质中空域形式的格林函数。使用基于MPIE的矩量法分析分层媒质的电磁特性时,首先将格林函数与混合位积分方程相结合,然后采用空域格林函数,选取RWG基函数展开电流分布使用伽略金方法匹配得到矩阵方程,最后求得矩阵方程得到电流分布,从而得到目标在分层媒质中的电磁响应。最常用到的MPIE可表示为:In related research, many different integral equation forms and Green's functions are used to analyze some specific layered media problems. For example, the electric field integral equation (EFIE), the mixed potential integral equation (MPIE), etc. MPIE has been successfully used in the calculation of planar microstrip antennas and has achieved high calculation accuracy. The most important step in solving MPIE in layered media is the calculation of Green's function. Starting from Maxwell's equations, the various field quantities in the spatial domain are converted to the spectral domain through Fourier transform. Through derivation, it is found that the result has the same form as the transmission line equation. Therefore, the layered medium is equivalent to a transmission line structure. The general expression of the spectral domain Green's function is derived using the transmission line theory. Then, the spatial form of the Green's function in the layered medium can be obtained by calculating the Sommerfeld integral. When using the moment method based on MPIE to analyze the electromagnetic characteristics of layered media, the Green's function is first combined with the mixed potential integral equation, and then the spatial domain Green's function is used. The RWG basis function is selected to expand the current distribution and the Galerkin method is used to match the matrix equation. Finally, the matrix equation is obtained to obtain the current distribution, thereby obtaining the electromagnetic response of the target in the layered medium. The most commonly used MPIE can be expressed as:
其中,、/>分别表示磁矢量位和电标量位格林函数,/>为已知电流源,/>和/>分别表示角频率和自由空间中的介电常数和磁导率,/>的矩阵形式表示为:in, 、/> denote the magnetic vector potential and electric scalar potential Green's functions, respectively, is a known current source, /> and/> denote the angular frequency and the permittivity and permeability in free space, respectively,/> The matrix form is expressed as:
图1为本申请实施例一所提供的一种基于GPU的分层媒质格林函数快速计算方法的流程示意图。FIG1 is a flow chart of a method for fast calculation of Green's function of layered media based on a GPU provided in the first embodiment of the present application.
如图1所示,该基于GPU的分层媒质格林函数快速计算方法包括以下步骤:As shown in FIG1 , the GPU-based layered medium Green's function fast calculation method includes the following steps:
步骤101,初始化GPU的三维网格和每个线程块的线程数;Step 101, initializing the three-dimensional grid of the GPU and the number of threads in each thread block;
步骤102,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,并将矩阵的项的计算任务均匀分配到各个线程块中并行执行,一次得到多个参数点、多个空间点的SI计算结果,其中,SI计算结果包括SI头部积分结果和尾部积分结果,在每个线程块中的计算过程包括:Step 102, fill the matrix with SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU, generalize the numerical integration of SI to matrix product, and evenly distribute the calculation tasks of the items of the matrix to each thread block for parallel execution, so as to obtain SI calculation results of multiple parameter points and multiple spatial points at one time, wherein the SI calculation results include SI head integration results and tail integration results, and the calculation process in each thread block includes:
利用CUDA矩阵运算单元Tensor Core执行矩阵乘积,计算头部和尾部的分段积分,并在尾部积分计算时,对分段积分结果采用Euler变换加速收敛。The CUDA matrix operation unit Tensor Core is used to perform matrix multiplication, calculate the head and tail piecewise integrals, and when calculating the tail integral, the Euler transform is used to accelerate the convergence of the piecewise integral results.
在一些实施例中,还可以获取GPU包含的其他分层媒质结构参数的多取值点,通过多取值点、多个空间点的SI的计算任务填充矩阵,本实施例能够实现分层媒质结构参数的快速扫描。In some embodiments, multiple value points of other layered media structure parameters contained in the GPU can also be obtained. By filling the matrix with SI calculation tasks of multiple value points and multiple spatial points, this embodiment can achieve rapid scanning of layered media structure parameters.
在一些实施例中,利用CUDA中的Tensor Core计算矩阵乘积,相比于数值积分计算极大的提高了计算效率。In some embodiments, the matrix product is calculated using Tensor Core in CUDA, which greatly improves the computational efficiency compared to numerical integration calculations.
在一些实施例中,在每个线程块进行计算时,先计算头部积分在计算尾部积分,Sommerfeld积分结果是二者的和,二者是串行的过程,头部和尾部分段积分计算过程相似,尾部积分比头部积分多了一步对分段积分结果计算Euler变换。In some embodiments, when each thread block performs calculations, the head integral is calculated first and then the tail integral. The Sommerfeld integral result is the sum of the two. The two are serial processes. The calculation processes of the head and tail segmental integrals are similar. The tail integral has one more step than the head integral to calculate the Euler transform of the segmental integral result.
在一些实施例中,在尾部积分计算中,对分段积分结果一般需要采用外推算法来加速收敛,常用的外推算法包括Euler变换、平均加权变换、Levin变换、Shanks 变换等。In some embodiments, in the tail integral calculation, an extrapolation algorithm is generally required to accelerate the convergence of the piecewise integral results. Commonly used extrapolation algorithms include Euler transform, average weighted transform, Levin transform, Shanks transform, and the like.
本申请实施例的基于GPU的分层媒质格林函数快速计算方法,通过将扫参过程中谱域格林函数与贝塞尔函数进行重复利用以优化计算架构,将索莫菲积分转化为两矩阵相乘的形式,利用GPU强大的并行计算能力以及特殊的矩阵运算单元Tensor Core,以及推导Euler变换表达式加速尾部积分实现了一次计算多个频率或者多个平面分层媒质参数时的索莫菲积分并行方案。The GPU-based fast calculation method of layered media Green's function in the embodiment of the present application optimizes the calculation architecture by reusing the spectral domain Green's function and Bessel function in the scanning process, converts the Somofi integral into the form of two matrix multiplications, and utilizes the powerful parallel computing capability of the GPU and the special matrix operation unit Tensor Core, as well as the derivation of the Euler transform expression to accelerate the tail integral to realize a parallel solution for the Somofi integral when calculating multiple frequencies or multiple plane layered media parameters at one time.
可选地,在本申请的一个实施例中,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,包括:Optionally, in one embodiment of the present application, the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU are used to fill the matrix, and the numerical integration of SI is generalized to matrix product, including:
设定初始化的GPU包含个参数点、/>个空间点的索莫菲积分的计算任务,确定每个线程块计算M个参数点、N个空间点的SI;Setting the initialization GPU includes parameter points, /> The calculation task of the Somofi integral of the spatial points determines that each thread block calculates the SI of M parameter points and N spatial points;
将每个线程块计算的SI排列在M×N矩阵中,每列表示不同参数点的SI,每行表示具有不同空间点的SI,使得SI的数值积分推广为矩阵乘积,并得到第一矩阵和第二矩阵。The SI calculated by each thread block is arranged in an M×N matrix, where each column represents the SI with a different parameter point and each row represents the SI with a different spatial point, so that the numerical integration of SI is generalized to matrix product, and a first matrix and a second matrix are obtained.
可选地,在本申请的一个实施例中,矩阵乘积为,/>,第一矩阵/>为/>矩阵,第一矩阵由M个参数点和K个积分采样点的谱域格林函数组成,第二矩阵/>的项由贝塞尔函数和积分权重系数的乘积的计算结果组成。Optionally, in one embodiment of the present application, the matrix product is ,/> , the first matrix/> For/> Matrix, the first matrix is composed of the spectral domain Green's function of M parameter points and K integral sampling points, the second matrix/> The term consists of the calculation result of the product of the Bessel function and the integral weight coefficient.
可选地,在本申请的一个实施例中,索莫菲积分包括头部积分和尾部积分,头部积分和尾部积分均为分段积分,头部积分表示为:Optionally, in one embodiment of the present application, the Somofi integral includes a head integral and a tail integral, both of which are piecewise integrals, and the head integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为场点和源点间的横向距离,/>为第一类Bessel函数,/>为Bessel函数的阶数,A为长轴,/>和/>分别为权重和采样,/>表示第i个采样点沿椭圆路径的SI积分结果,N表示积分采样点个数;in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the lateral distance between the field point and the source point, /> is the first kind Bessel function, /> is the order of the Bessel function, A is the major axis, /> and/> are weights and samples respectively,/> represents the SI integral result of the i-th sampling point along the elliptical path, and N represents the number of integral sampling points;
尾部积分表示为:The tail integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为第一类Bessel函数,/>为Bessel函数的阶数,/>为场点和源点间的横向距离,A为长轴,/>和/>分别表示权重和采样点,L表示尾部积分区间划分子区间的采样点个数,N表示子积分区间采样点个数,/>表示尾部积分子区间Euler变换后的计算结果。in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the first kind Bessel function, /> is the order of the Bessel function, /> is the lateral distance between the field point and the source point, A is the major axis, /> and/> Represent weights and sampling points respectively, L represents the number of sampling points of the sub-intervals of the tail integral interval, N represents the number of sampling points of the sub-integral intervals, /> Represents the calculation result after Euler transformation of the tail integral subinterval.
可选地,在本申请的一个实施例中,通过公式推导简化了Euler外推方法的实现方法,在计算时第K次递归后的SI尾部积分表示为:Optionally, in one embodiment of the present application, the implementation method of the Euler extrapolation method is simplified by formula derivation, and the SI tail integral after the Kth recursion during calculation is expressed as:
其中,N表示尾部积分划分子区间的采样点个数,表示相应的分段积分值的系数,/>表示尾部积分各分段积分值。Where N represents the number of sampling points of the tail integral division subinterval, The coefficients representing the corresponding piecewise integral values, /> Indicates the integral value of each segment of the tail integral.
下面通过具体实施例对本申请的基于GPU的分层媒质格林函数快速计算方法进行详细描述。The GPU-based layered medium Green's function fast calculation method of the present application is described in detail below through specific embodiments.
平面分层媒质空间,指介质的不连续仅仅出现在三维空间的某一个方向上,在与其正交的另外两个方向上介质不发生变化,通常可以用如图2所示的分层媒质表示,介质分界面位于,第/>层的相对介电常数和磁导率分别为/>。顶层介质为介电常数和磁导率为/>的空气或者真空,假设时谐因子为/>。Planar layered medium space refers to the discontinuity of the medium only appearing in one direction of the three-dimensional space, and the medium does not change in the other two directions orthogonal to it. It can usually be represented by a layered medium as shown in Figure 2. The medium interface is located at , No./> The relative permittivity and magnetic permeability of the layers are respectively/> The dielectric constant and permeability of the top layer are / > of air or vacuum, assuming the time harmonic factor is/> .
在谱域中,利用传输线方程求解谱域格林函数,然后通过二维逆傅里叶变换便可以获得空域格林函数,Sommerfeld积分的形式表示如下:In the spectral domain, the transmission line equation is used to solve the spectral domain Green's function, and then the spatial domain Green's function can be obtained through a two-dimensional inverse Fourier transform. The form of the Sommerfeld integral is expressed as follows:
(1) (1)
其中,表示谱域格林函数,/>,/>分别是场点和源点的垂直坐标;/>为基于场源位置(/>和/>)通过传输线理论求得的横向波数;/>为第一类Bessel函数,/>为Bessel函数的阶数;为场点和源点间的横向距离;in, represents the spectral domain Green's function, /> ,/> are the vertical coordinates of the field point and the source point respectively;/> Based on the source location (/> and/> ) The transverse wave number obtained by transmission line theory; /> is the first kind Bessel function, /> is the order of the Bessel function; is the lateral distance between the field point and the source point;
如图3所示,将积分路径分为头部(Head)积分和尾部(Tail)积分/>。其中,头部积分路径采用半椭圆路径。长轴/>一般大于分层媒质中波数的最大值,选择,其中/>。选择/>取值如下:As shown in Figure 3, the integration path is divided into head integration and Tail integral/> . The head integral path adopts a semi-elliptical path. The major axis/> Generally greater than the maximum value of the wave number in the layered medium, select , where/> . Select /> The values are as follows:
(2) (2)
其中,表示在自由空间中波数,采用实轴积分路径计算尾部积分,则式(1)中的积分表达式可变形为:in, represents the wave number in free space, and the real axis integration path is used to calculate the tail integral. Then the integral expression in equation (1) can be transformed into:
(3) (3)
SI头部积分在复平面内采用椭圆积分路径,为了保证结果的准确性,采用分段积分的策略。采用GAUSS-KRONROD积分规则,头部积分可表述为:SI head integral uses the elliptic integral path in the complex plane. In order to ensure the accuracy of the result, the strategy of piecewise integration is adopted. Using the GAUSS-KRONROD integration rule, the head integral can be expressed as:
其中,和/>分别为权重和采样;/>表示第/>个采样点沿椭圆路径的Sommerfeld积分结果;/>;in, and/> are weight and sampling respectively;/> Indicates the first/> The Sommerfeld integral result of the sampling points along the elliptical path;/> ;
Sommerfeld尾部积分是实轴上的半无限积分,为了加快收敛速度,提高计算效率,采用经典的Euler变换技术将截断无穷积分简化为L段分段积分。分段积分可以写成:The Sommerfeld tail integral is a semi-infinite integral on the real axis. In order to speed up the convergence and improve the computational efficiency, the classic Euler transformation technique is used to simplify the truncated infinite integral into an L-segment piecewise integral. The piecewise integral can be written as:
(5) (5)
其中,和/>分别表示权重和采样点。段区间/>为第一类Bessel函数/>两个相邻零点间隔。in, and/> Represent weights and sampling points respectively. Segment interval/> is a Bessel function of the first kind/> The interval between two adjacent zero points.
(1)优化计算架构加速(1) Optimize computing architecture to accelerate
对于式(1)所示的索莫菲积分,可以看出被积函数是谱域格林函数与贝塞尔函数相乘的形式。其中,谱域格林函数是一个与频率以及分层媒质参数有关,场源横向距离无关的函数,而贝塞尔函数则仅与场源横向距离有关。这意味着在索末菲积分的多参数计算中,同一场源,不同频率或者不同分层媒质参数时,仅需要计算一次贝塞尔函数。同样,相同频率或者相同分层媒质参数、不同场源时,也仅需要计算一次谱域格林函数。通过复用谱域格林函数和贝塞尔函数可以极大的提高索莫菲积分多参数的计算效率。For the Sommerfeld integral shown in equation (1), it can be seen that the integrand is the product of the spectral domain Green's function and the Bessel function. Among them, the spectral domain Green's function is a function related to the frequency and the layered medium parameters, and is independent of the lateral distance of the field source, while the Bessel function is only related to the lateral distance of the field source. This means that in the multi-parameter calculation of the Sommerfeld integral, for the same field source, different frequencies or different layered medium parameters, only the Bessel function needs to be calculated once. Similarly, for the same frequency or the same layered medium parameters, different field sources, only the spectral domain Green's function needs to be calculated once. By reusing the spectral domain Green's function and the Bessel function, the calculation efficiency of the multi-parameter Sommerfeld integral can be greatly improved.
为了量化这种加速性能,以频率扫描为例做出以下假设:To quantify this speedup, we take a frequency sweep as an example and make the following assumptions:
1)表示计算一个空间点中一个频率点谱域格林函数/>的计算时间。1) Indicates the calculation of the spectral domain Green's function of a frequency point in a spatial point/> Calculation time.
2)表示计算一个空间点第一类贝塞尔函数/>的计算时间。2) Indicates the calculation of the first kind of Bessel function of a space point/> Calculation time.
3)表示计算上述计算结果加权求和/>的时间,/>表示/>积分点的总数。显然,/>通常远低于/>或/>。3) Indicates the weighted sum of the above calculation results/> time,/> Indicates/> The total number of integral points. Obviously, /> Usually much lower than/> or/> .
计算个参数点、/>个空间点的索莫菲积分总时间可以写为/>,使用常规单参数点多空间点SI的并行计算方案,总计算时间/>可以写成:calculate parameter points, /> The total time of the Somofi integration of spatial points can be written as/> , using the conventional single parameter point multi-space point SI parallel computing scheme, the total computing time/> can be written as:
(6) (6)
如果有效地使用中间数据,使用本申请所提出的架构优化方法,则计算时间变为If the intermediate data is used effectively and the architecture optimization method proposed in this application is used, the calculation time becomes
(7) (7)
因此,与单参数点SI循环计算的方案相比,理论加速比为:Therefore, compared with the scheme of single parameter point SI loop calculation, the theoretical speedup ratio is:
(8) (8)
显然,加速比随着参数点数目和空间点数目/>的增加而增加。在实际应用中,和/>通常都是很大的一个数目,特别是空间点数目/>。这意味着,通过复用谱域格林函数与贝塞尔函数能够实现很高的加速比。Obviously, the speedup increases with the number of parameter points. and the number of spatial points/> In practical applications, and/> Usually it is a very large number, especially the number of spatial points/> This means that a very high speedup ratio can be achieved by multiplexing the spectral domain Green's function and the Bessel function.
(2)矩阵运算加速积分(2) Matrix operations to accelerate integration
在 GPU 中,个参数点、/>个空间点的索莫菲积分的计算任务被均匀分配到各个线程块中,一个线程块计算𝑀个参数点、𝑁个空间点的索莫菲积分,总共/>个线程块同时执行积分计算。为了计算具有/>个空间距离和/>个频率点的SI频率扫描,可以将SI排列在M×N矩阵中,其中,每列表示不同参数点的SI,每行表示具有不同空间距离点的SI。通过这样做,可以将SI(包括SI头部积分和尾部积分)的数值积分推广为矩阵乘积,可一次性得到多个参数点、多个空间点的SI计算结果:In GPU, parameter points, /> The calculation task of the Somofi integral of 𝑀 spatial points is evenly distributed to each thread block. One thread block calculates the Somofi integral of 𝑀 parameter points and 𝑁 spatial points, a total of/> Thread blocks perform integral calculations simultaneously. In order to calculate the integral with/> The spatial distance and /> The SI frequency scan of frequency points can arrange the SI in an M×N matrix, where each column represents the SI of a different parameter point and each row represents the SI of a point with a different spatial distance. By doing so, the numerical integration of SI (including SI head integral and tail integral) can be generalized to matrix product, and the SI calculation results of multiple parameter points and multiple spatial points can be obtained at one time:
(9) (9)
在式(9)中,矩阵/>由M个参数点和K个积分采样点的谱域格林函数组成,而矩阵/>中的项是通过贝塞尔函数和积分权重系数的乘积的计算结果组成。In formula (9), Matrix/> It consists of the spectral domain Green's function of M parameter points and K integral sampling points, and the matrix/> The terms in are composed of the calculation results of the product of Bessel functions and integral weight coefficients.
在现代Nvidia GPU中,有两个硬件单元可用于执行矩阵乘积(10):CUDA核心(CUDAcore)和张量核心(Tensor core)。CUDA核心是GPU上的基本处理单元,可以执行简单的浮点运算,并针对并行计算工作负载进行了优化。张量核是CUDA中较新的处理单元,专门设计用于加速深度学习和人工智能应用中广泛使用的张量运算。与用于通用并行工作负载的CUDA内核相比,张量核更有效地执行矩阵乘法或加法。在CUDA12.2版本中,Tensor Core一次只能执行一个与另一个/>的矩阵乘法。因此,对于每个线程块内大小为/>的/>与大小为/>的/>的乘积需要进行分块矩阵相乘,如图4所示,将矩阵/>和/>划分为小矩阵,然后使用张量核分别计算这些小矩阵的乘积,然后将它们组合在一起以获得矩阵/>。In modern Nvidia GPUs, there are two hardware units that can be used to perform matrix multiplications (10): CUDA cores and tensor cores. CUDA cores are the basic processing units on the GPU that can perform simple floating-point operations and are optimized for parallel computing workloads. Tensor cores are newer processing units in CUDA that are specifically designed to accelerate tensor operations widely used in deep learning and artificial intelligence applications. Tensor cores perform matrix multiplications or additions more efficiently than CUDA cores for general-purpose parallel workloads. As of CUDA 12.2, Tensor Cores can only perform one at a time. With another /> Therefore, for each thread block of size/> /> With size /> /> The product needs to be multiplied by blocks, as shown in Figure 4. and/> Divide into small matrices, then use tensor cores to calculate the product of these small matrices separately, and then combine them together to get the matrix/> .
(3)Euler外推加速(3) Euler extrapolation acceleration
在SI的尾部积分计算中,需要Euler外推算法来加速无穷积分的收敛。如算法1所述,Euler变换的输入是L段的积分值序列,并存储在线程块的共享内存中,具体地,In the calculation of the tail integral of SI, the Euler extrapolation algorithm is needed to accelerate the convergence of the infinite integral. As described in Algorithm 1, the input of the Euler transform is an L-segment integral value sequence stored in the shared memory of the thread block. Specifically,
算法1的输入为:分段积分值:,算法1的输出为:尾部积分值:/>,The input of Algorithm 1 is: Segmented integral value: , the output of Algorithm 1 is: Tail integral value: /> ,
算法1的过程为:;k=0;while/>,,End;/>;return/>。The process of Algorithm 1 is: ; k = 0; while/> , , End; /> ;return/> .
然而,对于多参数Sommerfeld尾部积分的计算,共享内存难以满足Euler变换所需的存储空间以及对频繁的读写共享内存操作可能会引发bank冲突。为了解决上述问题,推导了Euler变换的公式,并简化了数值计算。根据算法1,第k次递归后的SI尾可以明确地写为:However, for the calculation of multi-parameter Sommerfeld tail integrals, shared memory is difficult to meet the storage space required for Euler transform, and frequent read and write operations of shared memory may cause bank conflicts. In order to solve the above problems, the formula of Euler transform is derived and the numerical calculation is simplified. According to Algorithm 1, the SI tail after the kth recursion can be explicitly written as:
(10) (10)
符合杨辉三角形的排列规则, It conforms to the arrangement rules of Pascal's triangle.
(11) (11)
由式(10)和式(11)可得:From formula (10) and formula (11), we can get:
(12) (12)
通过这种方式,避免了算法1中的自适应循环,并且解除了分段积分在线程块的共享内存的占用从而规避了由于所占内存较多而导致的计算效率下降的问题。In this way, the adaptive loop in Algorithm 1 is avoided and the piecewise integration is relieved. The occupancy of shared memory in thread blocks avoids the problem of decreased computing efficiency due to large memory usage.
(4)多参数的Sommerfeld积分一次性并行计算方案如图5所示,具体如下:(4) The one-time parallel calculation scheme of the multi-parameter Sommerfeld integral is shown in Figure 5. The details are as follows:
1)线程层次配置初始化;初始化GPU的三维网格为(32,32,1);每个线程块的线程数为(32,1,1)1) Initialize the thread hierarchy configuration; initialize the GPU's 3D grid to (32,32,1); the number of threads in each thread block is (32,1,1)
2)填充矩阵和/>;矩阵/>和/>中项的计算被总共1024个线程块平均划分,并且每个块中的计算任务分配32个线程并行执行。填充矩阵并存储在寄存器中,2) Fill the matrix and/> ; Matrix/> and/> The calculation of the middle term is evenly divided into a total of 1024 thread blocks, and the calculation tasks in each block are assigned to 32 threads for parallel execution. The matrix is filled and stored in the register,
3)矩阵相乘计算头部尾部分段积分;通过使用CUDA核和张量核来执行矩阵乘积一次性得到多个参数点、多个空间点的SI计算结果。3) Calculate the head and tail segment integrals by matrix multiplication; use CUDA cores and tensor cores to perform matrix multiplication to obtain SI calculation results for multiple parameter points and multiple spatial points at one time.
4)在尾部积分计算中,对分段积分结果继续计算Euler变换加速收敛;头部积分计算则直接将得到的矩阵存储全局存储器中;4) In the tail integral calculation, the Euler transform is continued to be calculated for the piecewise integral result to accelerate convergence; the head integral calculation directly converts the obtained matrix Stored in global memory;
5)GPU通过PCIe总线实现数据传输积分计算结果到CPU;5) GPU transfers the data integration calculation results to CPU through PCIe bus;
本申请致力于高性能计算分层媒质中的层状格林函数,提出了一种层状格林函数参数快速扫描方法。通过重复利用多参数中的被积函数,借助于GPU高效的并行计算能力,实现了多参数点或多分层媒质参数时层状格林函数的精确高效计算。This application is dedicated to high-performance calculation of layered Green's functions in layered media, and proposes a method for fast scanning of layered Green's function parameters. By reusing the integrand in multiple parameters and relying on the efficient parallel computing capability of GPU, accurate and efficient calculation of layered Green's functions for multiple parameter points or multiple layered media parameters is achieved.
本申请以计算三层微带结构层状格林函数的以及/>分量为例,分别对频率、分层媒质相对介电常数、分层媒质高度进行多参数计算来展示有益效果。算例模型分层媒质参数如图6所示。本申请采用的计算平台GPU1采用NVIDIA RTX 6000 Ada,GPU2采用NVIDIAGeForce RTX4090,CPU采用Intel Xeon(R) Platinum 8280。This application is to calculate the layered Green's function of the three-layer microstrip structure and/> Taking the component as an example, multi-parameter calculations are performed on the frequency, relative dielectric constant of the layered medium, and the height of the layered medium to demonstrate the beneficial effects. The layered medium parameters of the example model are shown in Figure 6. The computing platform used in this application is NVIDIA RTX 6000 Ada for GPU1, NVIDIA GeForce RTX4090 for GPU2, and Intel Xeon (R) Platinum 8280 for CPU.
(1)频率扫描(1) Frequency sweep
在频带2GHz到8GHz范围内,按照0.1GHz等间隔采样64个参数点(=64)。在自由空间与介质交界面上(/>=0,/>=0),场源横向距离/>范围内等间隔采样500,000个空间点/>=500,000。采用𝑀=64,𝑁=96的任务划分方案,计算结果和相对误差如图7、图8、图9、图10所示。图7、图8、图9、图10仅画出了64个频率中在2GHz、4GHz以及6GHz下三个参数点时的计算结果,从图7、图8、图9、图10中可以看出,/>与/>的相对误差均保持在以下,证明了本申请所提出的方法具有很好的计算精度。In the frequency band from 2 GHz to 8 GHz, 64 parameter points are sampled at equal intervals of 0.1 GHz ( =64). At the interface between free space and medium (/> =0,/> =0), lateral distance of source/> 500,000 spatial points are sampled at equal intervals within the range/> =500,000. Using the task division scheme of 𝑀=64,𝑁=96, the calculation results and relative errors are shown in Figures 7, 8, 9, and 10. Figures 7, 8, 9, and 10 only show the calculation results of three parameter points at 2GHz, 4GHz, and 6GHz among the 64 frequencies. It can be seen from Figures 7, 8, 9, and 10 that,/> With/> The relative error is kept at The following demonstrates that the method proposed in this application has good calculation accuracy.
为了展示本申请提出方法的时间优势,统计了两种方法计算上述64个参数点和500,000个空间点的计算时间,如表1所示。可以看出,在M=64,N=96任务划分方案下,本申请所提出的方法相对于利用OpenMP并行的常规方法在头部积分中实现了1914倍的加速,在尾部积分实现了1226倍加速,将总时间从24960.62秒缩短到不到14.84s,实现了超1600倍的加速。In order to demonstrate the time advantage of the method proposed in this application, the calculation time of the two methods for calculating the above 64 parameter points and 500,000 spatial points is statistically calculated, as shown in Table 1. It can be seen that under the task division scheme of M=64 and N=96, the method proposed in this application achieves a 1914-fold acceleration in the head integration and a 1226-fold acceleration in the tail integration compared with the conventional method using OpenMP parallelization, shortening the total time from 24960.62 seconds to less than 14.84 seconds, achieving an acceleration of more than 1600 times.
表1使用不同方法计算三层媒质的的时间(M=64,N=96)Table 1 Calculation of three-layer media using different methods Time (M=64,N=96)
(2)相对介电常数扫描(2) Relative dielectric constant scanning
在8GHz频率和媒质厚度为0.254mm时,在3.2-9.6范围内按照0.1间隔等间隔采样64个相对介电常数,在自由空间与介质交界面上(=0,/>=0),场源横向距离范围内等间隔采样500,000个空间点(/>=500,000),计算相应的/>以及分量,结果和相对误差如图11、图12、图13、图14所示。图11、图12、图13、图14仅画出了相对介电常数在3.6、6.6以及9.6时的计算结果,可以发现,本申请提出的方法相对误差均在之下,证明了本申请提出方法对介电常数的扫描仍然适用,此处计算时间与多参数点方法基本一致。At a frequency of 8 GHz and a medium thickness of 0.254 mm, 64 relative dielectric constants were sampled at intervals of 0.1 in the range of 3.2-9.6. =0,/> =0), the lateral distance of the field source 500,000 spatial points are sampled at equal intervals within the range (/> =500,000), calculate the corresponding /> as well as The components, results and relative errors are shown in Figures 11, 12, 13 and 14. Figures 11, 12, 13 and 14 only show the calculation results when the relative dielectric constant is 3.6, 6.6 and 9.6. It can be found that the relative errors of the method proposed in this application are all within It is proved that the method proposed in this application is still applicable to the scanning of dielectric constant, and the calculation time here is basically the same as that of the multi-parameter point method.
(3)分层媒质高度扫描(3) Layered media height scanning
在8GHz频率和媒质相对介电常数9.6时,在0.254mm-1.534mm范围内按照0.02mm等间隔采样64个媒质厚度,在自由空间与介质交界面上(=0,/>=0),场源横向距离范围内等间隔采样500,000个空间点(/>=500,000),计算相应的/>以及分量,结果和相对误差如图15、图16、图17、图18所示。图15、图16、图17、图18仅画出了媒质厚度在0.254mm、0.854mm以及1.454mm时的计算结果,可以看出,相对误差仍然保持在之下,证明了本申请提出方法对同时计算多个分层媒质厚度仍然适用,此处计算时间与多参数点方法也基本一致。At a frequency of 8 GHz and a relative dielectric constant of 9.6, 64 medium thicknesses were sampled at 0.02 mm intervals within the range of 0.254 mm to 1.534 mm. =0,/> =0), the lateral distance of the field source 500,000 spatial points are sampled at equal intervals within the range (/> =500,000), calculate the corresponding /> as well as The results and relative errors are shown in Figures 15, 16, 17 and 18. Figures 15, 16, 17 and 18 only show the calculation results when the medium thickness is 0.254mm, 0.854mm and 1.454mm. It can be seen that the relative error is still maintained at It is proved that the method proposed in this application is still applicable to the simultaneous calculation of the thickness of multiple layered media, and the calculation time here is basically consistent with the multi-parameter point method.
通过数值实验表明本申请提出的方法相对于传统方法相对误差均保持在以下,同时时间上有着大幅优势。与在高端CPU中采用OpenMP加速的单参数点方法相比,在三层媒质结构算例中,将时间从24960.62秒缩短到不到14.84s,实现了1682倍的加速。本申请所提出方法对微带电路和微波集成电路的仿真及参数优化将具有显著的加速效果。Numerical experiments show that the relative error of the method proposed in this application is kept within Compared with the single parameter point method accelerated by OpenMP in high-end CPU, the time in the three-layer medium structure calculation example was shortened from 24960.62 seconds to less than 14.84 seconds, achieving a 1682-fold acceleration. The method proposed in this application will have a significant acceleration effect on the simulation and parameter optimization of microstrip circuits and microwave integrated circuits.
为了实现上述实施例,本申请还提出一种基于GPU的分层媒质格林函数快速计算装置。In order to implement the above embodiments, the present application also proposes a GPU-based layered medium Green's function fast calculation device.
图19为本申请实施例提供的一种基于GPU的分层媒质格林函数快速计算装置的结构示意图。FIG19 is a schematic diagram of the structure of a GPU-based layered medium Green's function fast calculation device provided in an embodiment of the present application.
如图19所示,该基于GPU的分层媒质格林函数快速计算装置包括CPU、GPU,CPU包含内存,其中,As shown in FIG19 , the GPU-based layered medium Green's function fast calculation device includes a CPU and a GPU, wherein the CPU includes a memory, wherein:
CPU,用于对GPU进行初始化,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,并将推广后的数据存储到内存中;The CPU is used to initialize the GPU, fill the matrix with the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU, generalize the numerical integral of SI to matrix product, and store the generalized data in the memory;
GPU,用于将矩阵的项的计算任务均匀分配到各个线程块中并行执行,一次得到多个参数点、多个空间点的SI计算结果,并通过PCIe总线将积分计算结果传输到CPU的内存中,其中,SI计算结果包括SI头部积分结果和尾部积分结果,在每个线程块中的计算过程包括:The GPU is used to evenly distribute the calculation tasks of the matrix items to each thread block for parallel execution, obtain the SI calculation results of multiple parameter points and multiple spatial points at one time, and transmit the integral calculation results to the CPU memory through the PCIe bus. The SI calculation results include the SI head integral result and the tail integral result. The calculation process in each thread block includes:
利用CUDA矩阵运算单元Tensor Core执行矩阵乘积,计算头部和尾部的分段积分,并在尾部积分计算时,对分段积分结果采用Euler变换加速收敛。The CUDA matrix operation unit Tensor Core is used to perform matrix multiplication, calculate the head and tail piecewise integrals, and when calculating the tail integral, the Euler transform is used to accelerate the convergence of the piecewise integral results.
可选地,在本申请的一个实施例中,使用初始化后的GPU中包含的多个参数点、多个空间点的SI的计算任务填充矩阵,将SI的数值积分推广为矩阵乘积,包括:Optionally, in one embodiment of the present application, the SI calculation tasks of multiple parameter points and multiple spatial points contained in the initialized GPU are used to fill the matrix, and the numerical integration of SI is generalized to matrix product, including:
设定初始化的GPU包含个参数点、/>个空间点的索莫菲积分的计算任务,确定每个线程块计算M个参数点、N个空间点的SI;Setting the initialization GPU includes parameter points, /> The calculation task of the Somofi integral of the spatial points determines that each thread block calculates the SI of M parameter points and N spatial points;
将每个线程块计算的SI排列在M×N矩阵中,每列表示不同参数点的SI,每行表示具有不同空间点的SI,使得SI的数值积分推广为矩阵乘积,并得到第一矩阵和第二矩阵。The SI calculated by each thread block is arranged in an M×N matrix, where each column represents the SI with a different parameter point and each row represents the SI with a different spatial point, so that the numerical integration of SI is generalized to matrix product, and a first matrix and a second matrix are obtained.
可选地,在本申请的一个实施例中,矩阵乘积为,/>,第一矩阵/>为/>矩阵,第一矩阵由M个参数点和K个积分采样点的谱域格林函数组成,第二矩阵/>的项由贝塞尔函数和积分权重系数的乘积的计算结果组成。Optionally, in one embodiment of the present application, the matrix product is ,/> , the first matrix/> For/> Matrix, the first matrix is composed of the spectral domain Green's function of M parameter points and K integral sampling points, the second matrix/> The term consists of the calculation result of the product of the Bessel function and the integral weight coefficient.
可选地,在本申请的一个实施例中,索莫菲积分包括头部积分和尾部积分,头部积分和尾部积分均为分段积分,头部积分表示为:Optionally, in one embodiment of the present application, the Somofi integral includes a head integral and a tail integral, both of which are piecewise integrals, and the head integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为场点和源点间的横向距离,/>为第一类Bessel函数,/>为Bessel函数的阶数,A为长轴,/>和/>分别为权重和采样,/>表示第i个采样点沿椭圆路径的SI积分结果,N表示积分采样点个数;in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the lateral distance between the field point and the source point, /> is the first kind Bessel function, /> is the order of the Bessel function, A is the major axis, /> and/> are weights and samples respectively,/> represents the SI integral result of the i-th sampling point along the elliptical path, and N represents the number of integral sampling points;
尾部积分表示为:The tail integral is expressed as:
其中,表示谱域格林函数,/>、/>分别是场点和源点的垂直坐标,/>为基于场源位置通过传输线理论求得的横向波数,/>为第一类Bessel函数,/>为Bessel函数的阶数,/>为场点和源点间的横向距离,A为长轴,/>和/>分别表示权重和采样点,L表示尾部积分区间划分子区间的采样点个数,N表示子积分区间采样点个数,/>表示尾部积分子区间Euler变换后的计算结果。in, represents the spectral domain Green's function, /> 、/> are the vertical coordinates of the field point and the source point, respectively,/> is the transverse wave number obtained by transmission line theory based on the source position,/> is the first kind Bessel function, /> is the order of the Bessel function, /> is the lateral distance between the field point and the source point, A is the major axis, /> and/> Represent weights and sampling points respectively, L represents the number of sampling points of the sub-intervals of the tail integral interval, N represents the number of sampling points of the sub-integral intervals, /> Represents the calculation result after Euler transformation of the tail integral subinterval.
可选地,在本申请的一个实施例中,通过公式推导简化Euler外推方法的实现方法,在计算时第K次递归后的SI尾部积分表示为:Optionally, in one embodiment of the present application, the implementation method of simplifying the Euler extrapolation method is derived by formula, and the SI tail integral after the Kth recursion during calculation is expressed as:
其中,N表示尾部积分划分子区间的采样点个数,表示相应的分段积分值的系数,/>表示尾部积分各分段积分值。Where N represents the number of sampling points of the tail integral division subinterval, The coefficients representing the corresponding piecewise integral values, /> Indicates the integral value of each segment of the tail integral.
需要说明的是,前述对基于GPU的分层媒质格林函数快速计算方法实施例的解释说明也适用于该实施例的基于GPU的分层媒质格林函数快速计算装置,此处不再赘述。It should be noted that the above explanation of the embodiment of the GPU-based layered medium Green's function fast calculation method is also applicable to the GPU-based layered medium Green's function fast calculation device of this embodiment, and will not be repeated here.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、 “示例”、“具体示例”或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example" or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present application. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, unless they are contradictory.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of "plurality" is at least two, such as two, three, etc., unless otherwise clearly and specifically defined.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, fragment or portion of code comprising one or more executable instructions for implementing the steps of a custom logical function or process, and the scope of the preferred embodiments of the present application includes alternative implementations in which functions may not be performed in the order shown or discussed, including performing functions in a substantially simultaneous manner or in reverse order depending on the functions involved, which should be understood by technicians in the technical field to which the embodiments of the present application belong.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowchart or otherwise described herein, for example, can be considered as an ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by an instruction execution system, device or apparatus (such as a computer-based system, a system including a processor, or other system that can fetch instructions from an instruction execution system, device or apparatus and execute instructions), or in combination with these instruction execution systems, devices or apparatuses. For the purposes of this specification, "computer-readable medium" can be any device that can contain, store, communicate, propagate or transmit a program for use by an instruction execution system, device or apparatus, or in combination with these instruction execution systems, devices or apparatuses. More specific examples (non-exhaustive list) of computer-readable media include the following: an electrical connection with one or more wires (electronic device), a portable computer disk box (magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable and programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disk read-only memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program is printed, since the program may be obtained electronically, for example, by optically scanning the paper or other medium and then editing, interpreting or processing in other suitable ways if necessary, and then stored in a computer memory.
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that the various parts of the present application can be implemented by hardware, software, firmware or a combination thereof. In the above-mentioned embodiments, multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented by hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: a discrete logic circuit having a logic gate circuit for implementing a logic function for a data signal, a dedicated integrated circuit having a suitable combination of logic gate circuits, a programmable gate array (PGA), a field programmable gate array (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。A person skilled in the art may understand that all or part of the steps in the method for implementing the above-mentioned embodiment may be completed by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, which, when executed, includes one or a combination of the steps of the method embodiment.
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into a processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above-mentioned integrated module may be implemented in the form of hardware or in the form of a software functional module. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。The storage medium mentioned above may be a read-only memory, a disk or an optical disk, etc. Although the embodiments of the present application have been shown and described above, it can be understood that the above embodiments are exemplary and cannot be understood as limiting the present application. A person of ordinary skill in the art may change, modify, replace and modify the above embodiments within the scope of the present application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410503575.9A CN118069969B (en) | 2024-04-25 | 2024-04-25 | GPU-based fast calculation method and device for layered medium Green's function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410503575.9A CN118069969B (en) | 2024-04-25 | 2024-04-25 | GPU-based fast calculation method and device for layered medium Green's function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118069969A true CN118069969A (en) | 2024-05-24 |
CN118069969B CN118069969B (en) | 2024-07-09 |
Family
ID=91109463
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410503575.9A Active CN118069969B (en) | 2024-04-25 | 2024-04-25 | GPU-based fast calculation method and device for layered medium Green's function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118069969B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6051027A (en) * | 1997-08-01 | 2000-04-18 | Lucent Technologies | Efficient three dimensional extraction |
CN107368454A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating |
CN107368368A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method |
CN108984483A (en) * | 2018-07-13 | 2018-12-11 | 清华大学 | The electric system sparse matrix method for solving and system reset based on DAG and matrix |
CN114925317A (en) * | 2022-05-17 | 2022-08-19 | 北京智芯仿真科技有限公司 | Fast Hankel conversion method and device for integrated circuit |
-
2024
- 2024-04-25 CN CN202410503575.9A patent/CN118069969B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6051027A (en) * | 1997-08-01 | 2000-04-18 | Lucent Technologies | Efficient three dimensional extraction |
CN107368454A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating |
CN107368368A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method |
CN108984483A (en) * | 2018-07-13 | 2018-12-11 | 清华大学 | The electric system sparse matrix method for solving and system reset based on DAG and matrix |
CN114925317A (en) * | 2022-05-17 | 2022-08-19 | 北京智芯仿真科技有限公司 | Fast Hankel conversion method and device for integrated circuit |
Also Published As
Publication number | Publication date |
---|---|
CN118069969B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6348561B2 (en) | System and method for multi-core optimized recurrent neural networks | |
CN113950066A (en) | Method, system and device for offloading partial computing on a single server in a mobile edge environment | |
Ergul et al. | A hierarchical partitioning strategy for an efficient parallelization of the multilevel fast multipole algorithm | |
JP2021521515A (en) | Methods and accelerators for accelerating operations | |
EP3862928B1 (en) | Deep learning processing apparatus and method, device and storage medium | |
JP2021521516A (en) | Accelerators and systems for accelerating operations | |
CN111523642B (en) | Data reuse method, operation method and device and chip for convolution operation | |
CN112199636A (en) | Fast convolution method and device suitable for microprocessor | |
CN114580249B (en) | Multi-loop FDTD electromagnetic field simulation analysis method, system, equipment and medium | |
US8495120B2 (en) | Method for using a graphics processing unit for accelerated iterative and direct solutions to systems of linear equations | |
Struharik et al. | Conna–compressed cnn hardware accelerator | |
Dziekonski et al. | Communication and load balancing optimization for finite element electromagnetic simulations using multi-GPU workstation | |
Wu et al. | Skeletongcn: a simple yet effective accelerator for gcn training | |
CN118069969B (en) | GPU-based fast calculation method and device for layered medium Green's function | |
CN115994565A (en) | Hardware Implementation of the Discrete Fourier Correlation Transform | |
Fotyga et al. | Multilevel model order reduction with generalized compression of boundaries for 3-D FEM electromagnetic analysis | |
Peres et al. | Faster convolutional neural networks in low density fpgas using block pruning | |
Cabel et al. | Multi-GPU acceleration of a DGTD method for modeling human exposure to electromagnetic waves | |
Yu et al. | A novel DGTD method and engineering applications | |
CN110736970B (en) | Radar target rapid identification method based on ASIC machine learning processor | |
US20140316744A1 (en) | Assigning method, recording medium, information processing apparatus, and analysis system | |
Dziekonski et al. | GPU-accelerated finite element method | |
CN116306036B (en) | Load balancing parallel method and medium of space-time mixed discrete DGTD | |
Jinguji et al. | Weight sparseness for a feature-map-split-cnn toward low-cost embedded fpgas | |
Elumalai | Parallelization of vector fitting algorithm for GPU platforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |