CN103713938A - Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment - Google Patents
Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment Download PDFInfo
- Publication number
- CN103713938A CN103713938A CN201310695055.4A CN201310695055A CN103713938A CN 103713938 A CN103713938 A CN 103713938A CN 201310695055 A CN201310695055 A CN 201310695055A CN 103713938 A CN103713938 A CN 103713938A
- Authority
- CN
- China
- Prior art keywords
- gpu
- matrix
- data
- openmp
- host side
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 24
- 230000006870 function Effects 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 58
- 238000000354 decomposition reaction Methods 0.000 claims description 7
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under a virtual environment. The method includes the following steps that host end threads with the same number as GPUs are arranged at the host end through the Open MP, and each host end thread is in charge of controlling one GPU, distributing video memory for each thread on each device and starting a kernel function. Each thread is provided with data indicators of its private host end and device, data computing and data merging are conducted, data is copied to the private host ends through device end data indicators owned by the GPUs, and data merging is conducted at the private host ends. Compared with the prior art, multi-GPU cooperative computing is achieved under the virtual environment, a single task is quickened by using the multiple GPUs, and the method has theoretical and practical significance on super computing, cloud computing and grid computing based on a central processing unit (CPU)+GPU isomerism platform.
Description
Technical field
The present invention relates to the many GPU of single task under virtualized environment and calculate field, relate in particular to the collaborative computing method of the many GPU based on OpenMP under virtualized environment.
Background technology
The collaborative computing technique of existing many GPU is all based on physical machine, OpenMP is generally for CPU parallel computation, be used in the sdk of GPU Zhong Yejiu NVIDIA official and provided example, do not support the API that many GPU are complete, gVirtuS is comparatively ripe at present GPU virtualization solution, it has solved and under virtualized environment, has utilized GPU to carry out the problem of CUDA programming, but its solution is all for single GPU, many GPU are not studied, in a word, in prior art, task of can not utilize many GPU to be on a grand scale to data under virtualized environment is accelerated simultaneously.
Summary of the invention
The present invention has overcome the deficiencies in the prior art, and the collaborative computing method of the many GPU based on OpenMP under a kind of virtualized environment are provided.
For solving the problems of the technologies described above, the technical solution used in the present invention is:
The collaborative computing method of many GPU based on OpenMP under virtualized environment, comprise the following steps,
Step S01; in service end, dispose GPU virtual (gVirtuS) service end assembly; to in this locality, carry out in the parameter of client interception; in physical machine, carry out; physical machine is offered the host side thread identical with GPU number by OpenMP in host side; each host side thread is responsible for controlling a GPU, and by DLL (dynamic link library) function, No. ID device number with GPU of host side thread is corresponding, the object definition that GPU is calculated is N*N matrix; DLL (dynamic link library) function is cudaSetDevice (cpu_thread_id) function;
In prior art, under virtualized environment, not for the driver of GPU, step S01 passes to service end by concrete execution after service end is disposed GPU virtual (gVirtuS) service end assembly, after service end completes, result is passed to virtual machine;
Step S02 is each thread distribution video memory on each equipment, and each self-starting kernel function, compute matrix compound operation, and described video memory size is distributed according to calculative size of data, the kernel function of described kernel function for multiplying each other for compute matrix;
Step S03, data decomposition, each thread arranges the data pointer of own privately owned host side and equipment, privately owned host side thread points to the different reference position of original pointer, and by CUDA copy function, be that cudaMemcpy () function reaches the object that scale is divided, line number or columns that wherein N is matrix, the number that n is GPU from position N/n data scale of copy of own privately owned host side thread;
Step S04, data are calculated, and according to matrix multiple rule, the matrix of n GPU are calculated, and OpenMP synchronization module is controlled the output time of described GPU result of calculation, by ccudaDeviceSynchronize () function, synchronously exports data;
Step S05, data merge, in described step S04, the data of synchrodata copy back privately owned host side by the privately owned equipment end data pointer of GPU, in privately owned host side, carry out data merging, service end is communicated by letter to result of calculation is passed to client by socket after completing calculating, and described client is virtual machine.
In step S03, GPU number is 4, is respectively GPU0, GPU1, GPU2 and GPU3, and described matrix is A, B, C and D, and A*B+C*D data decomposition comprises the following steps:
1) GPU 0 carry out matrix A half be multiplied by B, GPU 1 carry out matrix A second half be multiplied by B, GPU 2 carry out Matrix C half be multiplied by D, GPU 3 carry out Matrix C second half be multiplied by D;
2) OpenMP synchronization module waits for that the phase multiplication of GPU0, GPU1, GPU2 and GPU3 all completes, and copies total data on GPU 0 to by cudaMemcpy () function the correctness of host side check results.
Data are calculated and are adopted matrix multiplication to calculate, comprise the following steps: A matrix is divided into 4 A/4(N/4*N), and multiply each other with matrix B respectively, obtain respectively 4 AB/4 matrixes, 4 AB/4 matrix combinations can obtain matrix of consequence AB, with the method, calculate respectively matrix multiple in GPU0, GPU1, GPU2 and GPU3, AB matrix is N*N matrix, and AB matrix is similarly N*N matrix.。
Compared with prior art, beneficial effect of the present invention has: the present invention realizes the collaborative calculating of many GPU under virtualized environment, utilize many GPU to accelerate single task, the supercomputing based on CPU+GPU heterogeneous platform, cloud computing and grid computing are had to great theory and realistic meaning.
Accompanying drawing explanation
Fig. 1 is method flow diagram of the present invention.
Fig. 2 is data decomposition algorithm schematic diagram of the present invention.
Fig. 3 is data computational algorithm schematic diagram of the present invention.
Fig. 4 is the computing time comparison diagram of A*B+C*D under many GPU and single GPU environment.
Embodiment
Below in conjunction with accompanying drawing, the present invention is further described.
As shown in Figure 1, the collaborative computing method of many GPU based on OpenMP under virtualized environment, comprise the following steps,
Step S01; in service end, dispose GPU virtual (gVirtuS) service end assembly; to in this locality, carry out in the parameter of client interception; in physical machine, carry out; physical machine is offered the host side thread identical with GPU number by OpenMP in host side; each host side thread is responsible for controlling a GPU, and by cudaSetDevice (cpu_thread_id) function, No. ID device number with GPU of host side thread is corresponding, the object definition that GPU is calculated is N*N matrix;
Step S02 is each thread distribution video memory on each equipment, and each self-starting kernel function, compute matrix compound operation, and described video memory size is distributed according to calculative size of data, the kernel function of described kernel function for multiplying each other for compute matrix;
Step S03, data decomposition, each thread arranges the data pointer of own privately owned host side and equipment, privately owned host side thread points to the different reference position of original pointer, and from position N/n data scale of copy of own privately owned host side thread, reach the object that scale is divided by cudaMemcpy () function, the line number that wherein N is matrix, the number that n is GPU;
Step S04, data are calculated, and according to matrix multiple rule, the matrix of n GPU are calculated, and OpenMP synchronization module is controlled the output time of described GPU result of calculation, by ccudaDeviceSynchronize () function, synchronously exports data;
Step S05, data merge, in described step S04, the data of synchrodata copy back privately owned host side by the privately owned equipment end data pointer of GPU, in privately owned host side, carry out data merging, service end is communicated by letter to result of calculation being passed to client (virtual machine) by socket after completing calculating.
As shown in Figure 2, in step S03, GPU number is 4, is respectively GPU0, GPU1, GPU2 and GPU3, and described matrix is A, B, C and D, and A*B+C*D data decomposition comprises the following steps:
1) GPU 0 carry out matrix A half be multiplied by B, GPU 1 carry out matrix A second half be multiplied by B, GPU 2 carry out Matrix C half be multiplied by D, GPU 3 carry out Matrix C second half be multiplied by D;
2) OpenMP synchronization module waits for that the phase multiplication of GPU0, GPU1, GPU2 and GPU3 all completes, and copies total data on GPU 0 to by cudaMemcpy () function the correctness of host side check results.
As shown in Figure 3, data are calculated and are adopted matrix multiplication to calculate, comprise the following steps: A matrix is divided into 4 A/4(N/4*N), and multiply each other with matrix B respectively, obtain respectively 4 AB/4 matrixes, 4 AB/4 matrix combinations can obtain matrix of consequence AB, with the method, calculate respectively matrix multiple in GPU0, GPU1, GPU2 and GPU3, and AB matrix is N*N matrix.
Fig. 4 is that A*B+C*D is at comparison diagram computing time under the many GPU of virtualized environment and single GPU environment, in diagram, when matrix exponent number increases, single GPU is index and rises operation time, consuming time long, the collaborative calculating of many GPU based on OpenMP under virtualized environment, when matrix exponent number increases, the approximate linear increase that is consuming time, efficiency is high.
The above is only the preferred embodiment of the present invention; be noted that for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.
Claims (10)
1. under virtualized environment, the many GPU based on OpenMP work in coordination with computing method, it is characterized in that: comprise the following steps,
Step S01; in service end, dispose GPU virtualization services end assembly; to in this locality, carry out in the parameter of client interception; in physical machine, carry out; physical machine is offered the host side thread identical with GPU number by OpenMP in host side; each host side thread is responsible for controlling a GPU, and by DLL (dynamic link library) function, No. ID device number with GPU of host side thread is corresponding, the object definition that GPU is calculated is N*N matrix computations;
Step S02 is each thread distribution video memory, and starts respectively kernel function on each equipment;
Step S03, data decomposition, each thread arranges the data pointer of own privately owned host side and equipment, privately owned host side thread points to the different reference position of original pointer, and from position N/n data of copy of own privately owned host side thread, reach the object that scale is divided by CUDA copy function, the line number that wherein N is matrix, the number that n is GPU;
Step S04, data are calculated, and according to matrix multiple rule, the matrix of n GPU are calculated, and OpenMP controls the output time of described GPU result of calculation, synchronously exports data;
Step S05, data merge, and in described step S04, the data of synchrodata copy back privately owned host side by the privately owned equipment end data pointer of GPU, in privately owned host side, carry out data merging, service end is communicated by letter by socket after completing calculating, and result of calculation is passed to client.
2. under virtualized environment according to claim 1, the many GPU based on OpenMP work in coordination with computing method, it is characterized in that: in described step S03, GPU number is 4, be respectively GPU0, GPU1, GPU2 and GPU3, described matrix is A, B, C and D, and A*B+C*D data decomposition comprises the following steps:
1) GPU 0 carry out matrix A half be multiplied by B, GPU 1 carry out matrix A second half be multiplied by B, GPU 2 carry out Matrix C half be multiplied by D, GPU 3 carry out Matrix C second half be multiplied by D;
2) OpenMP synchronization module waits for that the phase multiplication of GPU0, GPU1, GPU2 and GPU3 all completes, and copies total data on GPU 0 to the correctness of host side check results.
3. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 2, is characterized in that: the data that calculate in described GPU1, GPU2 and GPU3 copy GPU 0 to by cudaMemcpy () function.
4. according to the collaborative computing method of the many GPU based on OpenMP under the virtualized environment described in claim 1 or 2, it is characterized in that: described data are calculated and adopted matrix multiplication to calculate, comprise the following steps: A matrix is divided into 4 A/4(N/4*N), and multiply each other with matrix B respectively, obtain respectively 4 AB/4 matrixes, 4 AB/4 matrix combinations can obtain matrix of consequence AB, with the method, calculate respectively matrix multiple in GPU0, GPU1, GPU2 and GPU3.
5. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described DLL (dynamic link library) function is cudaSetDevice (cpu_thread_id) function.
6. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described step S04 synchronously completes by ccudaDeviceSynchronize () function.
7. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described step CUDA copy function is cudaMemcpy () function.
8. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described client is virtual machine.
9. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that, described video memory size is distributed according to calculative size of data.
10. the collaborative computing method of the many GPU based on OpenMP under virtualized environment according to claim 1, is characterized in that the kernel function of described kernel function for multiplying each other for compute matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310695055.4A CN103713938A (en) | 2013-12-17 | 2013-12-17 | Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310695055.4A CN103713938A (en) | 2013-12-17 | 2013-12-17 | Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103713938A true CN103713938A (en) | 2014-04-09 |
Family
ID=50406941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310695055.4A Pending CN103713938A (en) | 2013-12-17 | 2013-12-17 | Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103713938A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216783A (en) * | 2014-08-20 | 2014-12-17 | 上海交通大学 | Method for automatically managing and controlling virtual GPU (Graphics Processing Unit) resource in cloud gaming |
WO2016093428A1 (en) * | 2014-12-11 | 2016-06-16 | 한화테크윈 주식회사 | Mini integrated control device |
WO2016093427A1 (en) * | 2014-12-11 | 2016-06-16 | 한화테크윈 주식회사 | Mini integrated control device |
CN107797843A (en) * | 2016-09-02 | 2018-03-13 | 华为技术有限公司 | A kind of method and apparatus of container function enhancing |
CN110543711A (en) * | 2019-08-26 | 2019-12-06 | 中国原子能科学研究院 | A Parallel Realization and Optimization Method for Thermal-hydraulic Subchannel Simulation of Numerical Reactor |
CN110546642A (en) * | 2018-10-17 | 2019-12-06 | 阿里巴巴集团控股有限公司 | Secure Multiparty Computation Without Trusted Initializers |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050120160A1 (en) * | 2003-08-20 | 2005-06-02 | Jerry Plouffe | System and method for managing virtual servers |
CN102609990A (en) * | 2012-01-05 | 2012-07-25 | 中国海洋大学 | Massive-scene gradually-updating algorithm facing complex three dimensional CAD (Computer-Aided Design) model |
CN102650950A (en) * | 2012-04-10 | 2012-08-29 | 南京航空航天大学 | Platform architecture supporting multi-GPU (Graphics Processing Unit) virtualization and work method of platform architecture |
CN103136035A (en) * | 2011-11-30 | 2013-06-05 | 国际商业机器公司 | Method and device for thread management of hybrid threading mode program |
CN103279330A (en) * | 2013-05-14 | 2013-09-04 | 江苏名通信息科技有限公司 | MapReduce multiple programming model based on virtual machine GPU computation |
-
2013
- 2013-12-17 CN CN201310695055.4A patent/CN103713938A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050120160A1 (en) * | 2003-08-20 | 2005-06-02 | Jerry Plouffe | System and method for managing virtual servers |
CN103136035A (en) * | 2011-11-30 | 2013-06-05 | 国际商业机器公司 | Method and device for thread management of hybrid threading mode program |
CN102609990A (en) * | 2012-01-05 | 2012-07-25 | 中国海洋大学 | Massive-scene gradually-updating algorithm facing complex three dimensional CAD (Computer-Aided Design) model |
CN102650950A (en) * | 2012-04-10 | 2012-08-29 | 南京航空航天大学 | Platform architecture supporting multi-GPU (Graphics Processing Unit) virtualization and work method of platform architecture |
CN103279330A (en) * | 2013-05-14 | 2013-09-04 | 江苏名通信息科技有限公司 | MapReduce multiple programming model based on virtual machine GPU computation |
Non-Patent Citations (1)
Title |
---|
石林: "GPU通用计算虚拟化方法研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216783A (en) * | 2014-08-20 | 2014-12-17 | 上海交通大学 | Method for automatically managing and controlling virtual GPU (Graphics Processing Unit) resource in cloud gaming |
CN104216783B (en) * | 2014-08-20 | 2017-07-11 | 上海交通大学 | Virtual GPU resource autonomous management and control method in cloud game |
WO2016093428A1 (en) * | 2014-12-11 | 2016-06-16 | 한화테크윈 주식회사 | Mini integrated control device |
WO2016093427A1 (en) * | 2014-12-11 | 2016-06-16 | 한화테크윈 주식회사 | Mini integrated control device |
CN107797843A (en) * | 2016-09-02 | 2018-03-13 | 华为技术有限公司 | A kind of method and apparatus of container function enhancing |
CN107797843B (en) * | 2016-09-02 | 2021-04-20 | 华为技术有限公司 | A method and apparatus for enhancing the function of a container |
CN110546642A (en) * | 2018-10-17 | 2019-12-06 | 阿里巴巴集团控股有限公司 | Secure Multiparty Computation Without Trusted Initializers |
WO2020077959A1 (en) * | 2018-10-17 | 2020-04-23 | Alibaba Group Holding Limited | Secure multi-party computation with no trusted initializer |
US11386212B2 (en) | 2018-10-17 | 2022-07-12 | Advanced New Technologies Co., Ltd. | Secure multi-party computation with no trusted initializer |
CN110543711A (en) * | 2019-08-26 | 2019-12-06 | 中国原子能科学研究院 | A Parallel Realization and Optimization Method for Thermal-hydraulic Subchannel Simulation of Numerical Reactor |
CN110543711B (en) * | 2019-08-26 | 2021-07-20 | 中国原子能科学研究院 | A Parallel Realization and Optimization Method for Numerical Reactor Thermal-Hydraulic Subchannel Simulation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10223762B2 (en) | Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units | |
Vermeire et al. | On the utility of GPU accelerated high-order methods for unsteady flow simulations: A comparison with industry-standard tools | |
CN103713938A (en) | Multi-graphics-processing-unit (GPU) cooperative computing method based on Open MP under virtual environment | |
Wang et al. | Seventh-order derivative-free iterative method for solving nonlinear systems | |
US9378533B2 (en) | Central processing unit, GPU simulation method thereof, and computing system including the same | |
GB2533256A (en) | Data processing systems | |
CN104375805A (en) | Method for simulating parallel computation process of reconfigurable processor through multi-core processor | |
Bo et al. | Accelerating FDTD algorithm using GPU computing | |
Xin et al. | An implementation of GPU accelerated MapReduce: Using Hadoop with OpenCL for data-and compute-intensive jobs | |
Liu et al. | A GPU accelerated red-black SOR algorithm for computational fluid dynamics problems | |
CN105183562A (en) | Method for conducting degree drawing on grid data on basis of CUDA technology | |
Stojanović et al. | Solving Gross Pitaevskii equation using dataflow paradigm | |
CN104615584A (en) | Method for vectorization computing of solution of large-scale trigonometric linear system of equations for GPDSP | |
Wang et al. | A survey of statistical methods and computing for big data | |
Song et al. | A fine-grained parallel EMTP algorithm compatible to graphic processing units | |
Shah et al. | An efficient sparse matrix multiplication for skewed matrix on gpu | |
CN104156271A (en) | Method and system for balancing cooperative computing cluster load | |
US20130106887A1 (en) | Texture generation using a transformation matrix | |
CN102799564A (en) | Fast fourier transformation (FFT) parallel method based on multi-core digital signal processor (DSP) platform | |
Mittal et al. | Machine Learning computation on multiple GPU's using CUDA and message passing interface | |
RADMANOVIĆ et al. | Efficient computation of Galois field expressions on hybrid CPU-GPU platforms. | |
CN104915187A (en) | Graph model calculation method and device | |
CN102968388B (en) | Data layout's method and device thereof | |
Balagafshe et al. | Matrix-matrix multiplication on graphics processing unit platform using tiling technique | |
Loghin | Efficient time-energy execution of data-parallel applications on heterogeneous systems with GPU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140409 |