Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
FIG. 1 is a general flowchart of a tag-based parallelization method for a serial program according to the present invention, comprising the steps of:
and (1) marking the serial program.
And (2) analyzing the mark by the code analysis system, and recording the parameter of the mark clause.
And (3) extracting parallel code segments from the basic parallel code library by the code analysis system, and filling the parallel code segments with the marked clause parameters.
And (4) splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.
Specifically, the step (1) of marking the serial program includes:
the serial program refers to a serial API function. In the field of software engineering, API (Application Programming Interface) functions are predefined functions, which represent specific software function modules that can be called, and are the basic composition structures of computer software. According to the software engineering specification, a serial API function consists of three parts, namely a function name, a parameter list and a function body.
The mark is an identification field used for expressing parallel semantics and indicating a parallel position, and comprises a mark name and a mark clause, wherein the mark name is used for a subsequent code analysis system to identify, is fixed and has a form different from that of a common program code, and the mark clause is used for providing parameters required by parallelization, is filled by a developer and is related to a serial program. The marking clause comprises a data source, a data destination and a data batch number, parameters of the marking clause can be taken from a serial program parameter list and can also be specified by developers, wherein the data source clause provides data packet information to be processed in parallel, the data destination clause provides result data packet information after the parallel processing is finished, the data batch number clause provides the detachable batch number of data packets in the data source clause, and the detachable principle is that each batch of data after being detached can be processed by a serial program independently.
The flow of the step is as follows: the developer adds a flag over the function name of the serial program definition part.
(2) And the code analysis system analyzes the mark and records the mark clause parameters.
The code analysis system is an independently executable program, is responsible for reading the marked serial program file and analyzing the serial program file into a corresponding parallel program, and is an important implementation tool of the method. The code analysis system comprises an analysis module and a code extraction module. The analysis module is responsible for reading and analyzing the marked serial program, and the code extraction module is responsible for extracting code segments from the basic parallel code library.
The flow of the step is as follows: and (3) reading the marked serial program file obtained in the step (1) by the code analysis system, scanning from beginning to end, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified.
(3) The code parsing system extracts parallel code segments from the underlying parallel code library and populates them with tagged clause parameters.
The basic parallel code library is a file which records parallel code segments of a plurality of parallel stages of a plurality of parallel platforms. The parallel platforms currently covered by the basic parallel code library include a shared storage structure hardware platform and an OpenMP programming model, and a distributed storage structure hardware platform and an MPI programming model, which are hereinafter referred to as a shared storage platform and a distributed storage platform, respectively. The multiple parallel stages refer to a parallel program execution process, namely a program structure of a parallel program, and can be divided into three parallel stages: data partitioning and distribution, data calculation and data collection. In general, the basic parallel code library comprises three types of code segments of data division and distribution, data calculation and data collection of a shared storage platform and a distributed storage platform. The basic parallel code base reserves an expansion interface and can be expanded to other parallel platforms.
The parallel code segment extraction refers to that the code analysis system selects the required parallel code segment from the basic parallel code library according to the parallel platform and the parallel stage. Taking the data collection stage under the shared storage platform as an example, the code analysis system queries the code segment set under the directory of the basic parallel code library shared storage platform and returns the code segments in the data collection stage.
The flow of the step is as follows: and (3) the code analysis system sequentially searches parallel code segments of three parallel stages (data division and distribution, data calculation and data collection) under the parallel platform from a basic parallel code library according to the parallel platform and the parallel stages, and fills the marked clause parameters obtained in the step (2) into corresponding fixed point positions of the parallel code segments to obtain parallel code segments of all stages containing the marked clause parameters.
(4) And splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.
The parallel program refers to a parallel API function corresponding to the serial API function and can be called by developers. According to the software engineering specification, a parallel API function is composed of a function name, a parameter list and a function body, and is the same as a serial API function.
The flow of the step is as follows: regarding the function body, under the shared storage platform, sequentially splicing the code segments obtained in the step (3) according to the sequence of data division and distribution, data collection and data calculation, under the distributed storage platform, the splicing sequence is data division and distribution, data calculation and data collection, and the function body of the parallel API function is obtained. Regarding the function name, the serial API function name plus a _ prallel suffix is taken as the function name of the parallel API function. Regarding the parameter list, the serial API function parameter list plus numprocs parameters is taken as the parameter list of the parallel API function as a whole, the numprocs parameters refer to the number of parallel processing units, and the shared storage platform and the distributed storage platform represent the thread number and the node thread number respectively.
According to a preferred embodiment of the invention: the embodiment of the invention parallelizes the serial program Func _ serial (srcname, dstname, num, var1 and var2) into parallel programs under a shared storage platform and a distributed storage platform respectively.
Marking the serial program in the step (1), specifically:
the tag # sigma parallel _ task is added above the serial program function name. After labeling as follows:
#sigma parallel_task src_data(srcname;srcdatatype;srcsize)
dst_data(dstname;dstdatatype;dstsize)
group(num)
func _ serial (src name, dstname, num, var1, var2)// function name and parameter list
{
….// function body
}
The mark name # sigma parallel _ task is used for the subsequent code parsing system to identify, and the mark clause is used for providing parameters required by code parsing, is filled by a developer and is related to a serial program. The mark clauses src _ data (srcname; srcdatatype; srcsize), dst _ data (dstname; dstatatype; dstsize) and group (num) respectively represent a data source, a data destination and a data batch number, wherein data source clause parameters src name, srcdatatype and srcsize respectively indicate the address, the data type and the data quantity of a data packet to be processed in parallel, data destination clause parameters dstname, dstatatype and stsize respectively indicate the address, the data type and the data quantity of a result data packet after the parallel processing is finished, and the data batch number clause provides the splittable batch number of the data source.
And (2) analyzing the mark by the code analyzing system, and recording mark parameters, wherein the method specifically comprises the following steps:
the algorithmic idea of the code parsing system is shown in fig. 2.
And (3) reading the marked serial program file obtained in the step (1), scanning from beginning to end, and when the mark name # sigma parallel _ task is identified, analyzing and recording each mark clause parameter such as srcname, srcdatatype, srcsize and the like by an analyzing module, and simultaneously recording the marked serial program function name Func _ serial and a parameter list (srcname, dstname, num, var1 and var 2). The unmarked part is not resolved.
And (3) extracting parallel code segments from the basic parallel code library by the code analysis system, and filling the parallel code segments by using marking parameters, wherein the method specifically comprises the following steps:
the implementation of the shared storage platform is different from that of the distributed storage platform, and firstly, taking the shared storage platform as an example:
(a) data partitioning and distribution phase
And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns data dividing and distributing code segments. The code segment is as follows:
average_allocation(t_testlocals,step,step_before,numprocs,①);
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(②,numprocs,displs,②_in);
an analysis module of the code analysis system fills corresponding point positions (the point positions are self-carried by code segments) by using the marked clause parameters obtained in the step (2), wherein the point position (i) is filled with num, the point position (ii) is filled with srcname, and the point position (iii) is filled with srcsize, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:
average_allocation(t_testlocals,step,step_before,numprocs,num);
stepsize=srcsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(srcname,numprocs,displs,srcname_in);
several functions in the code segment are introduced:
the data partitioning function average _ allocation has the function of partitioning num batches of data into numacrocs threads, storing results in step and step _ before arrays, and setting t _ testlocals as a partitioning proportionality coefficient and setting the default as 1. The data division mode is as follows:
wherein stepiAnd step _ beforeiRespectively representing the data batch number and the batch number offset obtained by the thread i, wherein the batch number offset refers to the sum of the data quantity obtained by the first i-1 threads.
The DataPartition function is used for calculating the data volume and the data volume offset of each thread, and the result is stored in the array of sendrecvcnts and displs. The calculation method is as follows:
sendrecvcntsi=stepsize*stepi 0≤i<numprocs
displsi=stepsize*step_beforei 0≤i<numprocs
among them, sendrecvcntsiAnd dispisiAnd the data quantity obtained by the thread i and the data quantity offset are shown, and the data quantity offset refers to the sum of the data quantities obtained by the first i-1 threads.
The Data distribution function Data _ trans is used for distributing Data to each thread and distributing a Data packet to be processed with the address of srcname to each thread. Because the shared storage platform adopts a shared storage structure, all threads share the same physical memory, and the data distribution operation can be converted into the memory address operation. The address calculation mode is as follows:
srcname_ini=srcname+displsi 0≤i<numprocs
wherein, srcname _ iniIndicating the address of the input data to which thread i is assigned.
(b) Data calculation phase
And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns a data calculation code segment. The code segment is as follows:
omp_set_num_threads(numprocs);
#pragma omp parallel
{
int i=omp_get_thread_num();
Func_serial(①_in[i],②_out[i],step[i],var1,var2);
}
filling corresponding point positions by using the marked clause parameters obtained in the step (2) by an analysis module of the code analysis system, wherein the point positions (i) are filled with srcname and the point positions (ii) are filled with dstname, the process is automatically completed by the code analysis system, and code segments are obtained after filling as follows:
omp_set_num_threads(numprocs);
#pragma omp parallel
{
int i=omp_get_thread_num();
Func_serial(srcname_in[i],dstname_out[i],step[i],var1,var2);
}
in the code segment, omp _ set _ num _ threads is a statement for setting the thread number by OpenMP, and the thread number of the parallel domain is set to numacrocs. After the parallel domain is opened by the # pragma omp parallel, each thread executes the code segment in the parallel domain in parallel, each thread carries out data calculation by calling the serial API function Func _ serial, and the function parameter srcname _ in [ i ] i]And dstname _ out [ i]Respectively representing the assigned input data address and output data address of thread i. Here, srcname _ in [ i]And the srcname _ iniSame meaning, dstname _ out [ i]The same is true.
(c) Data collection phase
And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns a data collection code segment. The code segment is as follows:
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(②,numprocs,displs,②_in);
filling the marked clause parameters obtained in the step (2) by an analysis module of the code analysis system, wherein the point location (i) is filled with num, the point location (ii) is filled with dstname, and the point location (iii) is filled with dstsize, the process is automatically completed by the code analysis system, and the code segments obtained after filling are as follows:
stepsize=dstsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
Data_trans(dstname,numprocs,displs,dstname_out);
in the code segment, the Data _ trans function is used to collect Data from each thread to the output Data packet address dstname, and the specific principle is introduced in (a) and is not described again, and the description of the DataPartition function is also introduced and is not described again.
According to a preferred embodiment of the present invention, the distributed storage platform is taken as an example:
(a) data partitioning and distribution phase
And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns data division and distribution code segments. The code segment is as follows:
average_allocation(t_testlocals,step,step_before,numprocs,①);
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Scatterv(②,sendrecvcnts,displs,MPI_④,
②_in,sendrecvcnts[myid],MPI_④,0,MPI_COMM_WORLD);
and (3) filling an analysis module of the code analysis system with the marked clause parameters obtained in the step (2), wherein point location (i) is filled with num, point location (ii) is filled with srcname, point location (iii) is filled with srcsize, and point location (iv) is filled with srcdatatype, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:
average_allocation(t_testlocals,step,step_before,numprocs,num);
stepsize=srcsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Scatterv(srcname,sendrecvcnts,displs,MPI_srcdatatype,srcname_in,
sendrecvcnts[myid],MPI_srcdatatype,0,MPI_COMM_WORLD);
the operation _ allocation and DataPartition in the code segment are introduced and will not be described herein. The MPI data distribution function MPI _ scatter is a standard distribution function for MPI.
(b) Data calculation phase
And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns a data calculation code segment. The code segment is as follows:
Func_serial(①_in,②_out,step[myid],var1,var2);
and (3) filling an analysis module of the code analysis system by using the marked clause parameters obtained in the step (2), wherein point locations (i) are filled with srcname and point locations (ii) are filled with dstname, the process is automatically completed by the code analysis system, and code segments are obtained after filling:
Func_serial(srcname_in,dstname_out,step[myid],var1,var2);
each node of the distributed storage platform performs data calculation by calling the serial program Func _ serial.
(c) Data collection phase
And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns a data collection code segment. The code segment is as follows:
stepsize=③/①;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Allgatherv(②_out,sendrecvcnts[myid],MPI_④,②,sendrecvcnts,
displs,MPI_④,MPI_COMM_WORLD);
Data_trans(②,numprocs,displs,②_in);
and (3) filling an analysis module of the code analysis system by using the marked clause parameters obtained in the step (2), wherein the point location (i) is filled with num, the point location (ii) is filled with dstname, the point location (iii) is filled with dstsize, and the point location (iv) is filled with dsttatype, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:
stepsize=dstsize/num;
DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
MPI_Allgatherv(dstname_out,sendrecvcnts[myid],MPI_dstdatatype,dstname,
endrecvcnts,displs,MPI_dstdatatype,MPI_COMM_WORLD);
in this code segment, the MPI data collection function MPI _ Allgatherv is an MPI standard function, and functions to collect data from each node to the output packet address dstname. The specific principle of the DataPartition function has been described above and will not be described again.
Splicing the filled parallel code segments to obtain a corresponding parallel program finally converted from the serial program, which specifically comprises the following steps:
taking a shared storage platform as an example, an analysis module of the code analysis system sequentially splices the code segments obtained in the step (3) according to the sequence of data division and distribution, data collection and data calculation, and then adds fixed codes such as variable definition and the like to obtain a function body of the parallel API function. And adding a _ prallel suffix to the serial API function name as the function name of the parallel API function. And taking the whole serial API function parameter list plus numprocs parameters as a parameter list of the parallel API function, wherein the numprocs parameters are thread numbers.
The resulting parallel program is as follows:
1.Func_serial_parallel(srcname,dstname,num,var1,var2,numprocs)
2.{
3.. 9.// temporary variable definition
4.// data partitioning
5.average_allocation(t_testlocals,step,step_before,numprocs,num);
6.stepsize=srcsize/num;
7.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
8.// data distribution
9.Data_trans(srcname,numprocs,displs,srcname_in);
10.// data Collection
11.stepsize=dstsize/num;
12.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
13.Data_trans(dstname,numprocs,displs,dstname_out);
14.// data calculation
15.omp_set_num_threads(numprocs);
16.#pragma omp parallel
17.{
18.int i=omp_get_thread_num();
19.Func_serial(srcname_in[i],dstname_out[i],step[i],var1,var2);
20.}
21.}
And (3) taking a distributed storage platform as an example, sequentially splicing the code segments obtained in the step (3) according to the sequence of data division and distribution, data calculation and data collection, and adding fixed codes such as variable definition and the like to obtain a function body of the parallel API function. And adding a _ prallel suffix to the serial API function name as the function name of the parallel API function. And taking the whole serial API function parameter list plus numprocs parameters as a parameter list of the parallel API function, wherein the numprocs parameters are node process numbers.
The resulting parallel program is as follows:
1.Func_serial_parallel(srcname,dstname,num,var1,var2,numprocs)
2.{
3.. 9.// temporary variable definition
4.// data partitioning
5.average_allocation(t_testlocals,step,step_before,numprocs,num);
6.stepsize=srcsize/num;
7.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
8.// data distribution
9.MPI_Scatterv(srcname,sendrecvcnts,displs,MPI_srcdatatype,srcname_in,sendrecvcnts[myid],MPI_srcdatatype,0,MPI_COMM_WORLD);
// data calculation
10.Func_serial(srcname_in,dstname_out,step[myid],var1,var2);
11.// data Collection
12.stepsize=dstsize/num;
13.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs);
14.MPI_Allgatherv(dstname_out,sendrecvcnts[myid],MPI_dstdatatype,dstname,sendrecvcnts,displs,/MPI_dstdatatype,MPI_COMM_WORLD);
15.}
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.