CN111857732B

CN111857732B - A Marker-Based Parallelization Method for Serial Programs

Info

Publication number: CN111857732B
Application number: CN202010756781.2A
Authority: CN
Inventors: 唐佩佳; 徐云; 余泽霖; 王嘎; 钟旭阳; 孙一佳
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2021-10-22
Anticipated expiration: 2040-07-31
Also published as: CN111857732A

Abstract

The invention relates to a mark-based serial program parallelization method, comprising the following steps: step (1) marking the serial program; step (2) parsing the mark by a code parsing system, and recording the mark clause parameters; 3) The code parsing system extracts parallel code segments from the basic parallel code base, and fills them with marker clause parameters; step (4) splices the filled parallel code segments to obtain a parallel program corresponding to the final serial program. The method of the invention reduces the development cost of parallel programming and reduces the burden on developers; provides the parallelization capability of multiple platforms, and can obtain the parallel APIs of multiple parallel platforms from one serial API; it does not need to develop a special compiler separately, and can directly adopt mature Compiler, parallel compilation is efficient. The tag-based method does not rewrite the original program and does not require in-depth understanding of the serial program. The standardized tag parsing process improves the reliability of the parallel program and reduces the error probability.

Description

Serial program parallelization method based on marks

Technical Field

The invention relates to a parallel program generation method, in particular to a serial program parallelization method.

Background

With the development and popularization of parallel computing technology, a large number of serial application programs in the industry need to be urgently modified into parallel programs to improve data processing capacity. While serial program parallelization faces two basic problems: (1) the high cost of parallel programming. Parallel programming requires professional parallel programming ability and rich engineering experience, and developing parallel programs requires a great deal of engineering cost and time. (2) The diversity problem of the parallel platform. As the number of various parallel hardware platforms and parallel programming models increases and diversification is presented, the ability to quickly generate parallel programs for a desired target parallel platform is required. In view of the above two problems, it is necessary to assist parallel programming by an efficient and easy-to-use parallelization method.

A parallel platform is a combination of a parallel hardware platform and a corresponding parallel programming model. Parallel hardware platforms can be divided into two categories, shared storage structure and distributed storage structure according to their storage structures. The shared storage structure hardware platform has a plurality of CPUs working together without primary and secondary relations, and each CPU shares the same physical memory and communicates through memory address operation; the distributed storage structure hardware platform is composed of a plurality of nodes, each node is provided with one or more independent CPUs and independent physical memories, and each node can independently operate and communicate through a network among the nodes. Different parallel hardware platforms have corresponding parallel programming models, a typical parallel programming model of a shared storage structure hardware platform is OpenMP, and the parallel programming model is a multi-thread parallel programming model based on guidance marks and describes parallel semantics through explicit marks. A typical parallel programming model of a distributed storage structure hardware platform is MPI, and the MPI provides rich standard APIs (application programming interfaces) to help a user complete operations such as parallel environment construction, node communication and synchronization.

The existing parallelization methods for serial programs mainly have two types, and the main contents and the defects are as follows: (1) an automatic parallelization tool. The tool automatically analyzes the parallelizable part in the original serial program and performs parallel compiling according to a certain compiling rule, a special compiler needs to be separately developed, meanwhile, in order to keep the original serial program semantics, the strategy of the compiler is conservative, and the parallel efficiency is not high. (2) A traditional parallel programming model. The tool adopts the existing mature classical parallel programming framework to rewrite the original serial program, but the program rewriting needs to deeply know the original serial program, needs experienced designers and has large workload, the program rewriting process is easy to make mistakes, the program quality depends on the professional ability of developers, the platform correlation of the tool is strong, most of the generated parallel programs are specific to specific parallel platforms, and the parallel platforms need to be redeveloped after being replaced

Disclosure of Invention

In order to solve the technical problems, the invention provides a parallelization method of a serial program, aiming at two basic problems faced by parallelization of the serial program and the defects of the existing parallelization method, and the parallelization method of the serial program solves the high cost problem of parallelization of the serial program and the diversity problem of a parallel platform by marking the serial program and analyzing the mark to generate the parallelization program.

The technical scheme of the invention is as follows: a tag-based serial program parallelization method, comprising the steps of:

marking a serial program;

step (2) the code analysis system analyzes the mark and records the parameters of the mark clause;

step (3) the code analysis system extracts parallel code segments from the basic parallel code library and fills the parallel code segments with the marked clause parameters;

and (4) splicing the filled parallel code segments to obtain a corresponding parallel program finally converted from the serial program.

Further, the function names of the serial API functions are marked, the marks comprise mark names and mark clauses, the mark names are used for identification of a subsequent code analysis system, the mark clauses are used for providing parameters required by parallelization and comprise data sources, data purposes and data batch numbers, the data source clauses provide data package information to be processed in parallel, the data purpose clauses provide result data package information after the parallelization processing is finished, the data batch number clauses provide split batch numbers of data packages in the data source clauses, and the split principle is that each batch of split data can be processed by a serial program independently.

Furthermore, the code analysis system comprises an analysis module and a code extraction module, wherein the analysis module is responsible for reading and analyzing the marked serial program, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified; the code extraction module is responsible for extracting code segments from the underlying parallel code library.

Further, the basic parallel code library records three parallel stages of a shared storage platform and a distributed storage platform, namely parallel code segments of data division and distribution, data calculation and data collection.

Furthermore, the code analysis system fills the marked clause parameters into corresponding fixed point positions of the parallel code segment to obtain the parallel code segment containing the marked clause parameters.

Further, under a shared storage platform, the splicing sequence comprises data division and distribution, data collection and data calculation; under a distributed storage platform, the splicing sequence comprises data division and distribution, data calculation and data collection.

Further, after splicing, adding a fixed code defined by a variable to obtain a function body of the parallel API function; adding a parallel identification suffix to the serial API function name to serve as the function name of the parallel API function; and adding thread number parameters to the serial API function parameter list to obtain the whole parameter list of the parallel API function.

The beneficial results of the invention are:

(1) the parallel programming development cost is reduced, and the burden of developers is lightened;

(2) the parallelization capability of a plurality of platforms is provided, and the parallel API of the plurality of parallel platforms can be obtained by one serial API;

(3) a special compiler is not required to be independently developed, a mature compiler can be directly adopted, and the parallel compiling efficiency is high;

(4) the method based on the mark does not rewrite the original program and does not need to deeply understand the serial program, and the standardized mark analysis process improves the reliability of the parallel program and reduces the error probability.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

FIG. 2 is an algorithmic flow diagram of the code resolution system of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

FIG. 1 is a general flowchart of a tag-based parallelization method for a serial program according to the present invention, comprising the steps of:

and (1) marking the serial program.

And (2) analyzing the mark by the code analysis system, and recording the parameter of the mark clause.

And (3) extracting parallel code segments from the basic parallel code library by the code analysis system, and filling the parallel code segments with the marked clause parameters.

And (4) splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.

Specifically, the step (1) of marking the serial program includes:

the serial program refers to a serial API function. In the field of software engineering, API (Application Programming Interface) functions are predefined functions, which represent specific software function modules that can be called, and are the basic composition structures of computer software. According to the software engineering specification, a serial API function consists of three parts, namely a function name, a parameter list and a function body.

The mark is an identification field used for expressing parallel semantics and indicating a parallel position, and comprises a mark name and a mark clause, wherein the mark name is used for a subsequent code analysis system to identify, is fixed and has a form different from that of a common program code, and the mark clause is used for providing parameters required by parallelization, is filled by a developer and is related to a serial program. The marking clause comprises a data source, a data destination and a data batch number, parameters of the marking clause can be taken from a serial program parameter list and can also be specified by developers, wherein the data source clause provides data packet information to be processed in parallel, the data destination clause provides result data packet information after the parallel processing is finished, the data batch number clause provides the detachable batch number of data packets in the data source clause, and the detachable principle is that each batch of data after being detached can be processed by a serial program independently.

The flow of the step is as follows: the developer adds a flag over the function name of the serial program definition part.

(2) And the code analysis system analyzes the mark and records the mark clause parameters.

The code analysis system is an independently executable program, is responsible for reading the marked serial program file and analyzing the serial program file into a corresponding parallel program, and is an important implementation tool of the method. The code analysis system comprises an analysis module and a code extraction module. The analysis module is responsible for reading and analyzing the marked serial program, and the code extraction module is responsible for extracting code segments from the basic parallel code library.

The flow of the step is as follows: and (3) reading the marked serial program file obtained in the step (1) by the code analysis system, scanning from beginning to end, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified.

(3) The code parsing system extracts parallel code segments from the underlying parallel code library and populates them with tagged clause parameters.

The basic parallel code library is a file which records parallel code segments of a plurality of parallel stages of a plurality of parallel platforms. The parallel platforms currently covered by the basic parallel code library include a shared storage structure hardware platform and an OpenMP programming model, and a distributed storage structure hardware platform and an MPI programming model, which are hereinafter referred to as a shared storage platform and a distributed storage platform, respectively. The multiple parallel stages refer to a parallel program execution process, namely a program structure of a parallel program, and can be divided into three parallel stages: data partitioning and distribution, data calculation and data collection. In general, the basic parallel code library comprises three types of code segments of data division and distribution, data calculation and data collection of a shared storage platform and a distributed storage platform. The basic parallel code base reserves an expansion interface and can be expanded to other parallel platforms.

The parallel code segment extraction refers to that the code analysis system selects the required parallel code segment from the basic parallel code library according to the parallel platform and the parallel stage. Taking the data collection stage under the shared storage platform as an example, the code analysis system queries the code segment set under the directory of the basic parallel code library shared storage platform and returns the code segments in the data collection stage.

The flow of the step is as follows: and (3) the code analysis system sequentially searches parallel code segments of three parallel stages (data division and distribution, data calculation and data collection) under the parallel platform from a basic parallel code library according to the parallel platform and the parallel stages, and fills the marked clause parameters obtained in the step (2) into corresponding fixed point positions of the parallel code segments to obtain parallel code segments of all stages containing the marked clause parameters.

(4) And splicing the filled parallel code segments to obtain a parallel program corresponding to the final serial program.

The parallel program refers to a parallel API function corresponding to the serial API function and can be called by developers. According to the software engineering specification, a parallel API function is composed of a function name, a parameter list and a function body, and is the same as a serial API function.

The flow of the step is as follows: regarding the function body, under the shared storage platform, sequentially splicing the code segments obtained in the step (3) according to the sequence of data division and distribution, data collection and data calculation, under the distributed storage platform, the splicing sequence is data division and distribution, data calculation and data collection, and the function body of the parallel API function is obtained. Regarding the function name, the serial API function name plus a _ prallel suffix is taken as the function name of the parallel API function. Regarding the parameter list, the serial API function parameter list plus numprocs parameters is taken as the parameter list of the parallel API function as a whole, the numprocs parameters refer to the number of parallel processing units, and the shared storage platform and the distributed storage platform represent the thread number and the node thread number respectively.

According to a preferred embodiment of the invention: the embodiment of the invention parallelizes the serial program Func _ serial (srcname, dstname, num, var1 and var2) into parallel programs under a shared storage platform and a distributed storage platform respectively.

Marking the serial program in the step (1), specifically:

the tag # sigma parallel _ task is added above the serial program function name. After labeling as follows:

#sigma parallel_task src_data(srcname；srcdatatype；srcsize)

dst_data(dstname；dstdatatype；dstsize)

group(num)

func _ serial (src name, dstname, num, var1, var2)// function name and parameter list

{

….// function body

}

The mark name # sigma parallel _ task is used for the subsequent code parsing system to identify, and the mark clause is used for providing parameters required by code parsing, is filled by a developer and is related to a serial program. The mark clauses src _ data (srcname; srcdatatype; srcsize), dst _ data (dstname; dstatatype; dstsize) and group (num) respectively represent a data source, a data destination and a data batch number, wherein data source clause parameters src name, srcdatatype and srcsize respectively indicate the address, the data type and the data quantity of a data packet to be processed in parallel, data destination clause parameters dstname, dstatatype and stsize respectively indicate the address, the data type and the data quantity of a result data packet after the parallel processing is finished, and the data batch number clause provides the splittable batch number of the data source.

And (2) analyzing the mark by the code analyzing system, and recording mark parameters, wherein the method specifically comprises the following steps:

the algorithmic idea of the code parsing system is shown in fig. 2.

And (3) reading the marked serial program file obtained in the step (1), scanning from beginning to end, and when the mark name # sigma parallel _ task is identified, analyzing and recording each mark clause parameter such as srcname, srcdatatype, srcsize and the like by an analyzing module, and simultaneously recording the marked serial program function name Func _ serial and a parameter list (srcname, dstname, num, var1 and var 2). The unmarked part is not resolved.

And (3) extracting parallel code segments from the basic parallel code library by the code analysis system, and filling the parallel code segments by using marking parameters, wherein the method specifically comprises the following steps:

the implementation of the shared storage platform is different from that of the distributed storage platform, and firstly, taking the shared storage platform as an example:

(a) data partitioning and distribution phase

And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns data dividing and distributing code segments. The code segment is as follows:

average_allocation(t_testlocals,step,step_before,numprocs,①)；

stepsize＝③/①；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

Data_trans(②,numprocs,displs,②_in)；

an analysis module of the code analysis system fills corresponding point positions (the point positions are self-carried by code segments) by using the marked clause parameters obtained in the step (2), wherein the point position (i) is filled with num, the point position (ii) is filled with srcname, and the point position (iii) is filled with srcsize, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:

average_allocation(t_testlocals,step,step_before,numprocs,num)；

stepsize＝srcsize/num；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

Data_trans(srcname,numprocs,displs,srcname_in)；

several functions in the code segment are introduced:

the data partitioning function average _ allocation has the function of partitioning num batches of data into numacrocs threads, storing results in step and step _ before arrays, and setting t _ testlocals as a partitioning proportionality coefficient and setting the default as 1. The data division mode is as follows:

wherein step_iAnd step _ before_iRespectively representing the data batch number and the batch number offset obtained by the thread i, wherein the batch number offset refers to the sum of the data quantity obtained by the first i-1 threads.

The DataPartition function is used for calculating the data volume and the data volume offset of each thread, and the result is stored in the array of sendrecvcnts and displs. The calculation method is as follows:

sendrecvcnts_i＝stepsize*step_i 0≤i<numprocs

displs_i＝stepsize*step_before_i 0≤i<numprocs

among them, sendrecvcnts_iAnd dispis_iAnd the data quantity obtained by the thread i and the data quantity offset are shown, and the data quantity offset refers to the sum of the data quantities obtained by the first i-1 threads.

The Data distribution function Data _ trans is used for distributing Data to each thread and distributing a Data packet to be processed with the address of srcname to each thread. Because the shared storage platform adopts a shared storage structure, all threads share the same physical memory, and the data distribution operation can be converted into the memory address operation. The address calculation mode is as follows:

srcname_in_i＝srcname+displs_i 0≤i<numprocs

wherein, srcname _ in_iIndicating the address of the input data to which thread i is assigned.

(b) Data calculation phase

And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns a data calculation code segment. The code segment is as follows:

omp_set_num_threads(numprocs)；

#pragma omp parallel

{

int i＝omp_get_thread_num()；

Func_serial(①_in[i],②_out[i],step[i],var1,var2)；

}

filling corresponding point positions by using the marked clause parameters obtained in the step (2) by an analysis module of the code analysis system, wherein the point positions (i) are filled with srcname and the point positions (ii) are filled with dstname, the process is automatically completed by the code analysis system, and code segments are obtained after filling as follows:

omp_set_num_threads(numprocs)；

#pragma omp parallel

{

int i＝omp_get_thread_num()；

Func_serial(srcname_in[i],dstname_out[i],step[i],var1,var2)；

}

in the code segment, omp _ set _ num _ threads is a statement for setting the thread number by OpenMP, and the thread number of the parallel domain is set to numacrocs. After the parallel domain is opened by the # pragma omp parallel, each thread executes the code segment in the parallel domain in parallel, each thread carries out data calculation by calling the serial API function Func _ serial, and the function parameter srcname _ in [ i ] i]And dstname _ out [ i]Respectively representing the assigned input data address and output data address of thread i. Here, srcname _ in [ i]And the srcname _ in_iSame meaning, dstname _ out [ i]The same is true.

(c) Data collection phase

And a code extraction module of the code analysis system inquires a code segment set under the shared storage platform directory of the basic parallel code library and returns a data collection code segment. The code segment is as follows:

stepsize＝③/①；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

Data_trans(②,numprocs,displs,②_in)；

filling the marked clause parameters obtained in the step (2) by an analysis module of the code analysis system, wherein the point location (i) is filled with num, the point location (ii) is filled with dstname, and the point location (iii) is filled with dstsize, the process is automatically completed by the code analysis system, and the code segments obtained after filling are as follows:

stepsize＝dstsize/num；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

Data_trans(dstname,numprocs,displs,dstname_out)；

in the code segment, the Data _ trans function is used to collect Data from each thread to the output Data packet address dstname, and the specific principle is introduced in (a) and is not described again, and the description of the DataPartition function is also introduced and is not described again.

According to a preferred embodiment of the present invention, the distributed storage platform is taken as an example:

(a) data partitioning and distribution phase

And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns data division and distribution code segments. The code segment is as follows:

average_allocation(t_testlocals,step,step_before,numprocs,①)；

stepsize＝③/①；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

MPI_Scatterv(②,sendrecvcnts,displs,MPI_④,

②_in,sendrecvcnts[myid],MPI_④,0,MPI_COMM_WORLD)；

and (3) filling an analysis module of the code analysis system with the marked clause parameters obtained in the step (2), wherein point location (i) is filled with num, point location (ii) is filled with srcname, point location (iii) is filled with srcsize, and point location (iv) is filled with srcdatatype, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:

average_allocation(t_testlocals,step,step_before,numprocs,num)；

stepsize＝srcsize/num；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

MPI_Scatterv(srcname,sendrecvcnts,displs,MPI_srcdatatype,srcname_in,

sendrecvcnts[myid],MPI_srcdatatype,0,MPI_COMM_WORLD)；

the operation _ allocation and DataPartition in the code segment are introduced and will not be described herein. The MPI data distribution function MPI _ scatter is a standard distribution function for MPI.

(b) Data calculation phase

And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns a data calculation code segment. The code segment is as follows:

Func_serial(①_in,②_out,step[myid],var1,var2)；

and (3) filling an analysis module of the code analysis system by using the marked clause parameters obtained in the step (2), wherein point locations (i) are filled with srcname and point locations (ii) are filled with dstname, the process is automatically completed by the code analysis system, and code segments are obtained after filling:

Func_serial(srcname_in,dstname_out,step[myid],var1,var2)；

each node of the distributed storage platform performs data calculation by calling the serial program Func _ serial.

(c) Data collection phase

And a code extraction module of the code analysis system inquires a code segment set under the basic parallel code library distribution storage platform directory and returns a data collection code segment. The code segment is as follows:

stepsize＝③/①；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

MPI_Allgatherv(②_out,sendrecvcnts[myid],MPI_④,②,sendrecvcnts,

displs,MPI_④,MPI_COMM_WORLD)；

Data_trans(②,numprocs,displs,②_in)；

and (3) filling an analysis module of the code analysis system by using the marked clause parameters obtained in the step (2), wherein the point location (i) is filled with num, the point location (ii) is filled with dstname, the point location (iii) is filled with dstsize, and the point location (iv) is filled with dsttatype, the process is automatically completed by the code analysis system, and the code segments are obtained after filling:

stepsize＝dstsize/num；

DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

MPI_Allgatherv(dstname_out,sendrecvcnts[myid],MPI_dstdatatype,dstname,

endrecvcnts,displs,MPI_dstdatatype,MPI_COMM_WORLD)；

in this code segment, the MPI data collection function MPI _ Allgatherv is an MPI standard function, and functions to collect data from each node to the output packet address dstname. The specific principle of the DataPartition function has been described above and will not be described again.

Splicing the filled parallel code segments to obtain a corresponding parallel program finally converted from the serial program, which specifically comprises the following steps:

taking a shared storage platform as an example, an analysis module of the code analysis system sequentially splices the code segments obtained in the step (3) according to the sequence of data division and distribution, data collection and data calculation, and then adds fixed codes such as variable definition and the like to obtain a function body of the parallel API function. And adding a _ prallel suffix to the serial API function name as the function name of the parallel API function. And taking the whole serial API function parameter list plus numprocs parameters as a parameter list of the parallel API function, wherein the numprocs parameters are thread numbers.

The resulting parallel program is as follows:

1.Func_serial_parallel(srcname,dstname,num,var1,var2,numprocs)

2.{

3.. 9.// temporary variable definition

4.// data partitioning

5.average_allocation(t_testlocals,step,step_before,numprocs,num)；

6.stepsize＝srcsize/num；

7.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

8.// data distribution

9.Data_trans(srcname,numprocs,displs,srcname_in)；

10.// data Collection

11.stepsize＝dstsize/num；

12.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

13.Data_trans(dstname,numprocs,displs,dstname_out)；

14.// data calculation

15.omp_set_num_threads(numprocs)；

16.#pragma omp parallel

17.{

18.int i＝omp_get_thread_num()；

19.Func_serial(srcname_in[i],dstname_out[i],step[i],var1,var2)；

20.}

21.}

And (3) taking a distributed storage platform as an example, sequentially splicing the code segments obtained in the step (3) according to the sequence of data division and distribution, data calculation and data collection, and adding fixed codes such as variable definition and the like to obtain a function body of the parallel API function. And adding a _ prallel suffix to the serial API function name as the function name of the parallel API function. And taking the whole serial API function parameter list plus numprocs parameters as a parameter list of the parallel API function, wherein the numprocs parameters are node process numbers.

The resulting parallel program is as follows:

1.Func_serial_parallel(srcname,dstname,num,var1,var2,numprocs)

2.{

3.. 9.// temporary variable definition

4.// data partitioning

5.average_allocation(t_testlocals,step,step_before,numprocs,num)；

6.stepsize＝srcsize/num；

7.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

8.// data distribution

9.MPI_Scatterv(srcname,sendrecvcnts,displs,MPI_srcdatatype,srcname_in,sendrecvcnts[myid],MPI_srcdatatype,0,MPI_COMM_WORLD)；

// data calculation

10.Func_serial(srcname_in,dstname_out,step[myid],var1,var2)；

11.// data Collection

12.stepsize＝dstsize/num；

13.DataPartition(numprocs,step,step_before,stepsize,sendrecvcnts,displs)；

14.MPI_Allgatherv(dstname_out,sendrecvcnts[myid],MPI_dstdatatype,dstname,sendrecvcnts,displs,/MPI_dstdatatype,MPI_COMM_WORLD)；

15.}

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims

1. A tag-based serial program parallelization method is characterized by comprising the following steps:

marking a serial program; the method comprises the steps of marking function names of serial API functions, wherein the marks comprise mark names and mark clauses, the mark names are used for identification of a subsequent code analysis system, the mark clauses are used for providing parameters required by parallelization and comprise a data source, a data destination and a data batch number, the data source clauses provide data packet information to be processed in parallel, the data destination clauses provide result data packet information after the parallelization is finished, the data batch number clauses provide the detachable batch number of data packets in the data source clauses, and the detachable principle is that each batch of data after being detached can be processed by a serial program independently;

2. The tag-based serial program parallelization method according to claim 1, characterized in that:

the code analysis system comprises an analysis module and a code extraction module, wherein the analysis module is responsible for reading and analyzing the marked serial program, and analyzing and recording the marked clause parameters and the marked serial program when the mark name is identified; the code extraction module is responsible for extracting code segments from the underlying parallel code library.

3. The tag-based serial program parallelization method according to claim 2, characterized in that:

the basic parallel code library records three parallel stages of a shared storage platform and a distributed storage platform, namely parallel code segments of data division and distribution, data calculation and data collection.

4. The tag-based serial program parallelization method according to claim 1, characterized in that:

and the code analysis system fills the marked clause parameters into corresponding fixed point positions of the parallel code segment to obtain the parallel code segment containing the marked clause parameters.

5. The tag-based serial program parallelization method according to claim 1, characterized in that:

under a shared storage platform, the splicing sequence comprises data division and distribution, data collection and data calculation; under a distributed storage platform, the splicing sequence comprises data division and distribution, data calculation and data collection.

6. The tag-based serial program parallelization method according to claim 5, characterized in that:

after splicing, adding a fixed code defined by a variable to obtain a function body of the parallel API function; adding a parallel identification suffix to the serial API function name to serve as the function name of the parallel API function; and adding thread number parameters to the serial API function parameter list to obtain the whole parameter list of the parallel API function.