CN103744643A

CN103744643A - Method and device for structuring a plurality of nodes parallel under multithreaded program

Info

Publication number: CN103744643A
Application number: CN201410012455.5A
Authority: CN
Inventors: 沈铂; 张广勇; 卢晓伟; 吴韶华
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2014-01-10
Filing date: 2014-01-10
Publication date: 2014-04-23
Anticipated expiration: 2034-01-10
Also published as: CN103744643B

Abstract

The invention discloses a method and a device for structuring a plurality of nodes parallel under a multithreaded program. The method comprises the steps of creating a master process and a slave process in a master function of an original program, and obtaining information which represents the master process and the slave process respectively; performing an operation of retaining a thread of the original program in the master function of the master process, developing thread and binding a thread function; deleting an original code in a hotspot thread function of the master process and adding a code which communicates with a subprocess; and deleting content of master function of the original program in the master function of the subprocess and adding a hotspot thread function calculating part of the original program and a part which communicates with the master process. According to the method and the device, the thread communication is added under a multithreaded version program framework, so that parallel structuring of a plurality of nodes under a multithreaded framework is realized, scalability and performance of the program are improved and a calculation resource among the nodes is fully utilized.

Description

Method and device for multi-node parallel architecture under multi-thread program

技术领域technical field

本发明涉及计算机软件优化技术，尤其涉及多线程程序下多节点并行架构的方法及装置。The invention relates to a computer software optimization technology, in particular to a method and a device of a multi-node parallel architecture under a multi-thread program.

背景技术Background technique

并行计算广义是指在一个程序中同时执行多个计算任务，通常用于对性能要求极高的场合，例如：气象预报、石油勘探等科学计算领域。并行计算能够充分利用CPU计算资源，因此应用越来越广泛。In a broad sense, parallel computing refers to the simultaneous execution of multiple computing tasks in one program, which is usually used in occasions with extremely high performance requirements, such as scientific computing fields such as weather forecasting and oil exploration. Parallel computing can make full use of CPU computing resources, so it is more and more widely used.

并行计算通常有两种实现方式：多进程和多线程。多线程程序仅能用于单节点（节点指网络中的一个计算设备）并行计算，优势在于线程间通信简单，线程开销小。多进程程序可以用于多节点并行计算，但是节点间通信需要通过网络实现，开销较大。There are usually two ways to implement parallel computing: multi-process and multi-thread. Multi-threaded programs can only be used for parallel computing on a single node (a node refers to a computing device in the network). The advantage lies in the simple communication between threads and low thread overhead. Multi-process programs can be used for multi-node parallel computing, but the communication between nodes needs to be realized through the network, which has a large overhead.

通常结合二者的设计方式是框架采用多进程方式，进程内部使用多线程方式，以充分利用多节点和节点内的CPU计算资源。Usually, the design method combining the two is that the framework adopts a multi-process method, and the internal process uses a multi-thread method to make full use of multi-nodes and CPU computing resources in the nodes.

多进程程序通常采用消息传递接口（MPI，Message Passing Interface）进行开发。MPI是通用的工业级的进程间通信接口，可以方便地通过消息传递方式在进程间进行通信，并提供了方便的集群多进程运行方式。Multi-process programs are usually developed using the Message Passing Interface (MPI, Message Passing Interface). MPI is a general-purpose industrial-grade inter-process communication interface, which can easily communicate between processes through message passing, and provides a convenient cluster multi-process operation mode.

多线程程序可以使用多种方式进行线程间并行计算，例如：OpenMP、pThread等。OpenMP是以编译指导语句形式对源程序进行简单修改，即可利用多线程库进行自动的并行，隐藏了很多细节。OpenMP库底层是调用pThread实现的。pThread库是Linux下的底层线程库，Windows下也有对应的线程应用程序接口（API，Application Program Interface），使用pThread库可以手工控制线程运作方式，提供了更加精细的控制。Multi-threaded programs can use a variety of methods for inter-thread parallel computing, such as: OpenMP, pThread, etc. OpenMP simply modifies the source program in the form of compilation instruction statements, and can use the multi-thread library for automatic parallelism, hiding many details. The bottom layer of the OpenMP library is implemented by calling pThread. The pThread library is the underlying thread library under Linux, and there is also a corresponding thread application program interface (API, Application Program Interface) under Windows. Using the pThread library can manually control the thread operation mode, providing more fine-grained control.

但是，如果原始计算程序为多线程程序，欲充分利用多节点计算资源，就必须将其改造为多进程程序。现在采用的方法是将代码大幅重写，将多线程程序框架改为多进程程序框架，费时费力。目前还没有一种方法能快速将多线程程序改造为保留多线程程序框架，计算部分采用多进程架构，以充分利用节点间计算资源。However, if the original computing program is a multi-threaded program, in order to make full use of multi-node computing resources, it must be transformed into a multi-process program. The current method is to substantially rewrite the code and change the multi-threaded program framework to a multi-process program framework, which is time-consuming and laborious. At present, there is no method to quickly transform a multi-threaded program into a multi-threaded program framework, and the calculation part adopts a multi-process architecture to make full use of the computing resources between nodes.

发明内容Contents of the invention

本发明所要解决的技术问题是提供一种多线程程序下多节点并行架构的方法，能够使得多线程程序快速改造为多线程框架下的多进程计算架构。The technical problem to be solved by the present invention is to provide a method for multi-node parallel architecture under the multi-thread program, which can quickly transform the multi-thread program into a multi-process computing architecture under the multi-thread framework.

为了解决上述技术问题，本发明提供了一种多线程程序下多节点并行架构的方法，包括：In order to solve the above-mentioned technical problems, the present invention provides a method of multi-node parallel architecture under multi-thread program, including:

在原始程序的主函数中创建主、从进程，并获取分别代表主、从进程的信息；Create master and slave processes in the main function of the original program, and obtain information representing the master and slave processes respectively;

在主进程的主函数中保留原始程序的线程开辟、线程绑定线程函数的操作；在主进程的热点线程函数中删除原有的计算代码，增加与子进程通信的代码；在子进程的主函数中删除原程序的主函数内容，增加原始程序的热点线程函数计算部分和与主进程通信部分。In the main function of the main process, the thread development of the original program and the operation of the thread binding thread function are retained; in the hot thread function of the main process, the original calculation code is deleted, and the code for communicating with the sub-process is added; in the main process of the sub-process In the function, the content of the main function of the original program is deleted, and the hot thread function calculation part and the communication part with the main process of the original program are added.

进一步地，在该方法步骤之前还包括：Further, before the method step, it also includes:

根据原始程序测得热点线程函数，并通过分析确定热点线程函数能够优化成多进程方式并行架构。According to the original program, the hot thread function is measured, and through analysis, it is determined that the hot thread function can be optimized into a multi-process parallel architecture.

进一步地，在原始程序的主函数中创建主、从进程，并获取分别代表主、从进程的信息；具体包括：Further, create master and slave processes in the main function of the original program, and obtain information representing the master and slave processes respectively; specifically include:

在主函数中使用消息传递接口初始化函数创建所述主、从进程，并通过消息传递接口获取进程号函数获取相应的进程号，其中进程号0代表主进程，其余非0进程号代表从进程。In the main function, use the message passing interface initialization function to create the master and slave processes, and obtain the corresponding process numbers through the message passing interface to obtain the process number function, wherein the process number 0 represents the master process, and the remaining non-zero process numbers represent the slave processes.

进一步地，在主进程的热点线程函数中删除原有的计算代码，增加与子进程通信的代码，具体包括：Further, delete the original calculation code in the hot thread function of the main process, and add the code for communicating with the child process, specifically including:

在进入主进程的热点线程函数时，主进程向子进程发送包括任务长度，任务数据的任务相关信息；待完成全部信息发送时，主进程进入等待状态，等待子进程返回计算结果。When entering the hot thread function of the main process, the main process sends task-related information including task length and task data to the sub-process; when all information is sent, the main process enters a waiting state and waits for the sub-process to return the calculation result.

进一步地，在子进程的主函数中删除原程序的主函数内容，增加原始程序的热点线程函数计算部分和与主进程通信部分，具体包括：Further, delete the main function content of the original program in the main function of the sub-process, and add the hot thread function calculation part and the communication part with the main process of the original program, specifically including:

由子进程等待主进程发送任务长度的信息，并根据任务长度分配内存空间；The child process waits for the main process to send the task length information, and allocates memory space according to the task length;

由子进程等待接收主进程发送任务数据的信息，并将收到的任务数据信息存入分配的内存空间；使用原始程序热点线程函数中计算部分调用的子函数或相应的代码对接收的任务数据进行计算；计算结束将计算结果发送至主进程；重复本步骤直到程序结束。The sub-process waits to receive the task data information sent by the main process, and stores the received task data information into the allocated memory space; uses the sub-function or corresponding code called by the calculation part in the hot thread function of the original program to process the received task data Calculation; the calculation is completed and the calculation result is sent to the main process; repeat this step until the end of the program.

为了解决上述技术问题，本发明提供了一种多线程程序下多节点并行架构的装置，包括相互连接的主函数模块和主进程计算模块，其中：In order to solve the above technical problems, the present invention provides a multi-node parallel architecture device under a multi-threaded program, including a main function module and a main process computing module connected to each other, wherein:

主函数模块，用于初始化进程参数，针对主进程和从进程运行相应的分支，通过多个子进程接收主进程传输的参与计算的数据并进行计算，将计算结果发送给主进程；The main function module is used to initialize the process parameters, run the corresponding branches for the main process and the slave process, receive the calculation data transmitted by the main process through multiple sub-processes and perform calculations, and send the calculation results to the main process;

主进程计算模块，用于将需要计算的数据传递给子进程，并接收子进程的计算结果。The calculation module of the main process is used to transfer the data to be calculated to the sub-processes and receive the calculation results of the sub-processes.

进一步地，主函数模块通过进程创建模块运行进程初始化函数，初始化进程参数，包括创建线程并绑定线程函数；通过主函数判断模块判断进程属于主进程或从进程，并针对不同进程运行相应的分支；通过子进程计算模块使得多个子进程接收主进程传输的参与计算的数据，并通过调用核心计算模块分别对接收的数据进行计算，并分别将计算结果发送给主进程。Further, the main function module runs the process initialization function through the process creation module, initializes process parameters, including creating threads and binding thread functions; judges whether the process belongs to the main process or the slave process through the main function judgment module, and runs corresponding branches for different processes ; Through the sub-process calculation module, multiple sub-processes receive the calculation data transmitted by the main process, and respectively calculate the received data by calling the core calculation module, and send the calculation results to the main process respectively.

本发明通过在现有多线程版本程序框架下加入进程通信，实现多线程框架下多节点并行架构，提高了程序扩展性和性能，从而充分利用了节点间计算资源。The invention realizes multi-node parallel architecture under the multi-thread framework by adding process communication under the existing multi-thread version program framework, improves program expansibility and performance, and thus fully utilizes inter-node computing resources.

附图说明Description of drawings

图1是本发明的多线程程序下多节点并行架构的方法实施例的流程图；Fig. 1 is the flow chart of the method embodiment of multi-node parallel architecture under the multi-thread program of the present invention;

图2是现有的原始计算程序组成的装置结构示意图；Fig. 2 is the device structure schematic diagram that existing original computing program is formed;

图3是本发明将图2所示的装置改造成的本发明的多线程程序下多节点并行架构的装置实施例的结构示意图。FIG. 3 is a schematic structural diagram of an embodiment of a device in which the device shown in FIG. 2 is transformed into a multi-node parallel architecture under a multi-threaded program of the present invention.

具体实施方式Detailed ways

以下结合附图和优选实施例详细说明本发明的技术方案。应该理解，以下列举的实施例仅用于说明和解释本发明，而不构成对本发明技术方案的限制。The technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings and preferred embodiments. It should be understood that the following examples are only used to illustrate and explain the present invention, but not to limit the technical solution of the present invention.

如图1所示，是本发明的多线程程序下多节点并行架构的方法实施例的流程，包括如下步骤：As shown in Figure 1, it is the process flow of the method embodiment of the multi-node parallel architecture under the multi-thread program of the present invention, including the following steps:

步骤S110，根据原始程序测得热点线程函数，并通过分析确定该热点线程函数能够优化成多进程方式并行架构；Step S110, measure the hot thread function according to the original program, and determine through analysis that the hot thread function can be optimized into a multi-process parallel architecture;

其中，测得的热点线程函数可以有一个或多个。Wherein, there may be one or more hotspot thread functions to be measured.

步骤S120，在原始程序的主函数中创建主、从进程，并获取分别代表主、从进程的信息；Step S120, creating master and slave processes in the main function of the original program, and obtaining information representing the master and slave processes respectively;

其中，以MPI为例建立进程，即在主函数中使用MPI初始化函数创建主、从进程，并获取分别代表主从进程的进程号，即通过MPI获取进程号函数获取相应的进程号，其中进程号0表示主进程，其余非0进程号表示从进程。因此，可以通过判断进程号确定主进程与从进程，并在不同的分支中处理不同的情况。Among them, take MPI as an example to establish a process, that is, use the MPI initialization function in the main function to create the master and slave processes, and obtain the process numbers representing the master and slave processes respectively, that is, obtain the corresponding process numbers through the MPI function to obtain the process number, where the process Number 0 means the master process, and other non-zero process numbers mean slave processes. Therefore, the master process and the slave process can be determined by judging the process number, and different situations can be handled in different branches.

需要说明的是，建立进程的方式与编程人员使用的方式有关。例如以MPI为例，主、从进程使用同一份源代码，对程序员来说，创建进程是在初始化函数中完成的；但真实的创建进程操作是在程序启动时完成的。由于本发明面向的是程序员，因此这里采用以程序员视角为准，将初始化函数认为是创建进程的操作，即每个进程都有主函数，在主函数内都会通过执行初始化函数创建相应的进程，但是所有进程使用同一份源代码，并根据进程号执行不同的操作。It should be noted that the way to establish a process is related to the way used by programmers. For example, taking MPI as an example, the master and slave processes use the same source code. For programmers, the process creation is completed in the initialization function; but the actual process creation operation is completed when the program starts. Since the present invention is aimed at programmers, the programmer’s perspective is adopted here, and the initialization function is considered as the operation of creating a process, that is, each process has a main function, and the corresponding initialization function will be created by executing the initialization function in the main function. process, but all processes use the same source code and perform different operations according to the process number.

步骤S130，在主进程的主函数中保留原始程序的线程开辟、线程绑定线程函数的操作；Step S130, retaining the operations of thread development and thread binding thread function of the original program in the main function of the main process;

步骤S140，在主进程的热点线程函数中删除原有的计算代码，增加与子进程通信的代码；Step S140, delete the original calculation code in the hot thread function of the main process, and increase the code for communicating with the child process;

其中，在进入主进程的热点线程函数时，主进程向子进程发送任务长度，任务数据等任务相关信息；待完成全部信息发送时，主进程进入等待状态，等待子进程返回计算结果。Among them, when entering the hot thread function of the main process, the main process sends task-related information such as task length and task data to the sub-process; when all information is sent, the main process enters a waiting state and waits for the sub-process to return the calculation result.

步骤S150，在子进程的主函数中删除原程序的主函数内容，增加原始程序的热点线程函数计算部分和与主进程通信部分。Step S150, delete the main function content of the original program in the main function of the sub-process, and add the hot thread function calculation part and the communication part with the main process of the original program.

其中，具体步骤为：Among them, the specific steps are:

由子进程等待主进程发送任务长度信息，并根据任务长度分配内存空间；The child process waits for the main process to send task length information, and allocates memory space according to the task length;

由子进程等待接收主进程发送任务数据信息，并将收到的任务数据信息存入分配的内存空间；使用原始程序热点线程函数中计算部分调用的子函数或相应的代码对接收的任务数据进行计算；计算结束将计算结果发送至主进程；重复本步骤直到程序结束。The child process waits to receive the task data information sent by the main process, and stores the received task data information into the allocated memory space; uses the sub-function or corresponding code called by the calculation part of the hot thread function of the original program to calculate the received task data ;Calculation ends and the calculation result is sent to the main process; repeat this step until the program ends.

本发明针对上述方法实施例，相应地还提供了多线程程序下多节点并行架构的装置实施例，其结构是对现有的原始计算程序组成结构进行改进而构成的。Aiming at the above-mentioned method embodiment, the present invention also provides a device embodiment of a multi-node parallel architecture under a multi-thread program correspondingly, and its structure is formed by improving the composition structure of an existing original computing program.

如图2所示，是一个现有的典型的多线程计算程序组成的装置结构，其中：As shown in Figure 2, it is a device structure composed of an existing typical multi-thread computing program, wherein:

主函数模块210，包含创建线程模块220，通过调用创建线程模块220创建线程并绑定线程函数，计算模块230；The main function module 210 includes creating a thread module 220, creating a thread and binding a thread function by calling the creating thread module 220, and computing module 230;

创建线程模块220，用于创建线程并绑定线程函数；Create a thread module 220, which is used to create a thread and bind a thread function;

计算模块230，包括实现并行计算的热点计算函数，与线程绑定为线程函数；Calculation module 230, including hotspot calculation function for realizing parallel calculation, is bound with thread as thread function;

实际上，无论是否为热点计算函数，均可与线程绑定成为线程函数。在此，是将计算模块230的热点计算函数绑定为线程函数。In fact, whether it is a hotspot computing function or not, it can be bound to a thread to become a thread function. Here, the hotspot calculation function of the calculation module 230 is bound as a thread function.

需要说明的是，线程函数可以有多个，并与不同的线程绑定，在下面的实施例中，仅对一个线程函数进行改造，所以对其它函数不再列出。It should be noted that there may be multiple thread functions, which are bound to different threads. In the following embodiments, only one thread function is modified, so other functions are not listed.

核心计算模块240，包含在计算模块230中，通常为一个循环体，形式是通过代码段或子函数完成一次计算工作。The core calculation module 240 is included in the calculation module 230, and is generally a loop body, in the form of completing a calculation work through code segments or sub-functions.

每次调用核心计算模块240之间没有顺序依赖。即，多次调用核心计算模块240时，没有先后顺序的限制。There is no sequence dependency between calling the core computing module 240 each time. That is, when calling the core computing module 240 multiple times, there is no sequence restriction.

本发明通过对图2所示的装置结构进行改进，构成图3所示装置实施例的结构，包括相互连接的主函数模块310和主进程计算模块330，其中：The present invention constitutes the structure of the device embodiment shown in FIG. 3 by improving the device structure shown in FIG. 2 , including a main function module 310 and a main process calculation module 330 connected to each other, wherein:

主函数模块310，用于初始化进程参数，针对主进程和从进程运行相应的分支，通过多个子进程接收主进程传输的参与计算的数据并进行计算，将计算结果发送给主进程；The main function module 310 is used to initialize the process parameters, run corresponding branches for the main process and the slave process, receive and calculate the data involved in the calculation transmitted by the main process through multiple sub-processes, and send the calculation result to the main process;

主进程计算模块330，用于将需要计算的数据传递给子进程，并接收子进程的计算结果。The calculation module 330 of the main process is used to transmit the data to be calculated to the sub-processes and receive the calculation results of the sub-processes.

在上述装置实施例中，In the above device embodiment,

主函数模块310通过进程创建模块220运行进程初始化函数，初始化进程参数，包括创建线程并绑定线程函数；通过主函数判断模块320判断进程属于主进程或从进程，并针对不同进程运行相应的分支；通过子进程计算模块340使得多个子进程接收主进程传输的参与计算的数据，并通过调用核心计算模块240分别对接收的数据进行计算，并分别将计算结果发送给主进程。The main function module 310 runs the process initialization function through the process creation module 220, initializes process parameters, including creating threads and binding thread functions; through the main function judging module 320, it is judged that the process belongs to the main process or a slave process, and corresponding branches are run for different processes ; The sub-process calculation module 340 enables multiple sub-processes to receive the calculation data transmitted by the main process, and calculate the received data by calling the core calculation module 240, and send the calculation results to the main process respectively.

下面针对上述方法及装置实施例，给出相应的应用实例，其处理的原始程序伪代码如下：For the above-mentioned method and device embodiment, a corresponding application example is given below, and the pseudocode of the original program is as follows:

应用本发明上述装置实施例，使用MPI编程接口对上述原程序进行多节点并行化改进。By applying the above-mentioned device embodiment of the present invention, the multi-node parallelization improvement of the above-mentioned original program is carried out by using the MPI programming interface.

首先测试和分析程序，得知tFunc1占用了最多的程序运行时间，且其中的calcFunc函数多次运行之间没有依赖，可以并行执行。tFunc2占用的程序运行时间较少，故不予对其进行进程化改写。Firstly, test and analyze the program, and know that tFunc1 takes up the most program running time, and the calcFunc function has no dependencies between multiple runs, and can be executed in parallel. The program running time occupied by tFunc2 is less, so it will not be rewritten as a process.

然后对主函数（main）进行改写，增加初始化进程函数MPI_init和MPI_Finalize函数，分别对应MPI库的初始化和退出。Then rewrite the main function (main), add the initialization process functions MPI_init and MPI_Finalize functions, corresponding to the initialization and exit of the MPI library, respectively.

在MPI_init函数后，使用MPI函数获取进程的rank号（即进程号），每个进程会获取不同的rank号，通过判断rank号是否为0，判断主从进程（为0为主进程，非0为从进程）；并执行相应的主从进程分支。After the MPI_init function, use the MPI function to obtain the rank number of the process (that is, the process number). Each process will obtain a different rank number. By judging whether the rank number is 0, determine the master-slave process (0 is the master process, non-0 as the slave process); and execute the corresponding master-slave process branch.

如果rank号是0，则将原程序的main函数中的开辟线程的代码——Create_Threads(tFunc1)和Create_Threads(tFunc2)复制到rank=0的主进程分支当中。If the rank number is 0, copy the codes of creating threads in the main function of the original program——Create_Threads(tFunc1) and Create_Threads(tFunc2) to the main process branch of rank=0.

如果rank号不是0，则为子进程，需要执行计算任务。首先使用MPI函数接收任务长度，任务长度与子进程数量有关，即tFunc1函数中循环次数N的一部分，例如，子进程数量为K，任务长度M可能为N/K。然后根据M的数量开辟内存。然后从主进程接收任务数据到新开辟的内存中。使用原程序中的计算函数calcFunc()进行计算，此时外层循环的次数不是N而是M。计算后使用MPI函数将数据传回主进程。If the rank number is not 0, it is a child process that needs to perform calculation tasks. First use the MPI function to receive the task length. The task length is related to the number of sub-processes, that is, part of the loop number N in the tFunc1 function. For example, the number of sub-processes is K, and the task length M may be N/K. Then open up memory according to the number of M. Then receive task data from the main process into the newly opened memory. Use the calculation function calcFunc() in the original program to calculate, and at this time, the number of outer loops is not N but M. After calculation, use the MPI function to transfer the data back to the main process.

主进程需要执行原程序中的tFunc1。这里对tFunc1函数改造如下：The main process needs to execute tFunc1 in the original program. Here, the tFunc1 function is modified as follows:

首先计算任务分配方式，例如M=N/K。计算M有多种方式，本例以M=N/K为例。以从节点数K循环，使用MPI函数将任务数和任务数据分发至从节点。再次循环K次，使用MPI函数从从节点接收数据并汇总。First calculate the task allocation method, such as M=N/K. There are many ways to calculate M. This example takes M=N/K as an example. Cycle through the number of slave nodes K, and use the MPI function to distribute the task number and task data to the slave nodes. Loop again K times, use MPI function to receive data from slave nodes and summarize.

tFunc2保持不变。tFunc2 remains unchanged.

程序启动时，使用MPI程序的启动方式，进行多进程方式运行。When the program starts, use the MPI program startup mode to run in multi-process mode.

最终得到新程序伪代码如下：Finally, the pseudocode of the new program is as follows:

本领域的技术人员应该明白，上述的本发明的各模块或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上。并且，它们可用计算装置可执行的程序代码来实现，从而可将它们存储在存储装置中由计算装置来执行，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样，本发明不限制于任何特定的硬件和软件结合。Those skilled in the art should understand that the above-mentioned modules or steps of the present invention can be implemented by general-purpose computing devices, and they can be concentrated on a single computing device or distributed on a network composed of multiple computing devices. Moreover, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device and executed by a computing device, or they can be made into individual integrated circuit modules, or a plurality of modules or steps in them Made into a single integrated circuit module to achieve. As such, the present invention is not limited to any specific combination of hardware and software.

虽然本发明所揭露的实施方式如上，但所述的内容只是为了便于理解本发明而采用的实施方式，并非用以限定本发明。任何本发明所属技术领域内的技术人员，在不脱离本发明所揭露的精神和范围的前提下，可以在实施的形式上及细节上作任何的修改与变化，但本发明的专利保护范围，仍须以所附的权利要求书所界定的范围为准。Although the embodiments disclosed in the present invention are as above, the described content is only an embodiment adopted for the convenience of understanding the present invention, and is not intended to limit the present invention. Anyone skilled in the technical field to which the present invention belongs can make any modifications and changes in the form and details of the implementation without departing from the spirit and scope disclosed by the present invention, but the patent protection scope of the present invention, The scope defined by the appended claims must still prevail.

Claims

1. A method for multi-node parallel architecture under a multi-threaded program, comprising:

Create master and slave processes in the main function of the original program, and obtain information representing the master and slave processes respectively;

In the main function of the main process, the thread development of the original program and the operation of the thread binding thread function are retained; in the hot thread function of the main process, the original calculation code is deleted, and the code for communicating with the sub-process is added; in the main process of the sub-process In the function, the content of the main function of the original program is deleted, and the hot thread function calculation part and the communication part with the main process of the original program are added.

2. according to the described method of claim 1, it is characterized in that, also comprise before this method step:

The hot thread function is measured according to the original program, and it is determined through analysis that the hot thread function can be optimized into a multi-process parallel architecture.

3. according to the described method of claim 2, it is characterized in that, in the main function of original program, create master, slave process, and obtain the information representing respectively master, slave process; Specifically comprise:

In the main function, use the message passing interface initialization function to create the master and slave processes, and obtain the corresponding process numbers through the message passing interface to obtain the process number function, wherein the process number 0 represents the master process, and the remaining non-zero process numbers represent the slave processes.

4. according to the described method of claim 3, it is characterized in that, described in the hot spot thread function of main process, delete original calculation code, increase the code that communicates with child process, specifically comprise:

When entering the hot thread function of the main process, the main process sends to the child process the task-related information including task length and task data; when all information is sent, the main process enters a waiting state and waits for the child process to return the calculation result.

5. according to the described method of claim 3, it is characterized in that, in the main function of described subprocess, delete the main function content of original program, increase the hot spot thread function calculation part of original program and communicate with main process part, specifically comprise :

The child process waits for the main process to send the information of the task length, and allocates memory space according to the task length;

The sub-process waits to receive the information of the task data sent by the main process, and stores the received task data information into the allocated memory space; use the sub-function or corresponding code called by the calculation part in the original program hotspot thread function to receive The task data is calculated; the calculation is completed and the calculation result is sent to the main process; repeat this step until the end of the program.

6. A device with multi-node parallel architecture under a multi-threaded program, characterized in that it includes a main function module and a main process computing module connected to each other, wherein:

The main function module is used to initialize the process parameters, run the corresponding branches for the main process and the slave process, receive the calculation data transmitted by the main process through multiple sub-processes and perform calculations, and send the calculation results to the main process;

The calculation module of the main process is used to pass the data to be calculated to the sub-processes and receive the calculation results of the sub-processes.

7. The device according to claim 6, characterized in that,

The main function module runs the process initialization function through the process creation module, initializes process parameters, including creating threads and binding thread functions; judges that the process belongs to the main process or the slave process through the main function judgment module, and runs corresponding branches for different processes; Through the sub-process calculation module, multiple sub-processes receive the calculation data transmitted by the main process, respectively calculate the received data by calling the core calculation module, and send the calculation results to the main process respectively.