[go: up one dir, main page]

CN104932947B - A kind of fence synchronous method and equipment - Google Patents

A kind of fence synchronous method and equipment Download PDF

Info

Publication number
CN104932947B
CN104932947B CN201410098952.1A CN201410098952A CN104932947B CN 104932947 B CN104932947 B CN 104932947B CN 201410098952 A CN201410098952 A CN 201410098952A CN 104932947 B CN104932947 B CN 104932947B
Authority
CN
China
Prior art keywords
fence
processor core
synchronization
queue
predetermined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410098952.1A
Other languages
Chinese (zh)
Other versions
CN104932947A (en
Inventor
徐卫志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Institute of Computing Technology of CAS
Original Assignee
Huawei Technologies Co Ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Institute of Computing Technology of CAS filed Critical Huawei Technologies Co Ltd
Priority to CN201410098952.1A priority Critical patent/CN104932947B/en
Publication of CN104932947A publication Critical patent/CN104932947A/en
Application granted granted Critical
Publication of CN104932947B publication Critical patent/CN104932947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)

Abstract

The invention discloses a kind of fence synchronous method and equipment, are related to the communications field, solve in the case where number of threads increases, due to having the problem of chip process performance of multinuclear or many-core processor declines caused by access bottleneck.Concrete scheme is:First processor core determines that currently processed multi-threaded program goes to predetermined fence synchronous point;The first processor core is any one in all processor cores that chip includes;Target fence sychronisation is determined according to the corresponding fence mark of the predetermined fence synchronous point;Fence synchronization message is sent to the target fence sychronisation;The number of synchronous multi-threaded program is identified and participated in the fence synchronization message comprising the fence.During the present invention is for fence synchronization.

Description

一种栅栏同步方法及设备A fence synchronization method and device

技术领域technical field

本发明涉及通信领域,尤其涉及一种栅栏同步方法及设备。The present invention relates to the communication field, in particular to a fence synchronization method and equipment.

背景技术Background technique

传统的单核处理器通常通过采用超标量和流水处理技术来提高处理器的主频,以达到提高处理器性能的目的,但是主频的提高会导致处理器的功耗增大,且会导致处理器的散热不好。并且,随着半导体工艺的不断发展,芯片上可集成的晶体管数目逐渐增多,体系结构设计者为了在提高处理器性能的同时降低处理器的功耗,并使处理器有良好的散热,提出了采用线程级粗粒度并行技术的多核或众核处理器。Traditional single-core processors usually increase the main frequency of the processor by using superscalar and pipeline processing technology to achieve the purpose of improving processor performance, but the increase of main frequency will lead to increased power consumption of the processor, and will lead to The heat dissipation of the processor is not good. Moreover, with the continuous development of semiconductor technology, the number of transistors that can be integrated on the chip is gradually increasing. In order to improve the performance of the processor and reduce the power consumption of the processor at the same time, and to make the processor have good heat dissipation, the architecture designer proposes Multi-core or many-core processors with thread-level coarse-grained parallelism.

由于多核或众核处理器采用的是多线程进行数据处理的,因此需采用栅栏同步来确保多个线程程序之间数据的正确传播以及线程程序执行语义的正确性,由此可知,栅栏同步对多核或众核处理器而言是非常重要的。在现有技术中,通过在芯片上设置一个同步管理装置来实现栅栏同步,其具体的实现过程是:在具有多核或众核处理器的芯片中,当某个处理器核处理的线程程序执行到预定的同步点时,该处理器核向同步管理装置发送用于通知自身处理的线程程序已执行到预定的同步点的通知消息,以便同步管理装置统计参与同步的线程程序中是否所有的线程程序均已执行到预定的同步点,并在所有的线程程序均已执行到预定的同步点时,向所有参与同步的线程程序中的每个线程程序对应的处理器核发送继续执行的指令,以便所有的处理器核继续处理线程程序。Since multi-core or many-core processors use multi-threads for data processing, barrier synchronization is required to ensure the correct propagation of data among multiple thread programs and the correctness of thread program execution semantics. It is very important for multi-core or many-core processors. In the prior art, barrier synchronization is realized by setting a synchronization management device on the chip. The specific implementation process is: in a chip with multi-core or many-core processors, when a thread program processed by a certain processor core executes When the predetermined synchronization point is reached, the processor core sends a notification message to the synchronization management device to notify that the thread program processed by itself has been executed to the predetermined synchronization point, so that the synchronization management device counts whether all the threads in the thread program participating in the synchronization The programs have all been executed to a predetermined synchronization point, and when all thread programs have been executed to a predetermined synchronization point, an instruction to continue execution is sent to the processor core corresponding to each thread program in all thread programs participating in the synchronization, So that all processor cores continue to process the thread program.

现有技术中至少存在如下问题:由于仅在芯片上设置了一个同步管理装置,因此在具有多核或众核处理器的芯片中,当处理器核处理的线程程序执行到预定的同步点,均需要向该同一个同步管理装置中发送用于通知自身处理的线程程序已执行达预定的同步点的通知消息,这样,当线程数目增多时,会产生严重的访问瓶颈,导致多个线程程序协同执行的速度变慢,从而导致具有多核或众核处理器的芯片的处理性能下降。There are at least the following problems in the prior art: since only one synchronization management device is set on the chip, in a chip with multi-core or many-core processors, when the thread program processed by the processor core is executed to a predetermined synchronization point, all It is necessary to send a notification message to the same synchronization management device to notify that the thread program it handles has been executed to a predetermined synchronization point. In this way, when the number of threads increases, a serious access bottleneck will be generated, resulting in the coordination of multiple thread programs. Execution is slower, resulting in reduced processing performance on chips with multi-core or many-core processors.

发明内容Contents of the invention

本发明提供一种栅栏同步方法及设备,解决了在线程数目增多的情况下,由于访问瓶颈导致的具有多核或众核处理器的芯片处理性能下降的问题。The invention provides a barrier synchronization method and device, which solves the problem of the chip processing performance degradation caused by access bottlenecks caused by multi-core or many-core processors when the number of threads increases.

为达到上述目的,本发明采用如下技术方案:To achieve the above object, the present invention adopts the following technical solutions:

本发明的第一方面,提供一种栅栏同步方法,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,所述方法包括:The first aspect of the present invention provides a barrier synchronization method, which is applied to a chip with a multi-core or many-core processor, and at least two barrier synchronization devices are provided on the chip, and the method includes:

第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点;所述第一处理器核为所述芯片包含的所有处理器核中的任意一个;The first processor core determines that the currently processed thread program executes to a predetermined fence synchronization point; the first processor core is any one of all processor cores included in the chip;

根据所述预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置;determining a target barrier synchronization device according to the barrier identification corresponding to the predetermined barrier synchronization point;

向所述目标栅栏同步装置发送栅栏同步消息;所述栅栏同步消息中包含所述栅栏标识以及参与同步的线程程序的个数。Sending a barrier synchronization message to the target barrier synchronization device; the barrier synchronization message includes the barrier identifier and the number of thread programs participating in the synchronization.

结合第一方面,在一种可能的实现方式中,所述根据所述预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置,包括:With reference to the first aspect, in a possible implementation manner, the determining the target barrier synchronization device according to the barrier identifier corresponding to the predetermined barrier synchronization point includes:

根据所述预定的栅栏同步点对应的栅栏标识,按照预设规则确定所述目标栅栏同步装置;所述预设规则包括栅栏标识与栅栏同步装置的映射关系。According to the barrier identification corresponding to the predetermined barrier synchronization point, the target barrier synchronization device is determined according to a preset rule; the preset rule includes a mapping relationship between a barrier identification and a barrier synchronization device.

结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,在所述向所述目标栅栏同步装置发送栅栏同步消息之后,还包括:With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, after sending the barrier synchronization message to the target barrier synchronization device, the method further includes:

暂停对所述当前处理的线程程序的处理,进入等待状态。Suspend processing of the currently processed thread program and enter a waiting state.

结合第一方面和上述可能的实现方式,在另一种可能的实现方式中,在所述暂停对所述当前处理的线程程序的处理,进入等待状态之后,还包括:In combination with the first aspect and the above possible implementation manner, in another possible implementation manner, after suspending the processing of the currently processed thread program and entering the waiting state, further include:

接收所述目标栅栏同步装置发送的确认消息;所述确认消息用于通知所述第一处理器核继续处理所述当前处理的线程程序;receiving an acknowledgment message sent by the target fence synchronization device; the acknowledgment message is used to notify the first processor core to continue processing the currently processed thread program;

继续处理所述当前处理的线程程序。Continue to process the currently processed thread program.

本发明的第二方面,提供一种栅栏同步方法,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,所述方法包括:The second aspect of the present invention provides a barrier synchronization method, which is applied to a chip with a multi-core or many-core processor, and at least two barrier synchronization devices are provided on the chip, and the method includes:

目标栅栏同步装置接收第一处理器核发送的栅栏同步消息;所述栅栏同步消息为所述第一处理器核在确定当前处理的线程程序执行到预定的栅栏同步点时发送的,所述第一处理器核为所述芯片包含的所有处理器核中的任意一个,所述栅栏同步消息中包含所述预定的栅栏同步点对应的栅栏标识以及参与同步的线程程序的个数;所述目标栅栏同步装置为用于处理所述预定的栅栏同步点对应的处理器核发送的栅栏同步消息的栅栏同步装置;The target barrier synchronization device receives a barrier synchronization message sent by the first processor core; the barrier synchronization message is sent by the first processor core when it is determined that the currently processed thread program has been executed to a predetermined barrier synchronization point, and the first A processor core is any one of all processor cores included in the chip, and the fence synchronization message includes the fence identification corresponding to the predetermined fence synchronization point and the number of thread programs participating in the synchronization; the target The fence synchronization device is a fence synchronization device for processing the fence synchronization message sent by the processor core corresponding to the predetermined fence synchronization point;

根据所述预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1;所述第一队列为与所述栅栏标识对应的用于标识所有参与同步的线程程序状态的队列;所述第一队列包含所述栅栏标识、队列状态、所述计数字段。Add 1 to the count value of the count field contained in the first queue according to the fence identifier corresponding to the predetermined fence synchronization point; the first queue is used to identify all thread program states participating in synchronization corresponding to the fence identifier Queue; the first queue includes the fence identifier, queue status, and count fields.

结合第二方面,在一种可能的实现方式中,在所述根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之前,还包括:With reference to the second aspect, in a possible implementation manner, before adding 1 to the count value of the count field included in the first queue according to the fence identifier, the method further includes:

判断是否存在所述第一队列;judging whether the first queue exists;

当不存在所述第一队列时,创建所述第一队列,并将所述队列状态更新为使用状态。When the first queue does not exist, create the first queue, and update the status of the queue to a usage status.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,所述第一队列还包含已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息;With reference to the second aspect and the foregoing possible implementation manner, in another possible implementation manner, the first queue further includes identification information of a processor core corresponding to a thread program that has been executed to the predetermined fence synchronization point;

在所述接收第一处理器核发送的栅栏同步消息之后,还包括:After receiving the fence synchronization message sent by the first processor core, the method further includes:

将所述第一处理器核的标识信息添加至所述第一队列中。Add the identification information of the first processor core to the first queue.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,在所述将所述第一处理器核的标识信息添加至所述第一队列中之前,还包括:With reference to the second aspect and the foregoing possible implementation manner, in another possible implementation manner, before adding the identification information of the first processor core to the first queue, the method further includes:

判断所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数是否小于预设阈值;所述预设阈值小于或等于所述芯片支持的最大线程数目;Judging whether the number of identification information of processor cores corresponding to the thread program that has been executed to the predetermined fence synchronization point is less than a preset threshold; the preset threshold is less than or equal to the maximum number of threads supported by the chip;

当确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数小于所述预设阈值时,执行所述将所述第一处理器核的标识信息添加至所述第一队列中。When it is determined that the number of identification information of processor cores corresponding to the thread program that has been executed to the predetermined barrier synchronization point is less than the preset threshold, performing the step of adding the identification information of the first processor core added to the first queue.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,还包括:In combination with the second aspect and the foregoing possible implementation manner, another possible implementation manner further includes:

当确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数不小于所述预设阈值时,将所述第一处理器核的标识信息保存至内存中。When it is determined that the number of identification information of processor cores corresponding to the thread program that has been executed to the predetermined barrier synchronization point is not less than the preset threshold, the identification information of the first processor core is saved to in memory.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,所述第一队列还包括用于标识所有参与同步的线程程序中的每个线程程序是否执行到所述预定的栅栏同步点的比特序列,所述比特序列中的每个比特位与处理器核的标识信息存在映射关系;With reference to the second aspect and the above possible implementation manner, in another possible implementation manner, the first queue further includes a function for identifying whether each thread program in all thread programs participating in the synchronization is executed to the predetermined A bit sequence of the barrier synchronization point, where each bit in the bit sequence has a mapping relationship with the identification information of the processor core;

在所述接收第一处理器核发送的栅栏同步消息之后,还包括:After receiving the fence synchronization message sent by the first processor core, the method further includes:

将与所述第一处理器核的标识信息对应的比特位由第一标识更新为第二标识;所述第一标识用于标识由处理器核处理的线程程序未执行到所述预定的栅栏同步点,所述第二标识用于标识由处理器核处理的线程程序已执行到所述预定的栅栏同步点。updating the bit corresponding to the identification information of the first processor core from a first identification to a second identification; the first identification is used to identify that the thread program processed by the processor core has not been executed to the predetermined barrier A synchronization point, the second identifier is used to identify that the thread program processed by the processor core has been executed to the predetermined fence synchronization point.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,在所述根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之后,还包括:With reference to the second aspect and the above possible implementation manner, in another possible implementation manner, after adding 1 to the count value of the count field contained in the first queue according to the barrier identifier, the method further includes:

判断所述计数字段的计数数值是否等于所述参与同步的线程程序的个数;judging whether the count value of the count field is equal to the number of thread programs participating in the synchronization;

当所述计数字段的计数数值等于所述参与同步的线程程序的个数时,获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息;When the count value of the count field is equal to the number of the thread programs participating in the synchronization, obtain the identification information of the processor core corresponding to each of the thread programs participating in the synchronization among all the thread programs participating in the synchronization;

根据所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,向所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核发送确认消息;所述确认消息用于通知所述处理器核继续处理需自身处理的线程程序。Send a confirmation message to the processor core corresponding to each of the thread programs participating in the synchronization according to the identification information of the processor core corresponding to each thread program participating in the synchronization among all the thread programs participating in the synchronization; The confirmation message is used to notify the processor core to continue processing the thread program that needs to be processed by itself.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,所述获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,包括:In combination with the second aspect and the above possible implementation manner, in another possible implementation manner, the obtaining the identification information of the processor core corresponding to each thread program participating in the synchronization among all the thread programs participating in the synchronization includes:

从所述第一队列中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。The identification information of the processor core corresponding to each of the thread programs participating in the synchronization among all the thread programs participating in the synchronization is obtained from the first queue.

结合第二方面和上述可能的实现方式,在另一种可能的实现方式中,所述获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,包括:In combination with the second aspect and the above possible implementation manner, in another possible implementation manner, the obtaining the identification information of the processor core corresponding to each thread program participating in the synchronization among all the thread programs participating in the synchronization includes:

从所述第一队列和所述内存中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。The identification information of the processor core corresponding to each of the thread programs participating in the synchronization among all the thread programs participating in the synchronization is obtained from the first queue and the memory.

本发明的第三方面,提供一种第一处理器核,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,所述第一处理器核,包括:A third aspect of the present invention provides a first processor core, which is applied to a chip with a multi-core or many-core processor, and at least two barrier synchronization devices are provided on the chip, and the first processor core, include:

第一确定单元,用于确定当前处理的线程程序执行到预定的栅栏同步点;所述第一处理器核为所述芯片包含的所有处理器核中的任意一个;The first determining unit is configured to determine that the currently processed thread program executes to a predetermined fence synchronization point; the first processor core is any one of all processor cores included in the chip;

第二确定单元,用于根据所述预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置;The second determining unit is configured to determine a target barrier synchronization device according to the barrier identification corresponding to the predetermined barrier synchronization point;

发送单元,用于向所述第二确定单元得到的所述目标栅栏同步装置发送栅栏同步消息;所述栅栏同步消息中包含所述栅栏标识以及参与同步的线程程序的个数。A sending unit, configured to send a barrier synchronization message to the target barrier synchronization device obtained by the second determining unit; the barrier synchronization message includes the barrier identifier and the number of thread programs participating in the synchronization.

结合第三方面,在一种可能的实现方式中,所述第二确定单元,具体用于:With reference to the third aspect, in a possible implementation manner, the second determining unit is specifically configured to:

根据所述预定的栅栏同步点对应的栅栏标识,按照预设规则确定所述目标栅栏同步装置;所述预设规则包括栅栏标识与栅栏同步装置的映射关系。According to the barrier identification corresponding to the predetermined barrier synchronization point, the target barrier synchronization device is determined according to a preset rule; the preset rule includes a mapping relationship between a barrier identification and a barrier synchronization device.

结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,还包括:In combination with the third aspect and the above possible implementation manner, another possible implementation manner further includes:

第一处理单元,用于在所述发送单元向所述目标栅栏同步装置发送栅栏同步消息之后,暂停对所述当前处理的线程程序的处理,进入等待状态。The first processing unit is configured to suspend the processing of the currently processed thread program and enter a waiting state after the sending unit sends the fence synchronization message to the target fence synchronization device.

结合第三方面和上述可能的实现方式,在另一种可能的实现方式中,还包括:In combination with the third aspect and the above possible implementation manner, another possible implementation manner further includes:

接收单元,用于在所述第一处理单元暂停对所述当前处理的线程程序的处理,进入等待状态之后,接收所述目标栅栏同步装置发送的确认消息;所述确认消息用于通知所述第一处理器核继续处理所述当前处理的线程程序;The receiving unit is configured to receive a confirmation message sent by the target barrier synchronization device after the first processing unit suspends the processing of the currently processed thread program and enters a waiting state; the confirmation message is used to notify the The first processor core continues to process the currently processed thread program;

第二处理单元,用于继续处理所述当前处理的线程程序。The second processing unit is configured to continue processing the currently processed thread program.

本发明的第四方面,提供一种目标栅栏同步装置,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,所述目标栅栏同步装置,包括:According to the fourth aspect of the present invention, a target barrier synchronization device is provided, which is applied to a chip with a multi-core or many-core processor, and at least two barrier synchronization devices are provided on the chip, and the target barrier synchronization device includes:

接收单元,用于接收第一处理器核发送的栅栏同步消息;所述栅栏同步消息为所述第一处理器核在确定当前处理的线程程序执行到预定的栅栏同步点时发送的,所述第一处理器核为所述芯片包含的所有处理器核中的任意一个,所述栅栏同步消息中包含所述预定的栅栏同步点对应的栅栏标识以及参与同步的线程程序的个数;所述目标栅栏同步装置为用于处理所述预定的栅栏同步点对应的处理器核发送的栅栏同步消息的栅栏同步装置;The receiving unit is configured to receive a barrier synchronization message sent by the first processor core; the barrier synchronization message is sent by the first processor core when it is determined that the currently processed thread program has executed to a predetermined barrier synchronization point, and the The first processor core is any one of all processor cores included in the chip, and the barrier synchronization message includes the barrier identification corresponding to the predetermined barrier synchronization point and the number of thread programs participating in the synchronization; The target barrier synchronization device is a barrier synchronization device for processing a barrier synchronization message sent by the processor core corresponding to the predetermined barrier synchronization point;

处理单元,用于根据所述接收单元的得到的所述栅栏同步消息中包含的所述预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1;所述第一队列为与所述栅栏标识对应的用于标识所有参与同步的线程程序状态的队列;所述第一队列包含所述栅栏标识、队列状态、所述计数字段。A processing unit, configured to add 1 to the count value of the count field contained in the first queue according to the barrier identifier corresponding to the predetermined barrier synchronization point contained in the barrier synchronization message obtained by the receiving unit; The queue is a queue corresponding to the fence identifier and used to identify the program states of all threads participating in the synchronization; the first queue includes the fence identifier, queue status, and the count field.

结合第四方面,在一种可能的实现方式中,还包括:In combination with the fourth aspect, in a possible implementation manner, it also includes:

判断单元,用于在所述处理单元根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之前,判断是否存在所述第一队列;A judging unit, configured to judge whether the first queue exists before the processing unit adds 1 to the count value of the count field contained in the first queue according to the barrier identifier;

创建更新单元,用于当所述判断单元得到不存在所述第一队列时,创建所述第一队列,并将所述队列状态更新为使用状态。An update unit is created, configured to create the first queue and update the state of the queue to an in-use state when the judging unit obtains that the first queue does not exist.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,所述第一队列还包含已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息;With reference to the fourth aspect and the foregoing possible implementation manner, in another possible implementation manner, the first queue further includes identification information of processor cores corresponding to thread programs that have been executed to the predetermined fence synchronization point;

所述目标栅栏同步装置,还包括:The target fence synchronization device also includes:

添加单元,用于在所述接收单元接收第一处理器核发送的栅栏同步消息之后,将所述第一处理器核的标识信息添加至所述第一队列中。The adding unit is configured to add the identification information of the first processor core to the first queue after the receiving unit receives the fence synchronization message sent by the first processor core.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,In combination with the fourth aspect and the above possible implementation, in another possible implementation,

所述判断单元,还用于在所述添加单元将所述第一处理器核的标识信息添加至所述第一队列中之前,判断所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数是否小于预设阈值;所述预设阈值小于或等于所述芯片支持的最大线程数目;The judging unit is further configured to judge the thread program that has been executed to the predetermined fence synchronization point before the adding unit adds the identification information of the first processor core to the first queue Whether the number of identification information of the corresponding processor core is less than a preset threshold; the preset threshold is less than or equal to the maximum number of threads supported by the chip;

所述添加单元,具体用于当所述判断单元确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数小于所述预设阈值时,将所述第一处理器核的标识信息添加至所述第一队列中。The adding unit is specifically configured to, when the judging unit determines that the number of identification information of the processor core corresponding to the thread program that has been executed to the predetermined barrier synchronization point is less than the preset threshold, add the Add the identification information of the first processor core to the first queue.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,还包括:In combination with the fourth aspect and the foregoing possible implementation manners, another possible implementation manner further includes:

保存单元,用于当所述判断单元确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数不小于所述预设阈值时,将所述第一处理器核的标识信息保存至内存中。A saving unit, configured to store the number of identification information of the processor core corresponding to the thread program that has been executed to the predetermined fence synchronization point by the judging unit is not less than the preset threshold The identification information of a processor core is stored in the memory.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,所述第一队列还包括用于标识所有参与同步的线程程序中的每个线程程序是否执行到所述预定的栅栏同步点的比特序列,所述比特序列中的每个比特位与处理器核的标识信息存在映射关系;With reference to the fourth aspect and the above possible implementation manner, in another possible implementation manner, the first queue further includes a function for identifying whether each thread program in all thread programs participating in the synchronization is executed to the predetermined A bit sequence of the barrier synchronization point, where each bit in the bit sequence has a mapping relationship with the identification information of the processor core;

所述目标栅栏同步装置,还包括:The target fence synchronization device also includes:

更新单元,用于在所述接收单元接收第一处理器核发送的栅栏同步消息之后,将与所述第一处理器核的标识信息对应的比特位由第一标识更新为第二标识;所述第一标识用于标识由处理器核处理的线程程序未执行到所述预定的栅栏同步点,所述第二标识用于标识由处理器核处理的线程程序已执行到所述预定的栅栏同步点。An updating unit, configured to update the bit corresponding to the identification information of the first processor core from the first identification to the second identification after the receiving unit receives the fence synchronization message sent by the first processor core; The first flag is used to identify that the thread program processed by the processor core has not been executed to the predetermined barrier synchronization point, and the second flag is used to identify that the thread program processed by the processor core has been executed to the predetermined barrier synchronization point.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,In combination with the fourth aspect and the above possible implementation, in another possible implementation,

所述判断单元,还用于在所述处理单元根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之后,判断所述计数字段的计数数值是否等于所述参与同步的线程程序的个数;The judgment unit is further configured to judge whether the count value of the count field is equal to the thread program participating in the synchronization after the processing unit adds 1 to the count value of the count field included in the first queue according to the barrier identifier the number of

所述目标栅栏同步装置,还包括:The target fence synchronization device also includes:

获取单元,用于当所述判断单元得到所述计数字段的计数数值等于所述参与同步的线程程序的个数时,获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息;An acquisition unit, configured to acquire the processor corresponding to each thread program participating in synchronization among all thread programs participating in synchronization when the count value of the counting field obtained by the judging unit is equal to the number of thread programs participating in synchronization Nuclear identification information;

发送单元,用于根据所述获取单元得到的所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,向所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核发送确认消息;所述确认消息用于通知所述处理器核继续处理需自身处理的线程程序。The sending unit is configured to, according to the identification information of the processor core corresponding to each of the thread programs participating in the synchronization obtained by the acquiring unit, send to each of the thread programs participating in the synchronization The processor core corresponding to the thread program sends an acknowledgment message; the acknowledgment message is used to notify the processor core to continue processing the thread program that needs to be processed by itself.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,所述获取单元,具体用于:In combination with the fourth aspect and the foregoing possible implementation manner, in another possible implementation manner, the acquiring unit is specifically configured to:

从所述第一队列中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。The identification information of the processor core corresponding to each of the thread programs participating in the synchronization among all the thread programs participating in the synchronization is obtained from the first queue.

结合第四方面和上述可能的实现方式,在另一种可能的实现方式中,所述获取单元,具体用于:In combination with the fourth aspect and the foregoing possible implementation manner, in another possible implementation manner, the acquiring unit is specifically configured to:

从所述第一队列和所述内存中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。The identification information of the processor core corresponding to each of the thread programs participating in the synchronization among all the thread programs participating in the synchronization is obtained from the first queue and the memory.

本发明提供的栅栏同步方法及设备,当第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点时,根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置,然后向该目标栅栏同步装置发送栅栏同步消息,通过根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。In the barrier synchronization method and device provided by the present invention, when the first processor core determines that the currently processed thread program has been executed to a predetermined barrier synchronization point, it determines the target barrier synchronization device according to the barrier identification corresponding to the predetermined barrier synchronization point, and then sends to the barrier synchronization device The target barrier synchronization device sends a barrier synchronization message, and by determining the target barrier synchronization device for processing its own barrier synchronization message according to the predetermined barrier synchronization point, different barrier synchronization points can be mapped to different barrier synchronization devices, so that when the number of threads increases In this way, access bottlenecks are avoided, and the processing performance of chips with multi-core or many-core processors is improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明实施例1提供的一种栅栏同步方法流程图;FIG. 1 is a flow chart of a barrier synchronization method provided in Embodiment 1 of the present invention;

图2为本发明实施例2提供的一种栅栏同步方法流程图;FIG. 2 is a flow chart of a fence synchronization method provided in Embodiment 2 of the present invention;

图3为本发明实施例3提供的一种设置有4个栅栏同步装置的具有多核或众核处理器的芯片的结构示意图;3 is a schematic structural diagram of a chip with multi-core or many-core processors provided with four barrier synchronization devices according to Embodiment 3 of the present invention;

图4为本发明实施例3提供的一种栏同步方法流程图;FIG. 4 is a flowchart of a column synchronization method provided by Embodiment 3 of the present invention;

图5为本发明实施例4提供的一种第一处理器核的组成示意图;FIG. 5 is a schematic diagram of the composition of a first processor core provided by Embodiment 4 of the present invention;

图6为本发明实施例4提供的另一种第一处理器核的组成示意图;FIG. 6 is a schematic composition diagram of another first processor core provided by Embodiment 4 of the present invention;

图7为本发明实施例5提供的一种目标栅栏同步装置的组成示意图;FIG. 7 is a schematic diagram of the composition of a target fence synchronization device provided in Embodiment 5 of the present invention;

图8为本发明实施例5提供的另一种目标栅栏同步装置的组成示意图;FIG. 8 is a schematic diagram of the composition of another target fence synchronization device provided in Embodiment 5 of the present invention;

图9为本发明实施例6提供的一种栅栏同步设备的组成示意图;FIG. 9 is a schematic diagram of the composition of a fence synchronization device provided in Embodiment 6 of the present invention;

图10为本发明实施例7提供的一种栅栏同步设备的组成示意图。FIG. 10 is a schematic diagram of a barrier synchronization device provided by Embodiment 7 of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。Additionally, the terms "system" and "network" are often used herein interchangeably. The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.

实施例1Example 1

本发明实施例1提供一种栅栏同步方法,应用于具有多核或众核处理器的芯片中,该芯片上设置有至少两个栅栏同步装置,如图1所示,该方法可以包括:Embodiment 1 of the present invention provides a barrier synchronization method, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip. As shown in FIG. 1, the method may include:

101、第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点。其中,第一处理器核为芯片包含的所有处理器核中的任意一个。101. The first processor core determines that a currently processed thread program is executed to a predetermined barrier synchronization point. Wherein, the first processor core is any one of all processor cores included in the chip.

102、第一处理器核根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置。102. The first processor core determines a target barrier synchronization device according to a barrier identifier corresponding to a predetermined barrier synchronization point.

其中,本发明实施例的栅栏同步方法,应用于具有多核或众核处理器的芯片中,且该芯片上设置有至少两个栅栏同步装置,当所有处理器核中的任意一个处理器核,即第一处理器核确定其当前处理的线程程序执行到预定的栅栏同步点时,第一处理器核首先根据栅栏同步点对应的栅栏标识从芯片中包含的至少两个栅栏同步装置中确定目标栅栏同步装置。Wherein, the fence synchronization method of the embodiment of the present invention is applied to a chip with a multi-core or many-core processor, and at least two fence synchronization devices are arranged on the chip, when any processor core in all processor cores, That is, when the first processor core determines that the thread program currently being processed has been executed to a predetermined fence synchronization point, the first processor core first determines the target from at least two fence synchronization devices included in the chip according to the fence identification corresponding to the fence synchronization point. Fence synchronization device.

103、第一处理器核向目标栅栏同步装置发送栅栏同步消息。103. The first processor core sends a fence synchronization message to the target fence synchronization device.

其中,栅栏同步消息中包含栅栏标识以及参与同步的线程程序的个数。第一处理器核向确定的目标栅栏同步装置发送用于栅栏同步消息。Wherein, the barrier synchronization message includes the barrier identifier and the number of thread programs participating in the synchronization. The first processor core sends a message for fence synchronization to the determined target fence synchronization device.

本发明实施例提供的栅栏同步方法,当第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点时,根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置,然后向该目标栅栏同步装置发送栅栏同步消息,通过根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。In the fence synchronization method provided by the embodiment of the present invention, when the first processor core determines that the currently processed thread program has been executed to a predetermined fence synchronization point, it determines the target fence synchronization device according to the fence identification corresponding to the predetermined fence synchronization point, and then sends the The target barrier synchronization device sends a barrier synchronization message, and by determining the target barrier synchronization device for processing its own barrier synchronization message according to the predetermined barrier synchronization point, different barrier synchronization points can be mapped to different barrier synchronization devices, so that when the number of threads increases In this way, access bottlenecks are avoided, and the processing performance of chips with multi-core or many-core processors is improved.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能。Moreover, compared with the software method, the hardware method is used to realize fence synchronization, which has a higher processing speed, and further improves the processing performance of chips with multi-core or many-core processors.

实施例2Example 2

本发明实施例2提供一种栅栏同步方法,应用于具有多核或众核处理器的芯片中,该芯片上设置有至少两个栅栏同步装置,如图2所示,该方法可以包括:Embodiment 2 of the present invention provides a barrier synchronization method, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip. As shown in FIG. 2, the method may include:

201、目标栅栏同步装置接收第一处理器核发送的栅栏同步消息。201. The target barrier synchronization device receives a barrier synchronization message sent by the first processor core.

其中,栅栏同步消息为第一处理器核在确定当前处理的线程程序执行到预定的栅栏同步点时发送的,第一处理器核为芯片包含的所有处理器核中的任意一个,栅栏同步消息中包含栅栏同步点对应的栅栏标识以及参与同步的线程程序的个数,目标栅栏同步装置为用于处理预定的栅栏同步点对应的处理器核发送的栅栏同步消息的栅栏同步装置。Wherein, the barrier synchronization message is sent by the first processor core when it is determined that the currently processed thread program executes to a predetermined barrier synchronization point, the first processor core is any one of all processor cores contained in the chip, and the barrier synchronization message contains the fence identifier corresponding to the fence synchronization point and the number of thread programs participating in the synchronization, and the target fence synchronization device is a fence synchronization device for processing the fence synchronization message sent by the processor core corresponding to the predetermined fence synchronization point.

202、目标栅栏同步装置根据栅栏标识将第一队列包含的计数字段的计数数值加1。202. The target fence synchronization device adds 1 to the count value of the count field included in the first queue according to the fence identifier.

其中,第一队列为与栅栏标识对应的用于标识所有参与同步的线程程序状态的队列,第一队列包含栅栏标识、队列状态、计数字段。Wherein, the first queue is a queue corresponding to the fence identifier and used to identify the program states of all threads participating in the synchronization, and the first queue includes fence identifier, queue status, and count fields.

具体的,当第一处理器核确定自身当前处理的线程程序执行到预定的栅栏同步点时,首先根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置,然后向该目标栅栏同步装置发送栅栏同步消息,此时目标栅栏同步装置便可以接收第一处理器核发送的用于通知自身处理的线程程序已执行到预定的栅栏同步点的栅栏同步消息,并根据接收到的栅栏同步消息中包含的栅栏标识将与该栅栏标识对应的第一队列包含的计数字段的计数数值加1,以便记录参与同步的线程程序中已执行到预定的栅栏同步点的线程程序的个数。Specifically, when the first processor core determines that the thread program currently being processed by itself has been executed to a predetermined barrier synchronization point, it first determines the target barrier synchronization device according to the barrier identification corresponding to the predetermined barrier synchronization point, and then sends a message to the target barrier synchronization device barrier synchronization message, the target barrier synchronization device can receive the barrier synchronization message sent by the first processor core for notifying itself that the thread program for processing has been executed to a predetermined barrier synchronization point, and according to the barrier synchronization message received The included fence identifier adds 1 to the count value of the count field included in the first queue corresponding to the fence identifier, so as to record the number of thread programs that have been executed to the predetermined barrier synchronization point among the thread programs participating in the synchronization.

本发明实施例提供的栅栏同步方法,目标栅栏同步装置接收第一处理器核发送的栅栏同步消息,并根据栅栏同步消息中包含的栅栏标识将与栅栏标识对应的第一队列包含的计数字段的计数数值加1,以便记录参与同步的线程程序中已执行到预定的栅栏同步点的线程程序的个数,通过第一处理器核根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。In the fence synchronization method provided by the embodiment of the present invention, the target fence synchronization device receives the fence synchronization message sent by the first processor core, and according to the fence identifier contained in the fence synchronization message Add 1 to the count value, so as to record the number of thread programs that have been executed to the predetermined barrier synchronization point among the thread programs participating in the synchronization, and determine the target barrier synchronization for processing its own barrier synchronization message according to the predetermined barrier synchronization point through the first processor core The device enables different barrier synchronization points to be mapped to different barrier synchronization devices, thereby avoiding access bottlenecks when the number of threads increases, and improving the processing performance of chips with multi-core or many-core processors.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能。Moreover, compared with the software method, the hardware method is used to realize fence synchronization, which has a higher processing speed, and further improves the processing performance of chips with multi-core or many-core processors.

实施例3Example 3

本发明实施例3提供一种栅栏同步方法,应用于具有多核或众核处理器的芯片中,该芯片上设置有至少两个栅栏同步装置,且该至少两个栅栏同步装置分布在片上网络的不同位置,示例性的如图3所示的本发明实施例提供的一种设置有4个栅栏同步装置的具有多核或众核处理器的芯片的结构示意图。如图4所示,本发明实施例提供的栅栏同步方法可以包括:Embodiment 3 of the present invention provides a barrier synchronization method, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip, and the at least two barrier synchronization devices are distributed in the network on chip. In different positions, as shown in FIG. 3 , an exemplary structure diagram of a chip with multi-core or many-core processors provided with four barrier synchronization devices provided by the embodiment of the present invention is shown. As shown in Figure 4, the fence synchronization method provided by the embodiment of the present invention may include:

301、第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点。301. The first processor core determines that a currently processed thread program is executed to a predetermined fence synchronization point.

其中,第一处理器核为芯片包含的所有处理器核中的任意一个。Wherein, the first processor core is any one of all processor cores included in the chip.

302、第一处理器核根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置。302. The first processor core determines a target barrier synchronization device according to a barrier identifier corresponding to a predetermined barrier synchronization point.

其中,在第一处理器核确定由自身当前处理的线程程序执行到预定的栅栏同步点之后,第一处理器核根据该预定的栅栏同步点对应的栅栏标识,从芯片中包含的至少两个栅栏同步装置中确定用于处理自身的栅栏同步消息的栅栏同步装置,即确定目标栅栏同步装置。Wherein, after the first processor core determines that the thread program currently processed by itself has been executed to a predetermined barrier synchronization point, the first processor core selects from at least two Among the barrier synchronization devices, a barrier synchronization device for processing its own barrier synchronization message is determined, that is, a target barrier synchronization device is determined.

在本发明实施例的一种可能的实现方式中,第一处理器核根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置具体的可以是:第一处理器根据预定的栅栏同步点对应的栅栏标识,按照预设规则从芯片中包含的至少两个栅栏同步装置中确定目标栅栏同步装置,该预设规则可以包括栅栏标识与栅栏同步装置的映射关系。例如,以循环制的方式为例,假设栅栏同步点的数目多于栅栏同步装置的数目,且芯片上共包含4个栅栏同步装置,处理器核便可以根据自身的栅栏同步点对应的栅栏标识,按照循环制的方式确定处理自身的栅栏同步消息的栅栏同步装置,如某个栅栏同步点对应的栅栏标识为7,芯片上包含4个栅栏同步装置,此时处理器核便可以对栅栏标识除4取余,便可以得到用于处理自身的栅栏同步消息的栅栏同步装置的编号为3,即便可确定出目标栅栏标识。In a possible implementation of the embodiment of the present invention, the first processor core determines the target fence synchronization device according to the fence identifier corresponding to the predetermined fence synchronization point. The barrier synchronization device is determined from the at least two barrier synchronization devices included in the chip according to a preset rule, and the preset rule may include a mapping relationship between the barrier ID and the barrier synchronization device. For example, taking the round-robin method as an example, assuming that the number of fence synchronization points is more than the number of fence synchronization devices, and the chip contains a total of 4 fence synchronization devices, the processor core can use the fence identification corresponding to its own fence synchronization point , determine the fence synchronization device that processes its own fence synchronization message in a round-robin manner. For example, the fence identification corresponding to a certain fence synchronization point is 7, and the chip contains 4 fence synchronization devices. At this time, the processor core can identify the fence After dividing by 4 and taking the remainder, the number of the barrier synchronization device for processing its own barrier synchronization message can be obtained as 3, even if the target barrier identifier can be determined.

需要说明的是,在本发明实施例中,采用循环制的方式确定目标栅栏同步装置的方法,仅是本发明实施例提供的一种可能的实现方式,具体的如何根据栅栏同步点对应的栅栏标识确定目标栅栏同步装置,可以根据实际应用场景的需求进行确定,本发明实施例在此不做具体限制。It should be noted that in the embodiment of the present invention, the method of determining the target barrier synchronization device in a round-robin manner is only a possible implementation method provided by the embodiment of the present invention. Specifically, how to The identification and determination of the target barrier synchronization device may be determined according to the requirements of the actual application scenario, and this embodiment of the present invention does not specifically limit it here.

需要说明的是,在本发明实施例的可能的实现方式中,预设规则还可以是每个处理器核对应一个栅栏同步装置,也可以共享高速缓存的每个bank对应一个栅栏同步装置,该预设规则可以是预先设置约定的,本发明实施例在此对预设规则不做具体的限制。It should be noted that, in a possible implementation of the embodiment of the present invention, the preset rule may also be that each processor core corresponds to a barrier synchronization device, or each bank of the shared cache may correspond to a barrier synchronization device, the The preset rules may be pre-set and agreed, and this embodiment of the present invention does not specifically limit the preset rules.

303、第一处理器核向目标栅栏同步装置发送栅栏同步消息。303. The first processor core sends a fence synchronization message to the target fence synchronization device.

其中,栅栏同步消息中包含栅栏标识以及参与同步的线程程序的个数。在第一处理器核根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置之后,便可以向该目标栅栏同步装置发送用于通知自身处理的线程程序已执行到预定的栅栏同步点的栅栏同步消息。Wherein, the barrier synchronization message includes the barrier identifier and the number of thread programs participating in the synchronization. After the first processor core determines the target fence synchronization device according to the fence identification corresponding to the predetermined fence synchronization point, it can send to the target fence synchronization device a fence for notifying itself that the thread program for processing has been executed to the predetermined fence synchronization point Synchronize messages.

需要说明的是,栅栏同步消息可以通过处理器核增加同步指令的方式通知给目标栅栏同步装置,本发明实施例在此不做具体限制,且该同步指令可以是非抢占式的,也可以是抢占式的。It should be noted that the barrier synchronization message can be notified to the target barrier synchronization device by adding a synchronization command to the processor core. This embodiment of the present invention does not make specific limitations here, and the synchronization command can be non-preemptive or preemptive style.

304、第一处理器核暂停对当前处理的线程程序的处理,进入等待状态。304. The first processor core suspends processing the currently processed thread program, and enters a waiting state.

其中,在第一处理器核向目标栅栏同步装置发送栅栏同步消息之后,第一处理器核可以暂停对当前处理的线程程序的处理,进入等待状态,以实现栅栏同步的目的。Wherein, after the first processor core sends the barrier synchronization message to the target barrier synchronization device, the first processor core may suspend the processing of the currently processed thread program and enter a waiting state, so as to achieve the purpose of barrier synchronization.

305、目标栅栏同步装置接收第一处理器核发送的栅栏同步消息。305. The target barrier synchronization device receives the barrier synchronization message sent by the first processor core.

其中,栅栏同步消息为第一处理器核在确定当前处理的线程程序执行到预定的栅栏同步点时发送的,栅栏同步消息中包含预定的栅栏同步点对应的栅栏标识以及参与同步的线程程序的个数;目标栅栏同步装置为用于处理预定的栅栏同步点对应的处理器核发送的栅栏同步消息的栅栏同步装置。Wherein, the barrier synchronization message is sent by the first processor core when it is determined that the currently processed thread program executes to a predetermined barrier synchronization point, and the barrier synchronization message includes the barrier identification corresponding to the predetermined barrier synchronization point and the ID of the thread program participating in the synchronization Number; the target fence synchronization device is a fence synchronization device for processing a fence synchronization message sent by a processor core corresponding to a predetermined fence synchronization point.

其中,当第一处理器核确定自身处理的线程程序执行到预定的栅栏同步点,且确定目标栅栏同步装置之后,便可以向目标栅栏同步装置发送栅栏同步消息,此时,目标栅栏同步装置便可以接收第一处理器核发送的包含预定的栅栏同步点对应的栅栏标识及参与同步的线程程序的个数的栅栏同步消息。Wherein, when the first processor core determines that the thread program processed by itself has been executed to a predetermined barrier synchronization point, and after determining the target barrier synchronization device, it can send a barrier synchronization message to the target barrier synchronization device. At this time, the target barrier synchronization device will A fence synchronization message sent by the first processor core may be received, including the fence identifier corresponding to the predetermined fence synchronization point and the number of thread programs participating in the synchronization.

306、目标栅栏同步装置判断是否存在第一队列。306. The target fence synchronization device judges whether there is a first queue.

其中,在目标栅栏同步装置接收到第一处理器核发送的栅栏同步消息之后,便可以根据栅栏同步消息中包含的预定的栅栏同步点一应的栅栏标识,确定是否存在与该栅栏标识对应的第一队列,该第一队列为与该栅栏标识对应的用于标识所有参与同步的线程程序状态的队列,且该第一队列包含有栅栏标识、队列状态以及计数字段,其中,栅栏标识用于唯一标识栅栏同步点,队列状态用于标识该队列的状态,可以是使用状态或空闲状态,计数字段则用于记录已执行到预定的栅栏同步点的线程程序的个数。Wherein, after the target barrier synchronization device receives the barrier synchronization message sent by the first processor core, it can determine whether there is a barrier corresponding to the barrier synchronization point corresponding to the barrier synchronization point contained in the barrier synchronization message. The first queue, the first queue is a queue corresponding to the fence identifier used to identify the program states of all threads participating in the synchronization, and the first queue includes a fence identifier, a queue state and a count field, wherein the fence identifier is used for The barrier synchronization point is uniquely identified. The queue status is used to identify the status of the queue, which can be in use or idle. The count field is used to record the number of thread programs that have been executed to the predetermined barrier synchronization point.

需要说明的是,在本发明实施例中,一个栅栏同步装置可以通过维护至少一个用于标识所有参与同一个同步的线程程序状态的队列,使得本发明实施例提供的栅栏同步方法的可拓展性好,使其能够更好的适用于具有多核处理器的芯片,且更适用于具有众核处理器的芯片。It should be noted that, in the embodiment of the present invention, a barrier synchronization device can maintain at least one queue for identifying the program states of all threads participating in the same synchronization, so that the scalability of the barrier synchronization method provided by the embodiment of the present invention Well, making it better suited for chips with multi-core processors, and even better for chips with many-core processors.

307、当不存在所述第一队列时,目标栅栏同步装置创建第一队列,并将队列状态更新为使用状态。307. When the first queue does not exist, the target fence synchronization device creates the first queue, and updates the status of the queue to the usage status.

其中,当目标栅栏同步装置确定不存在与栅栏标识对应的第一队列时,可以创建包含该栅栏标识的第一队列,并将该第一队列的队列状态由空闲状态更新为使用状态。Wherein, when the target fence synchronization device determines that there is no first queue corresponding to the fence identifier, it may create a first queue containing the fence identifier, and update the queue status of the first queue from the idle state to the used state.

需要说明的是,若目标栅栏同步装置确定存在与栅栏标识对应的第一队列则直接执行步骤308。It should be noted that, if the target fence synchronization device determines that there is a first queue corresponding to the fence identifier, step 308 is directly executed.

308、目标栅栏同步装置根据预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1。308. The target fence synchronization device adds 1 to the count value of the count field included in the first queue according to the fence identifier corresponding to the predetermined fence synchronization point.

需要说明的是,在本发明实施例中可能的实现方式中,当每一个处理器核对应一个栅栏同步装置时,在目标栅栏同步装置接收到第一处理器核发送的栅栏同步消息时,可以直接执行步骤308,而不用判断是否存在与栅栏标识对应的第一队列。It should be noted that, in a possible implementation manner in the embodiment of the present invention, when each processor core corresponds to a barrier synchronization device, when the target barrier synchronization device receives the barrier synchronization message sent by the first processor core, it may Step 308 is directly executed without judging whether there is a first queue corresponding to the fence identifier.

在本发明实施例的一种可能的实现方式中,进一步的,第一队列中还包含已执行到预定的栅栏同步点的线程程序对应的处理器核的标识信息,则在目标栅栏同步装置根据预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1之后,可以执行以下步骤309-步骤311。In a possible implementation of the embodiment of the present invention, further, the first queue also includes the identification information of the processor core corresponding to the thread program that has been executed to the predetermined barrier synchronization point, and the target barrier synchronization device according to After the fence identifier corresponding to the predetermined fence synchronization point adds 1 to the count value of the count field contained in the first queue, the following steps 309 to 311 may be executed.

309、目标栅栏同步装置判断已执行到预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数是否小于预设阈值。309 . The target barrier synchronization device judges whether the number of identification information of processor cores corresponding to the thread programs that have been executed to a predetermined barrier synchronization point is less than a preset threshold.

其中,预设阈值小于或等于芯片支持的最大线程数目。在目标栅栏装置根据接收到的栅栏同步消息中的预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1之后,便可以继续判断已执行到预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数是否小于预设的阈值,当确定已执行到预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数小于预设阈值时,执行以下步骤310;当确定已执行到预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数不小于预设阈值时,执行以下步骤311。Wherein, the preset threshold is less than or equal to the maximum number of threads supported by the chip. After the target barrier device adds 1 to the count value of the count field contained in the first queue according to the barrier identifier corresponding to the predetermined barrier synchronization point in the received barrier synchronization message, it can continue to judge that the predetermined barrier synchronization point has been executed. Whether the number of identification information of the processor core corresponding to the thread program is less than a preset threshold, when it is determined that the number of identification information of the processor core corresponding to the thread program that has been executed to a predetermined barrier synchronization point is less than the preset threshold, Execute the following step 310; when it is determined that the number of identification information of processor cores corresponding to the thread program that has been executed to the predetermined barrier synchronization point is not less than the preset threshold, execute the following step 311.

310、目标栅栏同步装置将第一处理器核的标识信息添加至第一队列中。310. The target fence synchronization apparatus adds the identification information of the first processor core to the first queue.

311、目标栅栏同步装置将第一处理器核的标识信息保存至内存中。311. The target fence synchronization device saves the identification information of the first processor core into a memory.

需要说明的是,在本发明实施例中,在预设阈值小于芯片支持的最大线程数目的情况下,当已执行到预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数不小于预设阈值时,通过将第一处理器核的标识信息保存至内存中,节省了目标栅栏同步装置存储处理器核对应的标识信息的空间。It should be noted that, in the embodiment of the present invention, when the preset threshold is less than the maximum number of threads supported by the chip, when the number of identification information of the processor core corresponding to the thread program that has been executed to the predetermined fence synchronization point When not less than the preset threshold, by storing the identification information of the first processor core in the memory, the space for storing the identification information corresponding to the processor core in the target fence synchronization device is saved.

在本发明实施例的另一种可能的实现方式中,进一步的,第一队列中还包括用于标识所有参与同步的线程程序中的每个线程程序是否执行到预定的栅栏同步点的比特序列,且该比特序列中的每个比特位与处理器核的标识信息存在映射关系,则在目标栅栏同步装置根据预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1之后,可以执行以下步骤312。In another possible implementation of the embodiment of the present invention, further, the first queue further includes a bit sequence for identifying whether each thread program in all thread programs participating in the synchronization is executed to a predetermined barrier synchronization point , and each bit in the bit sequence has a mapping relationship with the identification information of the processor core, then the target fence synchronization device adds the count value of the count field contained in the first queue to 1, the following step 312 can be performed.

312、目标栅栏同步装置将与第一处理器核的标识信息对应的比特位由第一标识更新为第二标识。312. The target fence synchronization apparatus updates bits corresponding to the identification information of the first processor core from the first identification to the second identification.

其中,第一标识用于标识由处理器核处理的线程程序未执行到预定的栅栏同步点,第二标识用于标识由处理器核处理的线程程序已执行到预定的栅栏同步点。例如,第一标识为0,第二标识为1,在创建第一队列时,将与处理器核的标识信息存在映射关系的比特序列中的每一位设置为0,当目标栅栏同步装置接收到第一处理器核的栅栏同步消息时,便可以将与第一处理器核的标识信息对应的比特位由0更新为1,以便标识第一处理器核处理的线程程序已执行到预定的栅栏同步点。Wherein, the first identification is used to identify that the thread program processed by the processor core has not been executed to a predetermined barrier synchronization point, and the second identification is used to indicate that the thread program processed by the processor core has been executed to a predetermined barrier synchronization point. For example, the first identification is 0, and the second identification is 1. When creating the first queue, each bit in the bit sequence that has a mapping relationship with the identification information of the processor core is set to 0. When the target fence synchronization device receives When the fence synchronization message of the first processor core is received, the bit corresponding to the identification information of the first processor core can be updated from 0 to 1, so as to indicate that the thread program processed by the first processor core has been executed to the predetermined Fence synchronization point.

需要说明的是,本发明实施例的可能的实现方式中,采用比特序列来标识参与同步的线程程序是否执行到预定的栅栏同步点,从而节省了目标栅栏同步装置存储处理器核对应的标识信息的空间。It should be noted that, in a possible implementation of the embodiment of the present invention, a bit sequence is used to identify whether the thread program participating in the synchronization is executed to a predetermined barrier synchronization point, thereby saving the target barrier synchronization device from storing identification information corresponding to the processor core. Space.

313、目标栅栏同步装置判断计数字段的计数数值是否等于参与同步的线程程序的个数。313. The target fence synchronization device judges whether the count value in the count field is equal to the number of thread programs participating in synchronization.

其中,在目标栅栏装置将第一队列包含的计数字段的计数数值加1,并保存了第一处理器核的标识信息或者更新完第一处理器核的标识信息对应的比特位之后,目标栅栏同步装置便可以判断是否所有参与同步的线程程序均已执行到预定的栅栏同步点,即判断第一队列包含的计数字段的计数数值是否等于参与同步的线程程序的个数。Wherein, after the target barrier device adds 1 to the count value of the count field contained in the first queue, and saves the identification information of the first processor core or updates the bit corresponding to the identification information of the first processor core, the target barrier The synchronization device can judge whether all the thread programs participating in the synchronization have been executed to the predetermined barrier synchronization point, that is, judge whether the count value of the count field contained in the first queue is equal to the number of the thread programs participating in the synchronization.

314、当计数字段的计数数值等于参与同步的线程程序的个数时,目标栅栏同步装置获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。314. When the count value in the count field is equal to the number of thread programs participating in synchronization, the target barrier synchronization device acquires identification information of the processor core corresponding to each thread program participating in synchronization among all thread programs participating in synchronization.

其中,在一种可能的实现方式中,当目标栅栏装置采用比特序列来标识参与同步的线程程序是否已执行到预定的栅栏同步点时,由于比特序列中的每个比特位与处理器核的标识信息存在映射关系,此时目标栅栏同步装置获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息具体的可以是:从第一队列中获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。Wherein, in a possible implementation manner, when the target barrier device uses a bit sequence to identify whether the thread program participating in the synchronization has been executed to a predetermined barrier synchronization point, since each bit in the bit sequence is related to the processor core There is a mapping relationship between the identification information. At this time, the target barrier synchronization device obtains the identification information of the processor core corresponding to each thread program participating in the synchronization among all the thread programs participating in the synchronization. Specifically, it may be: obtain all the synchronization information from the first queue. The identification information of the processor core corresponding to each thread program participating in the synchronization in the thread program.

在另一种可能的实现方式中,当目标栅栏装置采用记录处理器核的标识信息来标识参与同步的线程程序是否已执行到预定的栅栏同步点时,此时目标栅栏同步装置获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息具体的可以是:从第一队列获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,或者,从第一队列和内存中获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。In another possible implementation, when the target fence device uses the identification information of the recorded processor core to identify whether the thread program participating in the synchronization has been executed to the predetermined fence synchronization point, at this time the target fence synchronization device obtains all the synchronization The identification information of the processor cores corresponding to each of the thread programs participating in the synchronization in the thread program specifically may be: obtaining the identification of the processor cores corresponding to each of the thread programs participating in the synchronization in all the thread programs participating in the synchronization from the first queue information, or obtain the identification information of the processor core corresponding to each thread program participating in the synchronization among all the thread programs participating in the synchronization from the first queue and the memory.

需要说明的是,当计数字段的计数数值不等于参与同步的线程程序的个数时,表示参与同步的线程程序中还有部分线程程序未执行到预定的栅栏同步点,此时目标栅栏同步装置可以继续接收其他参与同步的线程程序对应的处理器核发送的栅栏同步消息,直到计数字段的计数数值等于参与同步的线程程序的个数时执行步骤314。It should be noted that, when the count value of the count field is not equal to the number of thread programs participating in synchronization, it means that some thread programs in the thread programs participating in synchronization have not been executed to the predetermined barrier synchronization point, and the target barrier synchronization device at this time The barrier synchronization messages sent by processor cores corresponding to other thread programs participating in synchronization may be continuously received until the count value in the count field is equal to the number of thread programs participating in synchronization and step 314 is executed.

315、目标栅栏同步装置根据所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,向所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核发送确认消息。315. The target barrier synchronization device sends a message to the processor core corresponding to each thread program participating in synchronization among all thread programs participating in synchronization according to the identification information of the processor core corresponding to each thread program participating in synchronization among all thread programs participating in synchronization. Send a confirmation message.

其中,在目标栅栏同步装置获取到所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息之后,便可以根据所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,向所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核发送用于通知处理器核可以继续处理需自身处理的线程程序的确认消息。Wherein, after the target barrier synchronization device obtains the identification information of the processor core corresponding to each thread program participating in synchronization among all the thread programs participating in synchronization, it can The identification information of the corresponding processor core sends a confirmation message for notifying the processor core that it can continue to process the thread program that needs to be processed by itself to the processor core corresponding to each thread program participating in the synchronization among all the thread programs participating in the synchronization.

316、第一处理器核接收目标栅栏同步装置发送的确认消息。316. The first processor core receives the confirmation message sent by the target barrier synchronization device.

317、第一处理器核继续处理当前处理的线程程序。317. The first processor core continues to process the currently processed thread program.

其中,在第一处理器核接收到目标栅栏同步装置发送的确认消息之后,便可获知所有参与同步的线程程序均已执行到预定的栅栏同步点,此时第一处理器核便可以继续处理当前处理的线程程序。Wherein, after the first processor core receives the confirmation message sent by the target barrier synchronization device, it can know that all the thread programs participating in the synchronization have been executed to the predetermined barrier synchronization point, and the first processor core can continue to process at this time. The currently processing thread program.

本发明实施例提供的栅栏同步方法,目标栅栏同步装置接收第一处理器核发送的栅栏同步消息,并根据栅栏同步消息中包含的栅栏标识将与栅栏标识对应的第一队列包含的计数字段的计数数值加1,以便记录参与同步的线程程序中已执行到预定的栅栏同步点的线程程序的个数,通过第一处理器核根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。In the fence synchronization method provided by the embodiment of the present invention, the target fence synchronization device receives the fence synchronization message sent by the first processor core, and according to the fence identifier contained in the fence synchronization message Add 1 to the count value, so as to record the number of thread programs that have been executed to the predetermined barrier synchronization point among the thread programs participating in the synchronization, and determine the target barrier synchronization for processing its own barrier synchronization message according to the predetermined barrier synchronization point through the first processor core The device enables different barrier synchronization points to be mapped to different barrier synchronization devices, thereby avoiding access bottlenecks when the number of threads increases, and improving the processing performance of chips with multi-core or many-core processors.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能,且一个栅栏同步装置可以通过维护至少一个用于标识所有参与同一个同步的线程程序状态的队列,使其具有良好的可拓展性。Moreover, the hardware method is used to realize barrier synchronization, which has a higher processing speed compared with the software method, and further improves the processing performance of chips with multi-core or many-core processors, and a barrier synchronization device can be maintained by at least A queue used to identify the program state of all threads participating in the same synchronization, making it scalable.

实施例4Example 4

本发明实施例4提供一种第一处理器核,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,所述第一处理器核,如图5所示,该第一处理器核可以包括:第一确定单元41、第二确定单元42、发送单元43。Embodiment 4 of the present invention provides a first processor core, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip. The first processor core is shown in the figure As shown in FIG. 5 , the first processor core may include: a first determining unit 41 , a second determining unit 42 , and a sending unit 43 .

第一确定单元41,用于确定当前处理的线程程序执行到预定的栅栏同步点;所述第一处理器核为所述芯片包含的所有处理器核中的任意一个。The first determining unit 41 is configured to determine that the currently processed thread program executes to a predetermined fence synchronization point; the first processor core is any one of all processor cores included in the chip.

第二确定单元42,用于根据所述预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置。The second determining unit 42 is configured to determine a target barrier synchronization device according to the barrier identifier corresponding to the predetermined barrier synchronization point.

发送单元43,用于向所述第二确定单元42得到的所述目标栅栏同步装置发送栅栏同步消息;所述栅栏同步消息中包含所述栅栏标识以及参与同步的线程程序的个数。The sending unit 43 is configured to send a fence synchronization message to the target fence synchronization device obtained by the second determining unit 42; the fence synchronization message includes the fence identifier and the number of thread programs participating in the synchronization.

在本发明实施例中,进一步可选的,所述第二确定单元42,具体用于根据所述预定的栅栏同步点对应的栅栏标识,按照预设规则确定所述目标栅栏同步装置;所述预设规则包括栅栏标识与栅栏同步装置的映射关系。In the embodiment of the present invention, further optionally, the second determining unit 42 is specifically configured to determine the target barrier synchronization device according to preset rules according to the barrier identifier corresponding to the predetermined barrier synchronization point; The preset rules include a mapping relationship between barrier identifiers and barrier synchronization devices.

在本发明实施例中,进一步可选的,如图6所示,该第一处理器核还可以包括:第一处理单元44。In the embodiment of the present invention, further optionally, as shown in FIG. 6 , the first processor core may further include: a first processing unit 44 .

第一处理单元44,用于在所述发送单元43向所述目标栅栏同步装置发送栅栏同步消息之后,暂停对所述当前处理的线程程序的处理,进入等待状态。The first processing unit 44 is configured to, after the sending unit 43 sends a fence synchronization message to the target fence synchronization device, suspend processing of the currently processed thread program and enter a waiting state.

在本发明实施例中,进一步可选的,该第一处理器核还可以包括:接收单元45、第二处理单元46。In the embodiment of the present invention, further optionally, the first processor core may further include: a receiving unit 45 and a second processing unit 46 .

接收单元45,用于在所述第一处理单元44暂停对所述当前处理的线程程序的处理,进入等待状态之后,接收所述目标栅栏同步装置发送的确认消息;所述确认消息用于通知所述第一处理器核继续处理所述当前处理的线程程序。The receiving unit 45 is configured to receive a confirmation message sent by the target barrier synchronization device after the first processing unit 44 suspends the processing of the currently processed thread program and enters a waiting state; the confirmation message is used to notify The first processor core continues to process the currently processed thread program.

第二处理单元46,用于继续处理所述当前处理的线程程序。The second processing unit 46 is configured to continue processing the currently processed thread program.

需要说明的是,本发明实施例提供的第一处理器核中功能模块的具体描述可以参考方法实施例中对应内容的具体描述,本发明实施例在此不再详细赘述。It should be noted that, for the specific description of the functional modules in the first processor core provided in the embodiment of the present invention, reference may be made to the specific description of the corresponding content in the method embodiment, and details are not described in detail in the embodiment of the present invention.

本发明实施例提供的第一处理器核,当第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点时,根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置,然后向该目标栅栏同步装置发送栅栏同步消息,通过根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。In the first processor core provided in the embodiment of the present invention, when the first processor core determines that the currently processed thread program has been executed to a predetermined fence synchronization point, it determines the target fence synchronization device according to the fence identifier corresponding to the predetermined fence synchronization point, and then Send a barrier synchronization message to the target barrier synchronization device, and determine the target barrier synchronization device that processes its own barrier synchronization message according to the predetermined barrier synchronization point, so that different barrier synchronization points can be mapped to different barrier synchronization devices, thereby increasing the number of threads In the case of an access bottleneck, the processing performance of a chip with a multi-core or many-core processor is improved.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能,且一个栅栏同步装置可以通过维护至少一个用于标识所有参与同一个同步的线程程序状态的队列,使其具有良好的可拓展性。Moreover, the hardware method is used to realize barrier synchronization, which has a higher processing speed compared with the software method, and further improves the processing performance of chips with multi-core or many-core processors, and a barrier synchronization device can be maintained by at least A queue used to identify the program state of all threads participating in the same synchronization, making it scalable.

实施例5Example 5

本发明实施例5提供一种目标栅栏同步装置,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,所述目标栅栏同步装置,如图7所示,该目标栅栏同步装置可以包括:接收单元51、处理单元52。Embodiment 5 of the present invention provides a target barrier synchronization device, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip. The target barrier synchronization device is shown in FIG. 7 As shown, the target fence synchronization device may include: a receiving unit 51 and a processing unit 52.

接收单元51,用于接收第一处理器核发送的栅栏同步消息;所述栅栏同步消息为所述第一处理器核在确定当前处理的线程程序执行到预定的栅栏同步点时发送的,所述第一处理器核为所述芯片包含的所有处理器核中的任意一个,所述栅栏同步消息中包含所述预定的栅栏同步点对应的栅栏标识以及参与同步的线程程序的个数;所述目标栅栏同步装置为用于处理所述预定的栅栏同步点对应的处理器核发送的栅栏同步消息的栅栏同步装置。The receiving unit 51 is configured to receive a fence synchronization message sent by the first processor core; the fence synchronization message is sent by the first processor core when it is determined that the currently processed thread program executes to a predetermined fence synchronization point, so The first processor core is any one of all processor cores included in the chip, and the barrier synchronization message includes the barrier identification corresponding to the predetermined barrier synchronization point and the number of thread programs participating in the synchronization; The target fence synchronization device is a fence synchronization device for processing a fence synchronization message sent by the processor core corresponding to the predetermined fence synchronization point.

处理单元52,用于根据所述接收单元51的得到的所述栅栏同步消息中包含的所述预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1;所述第一队列为与所述栅栏标识对应的用于标识所有参与同步的线程程序状态的队列;所述第一队列包含所述栅栏标识、队列状态、所述计数字段。The processing unit 52 is configured to add 1 to the count value of the count field included in the first queue according to the fence identifier corresponding to the predetermined fence synchronization point contained in the fence synchronization message obtained by the receiving unit 51; The first queue is a queue corresponding to the fence identifier and used to identify the program states of all threads participating in the synchronization; the first queue includes the fence identifier, queue status, and the count field.

在本发明实施例中,进一步可选的,如图8所示,该目标栅栏同步装置还可以包括:判断单元53、创建更新单元54。In the embodiment of the present invention, further optionally, as shown in FIG. 8 , the device for synchronizing the target barrier may further include: a judging unit 53 and a creating and updating unit 54 .

判断单元53,用于在所述处理单元52根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之前,判断是否存在所述第一队列。The judging unit 53 is configured to judge whether the first queue exists before the processing unit 52 adds 1 to the count value of the count field included in the first queue according to the barrier identifier.

创建更新单元54,用于当所述判断单元53得到不存在所述第一队列时,创建所述第一队列,并将所述队列状态更新为使用状态。The creating and updating unit 54 is configured to create the first queue and update the state of the queue to a used state when the judging unit 53 obtains that the first queue does not exist.

在本发明实施例中,进一步可选的,所述第一队列还包含已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息。In the embodiment of the present invention, further optionally, the first queue further includes identification information of processor cores corresponding to thread programs that have been executed to the predetermined fence synchronization point.

所述目标栅栏同步装置,还可以包括:添加单元55。The target fence synchronization device may further include: an adding unit 55 .

添加单元55,用于在所述接收单元51接收第一处理器核发送的栅栏同步消息之后,将所述第一处理器核的标识信息添加至所述第一队列中。The adding unit 55 is configured to add the identification information of the first processor core to the first queue after the receiving unit 51 receives the fence synchronization message sent by the first processor core.

在本发明实施例中,进一步可选的,所述判断单元53,还用于在所述添加单元55将所述第一处理器核的标识信息添加至所述第一队列中之前,判断所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数是否小于预设阈值;所述预设阈值小于或等于所述芯片支持的最大线程数目。In the embodiment of the present invention, further optionally, the judging unit 53 is further configured to, before the adding unit 55 adds the identification information of the first processor core to the first queue, judge the Whether the number of identification information of processor cores corresponding to the thread programs that have been executed to the predetermined barrier synchronization point is less than a preset threshold; the preset threshold is less than or equal to the maximum number of threads supported by the chip.

所述添加单元55,具体用于当所述判断单元53确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数小于所述预设阈值时,将所述第一处理器核的标识信息添加至所述第一队列中。The adding unit 55 is specifically configured to, when the judging unit 53 determines that the number of identification information of processor cores corresponding to the thread program that has been executed to the predetermined barrier synchronization point is less than the preset threshold, Add the identification information of the first processor core to the first queue.

在本发明实施例中,进一步可选的,所述目标栅栏同步装置,还可以包括:保存单元56。In the embodiment of the present invention, further optionally, the target barrier synchronization device may further include: a saving unit 56 .

保存单元56,用于当所述判断单元53确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数不小于所述预设阈值时,将所述第一处理器核的标识信息保存至内存中。The saving unit 56 is configured to store the identification information of the processor core corresponding to the thread program that has been executed to the predetermined fence synchronization point when the judging unit 53 is not less than the preset threshold value. The identification information of the first processor core is stored in the memory.

在本发明实施例中,进一步可选的,所述第一队列还包括用于标识所有参与同步的线程程序中的每个线程程序是否执行到所述预定的栅栏同步点的比特序列,所述比特序列中的每个比特位与处理器核的标识信息存在映射关系。In the embodiment of the present invention, further optionally, the first queue further includes a bit sequence for identifying whether each thread program in all thread programs participating in the synchronization is executed to the predetermined barrier synchronization point, the There is a mapping relationship between each bit in the bit sequence and the identification information of the processor core.

所述目标栅栏同步装置,还可以包括:更新单元57。The device for synchronizing the target fence may further include: an updating unit 57 .

更新单元57,用于在所述接收单元51接收第一处理器核发送的栅栏同步消息之后,将与所述第一处理器核的标识信息对应的比特位由第一标识更新为第二标识;所述第一标识用于标识由处理器核处理的线程程序未执行到所述预定的栅栏同步点,所述第二标识用于标识由处理器核处理的线程程序已执行到所述预定的栅栏同步点。An updating unit 57, configured to update the bit corresponding to the identification information of the first processor core from the first identification to the second identification after the receiving unit 51 receives the fence synchronization message sent by the first processor core ; The first identification is used to identify that the thread program processed by the processor core has not been executed to the predetermined barrier synchronization point, and the second identification is used to identify that the thread program processed by the processor core has been executed to the predetermined The fence synchronization point.

在本发明实施例中,进一步可选的,所述判断单元53,还用于在所述处理单元52根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之后,判断所述计数字段的计数数值是否等于所述参与同步的线程程序的个数。In the embodiment of the present invention, further optionally, the judging unit 53 is further configured to judge the Whether the count value in the count field is equal to the number of thread programs participating in the synchronization.

所述目标栅栏同步装置,还可以包括:获取单元58、发送单元59。The target fence synchronization device may further include: an acquiring unit 58 and a sending unit 59 .

获取单元58,用于当所述判断单元53得到所述计数字段的计数数值等于所述参与同步的线程程序的个数时,获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。An acquisition unit 58, configured to acquire the corresponding thread program number of each thread program participating in synchronization among all thread programs participating in synchronization when the count value of the counting field obtained by the judging unit 53 is equal to the number of thread programs participating in synchronization. Identification information of the processor core.

发送单元59,用于根据所述获取单元58得到的所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,向所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核发送确认消息;所述确认消息用于通知所述处理器核继续处理需自身处理的线程程序。The sending unit 59 is configured to, according to the identification information of the processor core corresponding to each of the thread programs participating in the synchronization obtained by the acquiring unit 58, send to each of the thread programs participating in the synchronization The processor core corresponding to the synchronized thread program sends an acknowledgment message; the acknowledgment message is used to notify the processor core to continue processing the thread program that needs to be processed by itself.

在本发明实施例中,进一步可选的,所述获取单元58,具体用于从所述第一队列中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。In the embodiment of the present invention, further optionally, the obtaining unit 58 is specifically configured to obtain the processor core corresponding to each thread program participating in synchronization among all thread programs participating in synchronization from the first queue identification information.

在本发明实施例中,进一步可选的,所述获取单元58,具体用于从所述第一队列和所述内存中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。In the embodiment of the present invention, further optionally, the obtaining unit 58 is specifically configured to obtain from the first queue and the memory the corresponding The identification information of the processor core.

需要说明的是,本发明实施例提供的目标栅栏同步装置中功能模块的具体描述可以参考方法实施例中对应内容的具体描述,本发明实施例在此不再详细赘述。It should be noted that, for the specific description of the functional modules in the target fence synchronization device provided by the embodiment of the present invention, reference may be made to the specific description of the corresponding content in the method embodiment, and the embodiments of the present invention will not be described in detail here.

本发明实施例提供的目标栅栏同步装置,接收第一处理器核发送的栅栏同步消息,并根据栅栏同步消息中包含的栅栏标识将与栅栏标识对应的第一队列包含的计数字段的计数数值加1,以便记录参与同步的线程程序中已执行到预定的栅栏同步点的线程程序的个数,通过第一处理器核根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。The target barrier synchronization device provided by the embodiment of the present invention receives the barrier synchronization message sent by the first processor core, and adds the count value of the count field contained in the first queue corresponding to the barrier identification to the value according to the barrier identification contained in the barrier synchronization message 1, in order to record the number of thread programs that have been executed to the predetermined barrier synchronization point among the thread programs participating in the synchronization, the first processor core determines the target barrier synchronization device for processing its own barrier synchronization message according to the predetermined barrier synchronization point, so that Different barrier synchronization points can be mapped to different barrier synchronization devices, so that when the number of threads increases, access bottlenecks are avoided, and the processing performance of chips with multi-core or many-core processors is improved.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能,且一个栅栏同步装置可以通过维护至少一个用于标识所有参与同一个同步的线程程序状态的队列,使其具有良好的可拓展性。Moreover, the hardware method is used to realize barrier synchronization, which has a higher processing speed compared with the software method, and further improves the processing performance of chips with multi-core or many-core processors, and a barrier synchronization device can be maintained by at least A queue used to identify the program state of all threads participating in the same synchronization, making it scalable.

实施例6Example 6

本发明实施例6提供一种栅栏同步设备,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,如图9所示,所述栅栏同步设备可以包括:至少一个处理器核、存储器62、通信接口63和总线64,该至少一个处理器核、存储器62和通信接口63通过总线64连接并完成相互间的通信,其中:Embodiment 6 of the present invention provides a barrier synchronization device, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip. As shown in FIG. 9, the barrier synchronization device can Including: at least one processor core, memory 62, communication interface 63 and bus 64, the at least one processor core, memory 62 and communication interface 63 are connected through the bus 64 and complete mutual communication, wherein:

所述总线64可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线64可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 64 may be an Industry Standard Architecture (Industry Standard Architecture, ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus 64 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 9 , but it does not mean that there is only one bus or one type of bus.

所述存储器62用于存储可执行程序代码,该程序代码包括计算机操作指令。存储器62可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 62 is used for storing executable program codes, and the program codes include computer operation instructions. The memory 62 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.

所述处理器核可能是一个中央处理器(Central Processing Unit,CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本发明实施例的一个或多个集成电路。The processor core may be a central processing unit (Central Processing Unit, CPU), or a specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the present invention .

所述通信接口63,主要用于实现本实施例的设备之间的通信。The communication interface 63 is mainly used to implement communication between devices in this embodiment.

其中,针对所述至少一个处理器核中的任意一个处理器核(即本发明实施例中所述的第一处理器核61)而言,具体用于执行以下功能:Wherein, for any one of the at least one processor core (that is, the first processor core 61 described in the embodiment of the present invention), it is specifically used to perform the following functions:

所述第一处理器核61,用于确定当前处理的线程程序执行到预定的栅栏同步点;所述第一处理器核61为所述芯片包含的所有处理器核中的任意一个;根据所述预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置;向所述目标栅栏同步装置发送栅栏同步消息;所述栅栏同步消息中包含所述栅栏标识以及参与同步的线程程序的个数。The first processor core 61 is used to determine that the currently processed thread program executes to a predetermined barrier synchronization point; the first processor core 61 is any one of all processor cores included in the chip; according to the The barrier identification corresponding to the predetermined barrier synchronization point determines the target barrier synchronization device; sends a barrier synchronization message to the target barrier synchronization device; the barrier synchronization message includes the barrier identification and the number of thread programs participating in the synchronization.

在本发明实施例中,进一步可选的,所述第一处理器核61,具体用于根据所述预定的栅栏同步点对应的栅栏标识,按照预设规则确定所述目标栅栏同步装置;所述预设规则包括栅栏标识与栅栏同步装置的映射关系。In the embodiment of the present invention, further optionally, the first processor core 61 is specifically configured to determine the target barrier synchronization device according to preset rules according to the barrier identifier corresponding to the predetermined barrier synchronization point; The preset rules include a mapping relationship between barrier identifiers and barrier synchronization devices.

在本发明实施例中,进一步可选的,所述第一处理器核61,还用于在所述向所述目标栅栏同步装置发送栅栏同步消息之后,暂停对所述当前处理的线程程序的处理,进入等待状态。In the embodiment of the present invention, further optionally, the first processor core 61 is further configured to suspend the processing of the currently processed thread program after the barrier synchronization message is sent to the target barrier synchronization device processing, enter the waiting state.

在本发明实施例中,进一步可选的,所述第一处理器核61,还用于在所述暂停对所述当前处理的线程程序的处理,进入等待状态之后,接收所述目标栅栏同步装置发送的确认消息;所述确认消息用于通知所述第一处理器核61继续处理所述当前处理的线程程序;继续处理所述当前处理的线程程序。In the embodiment of the present invention, further optionally, the first processor core 61 is further configured to receive the target fence synchronization A confirmation message sent by the device; the confirmation message is used to notify the first processor core 61 to continue processing the currently processed thread program; continue to process the currently processed thread program.

需要说明的是,本发明实施例提供的第一处理器核中功能模块的具体描述可以参考方法实施例中对应内容的具体描述,本发明实施例在此不再详细赘述。It should be noted that, for the specific description of the functional modules in the first processor core provided in the embodiment of the present invention, reference may be made to the specific description of the corresponding content in the method embodiment, and details are not described in detail in the embodiment of the present invention.

本发明实施例提供的第一处理器核,当第一处理器核确定当前处理的线程程序执行到预定的栅栏同步点时,根据预定的栅栏同步点对应的栅栏标识确定目标栅栏同步装置,然后向该目标栅栏同步装置发送栅栏同步消息,通过根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。In the first processor core provided in the embodiment of the present invention, when the first processor core determines that the currently processed thread program has been executed to a predetermined fence synchronization point, it determines the target fence synchronization device according to the fence identifier corresponding to the predetermined fence synchronization point, and then Send a barrier synchronization message to the target barrier synchronization device, and determine the target barrier synchronization device that processes its own barrier synchronization message according to the predetermined barrier synchronization point, so that different barrier synchronization points can be mapped to different barrier synchronization devices, thereby increasing the number of threads In the case of an access bottleneck, the processing performance of a chip with a multi-core or many-core processor is improved.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能,且一个栅栏同步装置可以通过维护至少一个用于标识所有参与同一个同步的线程程序状态的队列,使其具有良好的可拓展性。Moreover, the hardware method is used to realize barrier synchronization, which has a higher processing speed compared with the software method, and further improves the processing performance of chips with multi-core or many-core processors, and a barrier synchronization device can be maintained by at least A queue used to identify the program state of all threads participating in the same synchronization, making it scalable.

实施例7Example 7

本发明实施例7提供一种栅栏同步设备,应用于具有多核或众核处理器的芯片中,所述芯片上设置有至少两个栅栏同步装置,如图10所示,所述栅栏同步设备可以包括:至少一个处理器71、存储器72、通信接口73和总线74,该至少一个处理器71、存储器72和通信接口73通过总线74连接并完成相互间的通信,其中:Embodiment 7 of the present invention provides a barrier synchronization device, which is applied to a chip with a multi-core or many-core processor. At least two barrier synchronization devices are provided on the chip. As shown in FIG. 10, the barrier synchronization device can Including: at least one processor 71, memory 72, communication interface 73 and bus 74, the at least one processor 71, memory 72 and communication interface 73 are connected through the bus 74 and complete mutual communication, wherein:

所述总线74可以是ISA总线、PCI总线或EISA总线等。该总线74可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 74 may be an ISA bus, a PCI bus, or an EISA bus. The bus 74 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 10 , but it does not mean that there is only one bus or one type of bus.

所述存储器72用于存储可执行程序代码,该程序代码包括计算机操作指令。存储器72可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 72 is used for storing executable program codes, and the program codes include computer operation instructions. The memory 72 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.

所述处理器71可能是一个CPU,或者是ASIC,或者是被配置成实施本发明实施例的一个或多个集成电路。The processor 71 may be a CPU, or an ASIC, or one or more integrated circuits configured to implement embodiments of the present invention.

所述通信接口73,主要用于实现本实施例的设备之间的通信。The communication interface 73 is mainly used to implement communication between devices in this embodiment.

所述处理器71,用于执行所述存储器72中的可执行程序代码,具体用于执行以下功能:The processor 71 is configured to execute the executable program code in the memory 72, and is specifically configured to execute the following functions:

所述处理器71,用于接收第一处理器核发送的栅栏同步消息;所述栅栏同步消息为所述第一处理器核在确定当前处理的线程程序执行到预定的栅栏同步点时发送的,所述第一处理器核为所述芯片包含的所有处理器核中的任意一个,所述栅栏同步消息中包含所述预定的栅栏同步点对应的栅栏标识以及参与同步的线程程序的个数;根据所述预定的栅栏同步点对应的栅栏标识将第一队列包含的计数字段的计数数值加1;所述第一队列为与所述栅栏标识对应的用于标识所有参与同步的线程程序状态的队列;所述第一队列包含所述栅栏标识、队列状态、所述计数字段。The processor 71 is configured to receive a barrier synchronization message sent by the first processor core; the barrier synchronization message is sent by the first processor core when it is determined that the currently processed thread program executes to a predetermined barrier synchronization point , the first processor core is any one of all processor cores included in the chip, and the barrier synchronization message includes the barrier identification corresponding to the predetermined barrier synchronization point and the number of thread programs participating in the synchronization ; Add 1 to the count value of the count field included in the first queue according to the fence identifier corresponding to the predetermined fence synchronization point; the first queue is used to identify all thread program states that participate in synchronization corresponding to the fence identifier The queue; the first queue includes the fence identifier, queue status, and the count field.

在本发明实施例中,进一步可选的,所述处理器71,还用于在所述根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之前,判断是否存在所述第一队列,当不存在所述第一队列时,创建所述第一队列,并将所述队列状态更新为使用状态。In the embodiment of the present invention, further optionally, the processor 71 is further configured to determine whether there is the first queue before adding 1 to the count value of the count field included in the first queue according to the fence identifier. A queue, when the first queue does not exist, create the first queue, and update the state of the queue to the use state.

在本发明实施例中,进一步可选的,所述第一队列还包含已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息。In the embodiment of the present invention, further optionally, the first queue further includes identification information of processor cores corresponding to thread programs that have been executed to the predetermined fence synchronization point.

所述处理器71,还用于在所述接收第一处理器核发送的栅栏同步消息之后,将所述第一处理器核的标识信息添加至所述第一队列中。The processor 71 is further configured to add the identification information of the first processor core to the first queue after receiving the fence synchronization message sent by the first processor core.

在本发明实施例中,进一步可选的,所述处理器71,还用于在所述将所述第一处理器核的标识信息添加至所述第一队列中之前,判断所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数是否小于预设阈值;所述预设阈值小于或等于所述芯片支持的最大线程数目,当确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数小于所述预设阈值时,执行所述将所述第一处理器核的标识信息添加至所述第一队列中。In the embodiment of the present invention, further optionally, the processor 71 is further configured to, before adding the identification information of the first processor core to the first queue, determine that the executed Whether the number of identification information of processor cores corresponding to the thread program to the predetermined barrier synchronization point is less than a preset threshold; the preset threshold is less than or equal to the maximum number of threads supported by the chip, when it is determined that the When the number of identification information of processor cores corresponding to the thread program executed to the predetermined fence synchronization point is less than the preset threshold, performing the adding the identification information of the first processor core to the second in a queue.

在本发明实施例中,进一步可选的,所述处理器71,还用于当确定所述已执行到所述预定的栅栏同步点的线程程序对应的处理器核的标识信息的个数不小于所述预设阈值时,将所述第一处理器核的标识信息保存至内存中。In the embodiment of the present invention, further optionally, the processor 71 is further configured to determine that the number of identification information of processor cores corresponding to the thread program that has been executed to the predetermined fence synchronization point is not When the value is less than the preset threshold, the identification information of the first processor core is stored in the memory.

在本发明实施例中,进一步可选的,所述第一队列还包括用于标识所有参与同步的线程程序中的每个线程程序是否执行到所述预定的栅栏同步点的比特序列,所述比特序列中的每个比特位与处理器核的标识信息存在映射关系。In the embodiment of the present invention, further optionally, the first queue further includes a bit sequence for identifying whether each thread program in all thread programs participating in the synchronization is executed to the predetermined barrier synchronization point, the There is a mapping relationship between each bit in the bit sequence and the identification information of the processor core.

所述处理器71,还用于在所述接收第一处理器核发送的栅栏同步消息之后,将与所述第一处理器核的标识信息对应的比特位由第一标识更新为第二标识;所述第一标识用于标识由处理器核处理的线程程序未执行到所述预定的栅栏同步点,所述第二标识用于标识由处理器核处理的线程程序已执行到所述预定的栅栏同步点。The processor 71 is further configured to, after receiving the fence synchronization message sent by the first processor core, update the bit corresponding to the identification information of the first processor core from the first identification to the second identification ; The first identification is used to identify that the thread program processed by the processor core has not been executed to the predetermined barrier synchronization point, and the second identification is used to identify that the thread program processed by the processor core has been executed to the predetermined The fence synchronization point.

在本发明实施例中,进一步可选的,所述处理器71,还用于在所述根据所述栅栏标识将第一队列包含的计数字段的计数数值加1之后,判断所述计数字段的计数数值是否等于所述参与同步的线程程序的个数,当所述计数字段的计数数值等于所述参与同步的线程程序的个数时,获取所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,根据所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息,向所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核发送确认消息;所述确认消息用于通知所述处理器核继续处理需自身处理的线程程序。In the embodiment of the present invention, further optionally, the processor 71 is further configured to, after adding 1 to the count value of the count field included in the first queue according to the barrier identifier, determine the value of the count field Whether the count value is equal to the number of thread programs participating in synchronization, and when the count value of the count field is equal to the number of thread programs participating in synchronization, obtain each thread participating in synchronization in all thread programs participating in synchronization The identification information of the processor core corresponding to the program, according to the identification information of the processor core corresponding to each thread program participating in the synchronization in all the thread programs participating in the synchronization, to each thread participating in the synchronization in all the thread programs participating in the synchronization The processor core corresponding to the program sends an acknowledgment message; the acknowledgment message is used to notify the processor core to continue processing the thread program that needs to be processed by itself.

在本发明实施例中,进一步可选的,所述处理器71,还用于从所述第一队列中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。In the embodiment of the present invention, further optionally, the processor 71 is further configured to obtain from the first queue the processor core corresponding to each of the thread programs participating in the synchronization among all the thread programs participating in the synchronization identification information.

在本发明实施例中,进一步可选的,所述处理器71,还用于从所述第一队列和所述内存中获取所述所有参与同步的线程程序中每个参与同步的线程程序对应的处理器核的标识信息。In the embodiment of the present invention, further optionally, the processor 71 is further configured to obtain from the first queue and the memory the corresponding The identification information of the processor core.

需要说明的是,本发明实施例提供的目标栅栏同步装置中功能模块的具体描述可以参考方法实施例中对应内容的具体描述,本发明实施例在此不再详细赘述。It should be noted that, for the specific description of the functional modules in the target fence synchronization device provided by the embodiment of the present invention, reference may be made to the specific description of the corresponding content in the method embodiment, and the embodiments of the present invention will not be described in detail here.

本发明实施例提供的目标栅栏同步装置,接收第一处理器核发送的栅栏同步消息,并根据栅栏同步消息中包含的栅栏标识将与栅栏标识对应的第一队列包含的计数字段的计数数值加1,以便记录参与同步的线程程序中已执行到预定的栅栏同步点的线程程序的个数,通过第一处理器核根据预定的栅栏同步点确定处理自身栅栏同步消息的目标栅栏同步装置,使得不同的栅栏同步点可以映射到不同的栅栏同步装置,从而在线程数目增多的情况下,避免出现访问瓶颈,提高了具有多核或众核处理器的芯片处理性能。The target barrier synchronization device provided by the embodiment of the present invention receives the barrier synchronization message sent by the first processor core, and adds the count value of the count field contained in the first queue corresponding to the barrier identification to the value according to the barrier identification contained in the barrier synchronization message 1, in order to record the number of thread programs that have been executed to the predetermined barrier synchronization point among the thread programs participating in the synchronization, the first processor core determines the target barrier synchronization device for processing its own barrier synchronization message according to the predetermined barrier synchronization point, so that Different barrier synchronization points can be mapped to different barrier synchronization devices, so that when the number of threads increases, access bottlenecks are avoided, and the processing performance of chips with multi-core or many-core processors is improved.

并且,采用硬件方法来实现栅栏同步,相较于软件方法来说,具有较高的处理速度,进一步的提高了具有多核或众核处理器的芯片处理性能,且一个栅栏同步装置可以通过维护至少一个用于标识所有参与同一个同步的线程程序状态的队列,使其具有良好的可拓展性。Moreover, the hardware method is used to realize barrier synchronization, which has a higher processing speed compared with the software method, and further improves the processing performance of chips with multi-core or many-core processors, and a barrier synchronization device can be maintained by at least A queue used to identify the program state of all threads participating in the same synchronization, making it scalable.

通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation or may be integrated into another device, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The unit described as a separate component may or may not be physically separated, and the component displayed as a unit may be one physical unit or multiple physical units, that is, it may be located in one place, or may be distributed to multiple different places . Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium. Several instructions are included to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and other media that can store program codes. .

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims (26)

1. a kind of fence synchronous method, which is characterized in that applied in the chip with multinuclear or many-core processor, the chip On be provided at least two fence sychronisations, the described method includes:
First processor core determines that currently processed multi-threaded program goes to predetermined fence synchronous point;The first processor core Any one in all processor cores included for the chip;
Target fence sychronisation is determined according to the corresponding fence mark of the predetermined fence synchronous point;
Fence synchronization message is sent to the target fence sychronisation;It is identified in the fence synchronization message comprising the fence And participate in the number of synchronous multi-threaded program.
It is 2. according to the method described in claim 1, it is characterized in that, described according to the predetermined corresponding grid of fence synchronous point Field mark, which is known, determines target fence sychronisation, including:
According to the corresponding fence mark of the predetermined fence synchronous point, determine that the target fence synchronously fills according to preset rules It puts;The preset rules include fence mark and the mapping relations of fence sychronisation.
3. according to the method described in claim 1, it is characterized in that, fence is sent to the target fence sychronisation described After synchronization message, further include:
Suspend the processing to the currently processed multi-threaded program, into wait state.
4. according to the method described in claim 3, it is characterized in that, in the pause to the currently processed multi-threaded program Processing, into after wait state, further includes:
Receive the confirmation message that the target fence sychronisation is sent;The confirmation message is used to notify the first processor Core continues with the currently processed multi-threaded program;
Continue with the currently processed multi-threaded program.
5. a kind of fence synchronous method, which is characterized in that applied in the chip with multinuclear or many-core processor, the chip On be provided at least two fence sychronisations, the described method includes:
Target fence sychronisation receives the fence synchronization message that first processor core is sent;The fence synchronization message is described What first processor core was sent when determining that currently processed multi-threaded program goes to predetermined fence synchronous point, at described first Reason device core is any one in all processor cores that the chip includes, comprising described predetermined in the fence synchronization message Fence synchronous point corresponding fence mark and participate in the number of synchronous multi-threaded program;The target fence sychronisation is For handling the fence sychronisation for the fence synchronization message that the corresponding processor core of the predetermined fence synchronous point is sent;
According to the count value for the count area that the corresponding fence mark of the predetermined fence synchronous point includes first queue Add 1;The first queue is the team for multi-threaded program state that identify all participations synchronous corresponding with fence mark Row;The first queue includes the fence mark, quene state, the count area.
6. according to the method described in claim 5, it is characterized in that, first queue is included in described identified according to the fence Count area count value add 1 before, further include:
Judge whether the first queue;
When there is no during the first queue, creating the first queue, and the quene state is updated to use state.
7. according to the method described in claim 5, it is characterized in that, the first queue also comprising executed to described predetermined The identification information of the corresponding processor core of multi-threaded program of fence synchronous point;
After the fence synchronization message sent in the reception first processor core, further include:
The identification information of the first processor core is added in the first queue.
8. the method according to the description of claim 7 is characterized in that add in the identification information by the first processor core Before adding in the first queue, further include:
Judge the executed to the identification information of the corresponding processor core of multi-threaded program of the predetermined fence synchronous point Whether number is less than predetermined threshold value;The predetermined threshold value is less than or equal to the maximum thread mesh that the chip is supported;
When the identification information of the definite executed to the corresponding processor core of multi-threaded program of the predetermined fence synchronous point Number be less than the predetermined threshold value when, perform the identification information by the first processor core be added to the first team In row.
9. it according to the method described in claim 8, it is characterized in that, further includes:
When the identification information of the definite executed to the corresponding processor core of multi-threaded program of the predetermined fence synchronous point Number be not less than the predetermined threshold value when, the identification information of the first processor core is preserved into memory.
10. according to the method described in claim 5, it is characterized in that, the first queue further includes to identify all participations Whether the per thread program in synchronous multi-threaded program goes to the bit sequence of the predetermined fence synchronous point, the ratio There are mapping relations for the identification information of each bit and processor core in special sequence;
After the fence synchronization message sent in the reception first processor core, further include:
The bit corresponding with the identification information of the first processor core is updated to second identifier by first flag;Described One mark is not carried out the predetermined fence synchronous point, second mark for identifying the multi-threaded program handled by processor core Know to identify by the multi-threaded program executed that processor core is handled to the predetermined fence synchronous point.
11. it according to the method described in claim 5, it is characterized in that, is identified described according to the fence by first queue bag After the count value of the count area contained adds 1, further include:
Judge whether the count value of the count area is equal to the number for participating in synchronous multi-threaded program;
When the count value of the count area is equal to the number of the multi-threaded program for participating in synchronization, it is same to obtain all participations The identification information of the corresponding processor core of synchronous multi-threaded program is each participated in the multi-threaded program of step;
According to all marks for participating in each participating in the corresponding processor core of synchronous multi-threaded program in synchronous multi-threaded program Know information, participate in each participating in the corresponding processor core transmission confirmation of synchronous multi-threaded program in synchronous multi-threaded program to all Message;The confirmation message is used to that the processor core to be notified to continue with the multi-threaded program that itself is needed to handle.
12. according to the method for claim 11, which is characterized in that described to obtain all participate in synchronous multi-threaded program often A identification information for participating in the synchronous corresponding processor core of multi-threaded program, including:
It is obtained from the first queue in all multi-threaded programs for participating in synchronization and each participates in synchronous multi-threaded program pair The identification information for the processor core answered.
13. according to the method for claim 11, which is characterized in that described to obtain all participate in synchronous multi-threaded program often A identification information for participating in the synchronous corresponding processor core of multi-threaded program, including:
Synchronous thread is each participated in the multi-threaded program synchronous with all participations are obtained in memory from the first queue The identification information of the corresponding processor core of program.
14. a kind of first processor core, which is characterized in that applied in the chip with multinuclear or many-core processor, the core On piece is provided at least two fence sychronisations, the first processor core, including:
First determination unit, for determining that currently processed multi-threaded program goes to predetermined fence synchronous point;At described first Reason device core is any one in all processor cores that the chip includes;
Second determination unit, for determining that target fence synchronously fills according to the corresponding fence mark of the predetermined fence synchronous point It puts;
Transmitting element, the target fence sychronisation for being obtained to second determination unit send fence and synchronously disappear Breath;The number of synchronous multi-threaded program is identified and participated in the fence synchronization message comprising the fence.
15. first processor core according to claim 14, which is characterized in that second determination unit is specifically used for:
According to the corresponding fence mark of the predetermined fence synchronous point, determine that the target fence synchronously fills according to preset rules It puts;The preset rules include fence mark and the mapping relations of fence sychronisation.
16. first processor core according to claim 14, which is characterized in that further include:
First processing units, for the transmitting element to the target fence sychronisation send fence synchronization message it Afterwards, the processing to the currently processed multi-threaded program is suspended, into wait state.
17. first processor core according to claim 16, which is characterized in that further include:
Receiving unit, for suspending the processing to the currently processed multi-threaded program in the first processing units, into etc. After treating state, the confirmation message that the target fence sychronisation is sent is received;The confirmation message is for notifying described the One processor core continues with the currently processed multi-threaded program;
Second processing unit, for continuing with the currently processed multi-threaded program.
18. a kind of target fence sychronisation, which is characterized in that applied in the chip with multinuclear or many-core processor, institute It states and at least two fence sychronisations is provided on chip, the target fence sychronisation, including:
Receiving unit, for receiving the fence synchronization message of first processor core transmission;The fence synchronization message is described the What one processor core was sent when determining that currently processed multi-threaded program goes to predetermined fence synchronous point, first processing Device core is any one in all processor cores that the chip includes, comprising described predetermined in the fence synchronization message The corresponding fence mark of fence synchronous point and the number for participating in synchronous multi-threaded program;The target fence sychronisation is use In the fence sychronisation for handling the fence synchronization message that the corresponding processor core of the predetermined fence synchronous point is sent;
Processing unit, for the predetermined fence included in the fence synchronization message that is obtained according to the receiving unit The count value for the count area that the corresponding fence mark of synchronous point includes first queue adds 1;The first queue for institute State the corresponding queues for being used to identify all multi-threaded program states for participating in synchronization of fence mark;The first queue includes described Fence mark, quene state, the count area.
19. target fence sychronisation according to claim 18, which is characterized in that further include:
Judging unit, for the counting for the count area for being included first queue according to fence mark in the processing unit Before numerical value adds 1, the first queue is judged whether;
Updating block is created, for when the judging unit obtains that the first queue is not present, creating the first queue, And the quene state is updated to use state.
20. target fence sychronisation according to claim 18, which is characterized in that the first queue is also included and held Row arrives the identification information of the corresponding processor core of multi-threaded program of the predetermined fence synchronous point;
The target fence sychronisation, further includes:
Adding device, described in after the fence synchronization message of receiving unit reception first processor core transmission, inciting somebody to action The identification information of first processor core is added in the first queue.
21. target fence sychronisation according to claim 20, which is characterized in that the target fence sychronisation, It further includes:
Judging unit, for the identification information of the first processor core to be added to the first queue in the adding device In before, judge the executed to the identification information of the corresponding processor core of multi-threaded program of the predetermined fence synchronous point Number whether be less than predetermined threshold value;The predetermined threshold value is less than or equal to the maximum thread mesh that the chip is supported;
The adding device determines the executed to the predetermined fence synchronous point specifically for working as the judging unit When the number of the identification information of the corresponding processor core of multi-threaded program is less than the predetermined threshold value, by the first processor core Identification information is added in the first queue.
22. target fence sychronisation according to claim 21, which is characterized in that further include:
Storage unit determines the executed to the multi-threaded program of the predetermined fence synchronous point for working as the judging unit When the number of the identification information of corresponding processor core is not less than the predetermined threshold value, the mark of the first processor core is believed Breath is preserved into memory.
23. target fence sychronisation according to claim 18, which is characterized in that the first queue, which further includes, to be used for Identify the ratio whether all per thread programs participated in synchronous multi-threaded program go to the predetermined fence synchronous point Special sequence, there are mapping relations for the identification information of each bit and processor core in the bit sequence;
The target fence sychronisation, further includes:
Updating block, for receive that first processor core is sent in the receiving unit fence synchronization message after, will be with institute Bit corresponding to identification information for stating first processor core is updated to second identifier by first flag;The first flag is used for Mark is not carried out the predetermined fence synchronous point by the multi-threaded program that processor core is handled, and the second identifier is used to identify By the multi-threaded program executed that processor core is handled to the predetermined fence synchronous point.
24. target fence sychronisation according to claim 18, which is characterized in that the target fence sychronisation, It further includes:
Judging unit, for the counting for the count area for being included first queue according to fence mark in the processing unit After numerical value adds 1, judge whether the count value of the count area is equal to the number for participating in synchronous multi-threaded program;
Acquiring unit, the count value that the count area is obtained for working as the judging unit are equal to the line for participating in synchronization During the number of Cheng Chengxu, obtain in all multi-threaded programs for participating in synchronization and each participate in the corresponding processor of synchronous multi-threaded program The identification information of core;
Transmitting element, for each participation to be same in the multi-threaded programs of all participation synchronizations obtained according to the acquiring unit The identification information of the corresponding processor core of multi-threaded program of step participates in each participating in synchronization in synchronous multi-threaded program to all The corresponding processor core of multi-threaded program sends confirmation message;The confirmation message is used to notify the processor core continues with to need The multi-threaded program of itself processing.
25. target fence sychronisation according to claim 24, which is characterized in that the acquiring unit is specifically used for:
It is obtained from the first queue in all multi-threaded programs for participating in synchronization and each participates in synchronous multi-threaded program pair The identification information for the processor core answered.
26. target fence sychronisation according to claim 24, which is characterized in that the acquiring unit is specifically used for:
Synchronous thread is each participated in the multi-threaded program synchronous with all participations are obtained in memory from the first queue The identification information of the corresponding processor core of program.
CN201410098952.1A 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment Active CN104932947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410098952.1A CN104932947B (en) 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410098952.1A CN104932947B (en) 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment

Publications (2)

Publication Number Publication Date
CN104932947A CN104932947A (en) 2015-09-23
CN104932947B true CN104932947B (en) 2018-06-05

Family

ID=54120120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410098952.1A Active CN104932947B (en) 2014-03-17 2014-03-17 A kind of fence synchronous method and equipment

Country Status (1)

Country Link
CN (1) CN104932947B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110716812A (en) * 2019-09-12 2020-01-21 无锡江南计算技术研究所 Distributed synchronous management method and device supporting high concurrency
CN112783663B (en) * 2021-01-15 2023-06-13 中国人民解放军国防科技大学 A scalable fence synchronization method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101925881A (en) * 2008-01-25 2010-12-22 学校法人早稻田大学 Multiprocessor system and multiprocessor system synchronization method
CN102591722A (en) * 2011-12-31 2012-07-18 龙芯中科技术有限公司 NoC (Network-on-Chip) multi-core processor multi-thread resource allocation processing method and system
CN103116527A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Super-large-scale barrier synchronization method based on network controller
CN103336571A (en) * 2013-06-13 2013-10-02 中国科学院计算技术研究所 Method and system for reducing power consumption of multi-thread program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4276028B2 (en) * 2003-08-25 2009-06-10 株式会社日立製作所 Multiprocessor system synchronization method
US20120179896A1 (en) * 2011-01-10 2012-07-12 International Business Machines Corporation Method and apparatus for a hierarchical synchronization barrier in a multi-node system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101925881A (en) * 2008-01-25 2010-12-22 学校法人早稻田大学 Multiprocessor system and multiprocessor system synchronization method
CN102591722A (en) * 2011-12-31 2012-07-18 龙芯中科技术有限公司 NoC (Network-on-Chip) multi-core processor multi-thread resource allocation processing method and system
CN103116527A (en) * 2013-03-05 2013-05-22 中国人民解放军国防科学技术大学 Super-large-scale barrier synchronization method based on network controller
CN103336571A (en) * 2013-06-13 2013-10-02 中国科学院计算技术研究所 Method and system for reducing power consumption of multi-thread program

Also Published As

Publication number Publication date
CN104932947A (en) 2015-09-23

Similar Documents

Publication Publication Date Title
US11809321B2 (en) Memory management in a multiple processor system
US7953915B2 (en) Interrupt dispatching method in multi-core environment and multi-core processor
US9218203B2 (en) Packet scheduling in a multiprocessor system using inter-core switchover policy
US12164450B2 (en) Managing network interface controller-generated interrupts
CN111078436B (en) Data processing method, device, equipment and storage medium
US20110225297A1 (en) Controlling Access To A Resource In A Distributed Computing System With A Distributed Access Request Queue
CN110691062A (en) Data writing method, device and equipment
TW201022957A (en) Data processing in a hybrid computing environment
JP2018533122A (en) Efficient scheduling of multiversion tasks
CN105630731A (en) Network card data processing method and device in multi-CPU (Central Processing Unit) environment
US20120297216A1 (en) Dynamically selecting active polling or timed waits
CN114564435A (en) Inter-core communication method, device and medium for heterogeneous multi-core chip
CN106325995B (en) A method and system for allocating GPU resources
WO2014161377A1 (en) Method, device, and chip for implementing mutually exclusive operation of multiple threads
CN106325996A (en) GPU resource distribution method and system
CN106385377A (en) Information processing method and system thereof
WO2013097098A1 (en) Data processing method, graphics processing unit (gpu) and first node device
CN104932947B (en) A kind of fence synchronous method and equipment
JP4584935B2 (en) Behavior model based multi-thread architecture
CN107977232A (en) A kind of data processing method, data processing circuit and the network equipment
CN105373563A (en) Database switching method and apparatus
CN103677959A (en) Virtual machine cluster migration method and system based on multicast
CN103176941B (en) Communication method between cores and agent apparatus
CN109819674B (en) Computer storage medium, embedded scheduling method and system
CN105892957A (en) Distributed transaction execution method based on dynamic fragmentation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant