[go: up one dir, main page]

CN108984115A - Data parallel write-in, read method, apparatus and system - Google Patents

Data parallel write-in, read method, apparatus and system Download PDF

Info

Publication number
CN108984115A
CN108984115A CN201810614178.3A CN201810614178A CN108984115A CN 108984115 A CN108984115 A CN 108984115A CN 201810614178 A CN201810614178 A CN 201810614178A CN 108984115 A CN108984115 A CN 108984115A
Authority
CN
China
Prior art keywords
vector
data
read
storage
written
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810614178.3A
Other languages
Chinese (zh)
Other versions
CN108984115B (en
Inventor
刘大可
苗志东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810614178.3A priority Critical patent/CN108984115B/en
Publication of CN108984115A publication Critical patent/CN108984115A/en
Application granted granted Critical
Publication of CN108984115B publication Critical patent/CN108984115B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0622Securing storage systems in relation to access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提供数据并行写入、读取方法、装置及系统,所述方法包括:将待写入数据向量的写入数据索引变换为一维写入地址;根据所述一维写入地址和预设写入数据个数,获取所述待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量;根据所述第一存储索引向量,对所述写入数据使能向量、所述第一存储地址向量和所述待写入数据向量进行重排序,根据重排序后的所述写入数据使能向量和所述第一存储地址向量将重排序后的所述待写入数据向量存入所述并行存储器。本发明支持数据从一个或多个维度进行并行写入,提高了数据写入的灵活性和写入效率。

The present invention provides a data parallel writing and reading method, device and system. The method includes: converting the written data index of the data vector to be written into a one-dimensional writing address; according to the one-dimensional writing address and the predetermined Set the number of write data, obtain the write data enable vector, the first storage index vector and the first storage address vector of the data vector to be written; according to the first storage index vector, write data to reordering the enable vector, the first storage address vector, and the to-be-written data vector, and reordering all the reordered write data enable vectors and the first storage address vector according to the reordered write data enable vector and the first storage address vector The data vector to be written is stored in the parallel memory. The invention supports parallel writing of data from one or more dimensions, and improves the flexibility and writing efficiency of data writing.

Description

数据并行写入、读取方法、装置及系统Data parallel writing and reading method, device and system

技术领域technical field

本发明属于数据存取技术领域,更具体地,涉及数据并行写入、读取方法、装置及系统。The invention belongs to the technical field of data access, and more specifically relates to a data parallel writing and reading method, device and system.

背景技术Background technique

近年来,人工智能在各个领域得到广泛的应用。人工智能算法通常都是数据量很大的算法,因此为了加快人工智能算法的执行速度,不仅需要优化运算系统,还需要优化存储系统。In recent years, artificial intelligence has been widely used in various fields. Artificial intelligence algorithms are usually algorithms with a large amount of data. Therefore, in order to speed up the execution of artificial intelligence algorithms, not only the computing system but also the storage system need to be optimized.

GPU是被大量使用的人工智能算法的硬件平台,GPU的存储有着特定的针对矩阵计算的多级缓存结构来优化存储系统。在嵌入式应用领域,由于功耗等的约束,通常使用定制化的可编程芯片而不是GPU来实现人工智能算法。在这些嵌入式芯片中有一类是向量计算机芯片,也很适合进行人工智能算法加速,这类计算机芯片通常使用向量存储器作为存储系统。GPU is a hardware platform for artificial intelligence algorithms that are widely used. GPU storage has a specific multi-level cache structure for matrix calculations to optimize the storage system. In the field of embedded applications, due to constraints such as power consumption, custom programmable chips are usually used instead of GPUs to implement artificial intelligence algorithms. One of these embedded chips is a vector computer chip, which is also very suitable for artificial intelligence algorithm acceleration. This type of computer chip usually uses vector memory as a storage system.

但是,对于GPU来说,尺寸、功耗都很大。因此,在嵌入式应用领域的使用受到很大的限制。人工智能算法需要进行大量矩阵运算,因此处理的数据常常是多维的数据块,包括一维和多维。不同的算法需要从一个或多个维度对数据进行并行连续读写。而向量存储器每次只能固定地存取一个特定长度的向量数据,数据存取的灵活度不足,无法满足复杂多变的人工智能算法对数据存取的需求。However, for the GPU, the size and power consumption are very large. Therefore, the use in the field of embedded applications is greatly limited. Artificial intelligence algorithms require a large number of matrix operations, so the processed data is often multi-dimensional data blocks, including one-dimensional and multi-dimensional. Different algorithms require parallel sequential reading and writing of data from one or more dimensions. However, the vector memory can only fixedly access a specific length of vector data at a time, and the flexibility of data access is insufficient to meet the data access requirements of complex and changeable artificial intelligence algorithms.

发明内容Contents of the invention

为克服上述现有的数据存取系统尺寸和功耗大,且数据存取不灵活的问题或者至少部分地解决上述问题,本发明提供一种数据并行写入、读取方法、装置及系统。In order to overcome the problems of large size, large power consumption and inflexible data access of the existing data access system or at least partially solve the above problems, the present invention provides a data parallel writing and reading method, device and system.

根据本发明的第一方面,提供一种数据并行写入方法,包括:According to a first aspect of the present invention, a data parallel writing method is provided, comprising:

将待写入数据向量的写入数据索引变换为一维写入地址;其中,所述待写入数据向量为待写入的多维数据矩阵中的一个一维或多维向量,所述写入数据索引为所述待写入数据向量的所有元素中第一个待写入元素在所述待写入的多维数据矩阵中的索引待写入的多维数据矩阵中的索引;Transform the write data index of the data vector to be written into a one-dimensional write address; wherein, the data vector to be written is a one-dimensional or multi-dimensional vector in the multidimensional data matrix to be written, and the write data The index is the index in the multidimensional data matrix to be written of the index of the first element to be written in the multidimensional data matrix to be written among all elements of the data vector to be written;

根据所述一维写入地址和预设写入数据个数,获取所述待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量;其中,所述写入数据使能向量中的每个元素用于表示所述待写入数据向量中相应位置的元素是否写入;所述第一存储索引向量为所述待写入数据向量中各元素对应的并行存储器中各存储子单元的索引所构成的向量;所述第一存储地址向量为所述待写入数据向量中各元素对应的各存储子单元中的地址所构成的向量;Acquire the write data enable vector, the first storage index vector, and the first storage address vector of the data vector to be written according to the one-dimensional write address and the preset number of write data; wherein, the write Each element in the input data enable vector is used to indicate whether the element at the corresponding position in the data vector to be written is written; the first storage index vector is the parallel index corresponding to each element in the data vector to be written A vector composed of indexes of each storage subunit in the memory; the first storage address vector is a vector composed of addresses in each storage subunit corresponding to each element in the data vector to be written;

根据所述第一存储索引向量,对所述写入数据使能向量、所述第一存储地址向量和所述待写入数据向量进行重排序,根据重排序后的所述写入数据使能向量和所述第一存储地址向量将重排序后的所述待写入数据向量存入所述并行存储器。According to the first storage index vector, reorder the write data enable vector, the first storage address vector and the to-be-written data vector, and according to the reordered write data enable The vector and the first storage address vector store the reordered data vector to be written into the parallel memory.

根据本发明第二方面提供一种数据并行读取方法,包括:According to the second aspect of the present invention, a data parallel reading method is provided, including:

将待读取数据向量的读取数据索引变换为一维读取地址;其中,所述待读取数据向量为待读取的多维数据矩阵中的一个一维或多维向量;所述读取数据索引为所述待读取数据向量的所有元素中第一个待读取元素在所述待读取的多维数据矩阵中的索引;Transform the read data index of the data vector to be read into a one-dimensional read address; wherein, the data vector to be read is a one-dimensional or multi-dimensional vector in the multidimensional data matrix to be read; the read data The index is the index of the first element to be read among all the elements of the data vector to be read in the multidimensional data matrix to be read;

根据所述一维读取地址和预设读取数据个数,获取所述待读取数据向量的读取数据使能向量、第二存储索引向量和第二存储地址向量;其中,所述读取数据使能向量中的每个元素用于表示所述待读取数据向量中相应位置的元素是否读取;所述第二存储索引向量为所述待读取数据向量中各元素在并行存储器中各存储子单元的索引所构成的向量;所述第二存储地址向量为所述待读取数据向量中各元素在各所述存储子单元中的地址所构成的向量;Acquire the read data enable vector, the second storage index vector, and the second storage address vector of the data vector to be read according to the one-dimensional read address and the preset number of read data; wherein, the read Each element in the data enable vector is used to indicate whether the element at the corresponding position in the data vector to be read is read; the second storage index vector is the data vector to be read for each element in the parallel memory The vector formed by the index of each storage subunit; the second storage address vector is a vector formed by the address of each element in the data vector to be read in each storage subunit;

根据所述第二存储索引向量,对所述读取数据使能向量和所述第二存储地址向量进行重排序,根据重排序后的所述读取数据使能向量和所述第二存储地址向量从所述并行存储器中读取存储数据向量,根据所述第二存储索引向量对所述存储数据向量进行重排序,获取所述待读取数据向量。According to the second storage index vector, reorder the read data enable vector and the second storage address vector, and according to the reordered read data enable vector and the second storage address The vector reads the stored data vectors from the parallel memory, reorders the stored data vectors according to the second stored index vector, and acquires the to-be-read data vectors.

根据本发明第三方面提供一种数据并行写入装置,包括:According to a third aspect of the present invention, a data parallel writing device is provided, including:

第一变换模块,用于将待写入数据向量的写入数据索引变换为一维写入地址;其中,所述待写入数据向量为待写入的多维数据矩阵中的一个一维或多维向量,所述写入数据索引为所述待写入数据向量的所有元素中第一个待写入元素在所述待写入的多维数据矩阵中的索引待写入的多维数据矩阵中的索引;The first conversion module is used to convert the write data index of the data vector to be written into a one-dimensional write address; wherein, the data vector to be written is a one-dimensional or multi-dimensional data matrix to be written Vector, the write data index is the index of the first element to be written in the multidimensional data matrix to be written in the multidimensional data matrix to be written among all the elements of the data vector to be written ;

第一获取模块,用于根据所述一维写入地址和预设写入数据个数,获取所述待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量;其中,所述写入数据使能向量中的每个元素用于表示所述待写入数据向量中相应位置的元素是否写入;所述第一存储索引向量为所述待写入数据向量中各元素对应的并行存储器中各存储子单元的索引所构成的向量;所述第一存储地址向量为所述待写入数据向量中各元素对应的各存储子单元中的地址所构成的向量;A first acquisition module, configured to acquire a write data enable vector, a first storage index vector, and a first storage address of the data vector to be written according to the one-dimensional write address and the preset number of write data Vector; wherein, each element in the write data enable vector is used to indicate whether the element at the corresponding position in the data vector to be written is written; the first storage index vector is the data to be written A vector composed of indexes of each storage subunit in the parallel memory corresponding to each element in the vector; the first storage address vector is composed of addresses in each storage subunit corresponding to each element in the data vector to be written vector;

存入模块,用于根据所述第一存储索引向量,对所述写入数据使能向量、所述第一存储地址向量和所述待写入数据向量进行重排序,根据重排序后的所述写入数据使能向量和所述第一存储地址向量将重排序后的所述待写入数据向量存入所述并行存储器。A storage module, configured to reorder the write data enable vector, the first storage address vector, and the data vector to be written according to the first storage index vector, and according to the reordered The write data enable vector and the first storage address vector store the reordered data vector to be written into the parallel memory.

根据本发明第四方面提供一种数据并行读取装置,包括:According to a fourth aspect of the present invention, a data parallel reading device is provided, including:

第二变换模块,用于将待读取数据向量的读取数据索引变换为一维读取地址;其中,所述待读取数据向量为待读取的多维数据矩阵中的一个一维或多维向量,所述读取数据索引为所述待读取数据向量的所有元素中第一个待读取元素在所述待读取的多维数据矩阵中的索引;The second conversion module is used to convert the read data index of the data vector to be read into a one-dimensional read address; wherein, the data vector to be read is a one-dimensional or multi-dimensional data matrix to be read Vector, the read data index is the index of the first element to be read among all the elements of the data vector to be read in the multidimensional data matrix to be read;

第二获取模块,用于根据所述一维读取地址和预设读取数据个数,获取所述待读取数据向量的读取数据使能向量、第二存储索引向量和第二存储地址向量;其中,所述读取数据使能向量中的每个元素用于表示所述待读取数据向量中相应位置的元素是否读取;所述第二存储索引向量为所述待读取数据向量中各元素对应的并行存储器中各存储子单元的索引所构成的向量;所述第二存储地址向量为所述待读取数据向量中各元素对应的各存储子单元中的地址所构成的向量;A second acquisition module, configured to acquire the read data enable vector, the second storage index vector, and the second storage address of the data vector to be read according to the one-dimensional read address and the preset number of read data Vector; wherein, each element in the read data enable vector is used to indicate whether the element at the corresponding position in the data vector to be read is read; the second storage index vector is the data to be read A vector formed by indexes of each storage subunit in the parallel memory corresponding to each element in the vector; the second storage address vector is formed by addresses in each storage subunit corresponding to each element in the data vector to be read vector;

读取模块,用于根据所述第二存储索引向量,对所述读取数据使能向量和所述第二存储地址向量进行重排序,根据重排序后的所述读取数据使能向量和所述第二存储地址向量从所述并行存储器读取存储数据向量,根据所述第二存储索引向量对所述存储数据向量进行重排序,获取所述待读取数据向量。A reading module, configured to reorder the read data enable vector and the second storage address vector according to the second storage index vector, and reorder the read data enable vector and the second storage address vector according to the reordered read data enable vector and The second storage address vector reads the storage data vector from the parallel memory, reorders the storage data vector according to the second storage index vector, and acquires the to-be-read data vector.

根据本发明第五方面提供一种数据并行读写系统,包括:According to the fifth aspect of the present invention, a data parallel reading and writing system is provided, including:

并行存储器,以及上述数据并行写入装置和上述数据并行读取装置。A parallel memory, and the above-mentioned data parallel writing device and the above-mentioned data parallel reading device.

本发明提供一种数据并行写入、读取方法、装置及系统,该方法通过将待写入数据向量的写入数据索引变换为一维写入地址,根据一维写入地址和预设写入数据个数,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量,根据所述第一存储索引向量,对所述写入数据使能向量、所述第一存储地址向量和所述待写入数据向量进行重排序,根据重排序后的所述写入数据使能向量和所述第一存储地址向量将重排序后的所述待写入数据向量存入所述并行存储器,从而支持数据从一个或多个维度进行并行写入,提高了数据写入的灵活性和写入效率。The present invention provides a data parallel writing and reading method, device and system. The method transforms the written data index of the data vector to be written into a one-dimensional writing address, and according to the one-dimensional writing address and the preset writing Enter the number of data, obtain the write data enable vector, the first storage index vector and the first storage address vector of the data vector to be written, and according to the first storage index vector, write the data enable vector, The first storage address vector and the data vector to be written are reordered, and the reordered data to be written is reordered according to the reordered write data enable vector and the first storage address vector. The data vectors are stored in the parallel memory, thereby supporting parallel writing of data from one or more dimensions, and improving the flexibility and writing efficiency of data writing.

附图说明Description of drawings

图1为本发明实施例提供的数据并行写入方法整体流程示意图;FIG. 1 is a schematic diagram of the overall flow of a data parallel writing method provided by an embodiment of the present invention;

图2为本发明实施例提供的数据并行写入方法中对四维待写入数据向量在两个维度进行并行写入的示意图;2 is a schematic diagram of parallel writing of four-dimensional data vectors to be written in two dimensions in the data parallel writing method provided by the embodiment of the present invention;

图3为本发明实施例提供的数据并行写入方法中对四维待写入数据向量在一个维度进行并行写入的示意图;3 is a schematic diagram of parallel writing of four-dimensional data vectors to be written in one dimension in the data parallel writing method provided by the embodiment of the present invention;

图4为本发明实施例提供的数据并行写入方法中将写入数据索引变换为一维写入地址的示意图;FIG. 4 is a schematic diagram of converting a written data index into a one-dimensional written address in a data parallel writing method provided by an embodiment of the present invention;

图5为本发明实施例提供的数据并行写入方法中将写入重排序网络结构示意图;FIG. 5 is a schematic diagram of the write reordering network structure in the data parallel writing method provided by the embodiment of the present invention;

图6为本发明实施例提供的数据并行读取方法整体流程示意图;FIG. 6 is a schematic diagram of the overall flow of a data parallel reading method provided by an embodiment of the present invention;

图7为本发明实施例提供的数据并行写入装置整体结构示意图;7 is a schematic diagram of the overall structure of a data parallel writing device provided by an embodiment of the present invention;

图8为本发明实施例提供的数据并行读取装置整体结构示意图。FIG. 8 is a schematic diagram of an overall structure of a data parallel reading device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

在本发明的一个实施例中提供一种数据并行写入方法,图1为本发明实施例提供的数据并行写入方法整体流程示意图,该方法包括:S101,将待写入数据向量的写入数据索引变换为一维写入地址;其中,所述待写入数据向量为待写入的多维数据矩阵中的一个一维或多维向量,写入数据索引为待写入数据向量的所有元素中第一个待写入元素在待写入的多维数据矩阵中的索引待写入的多维数据矩阵中的索引;In one embodiment of the present invention, a data parallel writing method is provided. FIG. 1 is a schematic diagram of the overall flow of the data parallel writing method provided by the embodiment of the present invention. The method includes: S101, writing the data vector to be written The data index is converted into a one-dimensional write address; wherein, the data vector to be written is a one-dimensional or multi-dimensional vector in the multidimensional data matrix to be written, and the write data index is all elements of the data vector to be written The index of the first element to be written in the multidimensional data matrix to be written The index in the multidimensional data matrix to be written;

其中,待写入数据向量为需要进行并行写入的数据向量。写入数据索引为待写入数据向量中第一个待写入元素在待写入的多维数据矩阵中的索引。其中待写入元素为需要进行写入的元素。在控制信号w_ctrl0的控制下与w_ctrl1的控制下对写入数据索引进行变换,生成一个一维写入地址。例如,待写入数据向量为四维向量[dim3,dim2,dim1,dim0],其中,dim3、dim2、dim1和dim0表示不同维度。待写入数据向量在各维度上的尺寸分别为DIM3、DIM2、DIM1和DIM0。最大并行写入数据个数为S,待写入数据向量W_DATA为一个长度为N的向量。从dim1和dim0两个维度进行连续写入,如图2所示。从两个维度进行并行连续写入时,需要定义两个参数,即dim1维度最大并行读写个数K和dim0维度最大并行读写个数L,K*L=S。从dim0一个维度并行连续写入的形式如图3所示,这种情况下L=S,K=0。Wherein, the data vector to be written is a data vector that needs to be written in parallel. The write data index is the index of the first element to be written in the data vector to be written in the multidimensional data matrix to be written. The element to be written is an element that needs to be written. Under the control of the control signal w_ctrl0 and w_ctrl1, the write data index is converted to generate a one-dimensional write address. For example, the data vector to be written is a four-dimensional vector [dim3, dim2, dim1, dim0], where dim3, dim2, dim1 and dim0 represent different dimensions. The sizes of the data vectors to be written in each dimension are DIM3, DIM2, DIM1 and DIM0 respectively. The maximum number of data to be written in parallel is S, and the data vector W_DATA to be written is a vector with a length of N. Continuous writing is performed from two dimensions of dim1 and dim0, as shown in Figure 2. When performing parallel continuous writing from two dimensions, two parameters need to be defined, namely, the maximum number of parallel reads and writes K in the dimension dim1 and the maximum number L of parallel reads and writes in the dimension dim0, where K*L=S. The form of parallel continuous writing from one dimension of dim0 is shown in FIG. 3 , in this case L=S, K=0.

S102,根据一维写入地址和预设写入数据个数,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量;其中,写入数据使能向量中的每个元素用于表示待写入数据向量中相应位置的元素是否写入;第一存储索引向量为待写入数据向量中各元素对应的并行存储器中各存储子单元的索引所构成的向量;第一存储地址向量为待写入数据向量中各元素对应的各存储子单元中的地址所构成的向量;S102. According to the one-dimensional write address and the preset number of write data, obtain the write data enable vector, the first storage index vector and the first storage address vector of the data vector to be written; wherein, the write data enable Each element in the vector is used to indicate whether the element at the corresponding position in the data vector to be written is written; the first storage index vector is formed by the index of each storage subunit in the parallel memory corresponding to each element in the data vector to be written The vector; the first storage address vector is a vector composed of addresses in each storage subunit corresponding to each element in the data vector to be written;

其中,预设写入数据个数W_M为待写入数据向量中需要写入的元素的个数。根据一维写入地址w_base和预设写入数据个数W_M进行计算,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量。其中,写入数据使能向量、第一存储索引向量和第一存储地址向量分别为长度等于N的向量。写入数据使能向量W_BE中每个元素为0或1,用于表示待写入数据向量W_DATA中相应位置的元素是否写入,其中1表示写入,0表示不写入。第一存储索引向量W_BI为待写入数据向量W_DATA中每个元素对应的要存入到的并行存储器中存储子单元的索引所构成的向量。第一存储地址向量W_BA为待写入数据向量W_DATA将要存入到的各个存储子单元中的地址所构成的向量。Wherein, the preset number of written data W_M is the number of elements to be written in the data vector to be written. Calculate according to the one-dimensional write address w_base and the preset number of write data W_M, and obtain the write data enable vector, the first storage index vector and the first storage address vector of the data vector to be written. Wherein, the write data enable vector, the first storage index vector and the first storage address vector are respectively vectors with a length equal to N. Each element in the write data enable vector W_BE is 0 or 1, which is used to indicate whether the element at the corresponding position in the data vector W_DATA to be written is written, wherein 1 means write, and 0 means not write. The first storage index vector W_BI is a vector composed of indexes of storage subunits in the parallel memory corresponding to each element in the data vector W_DATA to be written. The first storage address vector W_BA is a vector composed of addresses in each storage subunit where the data vector W_DATA to be written will be stored.

S103,根据第一存储索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序,根据重排序后的写入数据使能向量和第一存储地址向量将重排序后的待写入数据向量存入并行存储器。S103. Reorder the write data enable vector, the first storage address vector, and the data vector to be written according to the first storage index vector, and reorder the write data enable vector and the first storage address vector according to the reordered write data enable vector and the first storage address vector. The reordered data vectors to be written are stored in the parallel memory.

具体地,将写入数据使能向量、第一存储地址向量和待写入数据向量输入写入重排序网络,根据第一存储索引向量,对写入数据使能向量W_BE、第一存储地址向量W_BA和待写入数据向量W_DATA进行重排序,获取重排序后的写入数据使能向量W_BE_R、第一存储地址向量W_BA_R和待写入数据向量W_DATA_R。其中W_BA_R为各存储子单元对应的要存入待写入数据向量中各元素的地址,W_BE_R为各个存储子单元是否使能的使能向量,W_DATA_R为各存储子单元对应的要存入的待写入数据向量中的元素。Specifically, the write data enable vector, the first storage address vector and the data vector to be written are input into the write reordering network, and according to the first storage index vector, the write data enable vector W_BE, the first storage address vector W_BA and the data vector W_DATA to be written are reordered to obtain the reordered write data enable vector W_BE_R, the first storage address vector W_BA_R and the data vector W_DATA_R to be written. Among them, W_BA_R is the address of each element corresponding to each storage subunit to be stored in the data vector to be written, W_BE_R is the enable vector of whether each storage subunit is enabled, and W_DATA_R is the corresponding to be stored in each storage subunit. Write to the elements in the data vector.

本实施例通过将待写入数据向量的写入数据索引变换为一维写入地址,根据一维写入地址和预设写入数据个数,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量,根据第一存储索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序,根据重排序后的写入数据使能向量和第一存储地址向量将重排序后的待写入数据向量存入并行存储器,从而支持数据从一个或多个维度进行并行写入,提高了数据写入的灵活性和写入效率。In this embodiment, the write data index of the data vector to be written is converted into a one-dimensional write address, and the write data enable of the data vector to be written is obtained according to the one-dimensional write address and the preset number of write data Vector, the first storage index vector and the first storage address vector, according to the first storage index vector, the write data enable vector, the first storage address vector and the data vector to be written are reordered, and according to the reordered write The input data enable vector and the first storage address vector store the reordered data vectors to be written into the parallel memory, thereby supporting data to be written in parallel from one or more dimensions, improving the flexibility of data writing and writing input efficiency.

在上述实施例的基础上,本实施例中步骤S101具体包括:对写入数据索引进行重排序,将重排序的写入数据索引中进行并行写入的各预设维度对应的索引值分别拆分为多个索引值;On the basis of the above embodiments, step S101 in this embodiment specifically includes: reordering the written data indexes, and decomposing the index values corresponding to the preset dimensions that are written in parallel in the reordered written data indexes respectively. Divided into multiple index values;

例如,如图4所示,在w_ctrl0的控制下将写入数据索引[dim0、dim1、dim2、dim3]进行重排序,得到重排序后的写入数据索引[dimnp1,dimnp0,dimp1,dimp0]。其中,dimp1和dimp0为并行写入的两个维度,dimnp1和dimnp0为不需要进行并行写入的两个维度,待写入数据向量dimnp1、dimnp0、dimp1和dimp0四个维度上的尺寸分别为DIMNP1、DIMNP0、DIMP1和DIMP0。For example, as shown in Fig. 4, under the control of w_ctrl0, the written data indexes [dim0, dim1, dim2, dim3] are reordered to obtain the reordered written data indexes [dimnp1, dimnp0, dimp1, dimp0]. Among them, dimp1 and dimp0 are the two dimensions for parallel writing, dimnp1 and dimnp0 are the two dimensions that do not need to be written in parallel, and the sizes of the four dimensions of the data vectors to be written dimnp1, dimnp0, dimp1 and dimp0 are DIMNP1 respectively , DIMNP0, DIMP1, and DIMP0.

定义dimp1和dimp0两个并行写入维度的最大并行写入个数分别为K和L。K和L可以依据并行存储器的性能进行设置。根据K将dimp1拆分为两维度,即dim1_p和dim1_b。其中,dim1_p=dimp1%K,dim1_b=dimp1//K。待写入数据向量在维度dim1_p上的尺寸为DIM1_P=K,在维度dim1_b上的尺寸为DIM1_B=DIMP1//K。根据L将dimp0拆分为两维度,即dim0_p和dim0_b。其中,dim0_p=dimp0%L,dim0_b=dimp0//L。待写入数据向量在维度dim0_p上的尺寸为DIM0_P=L,在维度dim0_b上的尺寸为DIM0_B=DIMP1//L。此时,待写入数据向量由四维被拆分为六维,拆分后的写入数据索引为[dimnp1,dimnp0,dimp1_b,dimp1_p,dimp0_b,dimp0_p]。待写入数据向量在每个维度上的尺寸分别为DIMNP1、DIMNP0、DIMP1_B、DIMP1_P、DIMP0_B和DIMP0_P。Define the maximum number of parallel writes for the two parallel write dimensions of dimp1 and dip0 to be K and L respectively. K and L can be set according to the performance of the parallel memory. Split dimp1 into two dimensions according to K, namely dim1_p and dim1_b. Among them, dim1_p=dimp1%K, dim1_b=dimp1//K. The size of the data vector to be written in the dimension dim1_p is DIM1_P=K, and the size in the dimension dim1_b is DIM1_B=DIMP1//K. Split dimp0 into two dimensions according to L, namely dim0_p and dim0_b. Among them, dim0_p=dimp0%L, dim0_b=dimp0//L. The dimension of the data vector to be written is DIM0_P=L on the dimension dim0_p, and the dimension on the dimension dim0_b is DIM0_B=DIMP1//L. At this time, the data vector to be written is split from four dimensions to six dimensions, and the index of the split write data is [dimnp1, dimnp0, dimp1_b, dimp1_p, dimp0_b, dimp0_p]. The sizes of the data vectors to be written in each dimension are DIMNP1, DIMNP0, DIMP1_B, DIMP1_P, DIMP0_B, and DIMP0_P.

在w_ctrl1的控制下对拆分后的写入数据索引再次进行重排序,根据再次重排序的写入数据索引进行计算,获取一维写入地址。Under the control of w_ctrl1, the split write data index is reordered again, and the one-dimensional write address is obtained by calculating according to the reordered write data index.

例如,将拆分后的写入数据索引[dimnp1,dimnp0,dimp1_b,dimp1_p,dimp0_b,dimp0_p]进行再次重排序,得到再次重排序的写入数据索引[dnp3,dnp2,dnp1,dnp0,dp1,dp0]。其中,dp1=dimp1_p,dp0=dimp0_p。dnp3,dnp2,dnp1,dnp0则是dimnp1,dimnp0,dimp1_b,dimp0_b再次重排序得到的。待写入数据向量在再次重排序后的在对应维度上的尺寸分别为DNP3、DNP2、DNP1、DNP0、DP1和DP0,其是通过对DIMNP1、DIMNP0、DIMP1_B、DIMP1_P、DIMP0_B和DIMP0_P进行同样的再次重排序而获取。根据再次重排序的写入数据索引进行计算,获取一维写入地址w_base的公式为:For example, reorder the split write data index [dimnp1,dimnp0,dimp1_b,dimp1_p,dimp0_b,dimp0_p] to get the reordered write data index [dnp3,dnp2,dnp1,dnp0,dp1,dp0 ]. Among them, dp1=dimp1_p, dp0=dimp0_p. dnp3, dnp2, dnp1, and dnp0 are reordered by dimnp1, dimnp0, dimp1_b, and dimp0_b. The sizes of the data vectors to be written in the corresponding dimensions after reordering are DNP3, DNP2, DNP1, DNP0, DP1, and DP0, which are obtained by performing the same reordering on DIMNP1, DIMNP0, DIMP1_B, DIMP1_P, DIMP0_B, and DIMP0_P. obtained by reordering. Calculated based on the reordered write data index, the formula for obtaining the one-dimensional write address w_base is:

w_base=dp0+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dn p2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1。本实施例不限于待写入数据向量的维数、进行数据拆分的维度和拆分成的维数,也不限于进行再次排序时不进行再次重排序的维度。w_base=dp0+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1. This embodiment is not limited to the dimension of the data vector to be written, the dimension to which the data is split, and the dimension to which the data is split, nor is it limited to the dimension not to be reordered when reordering is performed.

在上述实施例的基础上,本实施例中根据一维写入地址和预设写入数据个数,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量的步骤具体包括:根据预设写入数据个数,确定写入数据使能向量中值为1的元素的个数,根据待写入数据向量的长度与预设写入数据个数之间的差值,确定写入数据使能向量中值为0的元素的个数;根据待写入数据向量中各元素在待写入数据向量中的索引、一维写入地址和待写入数据向量的长度,获取第一存储索引向量和第一存储地址向量。On the basis of the above-mentioned embodiments, in this embodiment, according to the one-dimensional write address and the number of preset write data, the write data enable vector, the first storage index vector and the first storage index vector of the data vector to be written are obtained. The step of addressing the vector specifically includes: according to the preset number of written data, determining the number of elements with a value of 1 in the write data enable vector, and according to the length of the data vector to be written and the preset number of written data, Determine the number of elements with a value of 0 in the write data enable vector; according to the index of each element in the data vector to be written in the data vector to be written, the one-dimensional write address and the The length of the data vector, get the first storage index vector and the first storage address vector.

具体地,根据W_M可以得到写入数据使能向量W_BE的数学表示:W_BE=[W_M{1},(N-W_M){0}]。即待写入数据向量W_DATA中前W_M个元素写入,W_DATA中的其他元素不写入。第一存储索引向量W_BI的计算公式为:W_BI=(w_base+[0,1,2,…,N-1])%N。第一存储地址向量W_BA计算公式为:W_BA=(w_base+[0,1,2,…,N-1])//N。Specifically, according to W_M, a mathematical representation of the write data enable vector W_BE can be obtained: W_BE=[W_M{1}, (N−W_M){0}]. That is, the first W_M elements in the data vector W_DATA to be written are written, and other elements in W_DATA are not written. The calculation formula of the first storage index vector W_BI is: W_BI=(w_base+[0, 1, 2, . . . , N−1])%N. The formula for calculating the first storage address vector W_BA is: W_BA=(w_base+[0, 1, 2, . . . , N−1])//N.

在上述各实施例的基础上,本实施例中根据第一存储索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序的步骤具体包括:根据第一存储索引向量,获取各存储子单元的索引对应的元素在待写入数据向量中的索引所构成的写入数据索引向量;根据写入数据索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序。On the basis of the above-mentioned embodiments, in this embodiment, according to the first storage index vector, the step of reordering the write data enable vector, the first storage address vector and the data vector to be written specifically includes: according to the first Store the index vector, and obtain the write data index vector formed by the index of the element corresponding to the index of each storage subunit in the data vector to be written; according to the write data index vector, the write data enable vector, the first storage The address vector and the data vector to be written are reordered.

具体地,写入数据索引向量W_BI_R为与W_BI相对应的向量,表示并行存储器中各个存储子单元将要写入的元素在待写入数据向量W_DATA中的索引构成的索引向量,该向量可以通过W_BI计算得到,公式为:W_BI_R=(N-W_BI[0]+[0,1,2,…,N-1])%N。根据写入数据索引向量W_BI_R对写入数据使能向量W_BE、第一存储地址向量W_BA和待写入数据向量W_DATA进行重排序。在写入重排序网络中,根据W_BI_R对W_BE、W_BA和W_DATA分别独立进行相同的重排序,写入重排序网络的结构示意图如图5所示。Specifically, the write data index vector W_BI_R is a vector corresponding to W_BI, which represents an index vector formed by the index of the element to be written in each storage subunit in the parallel memory in the data vector W_DATA to be written, and the vector can be passed through W_BI Calculated, the formula is: W_BI_R=(N-W_BI[0]+[0,1,2,...,N-1])%N. The write data enable vector W_BE, the first storage address vector W_BA and the data to be written vector W_DATA are reordered according to the write data index vector W_BI_R. In the write reordering network, W_BE, W_BA, and W_DATA are independently reordered according to W_BI_R, and the structural diagram of the write reordering network is shown in Figure 5.

例如,需要将待写入数据向量为[dim3,dim2,dim1,dim0]写入由4个存储子单元构成的并行存储器,即N=4。待写入数据向量在各维度的尺寸对应为DIM3=12,DIM2=10,DIM1=8,DIM0=6。写入数据索引在4个维度上的索引值在w_ctrl0的控制下进行交织,输出与输入的关系为:dimnp1=dim1,dimnp0=dim0,dimp1=dim3,dimp0=dim2。待写入数据向量在各维度的尺寸为DIMNP1=DIM1=8,DIMNP0=DIM0=6,DIMP1=DIM3=12,DIMP0=DIM2=10。在dimp1和dimp0这两个维度分别最大并行读写个数为K=2,L=2。依据K和L分别将dimp1和dimp0拆分成两个维度,得到:dimp1_b=dimp1//K,dimp1_p=dimp1%K,dimp0_b=dimp0//L,dimp0_p=dimp0%L。待写入数据向量在拆分后得到的维度上的尺寸相应为DIMP1_B=DIMP1//K=6,DIMP1_P=K=2,DIMP0_B=DIMP0//L=5,DIMP0_P=L=2。将拆分后得到的六个维度数据dimnp1、dimnp0、dimp1_b、dimp1_p、dimp0_b、dimp0_p中的dimp1_p和dimp0_p直接作为最终的六个维度中的dp1和dp0;dimnp1、dimnp0、dimp1_b、dimp0_b要在w_ctrl1的控制下进行交织,输出与输入的关系为:dnp3=dimp1_b,dnp2=dimp0_b,dnp1=dimnp1,dnp0=dimnp0。待写入数据向量在各维度的尺寸为DNP3=DIMP1_B=6,DNP2=DIMP0_B=5,DNP1=DIMNP1=8,DNP0=DIMNP0=6,DP1=DIMP1_P=2,DP0=DIMP0_P=2。For example, the data vector [dim3, dim2, dim1, dim0] to be written needs to be written into a parallel memory composed of 4 storage sub-units, that is, N=4. The sizes of the data vectors to be written in each dimension correspond to DIM3=12, DIM2=10, DIM1=8, and DIM0=6. The index values of the written data index on the four dimensions are interleaved under the control of w_ctrl0, and the relationship between output and input is: dimnp1=dim1, dimnp0=dim0, dimp1=dim3, dimp0=dim2. The sizes of the data vectors to be written in each dimension are DIMNP1=DIM1=8, DIMNP0=DIM0=6, DIMP1=DIM3=12, DIMP0=DIM2=10. The maximum numbers of parallel reads and writes in the two dimensions of dip1 and dip0 are respectively K=2 and L=2. Dimp1 and dip0 are divided into two dimensions according to K and L respectively, and the following is obtained: dimp1_b=dimp1//K, dimp1_p=dimp1%K, dimp0_b=dimp0//L, dimp0_p=dimp0%L. The dimensions of the data vector to be written after splitting are DIMP1_B=DIMP1//K=6, DIMP1_P=K=2, DIMP0_B=DIMP0//L=5, DIMP0_P=L=2. Dimp1_p and dimp0_p in the six dimension data dimnp1, dimnp0, dimp1_b, dimp1_p, dimp0_b, and dimp0_p obtained after splitting are directly used as dp1 and dp0 in the final six dimensions; dimnp1, dimnp0, dimp1_b, and dimp0_b should be placed in the Interleaving is performed under control, and the relationship between output and input is: dnp3=dimp1_b, dnp2=dimnp0_b, dnp1=dimnp1, dnp0=dimnp0. The size of the data vector to be written in each dimension is DNP3=DIMP1_B=6, DNP2=DIMP0_B=5, DNP1=DIMNP1=8, DNP0=DIMNP0=6, DP1=DIMP1_P=2, DP0=DIMP0_P=2.

当写入数据索引w_index=[0,2,0,0],预设数据写入个数W_M=4,待写入数据向量W_DATA=[6,7,8,9]时,数据并行写入过程如下:When the written data index w_index=[0,2,0,0], the number of preset data written W_M=4, and the data vector to be written W_DATA=[6,7,8,9], the data is written in parallel The process is as follows:

根据写入数据索引w_index计算出w_base,即:Calculate w_base according to the written data index w_index, namely:

w_base=dp0_p+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1=2。w_base=dp0_p+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1=2.

根据w_base和W_M,计算出W_BE、W_BA和W_BI_R,即:According to w_base and W_M, calculate W_BE, W_BA and W_BI_R, namely:

W_BE=[W_M{1},(N-W_M){0}]=[1,1,1,1];W_BE=[W_M{1},(N-W_M){0}]=[1,1,1,1];

W_BI=(w_base+[0,1,2,…,N-1])%N=[2,3,0,1];W_BI=(w_base+[0,1,2,...,N-1])%N=[2,3,0,1];

W_BI_R=(N-W_BI[0]+[0,1,2,…,N-1])%N=[2,3,0,1];W_BI_R=(N-W_BI[0]+[0,1,2,...,N-1])%N=[2,3,0,1];

W_BA=(w_base+[0,1,2,…,N-1])//N=[0,0,1,1];W_BA=(w_base+[0,1,2,...,N-1])//N=[0,0,1,1];

W_BE、W_BA和W_DATA输入写入重排序网络,在W_BI_R的控制下,输出为:The W_BE, W_BA and W_DATA inputs are written to the reordering network, and under the control of W_BI_R, the output is:

W_BE_R=[1,1,1,1];W_BE_R=[1,1,1,1];

W_BA_R=[1,1,0,0];W_BA_R=[1,1,0,0];

W_DATA_R=[8,9,6,7]。W_DATA_R=[8,9,6,7].

根据W_BE_R、W_BA_R和W_BA_R对待写入数据向量进行并行写入。由于W_BE_R向量中各元素均为1,因此所有存储子单元均使能,W_BA_R给出了每个存储子单元的地址,W_DATA_R给出每个存储子单元要存储的数据。The data vectors to be written are written in parallel according to W_BE_R, W_BA_R and W_BA_R. Since each element in the W_BE_R vector is 1, all storage subunits are enabled, W_BA_R gives the address of each storage subunit, and W_DATA_R gives the data to be stored in each storage subunit.

在本发明的另一个实施例中提供一种数据并行读取方法,图6为本发明实施例提供的数据并行读取方法整体流程示意图,该方法包括:S601,将待读取数据向量的读取数据索引变换为一维读取地址;其中,所述待读取数据向量为待读取的多维数据矩阵中的一个一维或多维向量,读取数据索引为待读取数据向量的所有元素中第一个待读取元素在待读取的多维数据矩阵中的索引;In another embodiment of the present invention, a data parallel reading method is provided. FIG. 6 is a schematic diagram of the overall flow of the data parallel reading method provided by the embodiment of the present invention. The method includes: S601, read the data vector to be read Take the data index and transform it into a one-dimensional read address; wherein, the data vector to be read is a one-dimensional or multi-dimensional vector in the multidimensional data matrix to be read, and the read data index is all elements of the data vector to be read The index of the first element to be read in the multidimensional data matrix to be read;

其中,待读取数据向量为需要进行并行读取的数据向量。读取数据索引为待读取数据向量中第一个待读取元素在待读取的多维数据矩阵中的索引。其中待读取元素为需要进行读取的元素。在控制信号r_ctrl0与r_ctrl1的控制下对读取数据索引进行变换,生成一个一维读取地址。一维读取地址生成方法与一维写入地址生成方法相同。Wherein, the data vector to be read is a data vector that needs to be read in parallel. The read data index is the index of the first to-be-read element in the to-be-read data vector in the multi-dimensional data matrix to be read. The element to be read is an element that needs to be read. Under the control of the control signals r_ctrl0 and r_ctrl1, the read data index is converted to generate a one-dimensional read address. The one-dimensional read address generation method is the same as the one-dimensional write address generation method.

S602,根据一维读取地址和预设读取数据个数,获取待读取数据向量的读取数据使能向量、第二存储索引向量和第二存储地址向量;其中,读取数据使能向量中的每个元素用于表示待读取数据向量中相应位置的元素是否读取;第二存储索引向量为待读取数据向量中各元素在并行存储器中各存储子单元的索引所构成的向量;第二存储地址向量为待读取数据向量中各元素在各存储子单元中的地址所构成的向量;S602. According to the one-dimensional read address and the preset number of read data, obtain the read data enable vector, the second storage index vector and the second storage address vector of the data vector to be read; wherein, the read data enable Each element in the vector is used to indicate whether the element at the corresponding position in the data vector to be read is read; the second storage index vector is formed by the index of each storage subunit of each element in the data vector to be read in the parallel memory Vector; the second storage address vector is a vector formed by the address of each element in each storage subunit in the data vector to be read;

其中,预设读取数据个数为待读取数据向量中需要读取的元素的个数。根据一维读取地址R_base和预设读取数据个数R_M进行计算,获取待读取数据向量的读取数据使能向量、第二存储索引向量和第二存储地址向量。其中,读取数据使能向量、第二存储索引向量和第二存储地址向量分别为长度等于N的向量。读取数据使能向量R_BE中每个元素为0或1,用于表示待读取数据向量R_DATA中相应位置的元素是否读取,其中1表示读取,0表示不读取。第二存储索引向量R_BI为待读取数据向量R_DATA中每个元素在并行存储器中存储子单元的索引所构成的向量。第二存储地址向量R_BA为读取待读取数据向量R_DATA中每个元素在各存储子单元中的地址所构成的向量。Wherein, the preset number of read data is the number of elements to be read in the data vector to be read. Calculate according to the one-dimensional read address R_base and the preset number of read data R_M to obtain the read data enable vector, the second storage index vector and the second storage address vector of the data vector to be read. Wherein, the read data enable vector, the second storage index vector and the second storage address vector are respectively vectors with a length equal to N. Each element in the read data enable vector R_BE is 0 or 1, which is used to indicate whether to read the element at the corresponding position in the data vector R_DATA to be read, wherein 1 means read, and 0 means not read. The second storage index vector R_BI is a vector composed of indexes of storage subunits in the parallel memory for each element in the data vector R_DATA to be read. The second storage address vector R_BA is a vector formed by reading the address of each element in the data vector R_DATA to be read in each storage subunit.

S603,根据第二存储索引向量,对读取数据使能向量和第二存储地址向量进行重排序,根据重排序后的读取数据使能向量和第二存储地址向量从并行存储器读取存储数据向量,根据第二存储索引向量对存储数据向量进行重排序,获取待读取数据向量。S603, reorder the read data enable vector and the second storage address vector according to the second storage index vector, and read the stored data from the parallel memory according to the reordered read data enable vector and the second storage address vector A vector, reordering the stored data vectors according to the second stored index vector, to obtain the data vectors to be read.

具体地,将读取数据使能向量、第二存储地址向量和待读取数据向量输入读取重排序网络,根据第二存储索引向量R_BI对读取数据使能向量R_BE和第二存储地址向量R_BA进行重排序,获取重排序后的读取数据使能向量R_BE_R和第二存储地址向量R_BA_R。其中R_BA_R为各存储子单元中预先存储的要读取的待读取数据向量中各元素的地址,R_BE_R为各个存储子单元中预先存储的待读取数据向量中各元素是否读取的使能向量。根据R_BE_R和R_BA_R从并行存储器读取存储数据向量R_DATA_R。R_DATA_R中各元素的顺序是按照存储子单元的顺序进行排列的,与待读取数据向量R_DATA中元素的值相同但顺序不同。根据第二存储索引向量R_BI对R_DATA_R进行重排序,获取R_DATA。Specifically, the read data enable vector, the second storage address vector and the data vector to be read are input into the read reordering network, and the read data enable vector R_BE and the second storage address vector are read according to the second storage index vector R_BI R_BA performs reordering, and acquires the reordered read data enable vector R_BE_R and the second storage address vector R_BA_R. Wherein R_BA_R is the address of each element in the pre-stored data vector to be read to be read in each storage subunit, and R_BE_R is the enablement of whether to read each element in the pre-stored data vector to be read in each storage subunit vector. The stored data vector R_DATA_R is read from the parallel memory according to R_BE_R and R_BA_R. The order of each element in R_DATA_R is arranged according to the order of the storage subunits, which is the same as the value of the elements in the data vector R_DATA to be read but the order is different. R_DATA_R is reordered according to the second storage index vector R_BI to obtain R_DATA.

本实施例通过将待读取数据向量的读取数据索引变换为一维读取地址,根据一维读取地址和预设读取数据个数,获取待读取数据向量的写入数据使能向量、第二存储索引向量和第二存储地址向量,根据第二存储索引向量,对写入数据使能向量和第一存储地址向量进行重排序,根据重排序后的读取数据使能向量和第二存储地址向量从并行存储器中读取存储数据向量,根据第二存储索引向量对存储数据向量进行重排序,获取待读取数据向量,从而支持数据从一个或多个维度进行并行读取,提高了数据读取的灵活性和写入效率。In this embodiment, the read data index of the data vector to be read is converted into a one-dimensional read address, and the write data enable of the data vector to be read is obtained according to the one-dimensional read address and the preset number of read data Vector, the second storage index vector and the second storage address vector, according to the second storage index vector, the write data enable vector and the first storage address vector are reordered, according to the reordered read data enable vector and The second storage address vector reads the storage data vector from the parallel memory, reorders the storage data vector according to the second storage index vector, and obtains the data vector to be read, thereby supporting parallel reading of data from one or more dimensions, Improve the flexibility of data reading and writing efficiency.

在上述实施例的基础上,本实施例中步骤S601具体包括:将待读取数据向量的读取数据索引变换为一维读取地址的步骤具体包括:对读取数据索引进行重排序,将重排序的读取数据索引中进行并行读取的各预设维度对应的索引值分别拆分为多个索引值;对拆分后的读取数据索引再次进行重排序,根据再次重排序的读取数据索引进行计算,获取一维读取地址。On the basis of the above embodiments, step S601 in this embodiment specifically includes: the step of transforming the read data index of the data vector to be read into a one-dimensional read address specifically includes: reordering the read data index, and In the reordered read data index, the index values corresponding to the preset dimensions that are read in parallel are split into multiple index values; Take the data index for calculation and obtain the one-dimensional read address.

例如,将读取数据索引[dim0、dim1、dim2、dim3]在r_ctrl0的控制下进行重排序,得到重排序后的读取数据索引[dimnp1,dimnp0,dimp1,dimp0]。其中,dimp1和dimp0为并行读取的两个维度,dimnp1和dimnp0为不需要进行并行读取的两个维度,待读取数据向量dimnp1、dimnp0、dimp1和dimp0四个维度上的尺寸分别为DIMNP1、DIMNP0、DIMP1和DIMP0。For example, reorder the read data indexes [dim0, dim1, dim2, dim3] under the control of r_ctrl0 to obtain the reordered read data indexes [dimnp1, dimnp0, dimp1, dimp0]. Among them, dimp1 and dimp0 are the two dimensions of parallel reading, dimnp1 and dimnp0 are the two dimensions that do not need to be read in parallel, and the sizes of the four dimensions of the data vector dimnp1, dimnp0, dimp1 and dimp0 to be read are respectively DIMNP1 , DIMNP0, DIMP1, and DIMP0.

定义dimp1和dimp0两个并行读取维度的最大并行读取个数分别为K和L。K和L可以依据并行存储器的性能进行设置。根据K将dimp1拆分为两维度,即dim1_p和dim1_b。其中,dim1_p=dimp1%K,dim1_b=dimp1//K。待读取数据向量在维度dim1_p上的尺寸为DIM1_P=K,在维度dim1_b上的尺寸为DIM1_B=DIMP1//K。根据L将dimp0拆分为两维度,即dim0_p和dim0_b。其中,dim0_p=dimp0%L,dim0_b=dimp0//L。待读取数据向量在维度dim0_p上的尺寸为DIM0_P=L,在维度dim0_b上的尺寸为DIM0_B=DIMP1//L。此时,待读取数据向量由四维被拆分为六维,拆分后的读取数据索引为[dimnp1,dimnp0,dimp1_b,dimp1_p,dimp0_b,dimp0_p]。待读取数据向量在每个维度上的尺寸分别为DIMNP1、DIMNP0、DIMP1_B、DIMP1_P、DIMP0_B和DIMP0_P。Define the maximum number of parallel reads for the two parallel read dimensions of dimp1 and dip0 to be K and L respectively. K and L can be set according to the performance of the parallel memory. Split dimp1 into two dimensions according to K, namely dim1_p and dim1_b. Among them, dim1_p=dimp1%K, dim1_b=dimp1//K. The dimension of the data vector to be read is DIM1_P=K on the dimension dim1_p, and the dimension on the dimension dim1_b is DIM1_B=DIMP1//K. Split dimp0 into two dimensions according to L, namely dim0_p and dim0_b. Among them, dim0_p=dimp0%L, dim0_b=dimp0//L. The size of the data vector to be read on the dimension dim0_p is DIM0_P=L, and the size on the dimension dim0_b is DIM0_B=DIMP1//L. At this time, the data vector to be read is split from four dimensions into six dimensions, and the read data index after splitting is [dimnp1, dimnp0, dimp1_b, dimp1_p, dimp0_b, dimp0_p]. The sizes of the data vectors to be read in each dimension are DIMNP1, DIMNP0, DIMP1_B, DIMP1_P, DIMP0_B, and DIMP0_P.

对拆分后的读取数据索引再次进行重排序,根据再次重排序的读取数据索引进行计算,获取一维读取地址。The split read data indexes are reordered again, and calculation is performed based on the reordered read data indexes to obtain a one-dimensional read address.

例如,将拆分后的读取数据索引[dimnp1,dimnp0,dimp1_b,dimp1_p,dimp0_b,dimp0_p]在r_ctrl1的控制下进行再次重排序,得到再次重排序的读取数据索引[dnp3,dnp2,dnp1,dnp0,dp1,dp0]。其中,dp1=dimp1_p,dp0=dimp0_p。dnp3,dnp2,dnp1,dnp0则是dimnp1,dimnp0,dimp1_b,dimp0_b再次重排序得到的。待读取数据向量在再次重排序后的在对应维度上的尺寸分别为DNP3、DNP2、DNP1、DNP0、DP1和DP0,其是通过对DIMNP1、DIMNP0、DIMP1_B、DIMP1_P、DIMP0_B和DIMP0_P进行同样的再次重排序而获取。根据再次重排序的读取数据索引进行计算,获取一维读取地址w_base的公式为:For example, reorder the split read data indexes [dimnp1,dimnp0,dimp1_b,dimp1_p,dimp0_b,dimp0_p] under the control of r_ctrl1 to get the reordered read data indexes [dnp3,dnp2,dnp1, dnp0,dp1,dp0]. Among them, dp1=dimp1_p, dp0=dimp0_p. dnp3, dnp2, dnp1, and dnp0 are reordered by dimnp1, dimnp0, dimp1_b, and dimp0_b. The sizes of the data vectors to be read in the corresponding dimensions after reordering are DNP3, DNP2, DNP1, DNP0, DP1, and DP0, which are obtained by performing the same reordering on DIMNP1, DIMNP0, DIMP1_B, DIMP1_P, DIMP0_B, and DIMP0_P. obtained by reordering. Calculated based on the reordered read data index, the formula for obtaining the one-dimensional read address w_base is:

w_base=dp0+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dn p2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1。本实施例不限于待读取数据向量的维数、进行数据拆分的维度和拆分成的维数,也不限于进行再次排序时不进行再次重排序的维度。w_base=dp0+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1. This embodiment is not limited to the dimension of the data vector to be read, the dimension into which the data is split, and the dimension into which the data is split, nor is it limited to the dimension in which reordering is not performed when reordering is performed.

在上述实施例的基础上,本实施例中根据一维读取地址和预设读取数据个数,获取待读取数据向量的读取数据使能向量、第二存储索引向量和第二存储地址向量的步骤具体包括:On the basis of the above embodiments, in this embodiment, according to the one-dimensional read address and the preset number of read data, the read data enable vector, the second storage index vector and the second storage index vector of the data vector to be read are obtained. The steps of the address vector specifically include:

根据预设读取数据个数,确定读取数据使能向量中值为1的元素的个数,根据待读取数据向量的长度与预设读取数据个数之间的差值,确定读取数据使能向量中值为0的元素的个数;根据待读取数据向量中各元素在待读取数据向量中的索引、一维读取地址和待读取数据向量的长度,获取第二存储索引向量和第二存储地址向量。Determine the number of elements with a value of 1 in the read data enable vector according to the preset number of read data, and determine the read value according to the difference between the length of the data vector to be read and the preset number of read data. Get the number of elements whose value is 0 in the data enable vector; according to the index of each element in the data vector to be read, the one-dimensional read address and the length of the data vector to be read, obtain the first A second storage index vector and a second storage address vector.

具体地,根据R_M可以得到读取数据使能向量R_BE的数学表示:R_BE=[R_M{1},(N-R_M){0}]。即待读取数据向量R_DATA中前R_M个元素写入,R_DATA中的其他元素不读取。第二存储索引向量R_BI的计算公式为:R_BI=(r_base+[0,1,2,…,N-1])%N。第二存储地址向量R_BA计算公式为:R_BA=(r_base+[0,1,2,…,N-1])//N。Specifically, according to R_M, a mathematical representation of the read data enable vector R_BE can be obtained: R_BE=[R_M{1}, (N−R_M){0}]. That is, the first R_M elements in the data vector R_DATA to be read are written, and other elements in R_DATA are not read. The calculation formula of the second storage index vector R_BI is: R_BI=(r_base+[0, 1, 2, . . . , N−1])%N. The formula for calculating the second storage address vector R_BA is: R_BA=(r_base+[0, 1, 2, . . . , N−1])//N.

在上述各实施例的基础上,本实施例中根据第二存储索引向量,对读取数据使能向量和第二存储地址向量进行重排序的步骤具体包括:根据第二存储索引向量,获取各存储子单元的索引对应的元素在待读取数据向量中的索引所构成的读取数据索引向量;根据读取数据索引向量,对读取数据使能向量和第二存储地址向量进行重排序。On the basis of the above embodiments, in this embodiment, according to the second storage index vector, the step of reordering the read data enable vector and the second storage address vector specifically includes: according to the second storage index vector, obtaining each storing the read data index vector formed by the index of the element corresponding to the index of the subunit in the data vector to be read; reordering the read data enable vector and the second storage address vector according to the read data index vector.

具体地,读取数据索引向量R_BI_R为与R_BI相对应的向量,表示并行存储器中各个存储子单元将要读取的元素在待读取数据向量R_DATA中的索引构成的索引向量,该向量可以通过R_BI计算得到,公式为:R_BI_R=(N-R_BI[0]+[0,1,2,…,N-1])%N。根据读取数据索引向量R_BI_R对写入数据使能向量R_BE和第二存储地址向量R_BA进行重排序。在读取重排序网络中,根据R_BI_R对R_BE和R_BA分别独立进行相同的重排序,读取重排序网络的结构和写入中排序网络的结构相同。根据R_BE_R和R_BA_R从并行存储器读取存储数据向量R_DATA_R。R_DATA_R中各元素的顺序是按照存储子单元的顺序进行排列的,与待读取数据向量R_DATA中元素的值相同但顺序不同。根据R_BI对R_DATA_R进行重排序,获取R_DATA。Specifically, the read data index vector R_BI_R is a vector corresponding to R_BI, which represents an index vector formed by the index of the element to be read by each storage subunit in the parallel memory in the data vector R_DATA to be read, and the vector can be passed through R_BI Calculated, the formula is: R_BI_R=(N-R_BI[0]+[0,1,2,...,N-1])%N. The write data enable vector R_BE and the second storage address vector R_BA are reordered according to the read data index vector R_BI_R. In the read reordering network, R_BE and R_BA are independently reordered according to R_BI_R, and the structure of the read reordering network is the same as that of the writing reordering network. The stored data vector R_DATA_R is read from the parallel memory according to R_BE_R and R_BA_R. The order of each element in R_DATA_R is arranged according to the order of the storage subunits, which is the same as the value of the elements in the data vector R_DATA to be read but the order is different. Reorder R_DATA_R according to R_BI to obtain R_DATA.

例如,将上述实施例中所列举的写入到并行存储器中的待写入数据向量W_DATA=[6,7,8,9]进行读取。即待读取数据向量R_DATA=[6,7,8,9]时,读取数据索引r_dex=[0,2,0,0],预设读取数据个数R_M=4,数据并行读取过程如下:For example, read the data vector W_DATA=[6, 7, 8, 9] to be written into the parallel memory listed in the above embodiment. That is, when the data vector to be read is R_DATA=[6,7,8,9], the read data index r_dex=[0,2,0,0], the preset number of read data R_M=4, and the data is read in parallel The process is as follows:

根据读取数据索引r_index计算出r_base,即:Calculate r_base according to the read data index r_index, namely:

r_base=dp0_p+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1=2。r_base=dp0_p+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1=2.

根据r_base和R_M,计算出R_BE、R_BA和R_BI_R,即:According to r_base and R_M, calculate R_BE, R_BA and R_BI_R, namely:

R_BE=[R_M{1},(N-R_M){0}]=[1,1,1,1];R_BE=[R_M{1},(N-R_M){0}]=[1,1,1,1];

R_BI=(r_base+[0,1,2,…,N-1])%N=[2,3,0,1];R_BI=(r_base+[0,1,2,...,N-1])%N=[2,3,0,1];

R_BI_R=(N-R_BI[0]+[0,1,2,…,N-1])%N=[2,3,0,1];R_BI_R=(N-R_BI[0]+[0,1,2,...,N-1])%N=[2,3,0,1];

R_BA=(r_base+[0,1,2,…,N-1])//N=[0,0,1,1];R_BA=(r_base+[0,1,2,...,N-1])//N=[0,0,1,1];

r_BE和r_BA输入读取重排序网络,在R_BI_R的控制下,输出为:r_BE and r_BA input read reordering network, under the control of R_BI_R, the output is:

R_BE_R=[1,1,1,1];R_BE_R=[1,1,1,1];

R_BA_R=[1,1,0,0];R_BA_R=[1,1,0,0];

根据R_BE_R和R_BA_R和W_BA_R从并行存储器中读取存储数据向量R_DATA_R。由于R_BE_R向量中各元素均为1,因此所有存储子单元均使能,R_BA_R给出了每个存储子单元的地址,读取的数据为R_DATA_R=[8,9,6,7]。R_DATA_R与R_DATA中元素的值相同,但顺序不同,需要进行重排序。在R_BI的控制下对R_DATA_R进行重排序,获取R_DATA。Read storage data vector R_DATA_R from parallel memory according to R_BE_R and R_BA_R and W_BA_R. Since each element in the R_BE_R vector is 1, all storage subunits are enabled, R_BA_R gives the address of each storage subunit, and the read data is R_DATA_R=[8,9,6,7]. R_DATA_R has the same values as the elements in R_DATA, but in a different order and needs to be reordered. Reorder R_DATA_R under the control of R_BI to get R_DATA.

在本发明的另一个实施例中提供一种数据并行写入装置,参考图7。该装置用于实现上述各数据并行写入方法实施例。因此,在前述各实施例中的数据并行写入方法中的描述和定义,可以用于本发明实施例中各个执行模块的理解。In another embodiment of the present invention, a data parallel writing device is provided, refer to FIG. 7 . The device is used to implement the above embodiments of the data parallel writing method. Therefore, the descriptions and definitions in the data parallel writing methods in the foregoing embodiments can be used for the understanding of each execution module in the embodiments of the present invention.

数据并行写入装置包括:第一变换模块701用于将待写入数据向量的写入数据索引变换为一维写入地址;其中,所述待写入数据向量为待写入的多维数据矩阵中的一个一维或多维向量,写入数据索引为待写入数据向量的所有元素中第一个待写入元素在待写入的多维数据矩阵中的索引;第一获取模块702用于根据一维写入地址和预设写入数据个数,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量;其中,写入数据使能向量中的每个元素用于表示待写入数据向量中相应位置的元素是否写入;第一存储索引向量为待写入数据向量中各元素对应的并行存储器中各存储子单元的索引所构成的向量;第一存储地址向量为待写入数据向量中各元素对应的各存储子单元中的地址所构成的向量;存入模块703用于根据第一存储索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序,根据重排序后的写入数据使能向量和第一存储地址向量将重排序后的待写入数据向量存入并行存储器。The data parallel writing device includes: a first conversion module 701 for converting the write data index of the data vector to be written into a one-dimensional write address; wherein, the data vector to be written is a multidimensional data matrix to be written A one-dimensional or multidimensional vector in , the write data index is the index of the first element to be written in the multidimensional data matrix to be written among all elements of the data vector to be written; the first acquisition module 702 is used for according to The one-dimensional write address and the number of preset write data are used to obtain the write data enable vector, the first storage index vector and the first storage address vector of the data vector to be written; wherein, the write data in the enable vector Each element is used to indicate whether the element at the corresponding position in the data vector to be written is written; the first storage index vector is a vector composed of indexes of each storage subunit in the parallel memory corresponding to each element in the data vector to be written; The first storage address vector is a vector composed of addresses in each storage subunit corresponding to each element in the data vector to be written; the storage module 703 is used to enable the write data vector, the first storage index vector according to the first storage index vector A storage address vector and the data vector to be written are reordered, and the reordered data vector to be written is stored in the parallel memory according to the reordered write data enable vector and the first storage address vector.

在上述实施例的基础上,本实施例中第一变换模块具体用于:对写入数据索引进行重排序,将重排序的写入数据索引中进行并行写入的各预设维度对应的索引值分别拆分为多个索引值;对拆分后的写入数据索引再次进行重排序,根据再次重排序的写入数据索引进行计算,获取一维写入地址。On the basis of the above embodiments, the first transformation module in this embodiment is specifically used to: reorder the written data indexes, and reorder the indexes corresponding to the preset dimensions that are written in parallel in the reordered written data indexes The value is split into multiple index values respectively; the split write data index is reordered again, and calculation is performed based on the reordered write data index to obtain a one-dimensional write address.

在上述实施例的基础上,本实施例中第一获取模块具体用于:根据预设写入数据个数,确定写入数据使能向量中值为1的元素的个数,根据待写入数据向量的长度与预设写入数据个数之间的差值,确定写入数据使能向量中值为0的元素的个数;根据待写入数据向量中各元素在待写入数据向量中的索引、一维写入地址和待写入数据向量的长度,获取第一存储索引向量和第一存储地址向量。On the basis of the above embodiments, the first acquisition module in this embodiment is specifically configured to: determine the number of elements with a value of 1 in the write data enable vector according to the preset number of write data, and determine the number of elements with a value of 1 in the write data enable vector according to The difference between the length of the data vector and the number of preset write data determines the number of elements whose value is 0 in the write data enable vector; The index in, the one-dimensional write address and the length of the data vector to be written are used to obtain the first storage index vector and the first storage address vector.

在上述各实施例的基础上,本实施例中写入模块具体用于:根据第一存储索引向量,获取各存储子单元的索引对应的元素在待写入数据向量中的索引所构成的写入数据索引向量;根据写入数据索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序。On the basis of the above-mentioned embodiments, the writing module in this embodiment is specifically configured to: obtain the index of the element corresponding to the index of each storage subunit in the data vector to be written according to the first storage index vector. Entering the data index vector; reordering the write data enable vector, the first storage address vector and the data vector to be written according to the write data index vector.

通过将待写入数据向量的写入数据索引变换为一维写入地址,根据一维写入地址和预设写入数据个数,获取待写入数据向量的写入数据使能向量、第一存储索引向量和第一存储地址向量,根据第一存储索引向量,对写入数据使能向量、第一存储地址向量和待写入数据向量进行重排序,根据重排序后的写入数据使能向量和第一存储地址向量将重排序后的待写入数据向量存入并行存储器,从而支持数据从一个或多个维度进行并行写入,提高了数据写入的灵活性和写入效率。By transforming the write data index of the data vector to be written into a one-dimensional write address, according to the one-dimensional write address and the preset number of write data, the write data enable vector and the first data vector of the data vector to be written are obtained A storage index vector and a first storage address vector, according to the first storage index vector, the write data enable vector, the first storage address vector and the data vector to be written are reordered, and according to the reordered write data use The energy vector and the first storage address vector store the reordered data vectors to be written into the parallel memory, thereby supporting parallel writing of data from one or more dimensions, and improving the flexibility and writing efficiency of data writing.

在本发明的另一个实施例中提供一种数据并行读取装置,参考图8。该装置用于实现上述各数据并行读取方法实施例。因此,在前述各实施例中的数据并行读取方法中的描述和定义,可以用于本发明实施例中各个执行模块的理解。In another embodiment of the present invention, a device for reading data in parallel is provided, refer to FIG. 8 . The device is used to implement the above embodiments of the data parallel reading method. Therefore, the descriptions and definitions in the data parallel reading methods in the foregoing embodiments can be used to understand each execution module in the embodiments of the present invention.

数据并行写入装置包括:第二变换模块801用于将待读取数据向量的读取数据索引变换为一维读取地址;其中,所述待读取数据向量为待读取的多维数据矩阵中的一个一维或多维向量,读取数据索引为待读取数据向量的所有元素中第一个待读取元素在待读取的多维数据矩阵中的索引;第二获取模块802用于根据一维读取地址和预设读取数据个数,获取待读取数据向量的读取数据使能向量、第二存储索引向量和第二存储地址向量;其中,读取数据使能向量中的每个元素用于表示待读取数据向量中相应位置的元素是否读取;第二存储索引向量为待读取数据向量中各元素对应的并行存储器中各存储子单元的索引所构成的向量;第二存储地址向量为待读取数据向量中各元素对应的各存储子单元中的地址所构成的向量;读取模块803用于根据第二存储索引向量,对读取数据使能向量和第二存储地址向量进行重排序,根据重排序后的读取数据使能向量和第二存储地址向量从并行存储器读取存储数据向量,根据第二存储索引向量对存储数据向量进行重排序,获取待读取数据向量。The data parallel writing device includes: a second transformation module 801 for transforming the read data index of the data vector to be read into a one-dimensional read address; wherein, the data vector to be read is a multidimensional data matrix to be read A one-dimensional or multidimensional vector in , the read data index is the index of the first element to be read in the multidimensional data matrix to be read among all elements of the data vector to be read; the second acquisition module 802 is used for according to The one-dimensional read address and the preset number of read data obtain the read data enable vector, the second storage index vector and the second storage address vector of the data vector to be read; wherein, the read data enable vector Each element is used to indicate whether the element at the corresponding position in the data vector to be read is read; the second storage index vector is a vector formed by indexes of each storage subunit in the parallel memory corresponding to each element in the data vector to be read; The second storage address vector is a vector composed of addresses in each storage subunit corresponding to each element in the data vector to be read; the reading module 803 is used to enable the read data vector and the second storage index vector according to the second storage index vector The two storage address vectors are reordered, and the storage data vectors are read from the parallel memory according to the reordered read data enable vector and the second storage address vector, and the storage data vectors are reordered according to the second storage index vector to obtain the pending Read the data vector.

在上述实施例的基础上,本实施例中第二变换模块具体用于:对读取数据索引进行重排序,将重排序的读取数据索引中进行并行读取的各预设维度对应的索引值分别拆分为多个索引值;对拆分后的读取数据索引再次进行重排序,根据再次重排序的读取数据索引进行计算,获取一维读取地址。On the basis of the above embodiments, the second transformation module in this embodiment is specifically used to: reorder the read data indexes, and reorder the indexes corresponding to the preset dimensions that are read in parallel in the reordered read data indexes The values are split into multiple index values; the read data indexes after splitting are reordered again, and the one-dimensional read address is obtained by calculating according to the reordered read data indexes.

在上述实施例的基础上,本实施例中第二获取模块具体用于:根据预设读取数据个数,确定读取数据使能向量中值为1的元素的个数,根据待读取数据向量的长度与预设读取数据个数之间的差值,确定读取数据使能向量中值为0的元素的个数;根据待读取数据向量中各元素在待读取数据向量中的索引、一维读取地址和待读取数据向量的长度,获取第二存储索引向量和第二存储地址向量。On the basis of the above embodiments, the second acquisition module in this embodiment is specifically configured to: determine the number of elements with a value of 1 in the read data enable vector according to the preset number of read data, and determine the number of elements with a value of 1 in the read data enable vector according to the The difference between the length of the data vector and the preset number of read data determines the number of elements whose value is 0 in the read data enable vector; The index, the one-dimensional read address and the length of the data vector to be read obtain the second storage index vector and the second storage address vector.

在上述各实施例的基础上,本实施例中读取模块具体用于:根据第二存储索引向量,获取各存储子单元的索引对应的元素在待读取数据向量中的索引所构成的读取数据索引向量;根据读取数据索引向量,对读取数据使能向量和第二存储地址向量进行重排序。On the basis of the above-mentioned embodiments, the reading module in this embodiment is specifically configured to: obtain the index of the element corresponding to the index of each storage subunit in the data vector to be read according to the second storage index vector. The data index vector is fetched; according to the read data index vector, the read data enable vector and the second storage address vector are reordered.

本实施例通过将待读取数据向量的读取数据索引变换为一维读取地址,根据一维读取地址和预设读取数据个数,获取待读取数据向量的写入数据使能向量、第二存储索引向量和第二存储地址向量,根据第二存储索引向量,对写入数据使能向量和第一存储地址向量进行重排序,根据重排序后的读取数据使能向量和第二存储地址向量从并行存储器中读取存储数据向量,根据第二存储索引向量对存储数据向量进行重排序,获取待读取数据向量,从而支持数据从一个或多个维度进行并行读取,提高了数据读取的灵活性和写入效率。In this embodiment, the read data index of the data vector to be read is converted into a one-dimensional read address, and the write data enable of the data vector to be read is obtained according to the one-dimensional read address and the preset number of read data Vector, the second storage index vector and the second storage address vector, according to the second storage index vector, the write data enable vector and the first storage address vector are reordered, according to the reordered read data enable vector and The second storage address vector reads the storage data vector from the parallel memory, reorders the storage data vector according to the second storage index vector, and obtains the data vector to be read, thereby supporting parallel reading of data from one or more dimensions, Improve the flexibility of data reading and writing efficiency.

在本发明的另一个实施例中提供一种数据并行读写系统,该数据并行读写系统包括并行存储器、上述各数据并行写入装置实施例中的任一数据并行写入装置和上述各数据并行读取装置实施例中的任一数据并行读取装置。In another embodiment of the present invention, a data parallel reading and writing system is provided, the data parallel reading and writing system includes a parallel memory, any data parallel writing device in the embodiments of the above-mentioned data parallel writing devices, and the above-mentioned data Any data parallel reading device in the parallel reading device embodiment.

最后,本申请的方法仅为较佳的实施方案,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。Finally, the method of the present application is only a preferred embodiment, and is not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (11)

1. a kind of data parallel wiring method characterized by comprising
The write-in data directory for being written into data vector is transformed to one-dimensional writing address;Wherein, the data vector to be written For an one or more dimensions vector in multidimensional data matrix to be written, said write data directory is the data to be written Index of first element to be written in the multidimensional data matrix to be written in all elements of vector;
According to the one-dimensional writing address and default write-in data amount check, the write-in data for obtaining the data vector to be written make It can vector, the first storage index vector and the first storage address vector;Wherein, said write data enable each member in vector Element is for indicating whether the element of corresponding position in the data vector to be written is written;The first storage index vector is institute State the vector that the index of each storing sub-units in the corresponding parallel storage of each element in data vector to be written is constituted;It is described First storage address vector is made of the address in the data vector to be written in the corresponding each storing sub-units of each element Vector;
According to the first storage index vector, vector, the first storage address vector sum institute are enabled to said write data It states data vector to be written to reorder, enables the first storage ground described in vector sum according to the said write data after reordering The data vector to be written after location vector will reorder is stored in the parallel storage.
2. the method according to claim 1, wherein the write-in data directory for being written into data vector is transformed to The step of one-dimensional writing address, specifically includes:
It reorders to said write data directory, it is each by being written in parallel in the said write data directory to reorder The default corresponding index value of dimension is split as multiple index values respectively;
It reorders again to the said write data directory after fractionation, according to the said write data directory to reorder again It is calculated, obtains the one-dimensional writing address.
3. the method according to claim 1, wherein according to the one-dimensional writing address and default write-in data Number, obtain the data vector to be written write-in data enable vector, the first storage index vector and the first storage address to The step of amount, specifically includes:
According to the default write-in data amount check, determine that said write data enable the number for the element that vector intermediate value is 1, according to Difference between the length of the data vector to be written and the default write-in data amount check, determines that said write data are enabled The number for the element that vector intermediate value is 0;
According to index, the one-dimensional write-in ground of each element in the data vector to be written in the data vector to be written The length of location and the data vector to be written obtains the first storage index vector and the first storage address vector.
4. method according to claim 1 to 3, which is characterized in that according to the first storage index vector, to institute State the step of write-in data enable vector, data vector to be written described in the first storage address vector sum is reordered tool Body includes:
According to the first storage index vector, the corresponding element of index of each storing sub-units is obtained described to be written The write-in data directory vector that index in data vector is constituted;
According to said write data directory vector, vector, the first storage address vector sum institute are enabled to said write data Data vector to be written is stated to reorder.
5. a kind of data parallel read method characterized by comprising
The reading data directory of data vector to be read is transformed to one-dimensional reading address;Wherein, the data vector to be read For an one or more dimensions vector in multidimensional data matrix to be read, the reading data directory is the data to be read Index of first element to be read in the multidimensional data matrix to be read in all elements of vector;
According to the one-dimensional reading address and default reading data amount check, the reading data for obtaining the data vector to be read make It can vector, the second storage index vector and the second storage address vector;Wherein, each member read in the enabled vector of data Element is for indicating whether the element of corresponding position in the data vector to be read reads;The second storage index vector is institute State each element vector that the index of each storing sub-units is constituted in parallel storage in data vector to be read;Described second Storage address vector by address of each element in each storing sub-units in the data vector to be read constitute to Amount;
According to the second storage index vector, the second storage address vector described in vector sum is enabled to the reading data and is carried out It reorders, enables the second storage address vector described in vector sum from the parallel memorizing according to the reading data after reordering Storing data vector is read in device, is reordered, is obtained to the storing data vector according to the second storage index vector Take the data vector to be read.
6. according to the method described in claim 5, it is characterized in that, the reading data directory of data vector to be read is transformed to The step of one-dimensional reading address, specifically includes:
It reorders to the reading data directory, it is each by being read parallel in the reading data directory to reorder The default corresponding index value of dimension is split as multiple index values respectively;
It reorders again to the reading data directory after fractionation, according to the reading data directory to reorder again It is calculated, obtains the one-dimensional reading address.
7. according to the method described in claim 5, it is characterized in that, according to the one-dimensional reading address and default reading data Number, obtain the data vector to be read reading data enable vector, the second storage index vector and the second storage address to The step of amount, specifically includes:
According to the default reading data amount check, determine that the reading data enable the number for the element that vector intermediate value is 1, according to The length of the data vector to be read and the default difference read between data amount check, determine that the reading data are enabled The number for the element that vector intermediate value is 0;
According to index, the one-dimensional reading ground of each element in the data vector to be read in the data vector to be read The length of location and the data vector to be read obtains the second storage index vector and the second storage address vector.
8. according to any method of claim 5-7, which is characterized in that according to the second storage index vector, to institute The step of the second storage address vector described in the enabled vector sum of reading data is reordered is stated to specifically include:
According to the second storage index vector, the corresponding element of index of each storing sub-units is obtained described to be read The reading data directory vector that index in data vector is constituted;
According to the reading data directory vector, the second storage address vector described in vector sum is enabled to the reading data and is carried out It reorders.
9. a kind of data parallel writing station characterized by comprising
First conversion module, the write-in data directory for being written into data vector are transformed to one-dimensional writing address;Wherein, institute Stating data vector to be written is an one or more dimensions vector in multidimensional data matrix to be written, said write data directory It is first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written Index;
First obtains module, for obtaining the number to be written according to the one-dimensional writing address and default write-in data amount check Vector, the first storage index vector and the first storage address vector are enabled according to the write-in data of vector;Wherein, said write data Each element in enabled vector is for indicating whether the element of corresponding position in the data vector to be written is written;Described One storage index vector is the rope of each storing sub-units in the corresponding parallel storage of each element in the data vector to be written Draw constituted vector;The first storage address vector is corresponding each storage of each element in the data vector to be written The vector that address in unit is constituted;
It is stored in module, for enabling vector, first storage to said write data according to the first storage index vector Address vector and the data vector to be written reorder, and enable vector sum institute according to the said write data after reordering The data vector to be written after stating the first storage address vector and reordering is stored in the parallel storage.
10. a kind of data parallel reading device characterized by comprising
Second conversion module, for the reading data directory of data vector to be read to be transformed to one-dimensional reading address;Wherein, institute Stating data vector to be read is an one or more dimensions vector in multidimensional data matrix to be read, the reading data directory It is first element to be read in all elements of the data vector to be read in the multidimensional data matrix to be read Index;
Second obtains module, for reading data amount check with default according to the one-dimensional reading address, access of continuing described in acquisition Vector, the second storage index vector and the second storage address vector are enabled according to the reading data of vector;Wherein, the reading data Each element in enabled vector is for indicating whether the element of corresponding position in the data vector to be read reads;Described Two storage index vectors are the rope of each storing sub-units in the corresponding parallel storage of each element in the data vector to be read Draw constituted vector;The second storage address vector is corresponding each storage of each element in the data vector to be read The vector that address in unit is constituted;
Read module, for being deposited to described in the enabled vector sum of the reading data second according to the second storage index vector Storage address vector reorders, and enables the second storage address vector described in vector sum according to the reading data after reordering From the parallel storage read storing data vector, according to it is described second storage index vector to the storing data vector into Rearrangement sequence obtains the data vector to be read.
11. a kind of data parallel read-write system, which is characterized in that including parallel storage, and as described in claim 9 and 10 Device.
CN201810614178.3A 2018-06-14 2018-06-14 Data parallel writing and reading method, device and system Expired - Fee Related CN108984115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810614178.3A CN108984115B (en) 2018-06-14 2018-06-14 Data parallel writing and reading method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810614178.3A CN108984115B (en) 2018-06-14 2018-06-14 Data parallel writing and reading method, device and system

Publications (2)

Publication Number Publication Date
CN108984115A true CN108984115A (en) 2018-12-11
CN108984115B CN108984115B (en) 2020-07-28

Family

ID=64540446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810614178.3A Expired - Fee Related CN108984115B (en) 2018-06-14 2018-06-14 Data parallel writing and reading method, device and system

Country Status (1)

Country Link
CN (1) CN108984115B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1971537A (en) * 2005-11-25 2007-05-30 杭州中天微系统有限公司 Access method of matrix data and storage device of the matrix data
CN101478608A (en) * 2009-01-09 2009-07-08 南京联创科技股份有限公司 Fast operating method for mass data based on two-dimensional hash
US20150234662A1 (en) * 2014-02-19 2015-08-20 Mediatek Inc. Apparatus for mutual-transposition of scalar and vector data sets and related method
CN104978148A (en) * 2014-04-09 2015-10-14 瑞萨电子(中国)有限公司 Data writing method and device and data reading method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1971537A (en) * 2005-11-25 2007-05-30 杭州中天微系统有限公司 Access method of matrix data and storage device of the matrix data
CN101478608A (en) * 2009-01-09 2009-07-08 南京联创科技股份有限公司 Fast operating method for mass data based on two-dimensional hash
US20150234662A1 (en) * 2014-02-19 2015-08-20 Mediatek Inc. Apparatus for mutual-transposition of scalar and vector data sets and related method
CN104978148A (en) * 2014-04-09 2015-10-14 瑞萨电子(中国)有限公司 Data writing method and device and data reading method and device

Also Published As

Publication number Publication date
CN108984115B (en) 2020-07-28

Similar Documents

Publication Publication Date Title
US11182667B2 (en) Minimizing memory reads and increasing performance by leveraging aligned blob data in a processing unit of a neural network environment
CN109948774B (en) Neural network accelerator based on network layer binding operation and implementation method thereof
CN106021182B (en) A kind of row transposition architecture design method based on Two-dimensional FFT processor
JP2013037517A5 (en)
CN103955446B (en) DSP-chip-based FFT computing method with variable length
US11030714B2 (en) Wide key hash table for a graphics processing unit
US20190361631A1 (en) Storage device, chip and method for controlling storage device
US12056055B2 (en) Data processing device and related product
US11487342B2 (en) Reducing power consumption in a neural network environment using data management
CN116010299A (en) Data processing method, device, equipment and readable storage medium
CN115221102A (en) Method for optimizing convolution operation of system on chip and related product
CN110046154A (en) The method and apparatus of filter operation are efficiently executed in relational database in memory
JP7034336B2 (en) Methods, equipment, and related products for processing data
CN109359729A (en) It is a kind of to realize data cached system and method on FPGA
WO2015094721A2 (en) Apparatuses and methods for writing masked data to a buffer
US9268744B2 (en) Parallel bit reversal devices and methods
CN108984115B (en) Data parallel writing and reading method, device and system
CN104992425B (en) A kind of DEM super-resolution methods accelerated based on GPU
CN118427136A (en) Direct memory access device, operation method and data processing device
US11914587B2 (en) Systems and methods for key-based indexing in storage devices
KR102502326B1 (en) Overhead reduction in data transfer protocol for nand memory
WO2021082723A1 (en) Operation apparatus
CN102622318B (en) Storage controlling circuit and vector data addressing method controlled by same
US12106109B2 (en) Data processing apparatus and related product
CN118502681B (en) Method, system for storing data sets and method for training models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200728

Termination date: 20210614