[go: up one dir, main page]

CN102508803A - Matrix transposition memory controller - Google Patents

Matrix transposition memory controller Download PDF

Info

Publication number
CN102508803A
CN102508803A CN2011103936607A CN201110393660A CN102508803A CN 102508803 A CN102508803 A CN 102508803A CN 2011103936607 A CN2011103936607 A CN 2011103936607A CN 201110393660 A CN201110393660 A CN 201110393660A CN 102508803 A CN102508803 A CN 102508803A
Authority
CN
China
Prior art keywords
matrix
data
transposition
storage unit
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103936607A
Other languages
Chinese (zh)
Inventor
李丽
潘红兵
郑艳丽
王佳文
沙金
何书专
郑维山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN2011103936607A priority Critical patent/CN102508803A/en
Publication of CN102508803A publication Critical patent/CN102508803A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

本发明公开了一种矩阵转置存储控制器,该矩阵转置存储控制器包括处理器、总线、直接内存存取、网络接口、存储单元、中断和程序存储器;处理器通过总线与直接内存存取、网络接口,中断、程序存储器进行数据传输;存储单元通过直接内存存取与总线及网络接口连接;网络接口与总线连接,并通过片上网络与外部存储器连接。本发明选择SRAM作为存储器,控制简单,读写操作容易,在处理器的控制下,能充分发挥SRAM作为存储器的优势。本发明提供了三种转置模式;使用处理器控制完成矩阵转置,能灵活地选择转置方法,适用于对各种大小的矩阵进行转置,并具有很好的扩展性。同时,本发明使用DMA作为数据通道,可提供高速的数据传输速率。本发明适用于片上网络中。

Figure 201110393660

The invention discloses a matrix transposition storage controller, which comprises a processor, a bus, direct memory access, a network interface, a storage unit, an interrupt and a program memory; the processor communicates with the direct memory through the bus Fetch, network interface, interrupt, and program memory for data transmission; the storage unit is connected to the bus and network interface through direct memory access; the network interface is connected to the bus, and connected to the external memory through the on-chip network. The invention selects SRAM as the memory, which is easy to control and easy to read and write. Under the control of the processor, the advantages of the SRAM as the memory can be fully utilized. The invention provides three transposition modes; the matrix transposition is completed through the control of a processor, the transposition method can be flexibly selected, it is suitable for transposing matrices of various sizes, and has good expansibility. At the same time, the present invention uses DMA as a data channel, which can provide high-speed data transmission rate. The present invention is suitable for on-chip network.

Figure 201110393660

Description

A kind of matrix transpose memory controller
Technical field
The present invention relates to a kind of transposition memory controller that is applied to matrix transpose, especially a kind of matrix that is fit to multiple size adopts the method for software-hardware synergism design can select the transposition algorithm to carry out the transposition memory controller of matrix transpose flexibly; Specifically, be a kind of matrix transpose memory controller that is applicable to network-on-chip.
Background technology
The matrix data structure is used various fields such as engineering calculation, image and signal Processing always, usually relates to this data handling procedure of matrix transpose.And the speed of matrix transpose has very big influence to the data treatment effeciency.
For extensive matrix transpose, if directly matrix is read in the high-speed cache, must frequent access external memory storage, so greatly reduction transposition efficient.Accomplish matrix transpose so generally adopt hardware.
When using SDRAM to make storer, because the row of SDRAM read-write speed is very low, this causes each access memory all to consume great amount of time.And use the integrated small scale of SRAM, price high.Correct choice storer and to make them can accomplish matrix transpose efficiently be a common problem of matrix transpose how.
At present, the realization matrix transposition mainly uses PLD control store unit to realize the transposition storage, perhaps adopts microprocessor controls to realize the transposition storage; The former realizes that transposition control is comparatively complicated, and latter's message transmission rate is often lower.These two kinds of methods of reasonable use are accomplished matrix transpose jointly, can merge the advantage of two kinds of methods well.Simultaneously; Present realization matrix transposition; Mainly be after having confirmed matrix size and memory size; Reasonably select the transposition method again, come the realization matrix transposition so these matrix transpose controllers are matrixes to fixed size, and can not select the transposition algorithm to come the realization matrix transposition flexibly the matrix of multiple size.
(Network-on-chip NoC) is a kind of method for designing of SOC(system on a chip) to network-on-chip.Can adapt to well based on the system of NoC that normal in present complicated system-on-chip designs what use is the clock mechanism of Global Asynchronous local synchronization.The NoC method has been brought a kind of brand-new chip-on communication method, has significantly improved the performance of conventional bus formula system.It is considered to the inevitable direction of multi-core technology development under the following integrated technique.
Summary of the invention
The purpose of this invention is to provide a kind of matrix transpose memory controller, this matrix transpose memory controller adopts the method for software-hardware synergism, can the matrix of multiple size be selected the transposition method to carry out transposition flexibly, and high-speed data transmission speed can be provided.
The objective of the invention is to realize through following technical scheme:
A kind of matrix transpose memory controller is characterized in that: this matrix transpose memory controller comprises processor, bus, direct memory access (DMA), network interface (NI), storage unit (SRAM), interruption and program storage; Processor is through bus and direct memory access, network interface, and interruption, program storage carry out data transmission; Storage unit is connected with bus and network interface through direct memory access; Network interface is connected with bus, and is connected with external memory storage through network-on-chip.
Among the present invention, the transmission of controller data has two kinds of processes of data input and data output, and they are under the control of processor, to carry out; During the data input, under the control of processor, DMA accomplishes the configuration to the data address, and NI is under the control of processor, and from the external memory storage reading of data, the address according to DMA produces deposits data in SRAM; After data were read in SRAM, processor configuration DMA produced the address, and data are read from SRAM, accomplishes reading in transpose of a matrix, then data is exported through NI.
Matrix transpose divides three kinds of patterns by the relation decision of the storage size of matrix size and storage unit, is respectively:
1) SRAM can store the data that matrix is all; The transposition controller reads in data, and processor configuration DMA deposits matrix in SRAM according to line direction, then according to column direction, matrix is read, and accomplishes transpose of a matrix;
2) data of matrix are greater than the storage space of SRAM; The transposition controller is read full SRAM according to the row matrix data direction, and then column direction is pressed in the address of configuring external storer, and the data of reading in are outputed to the external memory storage corresponding address successively; Read original matrix next part data then to SRAM, and output in the external memory storage corresponding address; Repeat above process,, accomplish transpose of a matrix up to all data output of matrix;
3) data of matrix are much larger than the storage space of SRAM; Can be with the partitioning of matrix, the size of every block matrix just in time equals the storage size of SRAM; Divide and to be about to the data of block matrix and all to be read among the SRAM; Identical with second kind of transposed mode, the transposition controller is pressed column direction output SRAM data in external memory storage; Then read second block matrix, data are write SRAM by line direction, read, output in the external memory storage by column direction; Repeat above process,, accomplish transpose of a matrix up to all data output of matrix.
More than transpose of a matrix is had three kinds of algorithms, to the selection of transposition algorithm, accomplish by processor control DMA and to select.The main contents of processor configuration comprise: the matrix line number, and the matrix columns, the matrix transpose algorithm is selected.
Transposed mode 3) in, needs to confirm the line number H and the columns P of partitioned matrix.They are to set up the used time T 1 of data channel when being imported by data, set up data channel used time T 2 decisions with data output.It is H*T1+P*T2 that used T.T. of data channel is set up in matrix transpose; Because H*P fixes; Equal the storage space of SRAM, so utilize inequality
Figure 2011103936607100002DEST_PATH_IMAGE001
can calculate the value of H and P.
The present invention be a kind of use SRAM as storer, use the common transposition of realizing of microcontroller and PLD to store.Select SRAM as storer, control is simple, and read-write operation is easy, does not have the row read or write speed problem slower than capable read or write speed, under the control of processor, can give full play to the advantage of SRAM as storer.The present invention uses processor control to accomplish matrix transpose, so can select the transposition method flexibly, is applicable to that the matrix to all size carries out transposition, and has favorable expansibility.Simultaneously, the present invention uses DMA as data channel, and high-speed data transmission speed can be provided.
The present invention can be articulated on the network-on-chip router through network interface, and is as shown in Figure 1.Set up data channel through network-on-chip and external memory storage, carry out data transmission.The present invention is a kind of matrix transpose memory controller that is applicable to network-on-chip.
Description of drawings
Fig. 1 is that the present invention is articulated in the structural synoptic diagram of network-on-chip;
Fig. 2 is a structural representation of the present invention;
Fig. 3 is the DFD of transposition memory controller of the present invention;
Fig. 4 is transposition function realization figure;
Fig. 5 is transposition algorithm one schematic diagram;
Fig. 6 is transposition algorithm two schematic diagrams;
Fig. 7 is the data output process flow diagram of transposition algorithm two;
Fig. 8 is transposition algorithm three schematic diagrams.
Embodiment
Below in conjunction with accompanying drawing and embodiment the present invention is described further.
A kind of matrix transpose memory controller that is applicable to network-on-chip, as shown in Figure 2, this matrix transpose memory controller comprises processor 1, bus 2, direct memory access (DMA) 3, network interface (NI) 4, SRAM5 interrupts 6, program storage 7; This transposition controller adopts bus structure, and flush bonding processor 1 is through bus protocol and DMA 3, NI 4, interruption 6, and functional modules such as program storage 7 are carried out communication; Wherein SRAM 5 links to each other with other modules through DMA 3; NI 4 is when being connected to bus 2, and an end directly links to each other with DMA 3, and the other end is connected with external memory storage.This transposition memory controller uses ARM nuclear as processor, selects the ahb bus agreement for use.
Fig. 3 is the DFD of transposition memory controller of the present invention; The transmission of data has two kinds of processes of data input and data output, and they are under the control of processor, to carry out.In the time of the data input, under the control of ARM nuclear, DMA accomplishes the configuration to the data address, and the NI module is under the configuration control of ARM, and from the external memory storage reading of data, the address according to DMA produces deposits data in SRAM; Data are read in after the SRAM, and DMA is put in the ARM caryogamy, produce the address, and data are read from SRAM, accomplish reading in transpose of a matrix, then data are exported through NI.
Through introduction to data stream, can find out that transpose of a matrix mainly is, under the control of processor, DMA produces the address, realizes the control to the SRAM address, accomplishes transpose of a matrix, and Fig. 4 is this transposition function realization figure.During the data input, DMA produces the address, and data are deposited in the corresponding address of corresponding SRAM.In this process, the address of SRAM is that 0 beginning adds up successively from the address.In the time of data output, DMA produces the address by ARM control, and reading corresponding data from SRAM is accomplished transpose of a matrix.So the control of matrix transpose mainly is when data are exported, control by the address of DMA generation.
Matrix transpose divides three kinds of patterns, is that selecting the SRAM size is 64Mbit by the relation decision of the storage size of matrix size and SRAM, and each data is 64bit, and SRAM can store the 1M data like this, and the line number of matrix is C, and columns is R.Matrix transpose divides three kinds of patterns, by the matrix size decision, is respectively:
1) SRAM can store the data that matrix is all: a 4K*256=1M;
As shown in Figure 5, C=4K, R=256, a matrix can read in fully.After the transposition controller read in data, processor configuration DMA deposited matrix in SRAM according to line direction, at this time according to column direction, matrix is read, and has accomplished transpose of a matrix.
2) data of matrix are greater than the storage space of SRAM: 4K*1K=4M>1M, C=4K, R=1K; At this moment, the transposition controller needs repeatedly from the external memory storage reading of data, and repeatedly output could be accomplished transposition.As shown in Figure 6, how following mask body introduction accomplishes matrix transpose.
A, SRAM can store 1M data, therefore at first can only read a matrix 1M data in order.The matrix column number is R=1K, and reading full SRAM needs N line data, R*N=M.Promptly at first read in the N=1K line data to SRAM.
B, the transposition controller outputs to first column data among the SRAM in the external memory storage.The configuring external address stored is to begin storage, sequential storage N=1K data from A.Then, controller reconfigures to external memory address, and a secondary series N data are in external memory storage among the output SRAM.At this moment, the address of external memory storage will add that the original matrix columns begins from A, and promptly A+R adds one successively, stores this column data.
C, the data of back are also carried out above transmission, repeat R=1K time after, the transposition controller all outputs to the data among the SRAM in the external memory storage.The flow process of data output is as shown in Figure 7.
D, then, the transposition controller reads second group of data from original matrix according to line direction, just the N+1 of original matrix capable to 2N the data between capable, SRAM is write full.After data are read in, begin the above data output procedure of repetition, data are outputed to external memory storage.Wherein, when every column data stores external memory storage into, the address and then last time output data last data of this row begin.All export up to data.
E repeats above process, up to all data output of matrix, accomplishes transpose of a matrix.
3) data of matrix are much larger than the storage space of SRAM: 4K*8K>> 1M, C=4K, R=8K; When matrix was excessive, especially matrix column was counted R when excessive, and repeatedly access external memory is set up data channel, can consume great amount of time.So the transposition controller can be accomplished matrix transpose with the partitioning of matrix.
As shown in Figure 8, following mask body is introduced this transposition algorithm.
A, with the partitioning of matrix, the size of every block matrix just in time equals the storage size of SRAM, and the partitioned matrix line number is N, and columns is P, N*P=M.The computing method of the line number H of partitioned matrix and columns P can be introduced below.
B divides and to be about to the data of block matrix and to be read among the SRAM.The transposition controller at first read first the row from d (0,0) to d (0, the data of P P-1), and then read second the row d (1,0) to d (1,, the data of P P-1) ... Up to the N line data, write full SRAM.
C, identical with second kind of transposed mode, the transposition controller is pressed data among the column direction output SRAM in external memory storage.
D then reads second block matrix, begins from P+1 data of every row, up to 2P data.
First row be from d (0, P) to d (0,2P-1), second the row from d (1, P) to d (1,2P-1) ... Up to N capable from d (N-1, P) to d (N-1,2P-1).Repeat above process, data are write SRAM by line direction.Data output is also with top identical, and the every column data externally address in the storer is all accepted last group and carried out.
E repeats above process, up to all data output of matrix, accomplishes transpose of a matrix.
In the third transposition algorithm, need to calculate the line number and the columns of partitioned matrix.They are to set up the used time T of data channel when being imported by data 1, set up the used time T of data channel when exporting with data 2Decision.When partitioned matrix reads each row of data from external memory storage, all need set up data channel, also will set up data channel when exporting every column data, be H*T so used T.T. of data channel is set up in each partitioned matrix matrix transpose 1+ P*T 2Only H and P make H*T 1+ P*T 2Minimum separates.Because H*P=1M; Fix; So utilize inequality
Figure 8980DEST_PATH_IMAGE001
can calculate the value of H and P, can round and obtain only result.Get T 1=700 time cycles, T 2=400 time cycles are example, obtain H=1K, P=1K.
Three kinds of matrix transpose algorithms have below been introduced altogether.According to the size of matrix, correct choice transposition algorithm carries out transposition when selecting the transposition algorithm.To the selection of transposition algorithm, DMA accomplishes selection by processor control among the present invention.The main contents of processor configuration comprise: the matrix line number, and the matrix columns, and the matrix transpose algorithm is selected.DMA selects according to the algorithm of processor configuration, and matrix is provided correct transposition method.
Calculate under three kinds of situation the time that transposition is used below.
(1) 4K*256 matrix, the transposition time comprises single reading according to setting up data channel, one time write data is set up data channel, and the used time of inputoutput data, so the clock period that needs is:
N1=2M+700+400
(2) 4K*1K matrix, transposition controller need past SRAM inputoutput data eight times.Each process, the time of consumption comprises single reading according to setting up data channel, 1K time write data is set up data channel, and inputoutput data, so the clock period that needs is:
N2=(2M+700+400*1K)*4
(3) 4K*8K matrix, the transposition controller need be to SRAM inputoutput data 32 times.Each process, the time of consumption comprises that the 1K read data sets up data channel, 1K time write data is set up data channel, and inputoutput data, so the clock period of needs is:
N3=(2M+700*1K+400*1K)*32
The present invention is articulated on the network-on-chip router through network interface, sets up data channel through network-on-chip and external memory storage, carries out data transmission.
The present invention be a kind of use SRAM as storer, use the common transposition of realizing of microcontroller and PLD to store.Select SRAM as storer, control is simple, and read-write operation is easy, does not have the row read or write speed problem slower than capable read or write speed, under the control of processor, can give full play to the advantage of SRAM as storer.The present invention uses processor control to accomplish matrix transpose, so can select the transposition method flexibly, is applicable to that the matrix to all size carries out transposition, and has favorable expansibility.Simultaneously, the present invention uses DMA as data channel, and high-speed data transmission speed can be provided.

Claims (5)

1.一种矩阵转置存储控制器,其特征在于:该矩阵转置存储控制器包括处理器(1)、总线(2)、直接内存存取(3)、网络接口(4)、存储单元(5)、中断(6)和程序存储器(7);处理器(1)通过总线(2)与直接内存存取(3)、网络接口(4),中断(6)、程序存储器(7)进行数据传输;存储单元(5)通过直接内存存取(3)与总线(2)及网络接口(4)连接;网络接口(4)与总线(2)连接,并通过片上网络与外部存储器连接。 1. A matrix transpose storage controller, characterized in that: the matrix transpose storage controller comprises processor (1), bus (2), direct memory access (3), network interface (4), storage unit (5), interrupt (6) and program memory (7); processor (1) communicates with direct memory access (3), network interface (4), interrupt (6), program memory (7) through bus (2) Data transmission; the storage unit (5) is connected to the bus (2) and the network interface (4) through the direct memory access (3); the network interface (4) is connected to the bus (2), and is connected to the external memory through the network on chip . 2.根据权利要求1所述的矩阵转置存储控制器,其特征在于:处理器(1)对数据传输进行控制,数据传输包括数据输入和数据输出两种过程;数据输入时,在处理器(1)的控制下,直接内存存取(3)完成对数据地址的配置;网络接口(4)从外部存储器读取数据,并根据直接内存存取(3)产生的地址,将数据存入存储单元(5);数据读入数据存入存储单元(5)后,处理器(1)配置直接内存存取(3)产生地址,将数据从存储单元(5)读出,完成对读入矩阵的转置,并将数据通过网络接口(4)输出。 2. matrix transpose memory controller according to claim 1, it is characterized in that: processor (1) controls data transmission, and data transmission comprises two kinds of processes of data input and data output; During data input, in processor Under the control of (1), direct memory access (3) completes the configuration of the data address; the network interface (4) reads data from the external memory, and according to the address generated by direct memory access (3), stores the data in Storage unit (5); after the data is read into the storage unit (5), the processor (1) configures direct memory access (3) to generate an address, reads the data from the storage unit (5), and completes the read-in Transpose the matrix, and output the data through the network interface (4). 3.根据权利要求1所述的矩阵转置存储控制器,其特征在于:矩阵转置由矩阵大小与存储单元(5)的存储空间大小的关系决定,分三种模式,分别为: 3. matrix transposition storage controller according to claim 1, is characterized in that: matrix transposition is determined by the relation of the storage space size of matrix size and storage unit (5), divides three kinds of modes, is respectively: 1)存储单元(5)存储一个矩阵所有的数据;转置控制器将数据读入,处理器(1)配置直接内存存取(3),按照行方向将矩阵存入存储单元(5),然后按照列方向,将矩阵读出,完成矩阵的转置; 1) The storage unit (5) stores all the data of a matrix; the transpose controller reads the data in, the processor (1) configures direct memory access (3), and stores the matrix into the storage unit (5) according to the row direction, Then read out the matrix according to the column direction to complete the transposition of the matrix; 2)矩阵的数据大于存储单元(5)的存储空间;转置控制器按照矩阵行数据方向读满存储单元(5),接着配置外部存储器的地址,按列方向,将读入的数据依次输出到外部存储器对应的地址;然后读取原始矩阵下一部分数据到存储单元(5),并且输出到外部存储器对应的地址中;重复以上过程,直到矩阵所有的数据输出,完成矩阵的转置; 2) The data of the matrix is larger than the storage space of the storage unit (5); the transposition controller reads the full storage unit (5) according to the row data direction of the matrix, and then configures the address of the external memory, and outputs the read data in sequence according to the column direction To the address corresponding to the external memory; then read the next part of the original matrix data to the storage unit (5), and output to the address corresponding to the external memory; repeat the above process until all the data output of the matrix, complete the transposition of the matrix; 3)矩阵的数据远大于存储单元 (5)的存储空间;可以将矩阵分块,每块矩阵的大小正好等于存储单元(5)的存储空间大小;分行将块矩阵的数据全部读入到存储单元(5)中;与2)转置模式相同,转置控制器按列方向输出存储单元(5)数据到外部存储器中;接着读取第二块矩阵,将数据按行方向写入存储单元(5),按列方向读出,输出到外部存储器中;重复以上过程,直到矩阵所有的数据输出,完成矩阵的转置。 3) The data of the matrix is much larger than the storage space of the storage unit (5); the matrix can be divided into blocks, and the size of each matrix is exactly equal to the storage space of the storage unit (5); all the data of the block matrix are read into the storage by row In unit (5); the same as 2) transposition mode, the transposition controller outputs the data of storage unit (5) to the external memory in the column direction; then reads the second matrix and writes the data in the storage unit in the row direction (5), read out in the column direction, and output to the external memory; repeat the above process until all the data of the matrix are output, and the transposition of the matrix is completed. 4.根据权利要求3所述的矩阵转置存储控制器,其特征在于:处理器(1)控制直接内存存取(3)完成对转置算法的选择;处理器(1)配置的内容包括:矩阵行数,矩阵列数,矩阵转置算法选择。 4. The matrix transposition storage controller according to claim 3, characterized in that: the processor (1) controls the direct memory access (3) to complete the selection of the transposition algorithm; the content configured by the processor (1) includes : Matrix row number, matrix column number, matrix transpose algorithm selection. 5.根据权利要求3所述的矩阵转置存储控制器,其特征在于:在转置模式3)中,确定分块矩阵的行数H和列数P;分块矩阵的行数H和列数P是由数据输入时建立数据通道所用时间T1,与数据输出建立数据通道所用时间T2决定的;矩阵转置建立数据通道所用总时间为H*T1+P*T2,H*P是固定的,并等于存储单元SRAM(5)的存储空间,利用不等式                                               
Figure 2011103936607100001DEST_PATH_IMAGE002
得出H与P的值;并取整得到最合适的结果。
5. The matrix transposition storage controller according to claim 3, characterized in that: in the transposition mode 3), the number of rows H and the number of columns P of the block matrix are determined; the number of rows H and the columns of the block matrix The number P is determined by the time T 1 used to establish the data channel when the data is input, and the time T 2 used to establish the data channel when the data is output; the total time used for matrix transposition to establish the data channel is H*T 1 +P*T 2 , H* P is fixed and equal to the storage space of the storage unit SRAM(5), using the inequality
Figure 2011103936607100001DEST_PATH_IMAGE002
Get the values of H and P; and round to get the most suitable result.
CN2011103936607A 2011-12-02 2011-12-02 Matrix transposition memory controller Pending CN102508803A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103936607A CN102508803A (en) 2011-12-02 2011-12-02 Matrix transposition memory controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103936607A CN102508803A (en) 2011-12-02 2011-12-02 Matrix transposition memory controller

Publications (1)

Publication Number Publication Date
CN102508803A true CN102508803A (en) 2012-06-20

Family

ID=46220894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103936607A Pending CN102508803A (en) 2011-12-02 2011-12-02 Matrix transposition memory controller

Country Status (1)

Country Link
CN (1) CN102508803A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389967A (en) * 2013-08-21 2013-11-13 中国人民解放军国防科学技术大学 Device and method for matrix transposition based on static random access memory (SRAM)
CN103501419A (en) * 2013-10-24 2014-01-08 北京时代奥视数码技术有限公司 Method for realizing image transposition based on FPGA (Field Programmable Gata Array)
CN106933756A (en) * 2015-12-31 2017-07-07 北京国睿中数科技股份有限公司 For the quick transposition methods of DMA and device of variable matrix
WO2018192161A1 (en) * 2017-04-19 2018-10-25 上海寒武纪信息科技有限公司 Operation apparatus and method
CN109471612A (en) * 2018-09-18 2019-03-15 北京中科寒武纪科技有限公司 Arithmetic unit and method
CN111045965A (en) * 2019-10-25 2020-04-21 南京大学 Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method
US10671913B2 (en) 2017-04-06 2020-06-02 Shanghai Cambricon Information Technology Co., Ltd Computation device and method
US10896369B2 (en) 2017-04-06 2021-01-19 Cambricon Technologies Corporation Limited Power conversion in neural networks
CN113986200A (en) * 2021-10-29 2022-01-28 上海阵量智能科技有限公司 Matrix transposition circuit, artificial intelligence chip and electronic equipment
CN114282161A (en) * 2020-09-27 2022-04-05 中科寒武纪科技股份有限公司 Matrix conversion circuit, matrix conversion method, integrated circuit chip, computing device and board card
CN116910437A (en) * 2023-09-12 2023-10-20 腾讯科技(深圳)有限公司 Matrix transposition device, method, AI processor and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080208942A1 (en) * 2007-02-23 2008-08-28 Nara Won Parallel Architecture for Matrix Transposition
CN101706760A (en) * 2009-10-20 2010-05-12 北京龙芯中科技术服务中心有限公司 Matrix transposition automatic control circuit system and matrix transposition method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080208942A1 (en) * 2007-02-23 2008-08-28 Nara Won Parallel Architecture for Matrix Transposition
CN101706760A (en) * 2009-10-20 2010-05-12 北京龙芯中科技术服务中心有限公司 Matrix transposition automatic control circuit system and matrix transposition method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张兵等: "基于NUMA MPSoC 的FFT 并行化算法设计及实现", 《微电子学与计算机》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389967A (en) * 2013-08-21 2013-11-13 中国人民解放军国防科学技术大学 Device and method for matrix transposition based on static random access memory (SRAM)
CN103389967B (en) * 2013-08-21 2016-06-01 中国人民解放军国防科学技术大学 The device and method of a kind of matrix transposition based on SRAM
CN103501419A (en) * 2013-10-24 2014-01-08 北京时代奥视数码技术有限公司 Method for realizing image transposition based on FPGA (Field Programmable Gata Array)
CN106933756A (en) * 2015-12-31 2017-07-07 北京国睿中数科技股份有限公司 For the quick transposition methods of DMA and device of variable matrix
US11551067B2 (en) 2017-04-06 2023-01-10 Shanghai Cambricon Information Technology Co., Ltd Neural network processor and neural network computation method
US10896369B2 (en) 2017-04-06 2021-01-19 Cambricon Technologies Corporation Limited Power conversion in neural networks
US11049002B2 (en) 2017-04-06 2021-06-29 Shanghai Cambricon Information Technology Co., Ltd Neural network computation device and method
US11010338B2 (en) 2017-04-06 2021-05-18 Shanghai Cambricon Information Technology Co., Ltd Data screening device and method
US10671913B2 (en) 2017-04-06 2020-06-02 Shanghai Cambricon Information Technology Co., Ltd Computation device and method
CN108733625B (en) * 2017-04-19 2021-06-08 上海寒武纪信息科技有限公司 Computing device and method
CN108733625A (en) * 2017-04-19 2018-11-02 上海寒武纪信息科技有限公司 Arithmetic unit and method
WO2018192161A1 (en) * 2017-04-19 2018-10-25 上海寒武纪信息科技有限公司 Operation apparatus and method
CN109471612A (en) * 2018-09-18 2019-03-15 北京中科寒武纪科技有限公司 Arithmetic unit and method
CN111045965A (en) * 2019-10-25 2020-04-21 南京大学 Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method
CN114282161A (en) * 2020-09-27 2022-04-05 中科寒武纪科技股份有限公司 Matrix conversion circuit, matrix conversion method, integrated circuit chip, computing device and board card
CN114282161B (en) * 2020-09-27 2025-06-06 中科寒武纪科技股份有限公司 Matrix conversion circuit, method, integrated circuit chip, computing device and board
CN113986200A (en) * 2021-10-29 2022-01-28 上海阵量智能科技有限公司 Matrix transposition circuit, artificial intelligence chip and electronic equipment
CN116910437A (en) * 2023-09-12 2023-10-20 腾讯科技(深圳)有限公司 Matrix transposition device, method, AI processor and computer equipment
CN116910437B (en) * 2023-09-12 2023-12-12 腾讯科技(深圳)有限公司 Matrix transposition device, matrix transposition method, AI processor and computer equipment
WO2025055512A1 (en) * 2023-09-12 2025-03-20 腾讯科技(深圳)有限公司 Matrix transposition apparatus and method, ai processor, and computer device

Similar Documents

Publication Publication Date Title
CN102508803A (en) Matrix transposition memory controller
US10372653B2 (en) Apparatuses for providing data received by a state machine engine
CN107609644B (en) Method and system for data analysis in a state machine
CN107590085B (en) A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN100405343C (en) A kind of asynchronous data cache device
CN106875012A (en) A kind of streamlined acceleration system of the depth convolutional neural networks based on FPGA
CN102163141B (en) Addressing module structure for realizing digital signal processor
CN101625635B (en) Method, system and equipment for processing circular task
CN103218348A (en) Method and system for processing fast Fourier transform
CN111159094A (en) RISC-V based near data stream type calculation acceleration array
CN112189324B (en) Bandwidth matched scheduler
CN103714044A (en) Efficient matrix transposition cluster and transposition method based on network-on-chip
WO2013097223A1 (en) Multi-granularity parallel storage system and storage
CN110414672B (en) Convolution operation method, device and system
CN118627565B (en) A configurable convolution operation acceleration device and method based on systolic array
WO2013097228A1 (en) Multi-granularity parallel storage system
CN104795091A (en) System and method for realizing ZBT (zero bus turnaround) reading and writing timing sequence stability in FPGA (field programmable gate array)
CN103455367B (en) Management unit and method for implementing multi-task scheduling in reconfigurable systems
CN106569968B (en) For data transmission structure and dispatching method between the array of reconfigurable processor
CN110890120B (en) A general block chain application processing acceleration method and system based on resistive memory
CN115496190A (en) Efficient reconfigurable hardware accelerator for convolutional neural network training
CN102201817B (en) Low-power-consumption LDPC decoder based on optimization of memory folding architecture
CN117951427A (en) A fully connected Ising model reconfigurable processing circuit supporting multiple algorithms
Ryazanova et al. Development of multiprocessor system-on-chip based on soft processor cores schoolMIPS
CN112486904B (en) Register file design method and device for reconfigurable processing unit array

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120620