Summary of the invention
The purpose of this invention is to provide a kind of matrix transpose memory controller, this matrix transpose memory controller adopts the method for software-hardware synergism, can the matrix of multiple size be selected the transposition method to carry out transposition flexibly, and high-speed data transmission speed can be provided.
The objective of the invention is to realize through following technical scheme:
A kind of matrix transpose memory controller is characterized in that: this matrix transpose memory controller comprises processor, bus, direct memory access (DMA), network interface (NI), storage unit (SRAM), interruption and program storage; Processor is through bus and direct memory access, network interface, and interruption, program storage carry out data transmission; Storage unit is connected with bus and network interface through direct memory access; Network interface is connected with bus, and is connected with external memory storage through network-on-chip.
Among the present invention, the transmission of controller data has two kinds of processes of data input and data output, and they are under the control of processor, to carry out; During the data input, under the control of processor, DMA accomplishes the configuration to the data address, and NI is under the control of processor, and from the external memory storage reading of data, the address according to DMA produces deposits data in SRAM; After data were read in SRAM, processor configuration DMA produced the address, and data are read from SRAM, accomplishes reading in transpose of a matrix, then data is exported through NI.
Matrix transpose divides three kinds of patterns by the relation decision of the storage size of matrix size and storage unit, is respectively:
1) SRAM can store the data that matrix is all; The transposition controller reads in data, and processor configuration DMA deposits matrix in SRAM according to line direction, then according to column direction, matrix is read, and accomplishes transpose of a matrix;
2) data of matrix are greater than the storage space of SRAM; The transposition controller is read full SRAM according to the row matrix data direction, and then column direction is pressed in the address of configuring external storer, and the data of reading in are outputed to the external memory storage corresponding address successively; Read original matrix next part data then to SRAM, and output in the external memory storage corresponding address; Repeat above process,, accomplish transpose of a matrix up to all data output of matrix;
3) data of matrix are much larger than the storage space of SRAM; Can be with the partitioning of matrix, the size of every block matrix just in time equals the storage size of SRAM; Divide and to be about to the data of block matrix and all to be read among the SRAM; Identical with second kind of transposed mode, the transposition controller is pressed column direction output SRAM data in external memory storage; Then read second block matrix, data are write SRAM by line direction, read, output in the external memory storage by column direction; Repeat above process,, accomplish transpose of a matrix up to all data output of matrix.
More than transpose of a matrix is had three kinds of algorithms, to the selection of transposition algorithm, accomplish by processor control DMA and to select.The main contents of processor configuration comprise: the matrix line number, and the matrix columns, the matrix transpose algorithm is selected.
Transposed mode 3) in, needs to confirm the line number H and the columns P of partitioned matrix.They are to set up the used
time T 1 of data channel when being imported by data, set up data channel used
time T 2 decisions with data output.It is H*T1+P*T2 that used T.T. of data channel is set up in matrix transpose; Because H*P fixes; Equal the storage space of SRAM, so utilize inequality
can calculate the value of H and P.
The present invention be a kind of use SRAM as storer, use the common transposition of realizing of microcontroller and PLD to store.Select SRAM as storer, control is simple, and read-write operation is easy, does not have the row read or write speed problem slower than capable read or write speed, under the control of processor, can give full play to the advantage of SRAM as storer.The present invention uses processor control to accomplish matrix transpose, so can select the transposition method flexibly, is applicable to that the matrix to all size carries out transposition, and has favorable expansibility.Simultaneously, the present invention uses DMA as data channel, and high-speed data transmission speed can be provided.
The present invention can be articulated on the network-on-chip router through network interface, and is as shown in Figure 1.Set up data channel through network-on-chip and external memory storage, carry out data transmission.The present invention is a kind of matrix transpose memory controller that is applicable to network-on-chip.
Embodiment
Below in conjunction with accompanying drawing and embodiment the present invention is described further.
A kind of matrix transpose memory controller that is applicable to network-on-chip, as shown in Figure 2, this matrix transpose memory controller comprises processor 1, bus 2, direct memory access (DMA) 3, network interface (NI) 4, SRAM5 interrupts 6, program storage 7; This transposition controller adopts bus structure, and flush bonding processor 1 is through bus protocol and DMA 3, NI 4, interruption 6, and functional modules such as program storage 7 are carried out communication; Wherein SRAM 5 links to each other with other modules through DMA 3; NI 4 is when being connected to bus 2, and an end directly links to each other with DMA 3, and the other end is connected with external memory storage.This transposition memory controller uses ARM nuclear as processor, selects the ahb bus agreement for use.
Fig. 3 is the DFD of transposition memory controller of the present invention; The transmission of data has two kinds of processes of data input and data output, and they are under the control of processor, to carry out.In the time of the data input, under the control of ARM nuclear, DMA accomplishes the configuration to the data address, and the NI module is under the configuration control of ARM, and from the external memory storage reading of data, the address according to DMA produces deposits data in SRAM; Data are read in after the SRAM, and DMA is put in the ARM caryogamy, produce the address, and data are read from SRAM, accomplish reading in transpose of a matrix, then data are exported through NI.
Through introduction to data stream, can find out that transpose of a matrix mainly is, under the control of processor, DMA produces the address, realizes the control to the SRAM address, accomplishes transpose of a matrix, and Fig. 4 is this transposition function realization figure.During the data input, DMA produces the address, and data are deposited in the corresponding address of corresponding SRAM.In this process, the address of SRAM is that 0 beginning adds up successively from the address.In the time of data output, DMA produces the address by ARM control, and reading corresponding data from SRAM is accomplished transpose of a matrix.So the control of matrix transpose mainly is when data are exported, control by the address of DMA generation.
Matrix transpose divides three kinds of patterns, is that selecting the SRAM size is 64Mbit by the relation decision of the storage size of matrix size and SRAM, and each data is 64bit, and SRAM can store the 1M data like this, and the line number of matrix is C, and columns is R.Matrix transpose divides three kinds of patterns, by the matrix size decision, is respectively:
1) SRAM can store the data that matrix is all: a 4K*256=1M;
As shown in Figure 5, C=4K, R=256, a matrix can read in fully.After the transposition controller read in data, processor configuration DMA deposited matrix in SRAM according to line direction, at this time according to column direction, matrix is read, and has accomplished transpose of a matrix.
2) data of matrix are greater than the storage space of SRAM: 4K*1K=4M>1M, C=4K, R=1K; At this moment, the transposition controller needs repeatedly from the external memory storage reading of data, and repeatedly output could be accomplished transposition.As shown in Figure 6, how following mask body introduction accomplishes matrix transpose.
A, SRAM can store 1M data, therefore at first can only read a matrix 1M data in order.The matrix column number is R=1K, and reading full SRAM needs N line data, R*N=M.Promptly at first read in the N=1K line data to SRAM.
B, the transposition controller outputs to first column data among the SRAM in the external memory storage.The configuring external address stored is to begin storage, sequential storage N=1K data from A.Then, controller reconfigures to external memory address, and a secondary series N data are in external memory storage among the output SRAM.At this moment, the address of external memory storage will add that the original matrix columns begins from A, and promptly A+R adds one successively, stores this column data.
C, the data of back are also carried out above transmission, repeat R=1K time after, the transposition controller all outputs to the data among the SRAM in the external memory storage.The flow process of data output is as shown in Figure 7.
D, then, the transposition controller reads second group of data from original matrix according to line direction, just the N+1 of original matrix capable to 2N the data between capable, SRAM is write full.After data are read in, begin the above data output procedure of repetition, data are outputed to external memory storage.Wherein, when every column data stores external memory storage into, the address and then last time output data last data of this row begin.All export up to data.
E repeats above process, up to all data output of matrix, accomplishes transpose of a matrix.
3) data of matrix are much larger than the storage space of SRAM: 4K*8K>> 1M, C=4K, R=8K; When matrix was excessive, especially matrix column was counted R when excessive, and repeatedly access external memory is set up data channel, can consume great amount of time.So the transposition controller can be accomplished matrix transpose with the partitioning of matrix.
As shown in Figure 8, following mask body is introduced this transposition algorithm.
A, with the partitioning of matrix, the size of every block matrix just in time equals the storage size of SRAM, and the partitioned matrix line number is N, and columns is P, N*P=M.The computing method of the line number H of partitioned matrix and columns P can be introduced below.
B divides and to be about to the data of block matrix and to be read among the SRAM.The transposition controller at first read first the row from d (0,0) to d (0, the data of P P-1), and then read second the row d (1,0) to d (1,, the data of P P-1) ... Up to the N line data, write full SRAM.
C, identical with second kind of transposed mode, the transposition controller is pressed data among the column direction output SRAM in external memory storage.
D then reads second block matrix, begins from P+1 data of every row, up to 2P data.
First row be from d (0, P) to d (0,2P-1), second the row from d (1, P) to d (1,2P-1) ... Up to N capable from d (N-1, P) to d (N-1,2P-1).Repeat above process, data are write SRAM by line direction.Data output is also with top identical, and the every column data externally address in the storer is all accepted last group and carried out.
E repeats above process, up to all data output of matrix, accomplishes transpose of a matrix.
In the third transposition algorithm, need to calculate the line number and the columns of partitioned matrix.They are to set up the used time T of data channel when being imported by data
1, set up the used time T of data channel when exporting with data
2Decision.When partitioned matrix reads each row of data from external memory storage, all need set up data channel, also will set up data channel when exporting every column data, be H*T so used T.T. of data channel is set up in each partitioned matrix matrix transpose
1+ P*T
2Only H and P make H*T
1+ P*T
2Minimum separates.Because H*P=1M; Fix; So utilize inequality
can calculate the value of H and P, can round and obtain only result.Get T
1=700 time cycles, T
2=400 time cycles are example, obtain H=1K, P=1K.
Three kinds of matrix transpose algorithms have below been introduced altogether.According to the size of matrix, correct choice transposition algorithm carries out transposition when selecting the transposition algorithm.To the selection of transposition algorithm, DMA accomplishes selection by processor control among the present invention.The main contents of processor configuration comprise: the matrix line number, and the matrix columns, and the matrix transpose algorithm is selected.DMA selects according to the algorithm of processor configuration, and matrix is provided correct transposition method.
Calculate under three kinds of situation the time that transposition is used below.
(1) 4K*256 matrix, the transposition time comprises single reading according to setting up data channel, one time write data is set up data channel, and the used time of inputoutput data, so the clock period that needs is:
N1=2M+700+400
(2) 4K*1K matrix, transposition controller need past SRAM inputoutput data eight times.Each process, the time of consumption comprises single reading according to setting up data channel, 1K time write data is set up data channel, and inputoutput data, so the clock period that needs is:
N2=(2M+700+400*1K)*4
(3) 4K*8K matrix, the transposition controller need be to SRAM inputoutput data 32 times.Each process, the time of consumption comprises that the 1K read data sets up data channel, 1K time write data is set up data channel, and inputoutput data, so the clock period of needs is:
N3=(2M+700*1K+400*1K)*32
The present invention is articulated on the network-on-chip router through network interface, sets up data channel through network-on-chip and external memory storage, carries out data transmission.
The present invention be a kind of use SRAM as storer, use the common transposition of realizing of microcontroller and PLD to store.Select SRAM as storer, control is simple, and read-write operation is easy, does not have the row read or write speed problem slower than capable read or write speed, under the control of processor, can give full play to the advantage of SRAM as storer.The present invention uses processor control to accomplish matrix transpose, so can select the transposition method flexibly, is applicable to that the matrix to all size carries out transposition, and has favorable expansibility.Simultaneously, the present invention uses DMA as data channel, and high-speed data transmission speed can be provided.