CN102508803A

CN102508803A - Matrix transposition memory controller

Info

Publication number: CN102508803A
Application number: CN2011103936607A
Authority: CN
Inventors: 李丽; 潘红兵; 郑艳丽; 王佳文; 沙金; 何书专; 郑维山
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2011-12-02
Filing date: 2011-12-02
Publication date: 2012-06-20

Abstract

The invention discloses a matrix transposition storage controller, which comprises a processor, a bus, direct memory access, a network interface, a storage unit, an interrupt and a program memory; the processor communicates with the direct memory through the bus Fetch, network interface, interrupt, and program memory for data transmission; the storage unit is connected to the bus and network interface through direct memory access; the network interface is connected to the bus, and connected to the external memory through the on-chip network. The invention selects SRAM as the memory, which is easy to control and easy to read and write. Under the control of the processor, the advantages of the SRAM as the memory can be fully utilized. The invention provides three transposition modes; the matrix transposition is completed through the control of a processor, the transposition method can be flexibly selected, it is suitable for transposing matrices of various sizes, and has good expansibility. At the same time, the present invention uses DMA as a data channel, which can provide high-speed data transmission rate. The present invention is suitable for on-chip network.

Description

A kind of matrix transpose memory controller

Technical field

The present invention relates to a kind of transposition memory controller that is applied to matrix transpose, especially a kind of matrix that is fit to multiple size adopts the method for software-hardware synergism design can select the transposition algorithm to carry out the transposition memory controller of matrix transpose flexibly; Specifically, be a kind of matrix transpose memory controller that is applicable to network-on-chip.

Background technology

The matrix data structure is used various fields such as engineering calculation, image and signal Processing always, usually relates to this data handling procedure of matrix transpose.And the speed of matrix transpose has very big influence to the data treatment effeciency.

For extensive matrix transpose, if directly matrix is read in the high-speed cache, must frequent access external memory storage, so greatly reduction transposition efficient.Accomplish matrix transpose so generally adopt hardware.

When using SDRAM to make storer, because the row of SDRAM read-write speed is very low, this causes each access memory all to consume great amount of time.And use the integrated small scale of SRAM, price high.Correct choice storer and to make them can accomplish matrix transpose efficiently be a common problem of matrix transpose how.

At present, the realization matrix transposition mainly uses PLD control store unit to realize the transposition storage, perhaps adopts microprocessor controls to realize the transposition storage; The former realizes that transposition control is comparatively complicated, and latter's message transmission rate is often lower.These two kinds of methods of reasonable use are accomplished matrix transpose jointly, can merge the advantage of two kinds of methods well.Simultaneously; Present realization matrix transposition; Mainly be after having confirmed matrix size and memory size; Reasonably select the transposition method again, come the realization matrix transposition so these matrix transpose controllers are matrixes to fixed size, and can not select the transposition algorithm to come the realization matrix transposition flexibly the matrix of multiple size.

(Network-on-chip NoC) is a kind of method for designing of SOC(system on a chip) to network-on-chip.Can adapt to well based on the system of NoC that normal in present complicated system-on-chip designs what use is the clock mechanism of Global Asynchronous local synchronization.The NoC method has been brought a kind of brand-new chip-on communication method, has significantly improved the performance of conventional bus formula system.It is considered to the inevitable direction of multi-core technology development under the following integrated technique.

Summary of the invention

The purpose of this invention is to provide a kind of matrix transpose memory controller, this matrix transpose memory controller adopts the method for software-hardware synergism, can the matrix of multiple size be selected the transposition method to carry out transposition flexibly, and high-speed data transmission speed can be provided.

The objective of the invention is to realize through following technical scheme:

A kind of matrix transpose memory controller is characterized in that: this matrix transpose memory controller comprises processor, bus, direct memory access (DMA), network interface (NI), storage unit (SRAM), interruption and program storage; Processor is through bus and direct memory access, network interface, and interruption, program storage carry out data transmission; Storage unit is connected with bus and network interface through direct memory access; Network interface is connected with bus, and is connected with external memory storage through network-on-chip.

Among the present invention, the transmission of controller data has two kinds of processes of data input and data output, and they are under the control of processor, to carry out; During the data input, under the control of processor, DMA accomplishes the configuration to the data address, and NI is under the control of processor, and from the external memory storage reading of data, the address according to DMA produces deposits data in SRAM; After data were read in SRAM, processor configuration DMA produced the address, and data are read from SRAM, accomplishes reading in transpose of a matrix, then data is exported through NI.

Matrix transpose divides three kinds of patterns by the relation decision of the storage size of matrix size and storage unit, is respectively:

1) SRAM can store the data that matrix is all; The transposition controller reads in data, and processor configuration DMA deposits matrix in SRAM according to line direction, then according to column direction, matrix is read, and accomplishes transpose of a matrix;

2) data of matrix are greater than the storage space of SRAM; The transposition controller is read full SRAM according to the row matrix data direction, and then column direction is pressed in the address of configuring external storer, and the data of reading in are outputed to the external memory storage corresponding address successively; Read original matrix next part data then to SRAM, and output in the external memory storage corresponding address; Repeat above process,, accomplish transpose of a matrix up to all data output of matrix;

3) data of matrix are much larger than the storage space of SRAM; Can be with the partitioning of matrix, the size of every block matrix just in time equals the storage size of SRAM; Divide and to be about to the data of block matrix and all to be read among the SRAM; Identical with second kind of transposed mode, the transposition controller is pressed column direction output SRAM data in external memory storage; Then read second block matrix, data are write SRAM by line direction, read, output in the external memory storage by column direction; Repeat above process,, accomplish transpose of a matrix up to all data output of matrix.

More than transpose of a matrix is had three kinds of algorithms, to the selection of transposition algorithm, accomplish by processor control DMA and to select.The main contents of processor configuration comprise: the matrix line number, and the matrix columns, the matrix transpose algorithm is selected.

Transposed mode 3) in, needs to confirm the line number H and the columns P of partitioned matrix.They are to set up the used time T 1 of data channel when being imported by data, set up data channel used time T 2 decisions with data output.It is H*T1+P*T2 that used T.T. of data channel is set up in matrix transpose; Because H*P fixes; Equal the storage space of SRAM, so utilize inequality

Figure 2011103936607100002DEST_PATH_IMAGE001

can calculate the value of H and P.

The present invention be a kind of use SRAM as storer, use the common transposition of realizing of microcontroller and PLD to store.Select SRAM as storer, control is simple, and read-write operation is easy, does not have the row read or write speed problem slower than capable read or write speed, under the control of processor, can give full play to the advantage of SRAM as storer.The present invention uses processor control to accomplish matrix transpose, so can select the transposition method flexibly, is applicable to that the matrix to all size carries out transposition, and has favorable expansibility.Simultaneously, the present invention uses DMA as data channel, and high-speed data transmission speed can be provided.

The present invention can be articulated on the network-on-chip router through network interface, and is as shown in Figure 1.Set up data channel through network-on-chip and external memory storage, carry out data transmission.The present invention is a kind of matrix transpose memory controller that is applicable to network-on-chip.

Description of drawings

Fig. 1 is that the present invention is articulated in the structural synoptic diagram of network-on-chip;

Fig. 2 is a structural representation of the present invention;

Fig. 3 is the DFD of transposition memory controller of the present invention;

Fig. 4 is transposition function realization figure;

Fig. 5 is transposition algorithm one schematic diagram;

Fig. 6 is transposition algorithm two schematic diagrams;

Fig. 7 is the data output process flow diagram of transposition algorithm two;

Fig. 8 is transposition algorithm three schematic diagrams.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is described further.

A kind of matrix transpose memory controller that is applicable to network-on-chip, as shown in Figure 2, this matrix transpose memory controller comprises processor 1, bus 2, direct memory access (DMA) 3, network interface (NI) 4, SRAM5 interrupts 6, program storage 7; This transposition controller adopts bus structure, and flush bonding processor 1 is through bus protocol and DMA 3, NI 4, interruption 6, and functional modules such as program storage 7 are carried out communication; Wherein SRAM 5 links to each other with other modules through DMA 3; NI 4 is when being connected to bus 2, and an end directly links to each other with DMA 3, and the other end is connected with external memory storage.This transposition memory controller uses ARM nuclear as processor, selects the ahb bus agreement for use.

Fig. 3 is the DFD of transposition memory controller of the present invention; The transmission of data has two kinds of processes of data input and data output, and they are under the control of processor, to carry out.In the time of the data input, under the control of ARM nuclear, DMA accomplishes the configuration to the data address, and the NI module is under the configuration control of ARM, and from the external memory storage reading of data, the address according to DMA produces deposits data in SRAM; Data are read in after the SRAM, and DMA is put in the ARM caryogamy, produce the address, and data are read from SRAM, accomplish reading in transpose of a matrix, then data are exported through NI.

Through introduction to data stream, can find out that transpose of a matrix mainly is, under the control of processor, DMA produces the address, realizes the control to the SRAM address, accomplishes transpose of a matrix, and Fig. 4 is this transposition function realization figure.During the data input, DMA produces the address, and data are deposited in the corresponding address of corresponding SRAM.In this process, the address of SRAM is that 0 beginning adds up successively from the address.In the time of data output, DMA produces the address by ARM control, and reading corresponding data from SRAM is accomplished transpose of a matrix.So the control of matrix transpose mainly is when data are exported, control by the address of DMA generation.

Matrix transpose divides three kinds of patterns, is that selecting the SRAM size is 64Mbit by the relation decision of the storage size of matrix size and SRAM, and each data is 64bit, and SRAM can store the 1M data like this, and the line number of matrix is C, and columns is R.Matrix transpose divides three kinds of patterns, by the matrix size decision, is respectively:

1) SRAM can store the data that matrix is all: a 4K*256=1M;

As shown in Figure 5, C=4K, R=256, a matrix can read in fully.After the transposition controller read in data, processor configuration DMA deposited matrix in SRAM according to line direction, at this time according to column direction, matrix is read, and has accomplished transpose of a matrix.

2) data of matrix are greater than the storage space of SRAM: 4K*1K=4M>1M, C=4K, R=1K; At this moment, the transposition controller needs repeatedly from the external memory storage reading of data, and repeatedly output could be accomplished transposition.As shown in Figure 6, how following mask body introduction accomplishes matrix transpose.

A, SRAM can store 1M data, therefore at first can only read a matrix 1M data in order.The matrix column number is R=1K, and reading full SRAM needs N line data, R*N=M.Promptly at first read in the N=1K line data to SRAM.

B, the transposition controller outputs to first column data among the SRAM in the external memory storage.The configuring external address stored is to begin storage, sequential storage N=1K data from A.Then, controller reconfigures to external memory address, and a secondary series N data are in external memory storage among the output SRAM.At this moment, the address of external memory storage will add that the original matrix columns begins from A, and promptly A+R adds one successively, stores this column data.

C, the data of back are also carried out above transmission, repeat R=1K time after, the transposition controller all outputs to the data among the SRAM in the external memory storage.The flow process of data output is as shown in Figure 7.

D, then, the transposition controller reads second group of data from original matrix according to line direction, just the N+1 of original matrix capable to 2N the data between capable, SRAM is write full.After data are read in, begin the above data output procedure of repetition, data are outputed to external memory storage.Wherein, when every column data stores external memory storage into, the address and then last time output data last data of this row begin.All export up to data.

E repeats above process, up to all data output of matrix, accomplishes transpose of a matrix.

3) data of matrix are much larger than the storage space of SRAM: 4K*8K>> 1M, C=4K, R=8K; When matrix was excessive, especially matrix column was counted R when excessive, and repeatedly access external memory is set up data channel, can consume great amount of time.So the transposition controller can be accomplished matrix transpose with the partitioning of matrix.

As shown in Figure 8, following mask body is introduced this transposition algorithm.

A, with the partitioning of matrix, the size of every block matrix just in time equals the storage size of SRAM, and the partitioned matrix line number is N, and columns is P, N*P=M.The computing method of the line number H of partitioned matrix and columns P can be introduced below.

B divides and to be about to the data of block matrix and to be read among the SRAM.The transposition controller at first read first the row from d (0,0) to d (0, the data of P P-1), and then read second the row d (1,0) to d (1,, the data of P P-1) ... Up to the N line data, write full SRAM.

C, identical with second kind of transposed mode, the transposition controller is pressed data among the column direction output SRAM in external memory storage.

D then reads second block matrix, begins from P+1 data of every row, up to 2P data.

First row be from d (0, P) to d (0,2P-1), second the row from d (1, P) to d (1,2P-1) ... Up to N capable from d (N-1, P) to d (N-1,2P-1).Repeat above process, data are write SRAM by line direction.Data output is also with top identical, and the every column data externally address in the storer is all accepted last group and carried out.

In the third transposition algorithm, need to calculate the line number and the columns of partitioned matrix.They are to set up the used time T of data channel when being imported by data ₁, set up the used time T of data channel when exporting with data ₂Decision.When partitioned matrix reads each row of data from external memory storage, all need set up data channel, also will set up data channel when exporting every column data, be H*T so used T.T. of data channel is set up in each partitioned matrix matrix transpose ₁+ P*T ₂Only H and P make H*T ₁+ P*T ₂Minimum separates.Because H*P=1M; Fix; So utilize inequality

can calculate the value of H and P, can round and obtain only result.Get T ₁=700 time cycles, T ₂=400 time cycles are example, obtain H=1K, P=1K.

Three kinds of matrix transpose algorithms have below been introduced altogether.According to the size of matrix, correct choice transposition algorithm carries out transposition when selecting the transposition algorithm.To the selection of transposition algorithm, DMA accomplishes selection by processor control among the present invention.The main contents of processor configuration comprise: the matrix line number, and the matrix columns, and the matrix transpose algorithm is selected.DMA selects according to the algorithm of processor configuration, and matrix is provided correct transposition method.

Calculate under three kinds of situation the time that transposition is used below.

(1) 4K*256 matrix, the transposition time comprises single reading according to setting up data channel, one time write data is set up data channel, and the used time of inputoutput data, so the clock period that needs is:

N1=2M+700+400

(2) 4K*1K matrix, transposition controller need past SRAM inputoutput data eight times.Each process, the time of consumption comprises single reading according to setting up data channel, 1K time write data is set up data channel, and inputoutput data, so the clock period that needs is:

N2=(2M+700+400*1K)*4

(3) 4K*8K matrix, the transposition controller need be to SRAM inputoutput data 32 times.Each process, the time of consumption comprises that the 1K read data sets up data channel, 1K time write data is set up data channel, and inputoutput data, so the clock period of needs is:

N3=(2M+700*1K+400*1K)*32

The present invention is articulated on the network-on-chip router through network interface, sets up data channel through network-on-chip and external memory storage, carries out data transmission.

Claims

1. A matrix transpose storage controller, characterized in that: the matrix transpose storage controller comprises processor (1), bus (2), direct memory access (3), network interface (4), storage unit (5), interrupt (6) and program memory (7); processor (1) communicates with direct memory access (3), network interface (4), interrupt (6), program memory (7) through bus (2) Data transmission; the storage unit (5) is connected to the bus (2) and the network interface (4) through the direct memory access (3); the network interface (4) is connected to the bus (2), and is connected to the external memory through the network on chip .

2. matrix transpose memory controller according to claim 1, it is characterized in that: processor (1) controls data transmission, and data transmission comprises two kinds of processes of data input and data output; During data input, in processor Under the control of (1), direct memory access (3) completes the configuration of the data address; the network interface (4) reads data from the external memory, and according to the address generated by direct memory access (3), stores the data in Storage unit (5); after the data is read into the storage unit (5), the processor (1) configures direct memory access (3) to generate an address, reads the data from the storage unit (5), and completes the read-in Transpose the matrix, and output the data through the network interface (4).

3. matrix transposition storage controller according to claim 1, is characterized in that: matrix transposition is determined by the relation of the storage space size of matrix size and storage unit (5), divides three kinds of modes, is respectively:

1) The storage unit (5) stores all the data of a matrix; the transpose controller reads the data in, the processor (1) configures direct memory access (3), and stores the matrix into the storage unit (5) according to the row direction, Then read out the matrix according to the column direction to complete the transposition of the matrix;

2) The data of the matrix is larger than the storage space of the storage unit (5); the transposition controller reads the full storage unit (5) according to the row data direction of the matrix, and then configures the address of the external memory, and outputs the read data in sequence according to the column direction To the address corresponding to the external memory; then read the next part of the original matrix data to the storage unit (5), and output to the address corresponding to the external memory; repeat the above process until all the data output of the matrix, complete the transposition of the matrix;

3) The data of the matrix is much larger than the storage space of the storage unit (5); the matrix can be divided into blocks, and the size of each matrix is exactly equal to the storage space of the storage unit (5); all the data of the block matrix are read into the storage by row In unit (5); the same as 2) transposition mode, the transposition controller outputs the data of storage unit (5) to the external memory in the column direction; then reads the second matrix and writes the data in the storage unit in the row direction (5), read out in the column direction, and output to the external memory; repeat the above process until all the data of the matrix are output, and the transposition of the matrix is completed.

4. The matrix transposition storage controller according to claim 3, characterized in that: the processor (1) controls the direct memory access (3) to complete the selection of the transposition algorithm; the content configured by the processor (1) includes : Matrix row number, matrix column number, matrix transpose algorithm selection.

5. The matrix transposition storage controller according to claim 3, characterized in that: in the transposition mode 3), the number of rows H and the number of columns P of the block matrix are determined; the number of rows H and the columns of the block matrix The number P is determined by the time T ₁ used to establish the data channel when the data is input, and the time T ₂ used to establish the data channel when the data is output; the total time used for matrix transposition to establish the data channel is H*T ₁ +P*T ₂ , H* P is fixed and equal to the storage space of the storage unit SRAM(5), using the inequality

Figure 2011103936607100001DEST_PATH_IMAGE002

Get the values of H and P; and round to get the most suitable result.