[go: up one dir, main page]

CN109727187B - Method and device for adjusting storage position of multiple region of interest data - Google Patents

Method and device for adjusting storage position of multiple region of interest data Download PDF

Info

Publication number
CN109727187B
CN109727187B CN201910005972.2A CN201910005972A CN109727187B CN 109727187 B CN109727187 B CN 109727187B CN 201910005972 A CN201910005972 A CN 201910005972A CN 109727187 B CN109727187 B CN 109727187B
Authority
CN
China
Prior art keywords
data
roi
address
region
local address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910005972.2A
Other languages
Chinese (zh)
Other versions
CN109727187A (en
Inventor
李军
李建军
张鑫语
李晓森
黄畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910005972.2A priority Critical patent/CN109727187B/en
Publication of CN109727187A publication Critical patent/CN109727187A/en
Application granted granted Critical
Publication of CN109727187B publication Critical patent/CN109727187B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A method and apparatus for adjusting storage locations of a plurality of region of interest data is disclosed. The method may include: determining a plurality of local address ranges corresponding to the address ranges of the storage areas storing the plurality of region of interest data to be adjusted according to the scores of the plurality of region of interest data to be adjusted; determining, for first data that is region of interest data to be currently adjusted, a first address within a first local address range of a plurality of local address ranges to be adjusted; storing second data at the first address from the storage area into the buffer memory; and storing the first data at a first address in the storage area. By the method and the device according to the embodiment of the disclosure, the in-situ adjustment of the storage positions of the plurality of region-of-interest data can be realized efficiently.

Description

Method and device for adjusting storage position of multiple region of interest data
Technical Field
The present disclosure relates generally to the field of artificial intelligence, and in particular to a method and apparatus for adjusting storage locations of a plurality of region of interest data.
Background
One or more objects of interest in the input image or video may be detected using methods such as RCNN (Regions with CNN features), spatial pyramid pooling network (Spatial Pyramid Pooling Network, SPP-Net), fast RCNN (Fast RCNN), faster RCNN (Faster RCNN), etc., which require that multiple regions of interest (Region of Interest, ROIs) be generated (or recommended) first and then subsequent processing be performed based on the generated ROI data.
In the ROI-based object detection method, it is necessary to readjust the storage position of each generated ROI data according to the score of the ROI.
The usual space-time complexity for controlling the process of adjusting the position of the ROI data is high and is not suitable for efficient implementation on artificial intelligence chips with limited hardware resources.
Disclosure of Invention
According to one aspect of the present disclosure, a method and apparatus for adjusting storage locations of a plurality of region of interest data is provided. The method may include: determining a plurality of local address ranges corresponding to the address ranges of the storage areas storing the plurality of region of interest data to be adjusted according to the scores of the plurality of region of interest data to be adjusted; determining, for first data that is region of interest data to be currently adjusted, a first address within a first local address range of a plurality of local address ranges to be adjusted; storing second data at the first address from the storage area into the buffer memory; and storing the first data at a first address in the storage area.
According to another aspect of the present disclosure, there is also provided an apparatus for adjusting storage locations of a plurality of region of interest data. The apparatus may comprise a buffer memory and a processor, wherein the buffer memory may be configured to store the region of interest data to be currently adjusted, and the processor may be configured to perform at least the above method at start-up.
According to another aspect of the present disclosure, there is also provided a computer readable non-transitory storage medium having stored thereon program instructions that, when executed, perform at least the above-described method.
By the method and the device according to the embodiment of the disclosure, the in-situ adjustment of the positions of the plurality of region-of-interest data can be realized with high efficiency.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing embodiments thereof in more detail with reference to the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 illustrates an example of a storage region SR storing N pieces of ROI data according to an embodiment of the present disclosure.
Fig. 2 illustrates an example of a method for adjusting a storage location of ROI data according to an embodiment of the present disclosure.
Fig. 3 shows an example of a procedure in performing the example method of fig. 2 for N ROI data as shown in fig. 1.
Fig. 4 shows an example of performing step S130 and step S140 in the example method of fig. 2.
Fig. 5 shows an example of a procedure in performing the example method of fig. 2 for N ROI data as shown in fig. 1.
Fig. 6 illustrates an example of an apparatus for adjusting a storage location of ROI data according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
SUMMARY
In multi-stage object detection methods such as RCNN, SPP-Net, fast RCNN, and Fast RCNN, which are excellent in detection effect, it is generally necessary to sort a large number (for example, thousands, tens of thousands, or even hundreds of thousands) of ROIs generated, that is, adjust the storage positions of ROI data in a memory.
In a typical scheme for controlling the adjustment of the storage location of ROI data, it is generally desirable that the memory be configured to provide a storage space that is greater than or equal to twice the total data amount of ROI data, wherein the memory may include the aforementioned buffer memory and/or memory for storing the initial ROI data and the final ordering result.
For artificial intelligence chips, only low capacity on-chip memory is typically configured due to various considerations and/or limitations of hardware cost and chip space, e.g., static memory or cache memory, etc., which may range in total from a few hundred kilobytes to a few megabytes. Thus, artificial intelligence chips cannot independently order ROI data in amounts up to hundreds of thousands or tens of thousands without data exchange with off-chip memory.
Even if only thousands of ROI data are ordered, in an artificial intelligence chip, more on-chip memory resources will be occupied and even more on-chip memory will need to be provided, since it is necessary to allocate in on-chip memory, for example, a memory area having at least the same capacity as a memory area storing all ROI data, which will increase the hardware cost of the artificial intelligence chip.
In addition, an artificial intelligence chip may be fitted into a terminal device such as a cellular phone, a navigator, or the like, in order to detect one or more objects of interest in an image or video captured, for example, via a camera or the like on the terminal device. In such a case, the artificial intelligence chip may be controlled to store the generated ROI data into an off-chip memory provided inside the terminal device and outside the artificial intelligence chip, and even the ordering of the ROI data stored in the off-chip memory may be controlled to be performed by a processor in the terminal device.
However, terminal devices such as cell phones, navigators, etc. often also have high demands on processing efficiency, power consumption, hardware costs, etc. The higher spatial complexity of a typical ROI ordering scheme (e.g., requiring a buffer area of the same size as the storage area in memory storing all ROI data) will result in higher hardware costs for the terminal device, e.g., requiring more memory to be configured.
Therefore, it is always desirable to be able to reduce the hardware cost of an artificial intelligence chip or a terminal device equipped with the artificial intelligence chip as much as possible while ensuring the efficiency of execution and the accuracy and precision of the result.
The method and apparatus for adjusting a storage location of ROI data according to embodiments of the present disclosure aim to at least partially solve or alleviate one or more of the above technical problems, so that an artificial intelligence chip or an artificial intelligence chip-equipped terminal device can efficiently implement in-place adjustment of a storage location of ROI data without having to configure a cache region whose capacity may need to be greater than or equal to a storage region in a memory for storing all ROI data.
Exemplary method
Fig. 1 shows N ROI data ROIs that have been generated and stored consecutively in a storage region SR of a memory 1 、ROI 2 、……、ROI N Where N is an integer greater than 1, such as hundreds of thousands, tens of thousands or thousands, and the ROI 1 、ROI 2 、……、ROI N Addresses ADDR stored in the storage areas SR, respectively 1 、ADDR 2 、……、ADDR N Where it is located. In other words, N ROI data ROIs 1 、ROI 2 、……、ROI N Continuously stored in address space ADDR corresponding to continuity 1 To ADDR N (also referred to herein as a "full address range") in the storage region SR.
Here, since each ROI data may generally include a plurality of data items (e.g., coordinates, width, height, score, etc. of the ROI), the address ADDR in the region SR is stored 1 、ADDR 2 、……、ADDR N Corresponding to the first address of the unit memory area for storing each ROI data in the memory area SR, instead of the first address of each physical memory unit of the memory area SR. Herein, for simplicity, it is assumed that each ROI data occupies only one memory cell of the memory region SR, or only one memory line or cache line or buffer of the memory region SR in the case where the memory region SR is a two-dimensional memory region determined according to a two-dimensional memory.
The storage positions of the respective ROI data in the storage region SR need to be adjusted according to the scores of the respective ROI data so that, in one order, for the respective storage in ADDR after adjustment i And ADDR i+1 Is of the ROI of (2) i And ROI i+1 (i is any integer greater than or equal to 1 and less than N), ROI i Score of less than or equal to ROI i+1 For the scores of (a) or, in another order, for the scores stored separately in ADDR after adjustment i And ADDR i+1 Is of the ROI of (2) i And ROI i+1 ,ROI i Score of greater than or equal to ROI i+1 Is a score of (2).
FIG. 2 illustrates a method for adjusting ROI in accordance with an embodiment of the present disclosure 1 、ROI 2 、……、ROI N An example of a method of storing a location in an SR, the example method may include steps S110, S120, S130, and S140.
In step S110, the ROI may be followed 1 、ROI 2 、……、ROI N Score for each ROI in the address book, determining address range ADDR with full 1 To ADDR N Corresponding multiple local address ranges LS 1 To LS K Wherein K is an integer greater than 1, such as 256, and LS 1 To LS K The size of each local address range (or the number of consecutive addresses included in each local address range, or the storage capacity corresponding to each local address range) may depend on the ROI 1 、ROI 2 、……、ROI N The number of ROI data having a score corresponding to the local address range.
Then, the ROI can be followed 1 、ROI 2 、……、ROI N Any of the data (e.g., ROI 1 ) Initially, for a region of interest 1 、ROI 2 、……、ROI N Currently to be adjusted data ROI in and currently cached in the buffer memory BUF cur (also referred to herein as "first data") steps S120 to S140 are performed.
In step S120, LS may be determined 1 To LS K LS in (3) k (also referred to herein as a "first local address range," where K is an integer greater than or equal to 1 and less than or equal to K) of ADDR within des (also referred to herein as a "first address"), wherein the ROI cur Has a structure similar to LS k Corresponding score, and ADDR des Data at ROI nxt (also referred to herein as "second data") is not adjusted ROI data.
In step S130, ADDR may be combined des Data at ROI nxt Is stored from the storage area SR into the buffer memory BUF.
In step S140, the ROI may be set cur ADDR stored as adjusted ROI data in storage region SR des Where it is located.
In the ROI cached in the buffer BUF nxt In case not the adjusted ROI data or the non-adjusted ROI data is still included in the storage region SR, it may be e.g. according to the ROI in the buffer memory BUF nxt Determining new ROI data to be adjusted (i.e., new ROI cur ) The above steps S120 to S140 are continued. This is repeated until all the ROI data in the storage region SR become the adjusted ROI data.
In this context, "adjusted ROI data" in the storage region SR may mean that the ROI data is an ROI 1 、ROI 2 、……、ROI N And its storage location in the storage region SR is an adjusted storage location, so that the ROI data can be skipped in subsequent processing (if any); "unregulated ROI data" in the storage region SR may mean that the ROI data is a ROI 1 、ROI 2 、……、ROI N An original ROI data that has never been previously accessed and/or processed; "not the adjusted ROI data" may mean that the data may be the non-adjusted ROI data, or may be special ROI data or special data having a special value such as NULL value (NULL) or "-1" (for example, may be used for marking, or may function to simplify processing, etc.).
By the method according to embodiments of the present disclosure, the ROI data can be adjusted "in-situ" directly on the originally allocated storage region SR. In addition to the storage region SR for storing the original individual ROI data itself, for use in the adjustment process The buffer memory BUF providing the auxiliary buffer area may comprise two buffer lines or buffers (in a further embodiment, the buffer memory BUF may also be two memory areas different from the memory area SR in a memory comprising the memory area SR) for buffering intermediate data involved in the adjustment process, i.e. the aforementioned ROI cur And ROI nxt . Even with respect to artificial intelligence chips that typically only configure a few hundred K bytes to a few M bytes of on-chip memory, the hardware cost of the buffer memory BUF is negligible.
Moreover, in the method according to the embodiment of the present disclosure, it is not necessary to save additional data such as links, pointers, etc. for indicating the context of the respective ROI data. This means that the capacity of each of the storage region SR and the buffer memory BUF depends only on the size of the ROI data itself. For example, if one ROI data itself (which may include information about the ROI itself such as ROI coordinates, width, height, confidence, etc., but does not include information indicating links, pointers, etc. to be in tandem with other ROI data) has a size of s bytes, the total capacity of the storage space (including the storage region SR and buffer memory BUF) required for the method according to the embodiment of the present disclosure is only (n+2) s bytes without considering necessary overhead such as data alignment.
In addition, the method according to the embodiment of the disclosure can complete the adjustment of the storage positions of all the ROI data while traversing all the ROI data at one time, and has simple and efficient control logic.
Thus, by example methods according to embodiments of the present disclosure, in-situ adjustment of ROI data can be efficiently achieved with minimal hardware cost, with minimal spatial and temporal complexity. For example, in the case where the example method according to the embodiments of the present disclosure is implemented in an artificial intelligence chip to adjust the storage location of ROI data initially stored in on-chip memory, the artificial intelligence chip may not have to expand the on-chip memory or add another buffer memory, for example, whose capacity may need to be greater than or equal to the storage area in the on-chip memory for storing all of the ROI data, and the artificial intelligence chip can efficiently perform the adjustment of the storage location of the ROI data through simple control logic.
It should be appreciated that in the different loops determined according to steps S120 to S140, the first data ROI cur First local address range LS k First address ADDR des Second data ROI nxt Etc. and possibly going out below, such as "second address", "second local address range", etc. may each have or correspond to a different data value/data item/data content.
Further details of methods according to embodiments of the present disclosure are described below in connection with examples.
In various embodiments, ROI data may be generated for one or more objects of interest in an input image or video (e.g., persons, animals, vehicles, markers, buildings, lane lines, etc. in the image or video) by a region recommendation network, selective search, or the like in a suitable manner, wherein each ROI data generated has a corresponding score, such score may generally be the confidence of the ROI, and in some embodiments may be a confidence replacement value obtained by omitting nonlinear processing in calculating the confidence of the ROI.
In the ROI-based object detection method (including multi-stage object detection methods such as RCNN, SPP-Net, fast RCNN, and single-stage object detection methods such as Tiny-SSD) which are excellent in detection effect, it is necessary to sort respective ROI data according to the score of the respective ROI data, or to adjust the storage position of the respective ROI data in a memory according to the score of the respective ROI data so that the respective ROI data are sequentially stored in the memory after adjustment.
Because of hardware limitations, such as registers or buffers for registering or caching the scores of the ROI data typically have a limited number of bits, the scores of the actually generated ROI data as well as other data items of the ROI data (e.g., coordinates of the ROI, width of the ROI, height of the ROI, etc.) can each only use numbers with a predetermined accuracyThe value represents. For example, a hardware configuration capable of supporting a floating point number of 32-bit precision may be employed so as to be capable of representing 2 of the ROI data 32 = 4294967296 scores.
As previously described, in GPU or TPU based artificial intelligence systems, there may be no restrictions on resources and power consumption, and thus a 32 bit precision floating point representation may be employed without stress. However, for artificial intelligence chips and terminal devices such as cell phones that require tight control over hardware costs, a 32-bit precision floating point representation would result in higher hardware costs.
By analyzing the principle of the algorithm of the ROI-based object detection method and conducting a lot of experiments, it is found that at least for the score of the ROI data, a lower precision than the usual 32-bit precision can be employed. For example, it is confirmed through a large number of works that the score of the ROI can be quantized to a predetermined accuracy of 6 bits or more, for example, 6-bit accuracy or 8-bit accuracy, without affecting the accuracy of the final detection result.
Thus, in embodiments of the present disclosure, the score of the ROI data may be quantized to a predetermined precision, for example, may be lower than a usual 32-bit precision, such as a 6-bit precision, an 8-bit precision, a 10-bit precision, a 16-bit precision, and the like.
For example, in the case of quantization to 6-bit precision, each ROI data will have 2 6 =64 scores; in the case of quantization to 8-bit precision, the individual ROI data will have 2 8 =256 scores; in the case of quantization to 10-bit precision, the individual ROI data will have 2 10 =1024 scores; and so on. The score of the ROI data is expressed with a predetermined precision lower than the usual 32-bit precision, and the hardware cost can be remarkably reduced and the processing efficiency can be improved.
As described previously, the ROI generated and stored in the storage region SR 1 、ROI 2 、……、ROI N The number N of (c) may typically be hundreds of thousands, tens of thousands or thousands. In the case of quantizing the score of each ROI to a predetermined precision such as 6-bit precision or 8-bit precision, the maximum number of kinds of ROI scores that may occur (for example, at 6-to-6 ratio64 in the case of a very precise, 256 in the case of an 8-bit precision) will also be much smaller than N, so that the ROI 1 、ROI 2 、……、ROI N May have the same score.
In one embodiment, the ROI may be generated for any one of the scores at a predetermined accuracy 1 、ROI 2 、……、ROI N The number of ROI data with such a score. For example, as shown in FIG. 3, a corresponding counter CNT may be set for each ROI score at a predetermined accuracy 1 To CNTs M Where, for example, in the case of an 8-bit precision, M may be equal to 256, and for example, a counter CNT 1 Counter CNT for counting the number of ROI data with first type ROI score 2 For counting the number of ROI data with the second ROI score, and so on. However, it should be understood that this does not mean that the first ROI score must be a score value of 1 and the first ROI score must be a score value of 2. In this context, for descriptive convenience, a counter CNT is used 1 To CNTs M Instead of a score value representing the ROI score corresponding to the counter, or in one embodiment, the score of each ROI data may be mapped to a certain value within a natural number range including 1 to M corresponding to a predetermined accuracy.
For convenience of subsequent adjustment/ordering, and also for descriptive convenience, it may be provided that for any two counters CNT p And CNTs n (p < n < M) and a counter CNT p The corresponding ROI score is smaller than the counter CNT n The corresponding ROI score. However, the present disclosure is not limited thereto, and other specifications may be made to the correspondence between the counter and the ROI score as needed (e.g., requirement of ordering), for example, the counter CNT may be also specified p The corresponding ROI score is greater than the counter CNT n The corresponding ROI score.
Obviously for counter CNT 1 To CNTs M Any one of the counter CNTs (s is greater thanOr any integer equal to 1 and less than or equal to M) that may be the range [0, n]And a value within, wherein 0 may mean that the value of the score of any one of all the ROI data generated and stored in the SR is not s at a predetermined precision, and N may mean that the value of the score of all the ROI data generated and stored in the SR is s at a predetermined precision. In fact, as described above, in the case where the predetermined precision is 6 bits or more, the counter CNT s The initial value before step S110 will typically not be N, or even a value close to it.
Then, in step S110, the counter CNT may be used 1 To CNTs M Each counter CNT in (a) s Initial value determination and storage region SR full address range ADDR 1 To ADDR N Corresponding multiple local address ranges LS 1 To LS K
For example, K may be the same as M, and if the counter CNT 1 An initial value of C1 and a counter CNT 2 Initial value of C2, i.e. ROI stored in SR 1 、ROI 2 、……、ROI N Wherein the value of the score of the C1 ROI data under the preset precision is 1, and the value of the score of the C2 ROI data under the preset precision is 2, the counter CNT 1 Corresponding local address range LS 1 May include consecutive C1 addresses ADDR in the memory region SR 1 To ADDR C1 And with a counter CNT 2 Corresponding local address range LS 2 May include consecutive C2 addresses ADDR in the storage region SR C1+1 To ADDR C1+1+C2 Wherein if c1=0, then and counter CNT 2 Corresponding local address range LS 2 May include consecutive C2 addresses ADDR in the storage region SR 1 To ADDR 1+c2 And at this point the counter CNT may be disabled 1 So that the local address range LS 1 In fact becomes ineffective and can therefore be left unattended. In a further example, it is actually trueThe number K of fixed local address ranges may be less than or equal to the number M of actually set counters. For example, K may be less than M in the event that one or more counters are disabled.
Similarly, a local address range LS may be determined 1 To LS K And similarly, a counter with an initial value of 0 may be disabled.
For descriptive convenience, in the example of fig. 3, the counter CNT 1 To CNTs M The initial value of each of which is not 0. Accordingly, local address range LS 1 To LS K Each of which may include a full address range ADDR 1 To ADDR N One or more consecutive addresses within, and a full address range ADDR 1 To ADDR N Any one of the addresses belonging only to the local address range LS 1 To LS K Is defined in the memory, is a local address range of the memory.
In the example of FIG. 3, the data ROI currently to be adjusted cur Is the ROI 1 、ROI 2 、……、ROI N ROI in (a) i (i is an integer greater than or equal to 1 and less than or equal to N), and may have been buffered in a cache line or buffer L1 (also referred to herein as a "first buffer") of the buffer memory BUF before performing step S120.
For example, if ROI cur (e.g., ROI in the example of FIG. 3) i ) Is the ROI 1 、ROI 2 、……、ROI N The ROI data of the first adjusted ROI in (a) may be stored from the storage region SR before performing step S120 cur Original address ADDR of (a) org (e.g., ADDR in the example of FIG. 3) i ) Read-out ROI cur And caches it in cache line L1 of the buffer memory BUF.
In determining the ROI cur Thereafter, for example, in the case of ROI cur After buffering in the buffer memory BUF, ADDR can be set org The data at is marked as not unprocessed ROI data anymore.
In one embodiment, the ROI may be defined as cur After the cache line L1 from the storage region SR to the buffer memory BUF, the address ADDR in the storage region SR is cached org The ADDR can also be modified by flushing or setting the data at a special data such as NULL or "-1 i ROI data at such that at ADDR org Some data item (e.g., score) in the modified ROI data at that point has a special value such as "-1" or null data.
Then, in step S120, the ROI cached in the buffer memory BUF may be determined cur Determining a local address range LS 1 To LS M In a local address range LS corresponding to the score k . For example, it can be based on the ROI in the buffer memory BUF cur To determine the score of the counter CNT described above 1 To CNTs M In the counter CNT corresponding to the score k Thereby determining the local address range LS k
Obviously, the local address range LS at this point k The data at least one address in (a) is not adjusted ROI data (may be unadjusted ROI data, or may be null data or other special data). Thus, a local address range LS may be determined k One address ADDR of (a) des (e.g., ADDR in the example of FIG. 3) j ) At address ADDR des Data ROI of (2) nxt (e.g., ROI in the example of FIG. 3) j ) Not the adjusted ROI data.
In one embodiment, the local address ranges LS may be detected individually k A data state at each address in the memory, thereby determining the address ADDR des
In another embodiment, the local address range LS may be based on k Corresponding counter CNT k The count value or data in (a) determines the address ADDR des . For example, it may be based on a counter CNT k In a local address range LS k Reference address (e.g. LS k The head address of (a) and from which the local address range LS can be determined k Address ADDR of (a) des . Then, at an appropriate time, for example, when determining address ADDR des After that, at the read address ADDR des ROI at nxt Thereafter, at the time of ROI cur Write to address ADDR des After that, the CNT is updated k Such that the updated data corresponds to the local address range LS k An address within the memory that has not been accessed.
For example, the counter CNT can be determined k Each count value and local address range LS k Corresponding relation between each address in (a) and using a counter CNT k The current count value is from the local address range LS k To determine the address ADDR des . For example, the counter CNT may be controlled k Count down from an initial value (i.e., the number of ROI data having a score corresponding to k) at an appropriate timing, and count up a counter CNT k Each count value of the local address range LS k Starting from the head address to the local address range LS k Offset in the tail address direction of (a). Counter CNT with initial value other than 0 k When the count value of (c) becomes 0, it means that all the storage positions of ROI data having a score corresponding to k have been adjusted. Then, it can be based on the counter CNT k Current count value CNT of k V and local Address Range LS k Determines the address ADDR des . Thereby, a local address range LS can be ensured k Address ADDR of (a) des The ROI data at is not adjusted ROI data and is at address ADDR des Then up to the local address range LS k The ROI data at each address in the tail addresses of (a) is the adjusted ROI data.
In one embodiment, local address ranges LS, respectively, may be provided 1 To LS M Or counter CNT 1 To CNTs M Corresponding address register AR 1 To AR M (not shown in fig. 3), wherein the address register AR k For preserving local address ranges LS k For example, a first address. That is to say, with the local address range LS k Corresponding registerThe/or counter may for example comprise an address register AR k . As previously described, such hardware overhead is fully acceptable even for artificial intelligence chips due to the small value of M.
In further embodiments, the local address range LS may be stored in, for example, a AND memory or a buffer memory 1 To LS M The first address or the last address of each local address range of the register is not necessarily set separately.
It should be understood that the present disclosure is not limited to the examples described above. For example, the counter CNT described herein as an example, since a relative address representation (e.g., count value or address offset) and an absolute address representation can be easily inter-converted 1 To CNTs M The manner of use of (c) may be modified to be based on any other suitable manner of processing of the relative address representation (e.g., count value or address offset) and may be modified to be based on absolute address representation.
At the determined address ADDR des Thereafter, the original stored in ADDR can be in step S130 des ROI at nxt As ROI nxt Buffered in the buffer memory BUF and the ROI in the buffer memory BUF is buffered in step S140 cur To cover ADDR des The way of the original data at the address is stored to ADDR des Where it is located.
In one embodiment, the ROI in the buffer memory BUF may be set in step S130 cur From cache line L1 of the buffer memory BUF to another cache line or buffer L2 (also referred to herein as a "second buffer") of the buffer memory BUF, and then to be stored in ADDR des ROI at nxt Cached in cache line L1 of the buffer memory BUF. Then, in step S140, the ROI in the cache line L2 may be set cur To cover ADDR des Raw data at (i.e., ROI nxt ) Is stored to ADDR des Where it is located.
In another embodiment, the ROI in cache line L1 of the buffer memory BUF cur Can be in the previous step S120To the buffer L2 of the buffer memory BUF. Then, in step S130, ADDR des ROI at nxt May be buffered directly into the cache line L1 of the buffer memory BUF. Then, in step S140, the ROI in the cache line L2 may be set cur To cover ADDR des The way of the original data at the address is stored to ADDR des Where it is located. In this embodiment, steps S130 and S140 can be controlled to be implemented in one read/write cycle.
For example, as shown in FIG. 4, step S130 may be performed on the rising edge of a clock to control the ADDR des ROI at nxt Buffered in line L1 of buffer memory BUF, and step S140 is performed on the falling edge of the clock to control the ROI in line L2 to be buffered cur To cover ADDR des The way of the original data at the address is stored to ADDR des Where it is located.
Thereby, the ROI to be currently adjusted cur (e.g., ROI in the example of FIG. 3) i ) ADDR that can be stored as adjusted ROI data to SR des At and with LS k The count value of the corresponding counter CNTk can be updated at an appropriate time in the process, for example, reduced by 1, and become the ADDR if the count value is still not 0 des Address ADDR before (a) des-1 Corresponding values. If CNT k If the count value of (c) is 0, meaning that all the storage locations of the ROI data having a score corresponding to k have been adjusted, and in one embodiment the counter CNT may be disabled k
As shown in fig. 3, in one case, the ROI to be processed in each cycle cur Or the ROI determined in each cycle to be processed in the next cycle nxt Can be directly the ROI 1 To ROI N Is included in the image data, is an unadjusted ROI data.
Thus, the ROI can be set nxt (i.e., ROI) j ) ROI as the next cycle cur . For example, the ROI may be set at step S140 cur =ROI i ADDR stored in storage region SR des =ADDR j After this and enter the next step of the cycleBefore S120, the variable ROI cur From ROI i Updating to ROI j . Then, for the new ROI cur =ROI j Steps S120 to S140 are performed again.
For example, the number of loops may be counted, and the processing may be ended when the number of loops reaches N.
In other cases (e.g., current ROI cur Is the ROI 1 To ROI N The last ROI data to be adjusted, ADDR determined in step S120 des And reading the ROI cur ADDR of (a) org Similarly, ADDR determined in step S120 des Has been processed in the previous loop, etc.), the ROI buffered in the buffer memory BUF in step S130 nxt It is possible not to unadjusted ROI data, but for example ROI data that has been processed in a previous cycle or ROI data to which a flag is applied (for example, the data itself is set to special data such as null data, "-1", or a data item in the data such as a score has a special value such as "-1"), or ROI being processed in a current cycle cur (at this time, ADDR des And ADDR (ADDR) org The same), etc.
To this end, it is possible to follow step S140 and before step S120 of the next cycle is performed, for example the ROI currently cached in BUF nxt In the case that the number of the adjusted ROI data in the storage region SR is smaller than N and not the non-adjusted ROI data, another non-accessed address ADDR in the storage region SR is determined new (also referred to herein as a "second address") and uses address ADDR new ROI data update in BUF at ROI nxt And as the next ROI to be adjusted cur
To determine a new address ADDR new In one embodiment, LS may be determined 1 To LS K LS in (3) k, (also referred to herein as a "second local address range," wherein K' is an integer greater than or equal to 1 and less than or equal to K, which may be the same or different from K), wherein LS k, May include at least one address that has not been accessed, e.g., with LS k, Corresponding counter CNT k’ Is not currently 0. Then, it can be based on the counter CNT k’ Determines a new address ADDR from the count value of (a) new
In one embodiment, as shown in FIG. 5, step S150 may be entered after step S140 to detect the ROI currently cached in the BUF nxt Whether an unadjusted ROI. In one embodiment, the ROI currently cached in cache line L1 of the BUF may be detected nxt Whether it is special data such as null data or "-1". In another embodiment, the ROI currently cached in cache line L1 of the BUF may be detected nxt Whether certain data items or fields (e.g., scores) have a particular value such as "-1".
Return "yes" at step S150 (e.g., ROI nxt Not special data or not including special data items) can be ROI nxt As a new ROI cur And for a new ROI cur Step S120 is performed.
In the case where step S150 returns "NO", it may proceed to step S160 to re-determine the ROI nxt . For example, it can be based on CNTs 1 To CNTs M A counter CNT of which the count value of any one is still not 0 k’ Thereby determining LS k’ Then according to the counter CNT k’ Determination of the current count value belonging to LS k, Address ADDR of (a) new . Address ADDR may then be used new ROI data at as new ROI nxt Replacement of ROI buffered in BUF in step S130 nxt And proceeds to step S120 (i.e., "successfully determining a new ROI" in the example of fig. 5 nxt "case").
If CNT is found in step S160 1 To CNTs M The count value of each of (2) is 0, it may mean that the number of adjusted ROI data in the storage region SR has reached N, i.e., all ROIs have been adjusted. In addition, for example, the number of cycles may be counted, and the number of cycles may be counted The process is ended when the number reaches N.
In the above example, in step S150, by detecting the ROI currently cached in cache line L1 of the BUF nxt Whether it is special data such as null data or "-1" or ROI currently cached in cache line L1 of BUF nxt Whether or not certain data items or fields (e.g., scores) of (i) have a particular value such as "-1", determine the current ROI nxt Whether an unadjusted ROI. In a further embodiment, for example, an address register may be provided, in which the initial data may be the original memory address of the first processed ROI data in the memory region SR. Then, it may be detected in step S150 that the determination in step S120 is ADDR des Whether the address is the same as the address stored in the address register. If the same, step S150 may return to "no", otherwise "yes" is returned. Then, in step S160, if a new ROI is successfully determined nxt Address ADDR may be used new The data in the address register is updated and the method proceeds to S120.
As described above, the counter CNT 1 To CNTs M Can be much smaller than the ROI 1 To ROI N Is a number of (3). Also, in some embodiments, a list of counters may be established, a counter may be controlled to be deleted or disabled from the list when the count value of a certain counter becomes 0, and then an attempt may be made to determine a new ROI based on the counter located at the head or tail of the list in step S160 nxt . Therefore, the execution of steps S150 and S160 can be very rapid and even involve no further loop processing.
Methods according to embodiments of the present disclosure can be used to traverse the ROI 1 To ROI N In-situ adjustment of the storage location of each ROI data is accomplished at the same time as each ROI data.
Thus, by example methods according to embodiments of the present disclosure, in-situ adjustment of ROI data can be efficiently achieved with minimal hardware cost, with minimal spatial and temporal complexity.
For example, in the case where the example method according to the embodiments of the present disclosure is implemented in an artificial intelligence chip to adjust the storage location of ROI data initially stored in on-chip memory, the artificial intelligence chip may not have to expand the on-chip memory or add another buffer memory, for example, whose capacity may need to be greater than or equal to the storage area in the on-chip memory for storing all of the ROI data, and the artificial intelligence chip can efficiently perform the adjustment of the storage location of the ROI data through simple control logic.
It should be appreciated that the example methods according to embodiments of the present disclosure may be applied to any scenario where it is desirable to sort in-situ a plurality of ROI data that has been generated and stored in succession, and are not limited to application in an artificial intelligence chip.
Exemplary apparatus
Fig. 6 illustrates an example DEV of an apparatus for adjusting storage locations of a plurality of region of interest data, which may include a processor or controller CON and a buffer memory BUF, according to an embodiment of the disclosure.
The processor or controller CON may be, for example, a processor circuit or control logic circuit developed based on a field programmable gate array, ARM processor, or the like, and may be configured to perform steps in a method (e.g., the example method shown in fig. 3 or 5) according to embodiments of the present disclosure according to predetermined instructions upon startup (e.g., upon power-up).
The buffer memory BUF may be any suitable type of storage means such as a cache memory, registers, static memory, etc. The buffer memory BUF may store or buffer intermediate data involved in the processing or logic control of the processor CON, including for example the ROI as described above, according to instructions of the processor CON or under control of the processor CON cur And/or ROI nxt
In one embodiment, the buffer memory BUF may include two buffers or cache lines (e.g., L1 and L2 in the example of fig. 4), and the capacity of each buffer may depend on the size of a single ROI data.
In addition, as shown in FIG. 6, an example apparatusThe DEV may also include a plurality of counters CNT 1 To CNTs M To count the number of ROI data having scores corresponding to 1 to M, respectively, to assist the processor CON in determining and storing the ROIs in the memory 1 To ROI N A plurality of local address ranges corresponding to the address ranges of the storage region SR.
In one embodiment, counter CNT 1 To CNTs M The number M of (c) may depend on the accuracy of the score of each ROI data. For example, in the case of an 8-bit precision, M may be equal to 256, and for example a counter CNT 1 Counter CNT for counting the number of ROI data with first type ROI score 2 For counting the number of ROI data with the second ROI score, and so on.
It should be understood that an apparatus according to an embodiment of the present disclosure is not limited to the example in fig. 6. For example, in further embodiments, an apparatus according to embodiments of the present disclosure may also include other components/modules/circuits/units/elements such as timing controllers for instruction timing control, interrupt controllers for instruction interrupt control, crossbar switches and/or multiplexers for implementing interconnections between components, coprocessors, and the like.
In addition, in fig. 6, devices/components/elements/modules/circuits at both ends of the connection may be directly connected or coupled together, for example, through a data bus and/or an instruction bus, or may be indirectly connected or coupled together via other intermediary devices/components/elements/modules/circuits, such as a Crossbar (Crossbar), where an arrow of the connection may indicate a flow direction of the instruction and/or data of interest, but does not mean that the instruction and/or data may flow only in a direction indicated by the arrow.
In one embodiment, an apparatus according to embodiments of the present disclosure may be implemented, for example, as or as part of an on-chip apparatus of an artificial intelligence chip. In further embodiments, an apparatus according to embodiments of the present disclosure may be applied in any scenario or apparatus where it is desirable to sort in-situ a plurality of ROI data that has been generated and stored in succession.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present disclosure described in the "exemplary methods" section of the present description.
The computer program product may include program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, such as a computer-readable non-transitory storage medium, having stored thereon program instructions that, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present disclosure described in the "exemplary methods" section above in the present specification.
A computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
In this document, modifiers such as "first," "second," etc. without a literal term are intended to distinguish between different elements/components/circuits/modules/means/steps, and do not emphasize order, positional relationship, importance, priority levels, etc. In contrast, modifiers with adjectives such as "first", "second", etc., may be used to emphasize different element/component/circuit/module/means/step sequence, positional relationship, importance, priority level, etc.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (14)

1. A method for adjusting storage locations of a plurality of region of interest data, comprising:
determining a plurality of local address ranges corresponding to the address ranges of the storage areas storing the plurality of region of interest data according to the scores of the plurality of region of interest data;
determining a first address within a first local address range of the plurality of local address ranges for first data that is region of interest data to be currently adjusted;
storing second data at the first address from the storage area into a buffer memory; and
the first data is stored to a first address in the storage area.
2. The method of claim 1, wherein a size of each of the plurality of local address ranges is dependent on a number of region of interest data of the plurality of region of interest data having a score corresponding to the local address range.
3. The method of claim 1, wherein the first data has a score corresponding to the first local address range and the second data of the first address is not adjusted region of interest data.
4. The method of claim 3, wherein determining a first address within a first local address range of the plurality of local address ranges comprises:
the first address is determined from a count value of a counter corresponding to the first local address range.
5. The method of claim 4, further comprising:
and updating the count value of a counter corresponding to the first local address range, so that the updated count value corresponds to an address which is not accessed in the first local address range.
6. The method of claim 1, wherein storing the second data at the first address from the storage area into a buffer memory comprises:
storing first data cached in a first buffer area of the buffer memory into a second buffer area of the buffer memory; and
and caching the second data from the storage area into a first buffer area of the buffer memory.
7. The method of claim 6, wherein storing the first data in the buffer memory at the first address in the storage area comprises:
first data in the second buffer is stored to a first address in the storage area.
8. The method of any of claims 1 to 7, further comprising:
and determining the second data in the buffer memory as the next region of interest data to be adjusted in the case that the second data is the region of interest data which is not adjusted.
9. The method of any of claims 1 to 7, further comprising:
in the case where the second data is not the unadjusted region of interest data and the number of the adjusted region of interest data in the memory area is smaller than the number of the plurality of region of interest data, the data at the second address in the memory area that is not accessed is determined as the next region of interest data to be adjusted.
10. The method of claim 9, wherein determining data at the second address in the memory area that has not been accessed as the next region of interest data to be adjusted comprises:
Determining a second local address range of the plurality of local address ranges, wherein the second local address range comprises at least one address which is not accessed; and
the second address is determined from a count value in a counter corresponding to the second local address range.
11. A computer readable non-transitory storage medium having stored thereon program instructions that, when executed, perform at least the method of any of claims 1 to 10.
12. An apparatus for adjusting storage locations of a plurality of region of interest data, comprising:
a buffer memory configured to store region of interest data to be currently adjusted; and
a processor configured to perform at least the method of any one of claims 1 to 10 at start-up.
13. The apparatus of claim 12, wherein the buffer memory comprises two buffers, each buffer having a capacity that depends on a size of a single region of interest data.
14. The apparatus of claim 12 or 13, further comprising:
a plurality of counters, each counter corresponding to a respective local address range, the number of the plurality of counters being dependent on the accuracy of the score for each region of interest data.
CN201910005972.2A 2019-01-03 2019-01-03 Method and device for adjusting storage position of multiple region of interest data Active CN109727187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910005972.2A CN109727187B (en) 2019-01-03 2019-01-03 Method and device for adjusting storage position of multiple region of interest data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910005972.2A CN109727187B (en) 2019-01-03 2019-01-03 Method and device for adjusting storage position of multiple region of interest data

Publications (2)

Publication Number Publication Date
CN109727187A CN109727187A (en) 2019-05-07
CN109727187B true CN109727187B (en) 2023-05-30

Family

ID=66298058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910005972.2A Active CN109727187B (en) 2019-01-03 2019-01-03 Method and device for adjusting storage position of multiple region of interest data

Country Status (1)

Country Link
CN (1) CN109727187B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428359B (en) * 2019-08-09 2022-12-06 南京地平线机器人技术有限公司 Apparatus and method for processing region of interest data
CN112406884B (en) * 2019-08-20 2022-03-15 北京地平线机器人技术研发有限公司 Vehicle driving state recognition method and device, storage medium and electronic equipment
CN112860602B (en) * 2019-11-12 2024-05-03 北京地平线机器人技术研发有限公司 Method and device for controlling storage operation of region-of-interest data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7634633B2 (en) * 2006-11-30 2009-12-15 Motorola, Inc. Method and apparatus for memory address generation using dynamic stream descriptors
JP2012003408A (en) * 2010-06-15 2012-01-05 Rohm Co Ltd Drive recorder
RU2580016C1 (en) * 2014-10-17 2016-04-10 Закрытое акционерное общество "Лаборатория Касперского" Method for transfer of control between memory areas
US9875031B2 (en) * 2015-09-30 2018-01-23 Western Digital Technologies, Inc. Data retention management for data storage device

Also Published As

Publication number Publication date
CN109727187A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN109690512B (en) GPU remote communication with trigger operation
US5511210A (en) Vector processing device using address data and mask information to generate signal that indicates which addresses are to be accessed from the main memory
CN109727187B (en) Method and device for adjusting storage position of multiple region of interest data
US20210089871A1 (en) Processing system and method for binary weight convolutional neural network
CN111984189B (en) Neural network computing device, data reading method, data storage method and related equipment
CN111355962A (en) Video decoding caching method suitable for multiple reference frames, computer device and computer readable storage medium
JP2018503924A (en) Providing memory bandwidth compression using continuous read operations by a compressed memory controller (CMC) in a central processing unit (CPU) based system
CN117223005A (en) Accelerator, computer system and method
CN109089120B (en) Analysis-aided encoding
US7603492B2 (en) Automatic generation of streaming data interface circuit
US11669736B2 (en) Executing neural networks on electronic devices
US20230196093A1 (en) Neural network processing
CN111258649A (en) Processors, Chips and Electronics
CN104731519B (en) Cache memory management device and dynamic image system and method using the cache memory management device
US11741349B2 (en) Performing matrix-vector multiply operations for neural networks on electronic devices
US10448020B2 (en) Intelligent MSI-X interrupts for video analytics and encoding
US20230195651A1 (en) Host device performing near data processing function and accelerator system including the same
CN110800301A (en) Control method and device of coding equipment and storage medium
US20220391676A1 (en) Quantization evaluator
US20130278775A1 (en) Multiple Stream Processing for Video Analytics and Encoding
CN104025026A (en) Accessing Configuration and Status Registers for a Configuration Space
CN114546251A (en) Weight matrix data storage method, data acquisition method and device and electronic equipment
CN109324826B (en) Counting device and counting method
CN111324793A (en) Method and device for controlling operation of storing data of region of interest
US20190378477A1 (en) Image processing system and memory managing method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant