CN109727187B

CN109727187B - Method and device for adjusting storage position of multiple region of interest data

Info

Publication number: CN109727187B
Application number: CN201910005972.2A
Authority: CN
Inventors: 李军; 李建军; 张鑫语; 李晓森; 黄畅
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-01-03
Filing date: 2019-01-03
Publication date: 2023-05-30
Anticipated expiration: 2039-01-03
Also published as: CN109727187A

Abstract

A method and apparatus for adjusting storage locations of a plurality of region of interest data is disclosed. The method may include: determining a plurality of local address ranges corresponding to the address ranges of the storage areas storing the plurality of region of interest data to be adjusted according to the scores of the plurality of region of interest data to be adjusted; determining, for first data that is region of interest data to be currently adjusted, a first address within a first local address range of a plurality of local address ranges to be adjusted; storing second data at the first address from the storage area into the buffer memory; and storing the first data at a first address in the storage area. By the method and the device according to the embodiment of the disclosure, the in-situ adjustment of the storage positions of the plurality of region-of-interest data can be realized efficiently.

Description

Method and device for adjusting storage position of multiple region of interest data

Technical Field

The present disclosure relates generally to the field of artificial intelligence, and in particular to a method and apparatus for adjusting storage locations of a plurality of region of interest data.

Background

One or more objects of interest in the input image or video may be detected using methods such as RCNN (Regions with CNN features), spatial pyramid pooling network (Spatial Pyramid Pooling Network, SPP-Net), fast RCNN (Fast RCNN), faster RCNN (Faster RCNN), etc., which require that multiple regions of interest (Region of Interest, ROIs) be generated (or recommended) first and then subsequent processing be performed based on the generated ROI data.

In the ROI-based object detection method, it is necessary to readjust the storage position of each generated ROI data according to the score of the ROI.

The usual space-time complexity for controlling the process of adjusting the position of the ROI data is high and is not suitable for efficient implementation on artificial intelligence chips with limited hardware resources.

Disclosure of Invention

According to one aspect of the present disclosure, a method and apparatus for adjusting storage locations of a plurality of region of interest data is provided. The method may include: determining a plurality of local address ranges corresponding to the address ranges of the storage areas storing the plurality of region of interest data to be adjusted according to the scores of the plurality of region of interest data to be adjusted; determining, for first data that is region of interest data to be currently adjusted, a first address within a first local address range of a plurality of local address ranges to be adjusted; storing second data at the first address from the storage area into the buffer memory; and storing the first data at a first address in the storage area.

According to another aspect of the present disclosure, there is also provided an apparatus for adjusting storage locations of a plurality of region of interest data. The apparatus may comprise a buffer memory and a processor, wherein the buffer memory may be configured to store the region of interest data to be currently adjusted, and the processor may be configured to perform at least the above method at start-up.

According to another aspect of the present disclosure, there is also provided a computer readable non-transitory storage medium having stored thereon program instructions that, when executed, perform at least the above-described method.

By the method and the device according to the embodiment of the disclosure, the in-situ adjustment of the positions of the plurality of region-of-interest data can be realized with high efficiency.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing embodiments thereof in more detail with reference to the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 illustrates an example of a storage region SR storing N pieces of ROI data according to an embodiment of the present disclosure.

Fig. 2 illustrates an example of a method for adjusting a storage location of ROI data according to an embodiment of the present disclosure.

Fig. 3 shows an example of a procedure in performing the example method of fig. 2 for N ROI data as shown in fig. 1.

Fig. 4 shows an example of performing step S130 and step S140 in the example method of fig. 2.

Fig. 5 shows an example of a procedure in performing the example method of fig. 2 for N ROI data as shown in fig. 1.

Fig. 6 illustrates an example of an apparatus for adjusting a storage location of ROI data according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.

SUMMARY

In multi-stage object detection methods such as RCNN, SPP-Net, fast RCNN, and Fast RCNN, which are excellent in detection effect, it is generally necessary to sort a large number (for example, thousands, tens of thousands, or even hundreds of thousands) of ROIs generated, that is, adjust the storage positions of ROI data in a memory.

In a typical scheme for controlling the adjustment of the storage location of ROI data, it is generally desirable that the memory be configured to provide a storage space that is greater than or equal to twice the total data amount of ROI data, wherein the memory may include the aforementioned buffer memory and/or memory for storing the initial ROI data and the final ordering result.

For artificial intelligence chips, only low capacity on-chip memory is typically configured due to various considerations and/or limitations of hardware cost and chip space, e.g., static memory or cache memory, etc., which may range in total from a few hundred kilobytes to a few megabytes. Thus, artificial intelligence chips cannot independently order ROI data in amounts up to hundreds of thousands or tens of thousands without data exchange with off-chip memory.

Even if only thousands of ROI data are ordered, in an artificial intelligence chip, more on-chip memory resources will be occupied and even more on-chip memory will need to be provided, since it is necessary to allocate in on-chip memory, for example, a memory area having at least the same capacity as a memory area storing all ROI data, which will increase the hardware cost of the artificial intelligence chip.

In addition, an artificial intelligence chip may be fitted into a terminal device such as a cellular phone, a navigator, or the like, in order to detect one or more objects of interest in an image or video captured, for example, via a camera or the like on the terminal device. In such a case, the artificial intelligence chip may be controlled to store the generated ROI data into an off-chip memory provided inside the terminal device and outside the artificial intelligence chip, and even the ordering of the ROI data stored in the off-chip memory may be controlled to be performed by a processor in the terminal device.

However, terminal devices such as cell phones, navigators, etc. often also have high demands on processing efficiency, power consumption, hardware costs, etc. The higher spatial complexity of a typical ROI ordering scheme (e.g., requiring a buffer area of the same size as the storage area in memory storing all ROI data) will result in higher hardware costs for the terminal device, e.g., requiring more memory to be configured.

Therefore, it is always desirable to be able to reduce the hardware cost of an artificial intelligence chip or a terminal device equipped with the artificial intelligence chip as much as possible while ensuring the efficiency of execution and the accuracy and precision of the result.

The method and apparatus for adjusting a storage location of ROI data according to embodiments of the present disclosure aim to at least partially solve or alleviate one or more of the above technical problems, so that an artificial intelligence chip or an artificial intelligence chip-equipped terminal device can efficiently implement in-place adjustment of a storage location of ROI data without having to configure a cache region whose capacity may need to be greater than or equal to a storage region in a memory for storing all ROI data.

Exemplary method

Fig. 1 shows N ROI data ROIs that have been generated and stored consecutively in a storage region SR of a memory ₁ 、ROI ₂ 、……、ROI _N Where N is an integer greater than 1, such as hundreds of thousands, tens of thousands or thousands, and the ROI ₁ 、ROI ₂ 、……、ROI _N Addresses ADDR stored in the storage areas SR, respectively ₁ 、ADDR ₂ 、……、ADDR _N Where it is located. In other words, N ROI data ROIs ₁ 、ROI ₂ 、……、ROI _N Continuously stored in address space ADDR corresponding to continuity ₁ To ADDR _N (also referred to herein as a "full address range") in the storage region SR.

Here, since each ROI data may generally include a plurality of data items (e.g., coordinates, width, height, score, etc. of the ROI), the address ADDR in the region SR is stored ₁ 、ADDR ₂ 、……、ADDR _N Corresponding to the first address of the unit memory area for storing each ROI data in the memory area SR, instead of the first address of each physical memory unit of the memory area SR. Herein, for simplicity, it is assumed that each ROI data occupies only one memory cell of the memory region SR, or only one memory line or cache line or buffer of the memory region SR in the case where the memory region SR is a two-dimensional memory region determined according to a two-dimensional memory.

The storage positions of the respective ROI data in the storage region SR need to be adjusted according to the scores of the respective ROI data so that, in one order, for the respective storage in ADDR after adjustment _i And ADDR _i+1 Is of the ROI of (2) _i And ROI _i+1 (i is any integer greater than or equal to 1 and less than N), ROI _i Score of less than or equal to ROI _i+1 For the scores of (a) or, in another order, for the scores stored separately in ADDR after adjustment _i And ADDR _i+1 Is of the ROI of (2) _i And ROI _i+1 ，ROI _i Score of greater than or equal to ROI _i+1 Is a score of (2).

FIG. 2 illustrates a method for adjusting ROI in accordance with an embodiment of the present disclosure ₁ 、ROI ₂ 、……、ROI _N An example of a method of storing a location in an SR, the example method may include steps S110, S120, S130, and S140.

In step S110, the ROI may be followed ₁ 、ROI ₂ 、……、ROI _N Score for each ROI in the address book, determining address range ADDR with full ₁ To ADDR _N Corresponding multiple local address ranges LS ₁ To LS _K Wherein K is an integer greater than 1, such as 256, and LS ₁ To LS _K The size of each local address range (or the number of consecutive addresses included in each local address range, or the storage capacity corresponding to each local address range) may depend on the ROI ₁ 、ROI ₂ 、……、ROI _N The number of ROI data having a score corresponding to the local address range.

Then, the ROI can be followed ₁ 、ROI ₂ 、……、ROI _N Any of the data (e.g., ROI ₁ ) Initially, for a region of interest ₁ 、ROI ₂ 、……、ROI _N Currently to be adjusted data ROI in and currently cached in the buffer memory BUF _cur (also referred to herein as "first data") steps S120 to S140 are performed.

In step S120, LS may be determined ₁ To LS _K LS in (3) _k (also referred to herein as a "first local address range," where K is an integer greater than or equal to 1 and less than or equal to K) of ADDR within _des (also referred to herein as a "first address"), wherein the ROI _cur Has a structure similar to LS _k Corresponding score, and ADDR _des Data at ROI _nxt (also referred to herein as "second data") is not adjusted ROI data.

In step S130, ADDR may be combined _des Data at ROI _nxt Is stored from the storage area SR into the buffer memory BUF.

In step S140, the ROI may be set _cur ADDR stored as adjusted ROI data in storage region SR _des Where it is located.

In the ROI cached in the buffer BUF _nxt In case not the adjusted ROI data or the non-adjusted ROI data is still included in the storage region SR, it may be e.g. according to the ROI in the buffer memory BUF _nxt Determining new ROI data to be adjusted (i.e., new ROI _cur ) The above steps S120 to S140 are continued. This is repeated until all the ROI data in the storage region SR become the adjusted ROI data.

In this context, "adjusted ROI data" in the storage region SR may mean that the ROI data is an ROI ₁ 、ROI ₂ 、……、ROI _N And its storage location in the storage region SR is an adjusted storage location, so that the ROI data can be skipped in subsequent processing (if any); "unregulated ROI data" in the storage region SR may mean that the ROI data is a ROI ₁ 、ROI ₂ 、……、ROI _N An original ROI data that has never been previously accessed and/or processed; "not the adjusted ROI data" may mean that the data may be the non-adjusted ROI data, or may be special ROI data or special data having a special value such as NULL value (NULL) or "-1" (for example, may be used for marking, or may function to simplify processing, etc.).

By the method according to embodiments of the present disclosure, the ROI data can be adjusted "in-situ" directly on the originally allocated storage region SR. In addition to the storage region SR for storing the original individual ROI data itself, for use in the adjustment process The buffer memory BUF providing the auxiliary buffer area may comprise two buffer lines or buffers (in a further embodiment, the buffer memory BUF may also be two memory areas different from the memory area SR in a memory comprising the memory area SR) for buffering intermediate data involved in the adjustment process, i.e. the aforementioned ROI _cur And ROI _nxt . Even with respect to artificial intelligence chips that typically only configure a few hundred K bytes to a few M bytes of on-chip memory, the hardware cost of the buffer memory BUF is negligible.

Moreover, in the method according to the embodiment of the present disclosure, it is not necessary to save additional data such as links, pointers, etc. for indicating the context of the respective ROI data. This means that the capacity of each of the storage region SR and the buffer memory BUF depends only on the size of the ROI data itself. For example, if one ROI data itself (which may include information about the ROI itself such as ROI coordinates, width, height, confidence, etc., but does not include information indicating links, pointers, etc. to be in tandem with other ROI data) has a size of s bytes, the total capacity of the storage space (including the storage region SR and buffer memory BUF) required for the method according to the embodiment of the present disclosure is only (n+2) s bytes without considering necessary overhead such as data alignment.

In addition, the method according to the embodiment of the disclosure can complete the adjustment of the storage positions of all the ROI data while traversing all the ROI data at one time, and has simple and efficient control logic.

Thus, by example methods according to embodiments of the present disclosure, in-situ adjustment of ROI data can be efficiently achieved with minimal hardware cost, with minimal spatial and temporal complexity. For example, in the case where the example method according to the embodiments of the present disclosure is implemented in an artificial intelligence chip to adjust the storage location of ROI data initially stored in on-chip memory, the artificial intelligence chip may not have to expand the on-chip memory or add another buffer memory, for example, whose capacity may need to be greater than or equal to the storage area in the on-chip memory for storing all of the ROI data, and the artificial intelligence chip can efficiently perform the adjustment of the storage location of the ROI data through simple control logic.

It should be appreciated that in the different loops determined according to steps S120 to S140, the first data ROI _cur First local address range LS _k First address ADDR _des Second data ROI _nxt Etc. and possibly going out below, such as "second address", "second local address range", etc. may each have or correspond to a different data value/data item/data content.

Further details of methods according to embodiments of the present disclosure are described below in connection with examples.

In various embodiments, ROI data may be generated for one or more objects of interest in an input image or video (e.g., persons, animals, vehicles, markers, buildings, lane lines, etc. in the image or video) by a region recommendation network, selective search, or the like in a suitable manner, wherein each ROI data generated has a corresponding score, such score may generally be the confidence of the ROI, and in some embodiments may be a confidence replacement value obtained by omitting nonlinear processing in calculating the confidence of the ROI.

In the ROI-based object detection method (including multi-stage object detection methods such as RCNN, SPP-Net, fast RCNN, and single-stage object detection methods such as Tiny-SSD) which are excellent in detection effect, it is necessary to sort respective ROI data according to the score of the respective ROI data, or to adjust the storage position of the respective ROI data in a memory according to the score of the respective ROI data so that the respective ROI data are sequentially stored in the memory after adjustment.

Because of hardware limitations, such as registers or buffers for registering or caching the scores of the ROI data typically have a limited number of bits, the scores of the actually generated ROI data as well as other data items of the ROI data (e.g., coordinates of the ROI, width of the ROI, height of the ROI, etc.) can each only use numbers with a predetermined accuracyThe value represents. For example, a hardware configuration capable of supporting a floating point number of 32-bit precision may be employed so as to be capable of representing 2 of the ROI data ³² = 4294967296 scores.

As previously described, in GPU or TPU based artificial intelligence systems, there may be no restrictions on resources and power consumption, and thus a 32 bit precision floating point representation may be employed without stress. However, for artificial intelligence chips and terminal devices such as cell phones that require tight control over hardware costs, a 32-bit precision floating point representation would result in higher hardware costs.

By analyzing the principle of the algorithm of the ROI-based object detection method and conducting a lot of experiments, it is found that at least for the score of the ROI data, a lower precision than the usual 32-bit precision can be employed. For example, it is confirmed through a large number of works that the score of the ROI can be quantized to a predetermined accuracy of 6 bits or more, for example, 6-bit accuracy or 8-bit accuracy, without affecting the accuracy of the final detection result.

Thus, in embodiments of the present disclosure, the score of the ROI data may be quantized to a predetermined precision, for example, may be lower than a usual 32-bit precision, such as a 6-bit precision, an 8-bit precision, a 10-bit precision, a 16-bit precision, and the like.

For example, in the case of quantization to 6-bit precision, each ROI data will have 2 ⁶ =64 scores; in the case of quantization to 8-bit precision, the individual ROI data will have 2 ⁸ =256 scores; in the case of quantization to 10-bit precision, the individual ROI data will have 2 ¹⁰ =1024 scores; and so on. The score of the ROI data is expressed with a predetermined precision lower than the usual 32-bit precision, and the hardware cost can be remarkably reduced and the processing efficiency can be improved.

As described previously, the ROI generated and stored in the storage region SR ₁ 、ROI ₂ 、……、ROI _N The number N of (c) may typically be hundreds of thousands, tens of thousands or thousands. In the case of quantizing the score of each ROI to a predetermined precision such as 6-bit precision or 8-bit precision, the maximum number of kinds of ROI scores that may occur (for example, at 6-to-6 ratio64 in the case of a very precise, 256 in the case of an 8-bit precision) will also be much smaller than N, so that the ROI ₁ 、ROI ₂ 、……、ROI _N May have the same score.

In one embodiment, the ROI may be generated for any one of the scores at a predetermined accuracy ₁ 、ROI ₂ 、……、ROI _N The number of ROI data with such a score. For example, as shown in FIG. 3, a corresponding counter CNT may be set for each ROI score at a predetermined accuracy ₁ To CNTs _M Where, for example, in the case of an 8-bit precision, M may be equal to 256, and for example, a counter CNT ₁ Counter CNT for counting the number of ROI data with first type ROI score ₂ For counting the number of ROI data with the second ROI score, and so on. However, it should be understood that this does not mean that the first ROI score must be a score value of 1 and the first ROI score must be a score value of 2. In this context, for descriptive convenience, a counter CNT is used ₁ To CNTs _M Instead of a score value representing the ROI score corresponding to the counter, or in one embodiment, the score of each ROI data may be mapped to a certain value within a natural number range including 1 to M corresponding to a predetermined accuracy.

For convenience of subsequent adjustment/ordering, and also for descriptive convenience, it may be provided that for any two counters CNT _p And CNTs _n (p < n < M) and a counter CNT _p The corresponding ROI score is smaller than the counter CNT _n The corresponding ROI score. However, the present disclosure is not limited thereto, and other specifications may be made to the correspondence between the counter and the ROI score as needed (e.g., requirement of ordering), for example, the counter CNT may be also specified _p The corresponding ROI score is greater than the counter CNT _n The corresponding ROI score.

Obviously for counter CNT ₁ To CNTs _M Any one of the counter CNTs (s is greater thanOr any integer equal to 1 and less than or equal to M) that may be the range [0, n]And a value within, wherein 0 may mean that the value of the score of any one of all the ROI data generated and stored in the SR is not s at a predetermined precision, and N may mean that the value of the score of all the ROI data generated and stored in the SR is s at a predetermined precision. In fact, as described above, in the case where the predetermined precision is 6 bits or more, the counter CNT _s The initial value before step S110 will typically not be N, or even a value close to it.

Then, in step S110, the counter CNT may be used ₁ To CNTs _M Each counter CNT in (a) _s Initial value determination and storage region SR full address range ADDR ₁ To ADDR _N Corresponding multiple local address ranges LS ₁ To LS _K 。

For example, K may be the same as M, and if the counter CNT ₁ An initial value of C1 and a counter CNT ₂ Initial value of C2, i.e. ROI stored in SR ₁ 、ROI ₂ 、……、ROI _N Wherein the value of the score of the C1 ROI data under the preset precision is 1, and the value of the score of the C2 ROI data under the preset precision is 2, the counter CNT ₁ Corresponding local address range LS ₁ May include consecutive C1 addresses ADDR in the memory region SR ₁ To ADDR _C1 And with a counter CNT ₂ Corresponding local address range LS ₂ May include consecutive C2 addresses ADDR in the storage region SR _C1+1 To ADDR _C1+1+C2 Wherein if c1=0, then and counter CNT ₂ Corresponding local address range LS ₂ May include consecutive C2 addresses ADDR in the storage region SR ₁ To ADDR _1+c2 And at this point the counter CNT may be disabled ₁ So that the local address range LS ₁ In fact becomes ineffective and can therefore be left unattended. In a further example, it is actually trueThe number K of fixed local address ranges may be less than or equal to the number M of actually set counters. For example, K may be less than M in the event that one or more counters are disabled.

Similarly, a local address range LS may be determined ₁ To LS _K And similarly, a counter with an initial value of 0 may be disabled.

For descriptive convenience, in the example of fig. 3, the counter CNT ₁ To CNTs _M The initial value of each of which is not 0. Accordingly, local address range LS ₁ To LS _K Each of which may include a full address range ADDR ₁ To ADDR _N One or more consecutive addresses within, and a full address range ADDR ₁ To ADDR _N Any one of the addresses belonging only to the local address range LS ₁ To LS _K Is defined in the memory, is a local address range of the memory.

In the example of FIG. 3, the data ROI currently to be adjusted _cur Is the ROI ₁ 、ROI ₂ 、……、ROI _N ROI in (a) _i (i is an integer greater than or equal to 1 and less than or equal to N), and may have been buffered in a cache line or buffer L1 (also referred to herein as a "first buffer") of the buffer memory BUF before performing step S120.

For example, if ROI _cur (e.g., ROI in the example of FIG. 3) _i ) Is the ROI ₁ 、ROI ₂ 、……、ROI _N The ROI data of the first adjusted ROI in (a) may be stored from the storage region SR before performing step S120 _cur Original address ADDR of (a) _org (e.g., ADDR in the example of FIG. 3) _i ) Read-out ROI _cur And caches it in cache line L1 of the buffer memory BUF.

In determining the ROI _cur Thereafter, for example, in the case of ROI _cur After buffering in the buffer memory BUF, ADDR can be set _org The data at is marked as not unprocessed ROI data anymore.

In one embodiment, the ROI may be defined as _cur After the cache line L1 from the storage region SR to the buffer memory BUF, the address ADDR in the storage region SR is cached _org The ADDR can also be modified by flushing or setting the data at a special data such as NULL or "-1 _i ROI data at such that at ADDR _org Some data item (e.g., score) in the modified ROI data at that point has a special value such as "-1" or null data.

Then, in step S120, the ROI cached in the buffer memory BUF may be determined _cur Determining a local address range LS ₁ To LS _M In a local address range LS corresponding to the score _k . For example, it can be based on the ROI in the buffer memory BUF _cur To determine the score of the counter CNT described above ₁ To CNTs _M In the counter CNT corresponding to the score _k Thereby determining the local address range LS _k 。

Obviously, the local address range LS at this point _k The data at least one address in (a) is not adjusted ROI data (may be unadjusted ROI data, or may be null data or other special data). Thus, a local address range LS may be determined _k One address ADDR of (a) _des (e.g., ADDR in the example of FIG. 3) _j ) At address ADDR _des Data ROI of (2) _nxt (e.g., ROI in the example of FIG. 3) _j ) Not the adjusted ROI data.

In one embodiment, the local address ranges LS may be detected individually _k A data state at each address in the memory, thereby determining the address ADDR _des 。

In another embodiment, the local address range LS may be based on _k Corresponding counter CNT _k The count value or data in (a) determines the address ADDR _des . For example, it may be based on a counter CNT _k In a local address range LS _k Reference address (e.g. LS _k The head address of (a) and from which the local address range LS can be determined _k Address ADDR of (a) _des . Then, at an appropriate time, for example, when determining address ADDR _des After that, at the read address ADDR _des ROI at _nxt Thereafter, at the time of ROI _cur Write to address ADDR _des After that, the CNT is updated _k Such that the updated data corresponds to the local address range LS _k An address within the memory that has not been accessed.

For example, the counter CNT can be determined _k Each count value and local address range LS _k Corresponding relation between each address in (a) and using a counter CNT _k The current count value is from the local address range LS _k To determine the address ADDR _des . For example, the counter CNT may be controlled _k Count down from an initial value (i.e., the number of ROI data having a score corresponding to k) at an appropriate timing, and count up a counter CNT _k Each count value of the local address range LS _k Starting from the head address to the local address range LS _k Offset in the tail address direction of (a). Counter CNT with initial value other than 0 _k When the count value of (c) becomes 0, it means that all the storage positions of ROI data having a score corresponding to k have been adjusted. Then, it can be based on the counter CNT _k Current count value CNT of _k V and local Address Range LS _k Determines the address ADDR _des . Thereby, a local address range LS can be ensured _k Address ADDR of (a) _des The ROI data at is not adjusted ROI data and is at address ADDR _des Then up to the local address range LS _k The ROI data at each address in the tail addresses of (a) is the adjusted ROI data.

In one embodiment, local address ranges LS, respectively, may be provided ₁ To LS _M Or counter CNT ₁ To CNTs _M Corresponding address register AR ₁ To AR _M (not shown in fig. 3), wherein the address register AR _k For preserving local address ranges LS _k For example, a first address. That is to say, with the local address range LS _k Corresponding registerThe/or counter may for example comprise an address register AR _k . As previously described, such hardware overhead is fully acceptable even for artificial intelligence chips due to the small value of M.

In further embodiments, the local address range LS may be stored in, for example, a AND memory or a buffer memory ₁ To LS _M The first address or the last address of each local address range of the register is not necessarily set separately.

It should be understood that the present disclosure is not limited to the examples described above. For example, the counter CNT described herein as an example, since a relative address representation (e.g., count value or address offset) and an absolute address representation can be easily inter-converted ₁ To CNTs _M The manner of use of (c) may be modified to be based on any other suitable manner of processing of the relative address representation (e.g., count value or address offset) and may be modified to be based on absolute address representation.

At the determined address ADDR _des Thereafter, the original stored in ADDR can be in step S130 _des ROI at _nxt As ROI _nxt Buffered in the buffer memory BUF and the ROI in the buffer memory BUF is buffered in step S140 _cur To cover ADDR _des The way of the original data at the address is stored to ADDR _des Where it is located.

In one embodiment, the ROI in the buffer memory BUF may be set in step S130 _cur From cache line L1 of the buffer memory BUF to another cache line or buffer L2 (also referred to herein as a "second buffer") of the buffer memory BUF, and then to be stored in ADDR _des ROI at _nxt Cached in cache line L1 of the buffer memory BUF. Then, in step S140, the ROI in the cache line L2 may be set _cur To cover ADDR _des Raw data at (i.e., ROI _nxt ) Is stored to ADDR _des Where it is located.

In another embodiment, the ROI in cache line L1 of the buffer memory BUF _cur Can be in the previous step S120To the buffer L2 of the buffer memory BUF. Then, in step S130, ADDR _des ROI at _nxt May be buffered directly into the cache line L1 of the buffer memory BUF. Then, in step S140, the ROI in the cache line L2 may be set _cur To cover ADDR _des The way of the original data at the address is stored to ADDR _des Where it is located. In this embodiment, steps S130 and S140 can be controlled to be implemented in one read/write cycle.

For example, as shown in FIG. 4, step S130 may be performed on the rising edge of a clock to control the ADDR _des ROI at _nxt Buffered in line L1 of buffer memory BUF, and step S140 is performed on the falling edge of the clock to control the ROI in line L2 to be buffered _cur To cover ADDR _des The way of the original data at the address is stored to ADDR _des Where it is located.

Thereby, the ROI to be currently adjusted _cur (e.g., ROI in the example of FIG. 3) _i ) ADDR that can be stored as adjusted ROI data to SR _des At and with LS _k The count value of the corresponding counter CNTk can be updated at an appropriate time in the process, for example, reduced by 1, and become the ADDR if the count value is still not 0 _des Address ADDR before (a) _des-1 Corresponding values. If CNT _k If the count value of (c) is 0, meaning that all the storage locations of the ROI data having a score corresponding to k have been adjusted, and in one embodiment the counter CNT may be disabled _k 。

As shown in fig. 3, in one case, the ROI to be processed in each cycle _cur Or the ROI determined in each cycle to be processed in the next cycle _nxt Can be directly the ROI ₁ To ROI _N Is included in the image data, is an unadjusted ROI data.

Thus, the ROI can be set _nxt (i.e., ROI) _j ) ROI as the next cycle _cur . For example, the ROI may be set at step S140 _cur ＝ROI _i ADDR stored in storage region SR _des ＝ADDR _j After this and enter the next step of the cycleBefore S120, the variable ROI _cur From ROI _i Updating to ROI _j . Then, for the new ROI _cur ＝ROI _j Steps S120 to S140 are performed again.

For example, the number of loops may be counted, and the processing may be ended when the number of loops reaches N.

In other cases (e.g., current ROI _cur Is the ROI ₁ To ROI _N The last ROI data to be adjusted, ADDR determined in step S120 _des And reading the ROI _cur ADDR of (a) _org Similarly, ADDR determined in step S120 _des Has been processed in the previous loop, etc.), the ROI buffered in the buffer memory BUF in step S130 _nxt It is possible not to unadjusted ROI data, but for example ROI data that has been processed in a previous cycle or ROI data to which a flag is applied (for example, the data itself is set to special data such as null data, "-1", or a data item in the data such as a score has a special value such as "-1"), or ROI being processed in a current cycle _cur (at this time, ADDR _des And ADDR (ADDR) _org The same), etc.

To this end, it is possible to follow step S140 and before step S120 of the next cycle is performed, for example the ROI currently cached in BUF _nxt In the case that the number of the adjusted ROI data in the storage region SR is smaller than N and not the non-adjusted ROI data, another non-accessed address ADDR in the storage region SR is determined _new (also referred to herein as a "second address") and uses address ADDR _new ROI data update in BUF at ROI _nxt And as the next ROI to be adjusted _cur 。

To determine a new address ADDR _new In one embodiment, LS may be determined ₁ To LS _K LS in (3) _k， (also referred to herein as a "second local address range," wherein K' is an integer greater than or equal to 1 and less than or equal to K, which may be the same or different from K), wherein LS _k， May include at least one address that has not been accessed, e.g., with LS _k， Corresponding counter CNT _k’ Is not currently 0. Then, it can be based on the counter CNT _k’ Determines a new address ADDR from the count value of (a) _new 。

In one embodiment, as shown in FIG. 5, step S150 may be entered after step S140 to detect the ROI currently cached in the BUF _nxt Whether an unadjusted ROI. In one embodiment, the ROI currently cached in cache line L1 of the BUF may be detected _nxt Whether it is special data such as null data or "-1". In another embodiment, the ROI currently cached in cache line L1 of the BUF may be detected _nxt Whether certain data items or fields (e.g., scores) have a particular value such as "-1".

Return "yes" at step S150 (e.g., ROI _nxt Not special data or not including special data items) can be ROI _nxt As a new ROI _cur And for a new ROI _cur Step S120 is performed.

In the case where step S150 returns "NO", it may proceed to step S160 to re-determine the ROI _nxt . For example, it can be based on CNTs ₁ To CNTs _M A counter CNT of which the count value of any one is still not 0 _k’ Thereby determining LS _k’ Then according to the counter CNT _k’ Determination of the current count value belonging to LS _k， Address ADDR of (a) _new . Address ADDR may then be used _new ROI data at as new ROI _nxt Replacement of ROI buffered in BUF in step S130 _nxt And proceeds to step S120 (i.e., "successfully determining a new ROI" in the example of fig. 5 _nxt "case").

If CNT is found in step S160 ₁ To CNTs _M The count value of each of (2) is 0, it may mean that the number of adjusted ROI data in the storage region SR has reached N, i.e., all ROIs have been adjusted. In addition, for example, the number of cycles may be counted, and the number of cycles may be counted The process is ended when the number reaches N.

In the above example, in step S150, by detecting the ROI currently cached in cache line L1 of the BUF _nxt Whether it is special data such as null data or "-1" or ROI currently cached in cache line L1 of BUF _nxt Whether or not certain data items or fields (e.g., scores) of (i) have a particular value such as "-1", determine the current ROI _nxt Whether an unadjusted ROI. In a further embodiment, for example, an address register may be provided, in which the initial data may be the original memory address of the first processed ROI data in the memory region SR. Then, it may be detected in step S150 that the determination in step S120 is ADDR _des Whether the address is the same as the address stored in the address register. If the same, step S150 may return to "no", otherwise "yes" is returned. Then, in step S160, if a new ROI is successfully determined _nxt Address ADDR may be used _new The data in the address register is updated and the method proceeds to S120.

As described above, the counter CNT ₁ To CNTs _M Can be much smaller than the ROI ₁ To ROI _N Is a number of (3). Also, in some embodiments, a list of counters may be established, a counter may be controlled to be deleted or disabled from the list when the count value of a certain counter becomes 0, and then an attempt may be made to determine a new ROI based on the counter located at the head or tail of the list in step S160 _nxt . Therefore, the execution of steps S150 and S160 can be very rapid and even involve no further loop processing.

Methods according to embodiments of the present disclosure can be used to traverse the ROI ₁ To ROI _N In-situ adjustment of the storage location of each ROI data is accomplished at the same time as each ROI data.

Thus, by example methods according to embodiments of the present disclosure, in-situ adjustment of ROI data can be efficiently achieved with minimal hardware cost, with minimal spatial and temporal complexity.

For example, in the case where the example method according to the embodiments of the present disclosure is implemented in an artificial intelligence chip to adjust the storage location of ROI data initially stored in on-chip memory, the artificial intelligence chip may not have to expand the on-chip memory or add another buffer memory, for example, whose capacity may need to be greater than or equal to the storage area in the on-chip memory for storing all of the ROI data, and the artificial intelligence chip can efficiently perform the adjustment of the storage location of the ROI data through simple control logic.

It should be appreciated that the example methods according to embodiments of the present disclosure may be applied to any scenario where it is desirable to sort in-situ a plurality of ROI data that has been generated and stored in succession, and are not limited to application in an artificial intelligence chip.

Exemplary apparatus

Fig. 6 illustrates an example DEV of an apparatus for adjusting storage locations of a plurality of region of interest data, which may include a processor or controller CON and a buffer memory BUF, according to an embodiment of the disclosure.

The processor or controller CON may be, for example, a processor circuit or control logic circuit developed based on a field programmable gate array, ARM processor, or the like, and may be configured to perform steps in a method (e.g., the example method shown in fig. 3 or 5) according to embodiments of the present disclosure according to predetermined instructions upon startup (e.g., upon power-up).

The buffer memory BUF may be any suitable type of storage means such as a cache memory, registers, static memory, etc. The buffer memory BUF may store or buffer intermediate data involved in the processing or logic control of the processor CON, including for example the ROI as described above, according to instructions of the processor CON or under control of the processor CON _cur And/or ROI _nxt 。

In one embodiment, the buffer memory BUF may include two buffers or cache lines (e.g., L1 and L2 in the example of fig. 4), and the capacity of each buffer may depend on the size of a single ROI data.

In addition, as shown in FIG. 6, an example apparatusThe DEV may also include a plurality of counters CNT ₁ To CNTs _M To count the number of ROI data having scores corresponding to 1 to M, respectively, to assist the processor CON in determining and storing the ROIs in the memory ₁ To ROI _N A plurality of local address ranges corresponding to the address ranges of the storage region SR.

In one embodiment, counter CNT ₁ To CNTs _M The number M of (c) may depend on the accuracy of the score of each ROI data. For example, in the case of an 8-bit precision, M may be equal to 256, and for example a counter CNT ₁ Counter CNT for counting the number of ROI data with first type ROI score ₂ For counting the number of ROI data with the second ROI score, and so on.

It should be understood that an apparatus according to an embodiment of the present disclosure is not limited to the example in fig. 6. For example, in further embodiments, an apparatus according to embodiments of the present disclosure may also include other components/modules/circuits/units/elements such as timing controllers for instruction timing control, interrupt controllers for instruction interrupt control, crossbar switches and/or multiplexers for implementing interconnections between components, coprocessors, and the like.

In addition, in fig. 6, devices/components/elements/modules/circuits at both ends of the connection may be directly connected or coupled together, for example, through a data bus and/or an instruction bus, or may be indirectly connected or coupled together via other intermediary devices/components/elements/modules/circuits, such as a Crossbar (Crossbar), where an arrow of the connection may indicate a flow direction of the instruction and/or data of interest, but does not mean that the instruction and/or data may flow only in a direction indicated by the arrow.

In one embodiment, an apparatus according to embodiments of the present disclosure may be implemented, for example, as or as part of an on-chip apparatus of an artificial intelligence chip. In further embodiments, an apparatus according to embodiments of the present disclosure may be applied in any scenario or apparatus where it is desirable to sort in-situ a plurality of ROI data that has been generated and stored in succession.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present disclosure described in the "exemplary methods" section of the present description.

The computer program product may include program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, such as a computer-readable non-transitory storage medium, having stored thereon program instructions that, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present disclosure described in the "exemplary methods" section above in the present specification.

A computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

In this document, modifiers such as "first," "second," etc. without a literal term are intended to distinguish between different elements/components/circuits/modules/means/steps, and do not emphasize order, positional relationship, importance, priority levels, etc. In contrast, modifiers with adjectives such as "first", "second", etc., may be used to emphasize different element/component/circuit/module/means/step sequence, positional relationship, importance, priority level, etc.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. A method for adjusting storage locations of a plurality of region of interest data, comprising:

determining a plurality of local address ranges corresponding to the address ranges of the storage areas storing the plurality of region of interest data according to the scores of the plurality of region of interest data;

determining a first address within a first local address range of the plurality of local address ranges for first data that is region of interest data to be currently adjusted;

storing second data at the first address from the storage area into a buffer memory; and

the first data is stored to a first address in the storage area.

2. The method of claim 1, wherein a size of each of the plurality of local address ranges is dependent on a number of region of interest data of the plurality of region of interest data having a score corresponding to the local address range.

3. The method of claim 1, wherein the first data has a score corresponding to the first local address range and the second data of the first address is not adjusted region of interest data.

4. The method of claim 3, wherein determining a first address within a first local address range of the plurality of local address ranges comprises:

the first address is determined from a count value of a counter corresponding to the first local address range.

5. The method of claim 4, further comprising:

and updating the count value of a counter corresponding to the first local address range, so that the updated count value corresponds to an address which is not accessed in the first local address range.

6. The method of claim 1, wherein storing the second data at the first address from the storage area into a buffer memory comprises:

storing first data cached in a first buffer area of the buffer memory into a second buffer area of the buffer memory; and

and caching the second data from the storage area into a first buffer area of the buffer memory.

7. The method of claim 6, wherein storing the first data in the buffer memory at the first address in the storage area comprises:

first data in the second buffer is stored to a first address in the storage area.

8. The method of any of claims 1 to 7, further comprising:

and determining the second data in the buffer memory as the next region of interest data to be adjusted in the case that the second data is the region of interest data which is not adjusted.

9. The method of any of claims 1 to 7, further comprising:

in the case where the second data is not the unadjusted region of interest data and the number of the adjusted region of interest data in the memory area is smaller than the number of the plurality of region of interest data, the data at the second address in the memory area that is not accessed is determined as the next region of interest data to be adjusted.

10. The method of claim 9, wherein determining data at the second address in the memory area that has not been accessed as the next region of interest data to be adjusted comprises:

Determining a second local address range of the plurality of local address ranges, wherein the second local address range comprises at least one address which is not accessed; and

the second address is determined from a count value in a counter corresponding to the second local address range.

11. A computer readable non-transitory storage medium having stored thereon program instructions that, when executed, perform at least the method of any of claims 1 to 10.

12. An apparatus for adjusting storage locations of a plurality of region of interest data, comprising:

a buffer memory configured to store region of interest data to be currently adjusted; and

a processor configured to perform at least the method of any one of claims 1 to 10 at start-up.

13. The apparatus of claim 12, wherein the buffer memory comprises two buffers, each buffer having a capacity that depends on a size of a single region of interest data.

14. The apparatus of claim 12 or 13, further comprising:

a plurality of counters, each counter corresponding to a respective local address range, the number of the plurality of counters being dependent on the accuracy of the score for each region of interest data.