Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the application may be practiced otherwise than as specifically illustrated and described herein, and that the "first" and "second" distinguishing between objects generally being of the same type, and not necessarily limited to the number of objects, such as the first object may be one or more.
Referring to fig. 1, fig. 1 is a flowchart of an image content detection method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:
step 101, performing region division on the acquired image frame to obtain S image subregions, wherein S is an integer greater than 1.
The acquired image frame may be an image frame acquired by a camera, for example, an image acquired by an internal or external camera of the electronic device. Of course, this is not limited thereto, and for example, an image may be received through a network, such as an image frame of a certain video received. In addition, this acquisition may be a reception, or may read an image frame of the local video.
The above-mentioned area division may be performed according to preset division positions to obtain S areas, where S may be a preset integer greater than 1, for example, 2, 4, or 6 may be specifically set according to an application scenario.
And 102, respectively carrying out suspected target detection on the S image subregions to obtain a suspected target detection result.
The detection of the suspected target may be to detect whether the target is suspected in the image region, and not specifically detect the target, for example, not to determine the type, posture, behavior, attribute, and the like of the target.
Further, the above-described suspected object detection may be defined as an object detection having lower accuracy, and/or calculation amount than the object detection performed in step 104.
The above-mentioned suspicious object detection result may indicate that a suspicious object exists in one or more image subregions, or may also indicate that no suspicious object exists in all image subregions for some image frames.
The image frame may be an ultra-high resolution image frame or a high resolution image frame, but the present invention is not limited thereto, and some low resolution image frames are also possible.
It should be noted that, step 102 may be performed continuously, for example, step 102 is performed for each image frame acquired by the camera or for consecutive image frames in a video.
Step 103, under the condition that the image position meeting the preset condition is determined to exist according to the suspected target detection result, the image partial area including the image position in the currently acquired image frame is intercepted.
The preset condition may be that the score of the image pixel exceeds a preset threshold, so that the position of the pixel is used as the image position, or the preset condition may be that a plurality of continuous image frames detect a suspected target in the same image subarea, so that the image position corresponding to the suspected target is used as the image position. In the embodiment of the present invention, the preset conditions are not limited, and may be specifically preconfigured according to an application scenario or a target detection requirement.
The capturing of the image partial area including the image position in the currently acquired image frame may be capturing the image frame with a preset width and height with the image position as a center.
The currently acquired image frame may be an image currently acquired by the camera, or may be a currently received or currently read image frame.
Step 104, performing object detection on the image partial area.
The above-mentioned target detection may be to determine the kind of the target, or may be to detect the relevant attribute of the target, or may be to detect the gesture of the target, or may be to detect the behavior of the target, etc., and in the embodiment of the present invention, the target detection is not limited. Or the target detection may confirm that the suspected target is detected in the step 102, that is, confirm whether the suspected target is a target to be detected.
In the embodiment of the invention, the suspected target detection can be implemented for each distinction through the steps, and then the target detection is implemented in the image partial area under the condition that the preset condition is met, so that the power consumption of the target detection can be reduced.
It should be noted that the embodiment of the present invention may be applied to electronic devices, such as embedded devices, mobile phones, tablet computers, wearable devices, vehicles, etc., which is not limited.
As an alternative embodiment, there is an overlapping area between adjacent image sub-areas of the S image sub-areas.
The overlapping area of the adjacent image subregions may be an overlapping area of adjacent regions of the adjacent image subregions.
For example, assuming that the first image from the video after decoding is I, the sub-region division of m lines and n columns is performed on I, taking S as an example 4. After segmentation, there are s=m×n image subregions, and each image subregion has an overlap with a width of t pixels. The parameters m, n and t are all preset values according to actual service requirements, wherein the overlapping condition of the segmented sub-regions is shown in fig. 2.
In the embodiment, because the adjacent image subareas have the overlapping area, when each subarea is detected, the continuity of the estimated image content of each image subarea can be ensured, so that the accuracy of target detection is improved.
As an optional implementation manner, when it is determined that there is an image position satisfying a preset condition according to the suspected target detection result, capturing an image part area including the image position in the currently acquired image frame includes:
calculating the score of pixels in the target subregion according to the position of the suspected target under the condition that the suspected target detection result indicates the target subregion with the suspected target;
And under the condition that the score of at least one pixel meets the preset condition, intercepting an image part area comprising the at least one pixel in the currently acquired image frame.
The suspected object may be an object having a correlation attribute of the object, or may be an object having a relatively high likelihood of being the object (e.g., higher than a preset threshold value), or the like. While the target may be predefined as an object to be detected, such as a particular person, a particular vehicle, a particular object, etc.
The score of the at least one pixel meeting the preset condition may be that the score of the at least one pixel exceeds a preset threshold.
The currently acquired image frame may be an image frame acquired when the cumulative score of at least one element in the matrix exceeds a preset threshold, such as an image frame currently acquired by the camera or an image frame currently read when the cumulative score of at least one element in the matrix exceeds the preset threshold, or may be an image frame acquired by the camera or an image frame read after the cumulative score of at least one element in the matrix exceeds the preset threshold.
In this embodiment, the partial image area is cut out according to the score of the pixel to perform target detection, so that the accuracy of target detection can be improved.
Optionally, when the suspected target detection result indicates that a target sub-area of a suspected target exists, calculating a score of a pixel in the target sub-area according to a position of the suspected target, including:
And under the condition that the suspected target detection result indicates that a target subarea of a suspected target exists, determining the central position of the suspected target, and calculating the score of each pixel in the target subarea according to the central position, wherein the score of each pixel is inversely related to the distance between the pixel and the central position.
In this embodiment, the score of each pixel may be determined using a two-dimensional gaussian distribution function. For example, when the region where the suspected target is located is a region of interest (region of interest, ROI), and the center pixel position (x c,yc) of the ROI region, that is, the center position, is expressed as a pixel (i, j), the score of the pixel (i, j) is obtained by the following formula:
Where G (i, j) represents the score of pixel (i, j).
It should be noted that the above formula is merely an example, and for example, 0.5 in the above formula may be configured as other constants, or constants may be added in the above formula, which is not limited thereto.
Further, the score of the center position may be preconfigured or obtained by the above formula, for example, the pixel (i, j) is equal to the pixel (x c,yc).
Representing a two-dimensional gaussian distribution function may determine the decrease in pixel score from a central location to the periphery of a sub-region of an image.
It should be noted that, in the embodiment of the present invention, the determination of the score of each pixel by using the two-dimensional gaussian distribution function is not limited, for example, the score may be determined directly according to the distance between the pixel and the center position, for example, the distance between each pixel and the center position in the image sub-region is divided into a plurality of intervals, and each interval corresponds to a score.
In this embodiment, the score of the pixel is inversely related to the distance between the pixel and the center position, which may indicate that the score of the pixel further from the suspected target is lower, so as to reduce the triggering of the preset condition, so as to further save power consumption.
In the embodiment of the present invention, the calculation of the score of each pixel in the target sub-area by the center position is not limited, and for example, the calculation of the score of each pixel in the target sub-area by the region of the suspected target may be directly performed.
In addition, the score of the pixel of the image subregion in which the suspected object is not detected may be zero, or in the embodiment of the matrix, the score accumulation operation is not performed for the image subregion in which the suspected object is not detected.
Optionally, the capturing the image partial area including the at least one pixel in the currently acquired image frame when the score of the at least one pixel meets the preset condition includes:
Adding the scores of the pixels into elements corresponding to a matrix, wherein the number of the elements included in the matrix is the same as the number of the pixels included in the acquired image frame, and the elements are in one-to-one correspondence with the pixels;
Under the condition that the accumulated score of at least one element in the matrix exceeds a preset threshold, intercepting an image part area comprising at least one pixel in the currently acquired image frame, wherein the at least one pixel corresponds to the at least one element one by one.
For example, the cumulative score of the reference element in the matrix exceeds a preset threshold, and an image partial area including pixels corresponding to the reference element one by one in the currently acquired image frame may be intercepted.
Wherein, the matrix is defined in advance, and the initial value of each pixel in the matrix can be zero.
It should be noted that, because the pixel frames collected by the camera or the obtained video are continuous, the scores of the elements in the matrix may be accumulated, and of course, in the embodiment of the present invention, the scores of the elements are not necessarily higher and higher, and may be attenuated to some extent.
In this embodiment, it may be implemented to intercept an image partial area including at least one pixel in a currently acquired image frame under a condition that an accumulated score of at least one element in the matrix exceeds a preset threshold, so as to reduce the number of times of target detection, so as to further save power consumption.
Optionally, the cumulative score of the target element at the first time is equal to the sum of:
the target element corresponding pixel scores at the first moment;
an attenuation value of a cumulative score of the target element at a time immediately preceding the first time;
wherein the target element is any element in the matrix.
Wherein each instant corresponds to an image frame.
In this embodiment, the current score of the pixel and the attenuation value of the score at the previous time may be added to obtain the score at the current time, so that triggering of the preset condition may be reduced, so as to further save power consumption.
Further, due to the attenuation value of the accumulated score of the target element at the time previous to the first time, it is possible to realize that some elements are attenuated to 0 without increasing the score, for example, a certain image sub-region continues for m frames without detecting a suspected target, so that all m frames have no score, and the score of the corresponding element is attenuated to 0.
In addition, the attenuation value of the cumulative score may be obtained by performing a score attenuation on the element of the matrix (which may be represented as M) using an exponential attenuation method. If no new score is increased, the score attenuation of each element in the matrix M after M frames is 0, that is, each continuous M frames is taken as a time statistical period, the matrix M in the time statistical period is attenuated by an exponential attenuation method, and the score of each element in the corresponding image frame (which can be expressed as an original image I) in the accumulation matrix M at the current moment is k x,y, and the element attenuation at the next moment (the next frame) is multiplied by a preset coefficient α (t):
kx,y(t+1)=α(t)×k(t)
α(t)=exp[-θ×t]
where m and ε are constants, the θ formula can attenuate the coefficients from 1 to near 0.
Of course, in the embodiment of the present invention, the element score is not limited to the exponential decay method, for example, the element score may be decayed by a certain proportion every other image frame, for example, by 20% every other image frame.
As an optional implementation manner, the performing suspected target detection on the S image subregions to obtain a suspected target detection result includes:
Respectively carrying out suspected target detection on the S image subregions by using a first network model to obtain a suspected target detection result;
The performing object detection on the image partial region includes:
performing object detection on the image partial region using a second network model;
wherein the first network model is less computationally intensive than the second network model.
The first network model and the second network model may be pre-trained, and in addition, the first network model may also be referred to as a shallow target recognition network model and the second network model may be referred to as a deep target recognition network model because the first network model is less computationally intensive than the second network model.
In this embodiment, the first network model is used to screen the suspected target with less calculation overhead, and detailed discrimination is not performed on the specific type of the target, so that only the speed is required, and high recognition accuracy is not required. Target detection is then performed using the second network model image partial region.
In addition, the second network model may be a network that meets a specific recognition accuracy according to an actual service requirement, such as a deep FASTERRCNN network, and the first network model may be a shallow SSD network, which is not limited in this regard, and may be other network models, for example.
In this embodiment, only the second network model detects the intercepted image area after the preset condition is satisfied, so that power consumption is saved, and calculation efficiency can be improved.
In the embodiment of the invention, the acquired image frame is subjected to regional division to obtain S image subregions, wherein S is an integer larger than 1, suspected target detection is respectively carried out on the S image subregions to obtain a suspected target detection result, under the condition that the image position meeting the preset condition is determined to exist according to the suspected target detection result, the image part region comprising the image position in the currently acquired image frame is intercepted, and target detection is carried out on the image part region. In this way, since the suspected target detection is performed for each distinction, and then the target detection is performed in the image partial region in the case where the preset condition is satisfied, the power consumption of the target detection can be reduced.
Referring to fig. 3, fig. 3 is a block diagram of an image content detecting apparatus according to an embodiment of the present invention, and as shown in fig. 3, an image content detecting apparatus 300 includes:
The dividing module 301 is configured to perform region division on the acquired image frame to obtain S image sub-regions, where S is an integer greater than 1;
The first detection module 302 is configured to perform suspected target detection on the S image subregions, to obtain a suspected target detection result;
a capturing module 303, configured to capture an image partial area including the image position in the currently acquired image frame when it is determined that the image position satisfying the preset condition exists according to the suspected target detection result;
A second detection module 304, configured to perform object detection on the image partial area.
Optionally, the intercepting module includes:
The calculating unit is used for calculating the score of the pixel in the target subarea according to the position of the suspected target when the suspected target detection result indicates that the target subarea of the suspected target exists;
And the intercepting unit is used for intercepting the image part area comprising the at least one pixel in the currently acquired image frame under the condition that the score of the at least one pixel meets the preset condition.
Optionally, the calculating unit is configured to determine a center position of the suspected target when the suspected target detection result indicates that the target sub-region of the suspected target exists, and calculate a score of each pixel in the target sub-region according to the center position, where the score of each pixel is inversely related to a distance between the pixel and the center position.
Optionally, the intercepting unit is configured to accumulate the scores of the pixels into the elements corresponding to the matrix, where the number of the elements included in the matrix is the same as the number of the pixels included in the acquired image frame and the elements are in one-to-one correspondence with the pixels, and intercept the image partial area including at least one pixel in the currently acquired image frame when the accumulated score of at least one element in the matrix exceeds a preset threshold, where the at least one pixel is respectively in one-to-one correspondence with the at least one element.
Optionally, the cumulative score of the target element at the first time is equal to the sum of:
the target element corresponding pixel scores at the first moment;
an attenuation value of a cumulative score of the target element at a time immediately preceding the first time;
wherein the target element is any element in the matrix.
Optionally, the first detection module is configured to use a first network model to perform suspected target detection on the S image subregions, so as to obtain a suspected target detection result;
the second detection module is used for performing target detection on the image partial region by using a second network model;
wherein the first network model is less computationally intensive than the second network model.
The image content detection device provided by the embodiment of the present invention can implement each process in the method embodiment of fig. 1, and in order to avoid repetition, a description thereof will not be repeated here.
It should be noted that, the image content detection apparatus in the embodiment of the present invention may be an apparatus, or may be a component, an integrated circuit, or a chip in an electronic device.
Referring to fig. 4, fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, an electronic device 400 includes a memory 401, a processor 402, and a program or an instruction stored in the memory 401 and capable of running on the processor 402, where the program or the instruction implements steps in the image content detection method described above when executed by the processor 402.
The embodiment of the invention also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above image content detection method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.