CN114596330B

CN114596330B - Commodity positioning method, device and storage medium based on dynamic vision

Info

Publication number: CN114596330B
Application number: CN202210096069.3A
Authority: CN
Inventors: 周艳华; 张盛; 邵晓盛
Original assignee: Guangzhou Gaimengda Industrial Products Co ltd
Current assignee: Guangzhou Gaimengda Industrial Products Co ltd
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2025-08-19
Anticipated expiration: 2042-01-26
Also published as: CN114596330A

Abstract

The present invention discloses a product positioning method, device, and storage medium based on dynamic vision, comprising: obtaining three consecutive frames of images from a video, using a three-frame difference algorithm to obtain an intersecting image of the three consecutive frames; performing dilated convolution processing on the intersecting image to obtain a preprocessed image; shielding areas in the preprocessed image that meet preset conditions, and performing contour extraction to obtain a connected domain; if the width or height of the connected domain is greater than a preset value, intercepting a square area with a maximum side length around the center of the connected domain as a moving product area; if the width or height of the connected domain is not greater than a preset value, intercepting a square area with a side length of the preset value around the center of the connected domain as a moving product area. The embodiment of the present invention uses a three-frame difference algorithm to obtain the intersecting images of the images, performs preprocessing to obtain the preprocessed image, and shields areas in the preprocessed image that meet the preset conditions, which can effectively improve the accuracy of product positioning.

Description

Commodity positioning method, device and storage medium based on dynamic vision

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a commodity positioning method and apparatus based on dynamic vision, and a storage medium.

Background

The moving object detection is to divide a dynamic object of interest from an image background and is mainly applied to image analysis and object tracking. The detection result of the moving object can be affected by the movement of the object, the change of the background, the illumination change of the object or the background, the mutual shielding of the object and other objects, and the like. The existing commodity positioning method adopts algorithms including an optical flow method, a background difference method and a frame difference method, but the existing commodity positioning method is easy to be interfered by the environment, so that the commodity is difficult to be positioned accurately.

Disclosure of Invention

The invention provides a commodity positioning method, a commodity positioning device and a storage medium based on dynamic vision, which are used for solving the technical problem that the existing commodity positioning method is difficult to accurately position commodities due to weak identification capacity.

One embodiment of the invention provides a commodity positioning method based on dynamic vision, which comprises the following steps:

acquiring continuous three-frame images in a video, and acquiring intersecting images of the continuous three-frame images by adopting a three-frame difference algorithm;

Performing expansion convolution processing on the intersected images to obtain preprocessed images;

Shielding an area meeting preset conditions in the preprocessed image, and extracting a contour to obtain a connected domain;

and intercepting the square area with the side length around the center of the connected domain as the mobile commodity area if the width or the height of the connected domain is not larger than the preset value.

Further, after the mobile commodity area is intercepted, the mobile commodity area is input into a preset target detection model, and commodity category and position information are obtained according to the mobile commodity area.

Further, the continuous three-frame image includes a t-th frame image, a t+1st frame image and a t+2nd frame image, and the intersecting image of the continuous three-frame image is obtained by adopting a three-frame differential algorithm, specifically:

Performing differential operation on the t+1st frame image and the t+1st frame image to obtain a first differential image, and performing differential operation on the t+1st frame image and the t+2nd frame image to obtain a second differential image;

respectively carrying out binarization processing on the first differential image and the second differential image to obtain a first binarized image and a second binarized image;

and performing logical AND operation on the first binarized image and the second binarized image to obtain an intersecting image of the first binarized image and the second binarized image.

Further, performing expansion convolution processing on the intersected images to obtain preprocessed images, specifically:

And selecting a preprocessing image with width of 5 and height of 1, and performing expansion convolution processing on the intersecting image.

Further, shielding the area meeting the preset condition in the preprocessed image, and extracting the contour to obtain a connected domain, specifically:

And shielding the area around the frame of the preprocessed image, and extracting the outline to obtain a connected domain with an area larger than the maximum value of the area of the background connected domain and smaller than the minimum value of the area of the commodity.

Detecting the brightness of each area in the preprocessed image, shielding the area of the preprocessed image, the brightness of which is not in the preset threshold range, and extracting the outline to obtain a connected area with the area larger than the maximum value of the area of the background connected area and smaller than the minimum value of the area of the commodity.

Further, the preset value is 128.

One embodiment of the present invention provides a dynamic vision-based commodity positioning apparatus, comprising:

the intersecting image acquisition module is used for acquiring continuous three-frame images in the video, and acquiring intersecting images of the continuous three-frame images by adopting a three-frame difference algorithm;

the preprocessing module is used for performing expansion convolution processing on the intersected images to obtain preprocessed images;

The connected domain extraction module is used for shielding the region meeting the preset condition in the preprocessed image and extracting the outline to obtain the connected domain;

And the commodity positioning module is used for intercepting a square area with the side length of the periphery of the center of the connected domain being the maximum value as a mobile commodity area if the width or the height of the connected domain is larger than a preset value, and intercepting the square area with the side length of the periphery of the center of the connected domain being the preset value as the mobile commodity area if the width or the height of the connected domain is not larger than the preset value.

An embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to perform a dynamic vision-based commodity positioning method as described above.

According to the embodiment of the invention, the intersecting images of the images are acquired by adopting a three-frame differential algorithm, the preprocessed images are obtained by preprocessing, the areas meeting the preset conditions in the preprocessed images are shielded, the interference of other areas on commodity areas can be effectively reduced, and the communicating areas with the area larger than the maximum value of the area of the background communicating area and smaller than the minimum value of the commodity area are obtained by contour extraction, so that the influence of the background area or other areas on the commodity areas can be further reduced, and the commodity positioning accuracy can be effectively improved.

Furthermore, the embodiment of the invention can also judge and compare the brightness in the preprocessed image with the preset threshold range, and shield the area which can influence commodity positioning in the preprocessed image according to the brightness, thereby further improving the commodity positioning accuracy.

Drawings

FIG. 1 is a schematic flow chart of a dynamic vision-based commodity positioning method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a commodity positioning apparatus based on dynamic vision according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In the description of the present application, unless explicitly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected via an intervening medium, or in communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.

Referring to fig. 1, an embodiment of the present invention provides a commodity positioning method based on dynamic vision, including:

s1, acquiring continuous three-frame images in a video, and acquiring intersecting images of the continuous three-frame images by adopting a three-frame difference algorithm;

In the embodiment of the invention, the three-frame difference algorithm is an improved algorithm based on the two adjacent frames of difference algorithm, and the embodiment of the invention can effectively eliminate the influence of the image background on commodity identification due to motion in the commodity moving process by selecting the continuous three-frame images to carry out difference operation, so as to obtain the intersecting images of the continuous three-frame images, thereby being capable of accurately extracting the motion profile information of the commodity and being beneficial to improving the commodity positioning precision.

S2, performing expansion convolution processing on the intersected images to obtain preprocessed images;

It can be understood that the moving direction of the commodity is usually transverse, and the embodiment of the invention can select the check intersection image with the width of 5 and the height of 1 to perform expansion convolution processing, and ensure the integrity of the commodity moving area by transversely connecting the connected areas. In a specific implementation manner, the embodiment of the invention does not carry out longitudinal connection, so that the area of a moving area can be reduced as much as possible, and the resolution of the truncated small image is reduced.

S3, shielding a region meeting preset conditions in the preprocessed image, and extracting a contour to obtain a connected region;

In the implementation of the present invention, the area of the preset condition may be a border area of the pre-processed image, for example, the pre-processed image is a square with a side length of 100px, and after the border area of the pre-processed image is shielded, a square with a side length of 90px is obtained. The region of the preset condition may also be a region where the brightness is not within the preset threshold range. According to the embodiment of the invention, the interference of the irrelevant area in the preprocessed image on commodity positioning, such as the influence of the frame irrelevant area and the influence of too high or too low brightness on commodity positioning, can be effectively reduced by shielding the area meeting the preset condition in the preprocessed image, and the communication area is obtained by carrying out contour extraction, so that the motion contour information of the commodity can be accurately extracted, and the accuracy of commodity positioning can be effectively improved.

S4, intercepting a square area with the side length of the periphery of the center of the connected area being the maximum value as a mobile commodity area if the width or the height of the connected area is larger than a preset value, and intercepting a square area with the side length of the periphery of the center of the connected area being the preset value as the mobile commodity area if the width or the height of the connected area is not larger than the preset value.

In the embodiment of the present invention, after the connected domain is extracted, it is necessary to further determine whether the value of the width or height of the connected domain is greater than a preset value. In the embodiment of the invention, experiments show that the width or height of the moving area is mainly between 100 and 200, the length and width of the picture are integral multiples of 32, which is favorable for the reasoning of deep learning, and the recognition effect is poor when the picture is scaled to the original picture 1/2, and the effect is almost unchanged when the picture is scaled to the original picture 3/4, so that in a specific embodiment, the preset value is 128. If the width or height of the communication domain is not greater than 128, a square area with the maximum side length is selected around based on the center of the communication domain to serve as a movable commodity area, namely the positioning of the commodity in the dynamic movement of the intelligent cabinet is realized, and if the width or height of the communication domain is not greater than 128, an area with the side length of 128 is directly selected around based on the center of the communication domain to serve as the movable commodity area. In a specific embodiment, if the side length of the square area with the maximum side length is greater than 128, the square area may be scaled to 128 according to a ratio, so that when the type and position information of the commodity are obtained according to the target detection model, the pressure of the system operation can be effectively reduced while the detection effect is ensured.

In one embodiment, after the mobile commodity area is intercepted, the mobile commodity area is input into a preset target detection model, and the type and position information of the commodity are obtained according to the mobile commodity area.

According to the embodiment of the invention, the target detection model can be obtained through pre-training, and can be used for target detection and classification according to the moving commodity area, so that the category information and the position information of the commodity can be further obtained.

In one embodiment, the continuous three-frame image includes a t-th frame image, a t+1st frame image, and a t+2nd frame image, and the intersecting image of the continuous three-frame image is obtained by adopting a three-frame differential algorithm, specifically:

In one embodiment, the expansion convolution processing is performed on the intersected image to obtain a preprocessed image, specifically:

A preprocessing image is selected, wherein the preprocessing image is obtained by performing expansion convolution processing on a check intersection image with the width of 5 and the height of 1.

In a specific embodiment, a core of width 5 to height 1 is:

(0,0,0,0,0)

(1,1,1,1,1)

(0,0,0,0,0)

(0,0,0,0,0)。

In the embodiment of the invention, the check intersection images with the width of 5 and the height of 1 are selected for expansion convolution treatment, and the integrity of the commodity moving area is ensured by transversely connecting the connected areas.

In one embodiment, the method includes shielding a region meeting a preset condition in the preprocessed image, and extracting a contour to obtain a connected region, specifically:

It is understood that the maximum value of the area of the background connected domain and the minimum value of the commercial nep can be determined by an image recognition method.

In the embodiment of the invention, the method is suitable for positioning detection in the commodity moving process in the intelligent cabinet, the moving image of the commodity is acquired through the camera arranged near the intelligent cabinet, the moving area of the commodity is not located on the frame of the moving image, and the method is beneficial to reducing the interference of the irrelevant area on commodity identification by shielding the area around the frame of the preprocessing image. Furthermore, the embodiment of the invention extracts the connected domain with the area larger than the maximum value of the area of the background connected domain and smaller than the minimum value of the area of the commodity, thereby effectively distinguishing the background area from the commodity area, further reducing the interference of the background area on the commodity area identification and being beneficial to improving the commodity positioning accuracy.

Detecting brightness of each area in the preprocessed image, shielding the area of the preprocessed image, the brightness of which is not in a preset threshold range, and extracting a contour to obtain a connected area with an area larger than the maximum value of the area of the background connected area and smaller than the minimum value of the area of the commodity.

In the embodiment of the invention, the areas corresponding to the too high and too low brightness are not commodity areas in the intelligent cabinet, for example, the areas corresponding to the high brightness may be lamplight in the intelligent cabinet, and the lamplight can interfere with the identification of the commodity target motion when the intelligent cabinet is opened.

The embodiment of the invention has the following beneficial effects:

Based on the same inventive concept as the above embodiments, referring to fig. 2, an embodiment of the present invention provides a dynamic vision-based commodity positioning apparatus, including:

The intersecting image acquisition module 21 is configured to acquire continuous three-frame images in a video, and acquire intersecting images of the continuous three-frame images by adopting a three-frame difference algorithm;

A preprocessing module 22, configured to perform dilation convolution processing on the intersecting images to obtain preprocessed images;

The connected domain extraction module 23 is used for shielding the region meeting the preset condition in the preprocessed image and extracting the outline to obtain the connected domain;

And the commodity positioning module 24 is used for intercepting a square area with the side length of the periphery of the center of the communicating area being the maximum value as a mobile commodity area if the width or the height of the communicating area is larger than a preset value, and intercepting a square area with the side length of the periphery of the center of the communicating area being the preset value as a mobile commodity area if the width or the height of the communicating area is not larger than the preset value.

In one embodiment, the device further comprises a target detection module, wherein the target detection module is used for inputting the moving commodity area into a preset target detection model, and obtaining the category and position information of the commodity according to the moving commodity area.

In one embodiment, the intersection image acquisition module 21 is configured to:

In one embodiment, the preprocessing module 22 is configured to:

In one embodiment, the connected domain extraction module 23 is configured to:

In one embodiment, the preset value is 128.

One embodiment of the present invention provides a computer-readable storage medium comprising a stored computer program, wherein the computer-readable storage medium is controlled to perform dynamic vision-based merchandise location as described above by a device in which the computer program is located when the computer program is run.

The foregoing is a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention and are intended to be comprehended within the scope of the present invention.

Claims

1. A method for product positioning based on dynamic vision, comprising:

Acquire three consecutive frames of images in a video, and use a three-frame difference algorithm to acquire an intersecting image of the three consecutive frames of images;

Performing dilation convolution processing on the intersecting image to obtain a preprocessed image;

Shielding areas in the preprocessed image that meet preset conditions and performing contour extraction to obtain connected domains; shielding areas in the preprocessed image that meet preset conditions and performing contour extraction to obtain connected domains, specifically: detecting the brightness of each area in the preprocessed image, shielding areas in the preprocessed image whose brightness is not within a preset threshold range, and performing contour extraction to obtain connected domains whose area is greater than the maximum area of the background connected domain and smaller than the minimum area of the product;

If the width or height of the connected domain is greater than a preset value, a square area with a maximum side length around the center of the connected domain is cut off as the mobile commodity area; if the width and height of the connected domain are not greater than the preset value, a square area with a side length of the preset value around the center of the connected domain is cut off as the mobile commodity area.

2. The dynamic vision-based product positioning method according to claim 1 is characterized in that after the mobile product area is intercepted, the mobile product area is input into a pre-set target detection model, and the category and location information of the product are obtained based on the mobile product area.

3. The method for product location based on dynamic vision according to claim 1, wherein the three consecutive image frames include the t-th frame image, the t+1-th frame image, and the t+2-th frame image, and a three-frame difference algorithm is used to obtain an intersecting image of the three consecutive image frames, specifically:

Performing a differential operation on the t-th frame image and the t+1-th frame image to obtain a first differential image, and performing a differential operation on the t+1-th frame image and the t+2-th frame image to obtain a second differential image;

performing binarization processing on the first differential image and the second differential image respectively to obtain a first binarized image and a second binarized image;

A logical AND operation is performed on the first binarized image and the second binarized image to obtain an intersection image of the first binarized image and the second binarized image.

4. The method for commodity positioning based on dynamic vision according to claim 1, wherein the intersecting image is subjected to dilation convolution processing to obtain a preprocessed image, specifically:

A pre-processed image is obtained by selecting a kernel with a width of 5 and a height of 1 to perform dilated convolution processing on the intersecting image.

5. The product positioning method based on dynamic vision according to claim 1, characterized in that the preset value is 128.

6. A commodity positioning device based on dynamic vision, characterized by comprising:

An intersection image acquisition module is used to acquire three consecutive frames of images in a video and to acquire an intersection image of the three consecutive frames of images using a three-frame difference algorithm;

A preprocessing module, configured to perform dilation convolution processing on the intersecting image to obtain a preprocessed image;

A connected domain extraction module is configured to screen areas in the preprocessed image that meet preset conditions and perform contour extraction to obtain connected domains. The module is specifically configured to: detect the brightness of each area in the preprocessed image, screen areas in the preprocessed image whose brightness is not within a preset threshold, and perform contour extraction to obtain connected domains whose area is greater than the maximum area of the background connected domain and less than the minimum area of the product;

The product positioning module is configured to, if the width or height of the connected domain is greater than a preset value, intercept a square area with a maximum side length around the center of the connected domain as a movable product area; if the width and height of the connected domain are not greater than the preset value, intercept a square area with a side length of the preset value around the center of the connected domain as a movable product area.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium includes a stored computer program, wherein when the computer program is running, the device where the computer-readable storage medium is located is controlled to perform the dynamic vision-based product positioning as described in any one of claims 1 to 5.