CN111723830B

CN111723830B - Image mapping method, device and equipment and storage medium

Info

Publication number: CN111723830B
Application number: CN201910212565.9A
Authority: CN
Inventors: 许剑华
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2023-08-29
Anticipated expiration: 2039-03-20
Also published as: CN111723830A

Abstract

The invention provides an image mapping method, an image mapping device, image mapping equipment and a storage medium, wherein the image mapping method comprises the following steps: respectively inputting the first image and the second image into a trained first neural network, so that the first neural network processes pixels in the input first image to obtain a first pixel characteristic set, and processes pixels in the input second image to obtain a second pixel characteristic set; the first image and the second image are images acquired by two different devices for the same scene; determining N pairs of pixels meeting preset conditions from a first image and a second image according to a first pixel feature set and a second pixel feature set, wherein two pixels in each pair are respectively positioned in the first image and the second image; determining a coordinate mapping relation mapped from a first coordinate system to a second coordinate system according to the position information of the pixels in each pixel pair; and mapping the designated image area in the first image from the first coordinate system to the second coordinate system according to the coordinate mapping relation.

Description

Image mapping method, device and equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image mapping method, apparatus, device, and storage medium.

Background

In some fields, such as the monitoring field, two or more imaging devices are required to complete shooting of a required scene in a matching way, one imaging device acquires a panoramic image of the scene, and the other imaging device acquires a local detail image in the scene, and the images acquired by the two devices are required to be mapped so as to realize panoramic and automatic tracking and amplification of details.

In the related image mapping mode, feature extraction filters such as SIFT (Scale-invariant feature transform, scale invariant feature transform), ORB (Oriented FAST and Rotated BRIEF, feature detection is based on FAST, and BRIEF descriptors are adopted and improved), SURF (Speeded Up Robust Features, acceleration robust feature) and the like are adopted to extract features in images, the extracted features are matched to obtain matching feature points of two images, and image mapping is performed based on the matching feature points. In the above-described method, the feature extraction performance of the feature extraction algorithm such as SIFT, SURF, ORB is limited, and there is a possibility that the image mapping cannot be realized due to failure in matching caused by insufficient feature points.

Disclosure of Invention

In view of the above, the present invention provides an image mapping method, apparatus, device, and storage medium, which avoid the problem that image mapping cannot be achieved due to failure in matching caused by insufficient feature points.

The first aspect of the present invention provides an image mapping method, including:

respectively inputting a first image and a second image into a trained first neural network, so that the first neural network processes pixels in the input first image to obtain a first pixel characteristic set, and processes pixels in the input second image to obtain a second pixel characteristic set; the first image and the second image are images acquired by two different devices aiming at the same scene;

determining N pairs of pixels meeting preset conditions from the first image and the second image according to the first pixel feature set and the second pixel feature set, wherein N is greater than 1, and two pixels in each pixel pair are respectively positioned in the first image and the second image;

determining a coordinate mapping relation mapped from a first coordinate system to a second coordinate system according to the position information of the pixels in each pixel pair, wherein the first coordinate system is a coordinate system applied by a first image, and the second coordinate system is a coordinate system applied by a second image;

And mapping the designated image area in the first image from a first coordinate system to a second coordinate system according to the coordinate mapping relation.

According to one embodiment of the invention, the first neural network comprises at least a plurality of cascaded convolutional layers;

the first set of pixel characteristics is: the plurality of cascaded convolution layers perform feature extraction on the first image to obtain an M-channel feature map; the M is greater than 1;

the second set of pixel characteristics is: and the plurality of cascaded convolution layers perform feature extraction on the second image to obtain an M-channel feature map.

According to an embodiment of the present invention, each pixel feature in the M-channel feature map includes M channel features, and each channel feature corresponds to a class, and each class to which each pixel feature belongs is a class corresponding to one channel feature satisfying a specified requirement in the pixel feature;

determining N pairs of pixels meeting a preset condition from the first image and the second image according to the first pixel feature set and the second pixel feature set, including:

clustering is respectively carried out on a first pixel feature of the first pixel feature set and a second pixel feature of the second pixel feature set;

Determining a first region and a second region from the clustered first pixel feature set and the clustered second pixel feature set respectively, wherein the categories of the first pixel feature of the first region and the second pixel feature of the second region are the same;

calculating, for a first pixel feature in the first region, a similarity between the first pixel feature and each second pixel feature in the second region;

determining pixel characteristic pairs corresponding to N highest target similarities in the calculated similarities, wherein the pixel characteristic pairs comprise a first pixel characteristic and a second pixel characteristic;

for each pair of pixel feature pairs, a pixel in the first image corresponding to a first pixel feature in the pair of pixel feature pairs and a pixel in the second image corresponding to a second pixel feature in the pair of pixel feature pairs are determined as a pair of pixel pairs.

According to one embodiment of the present invention, determining N pairs of pixels satisfying a preset condition from the first image and the second image according to the first pixel feature set and the second pixel feature set includes:

calculating the similarity between each first pixel feature in the first pixel feature set and each second pixel feature in the second pixel feature set;

According to one embodiment of the present invention, determining a coordinate mapping relationship mapped from a first coordinate system to a second coordinate system according to position information of pixels in each pixel pair includes:

constructing a first matrix according to the position information of the pixels in the first coordinate system of the pixels in the pixel pair;

constructing a second matrix according to the position information of the pixels in the second coordinate system of the pixel pair, wherein the pixels are positioned in the second image;

and calculating a conversion relation from the first matrix to the second matrix, and determining the conversion relation as the coordinate mapping relation.

A second aspect of the present invention provides an image mapping apparatus, comprising:

the pixel-level processing module is used for respectively inputting the first image and the second image into the trained first neural network, so that the first neural network processes the pixels in the input first image to obtain a first pixel characteristic set, and processes the pixels in the input second image to obtain a second pixel characteristic set; the first image and the second image are images acquired by two different devices aiming at the same scene;

The pixel pair determining module is used for determining N pairs of pixel pairs meeting preset conditions from the first image and the second image according to the first pixel feature set and the second pixel feature set, wherein N is greater than 1, and two pixels in each pixel pair are respectively positioned in the first image and the second image;

the coordinate mapping relation determining module is used for determining a coordinate mapping relation mapped from a first coordinate system to a second coordinate system according to the position information of the pixels in each pixel pair, wherein the first coordinate system is a coordinate system applied by a first image, and the second coordinate system is a coordinate system applied by a second image;

and the image mapping module is used for mapping the appointed image area in the first image from the first coordinate system to the second coordinate system according to the coordinate mapping relation.

the pixel pair determining module includes:

a clustering processing unit, configured to perform clustering processing on a first pixel feature of the first pixel feature set and a second pixel feature of the second pixel feature set, respectively;

the region determining unit is used for determining a first region and a second region from the clustered first pixel feature set and the clustered second pixel feature set respectively, wherein the first pixel feature of the first region and the second pixel feature of the second region belong to the same category;

a first similarity calculation unit configured to calculate, for a first pixel feature in the first region, a similarity between the first pixel feature and each of second pixel features in the second region;

a first pixel feature pair determining unit, configured to determine a pixel feature pair corresponding to the N highest target similarities among the calculated similarities, where the pixel feature pair includes a first pixel feature and a second pixel feature;

A first pixel pair determining unit, configured to determine, for each pair of pixel feature pairs, a pixel in the first image corresponding to a first pixel feature in the pixel feature pair, and a pixel in the second image corresponding to a second pixel feature in the pixel feature pair as a pair of pixel pairs.

According to one embodiment of the present invention, the pixel pair determining module includes:

a second similarity calculating unit configured to calculate, for each first pixel feature in the first pixel feature set, a similarity between the first pixel feature and each second pixel feature in the second pixel feature set;

a second pixel feature pair determining unit, configured to determine a pixel feature pair corresponding to the N highest target similarities among the calculated similarities, where the pixel feature pair includes a first pixel feature and a second pixel feature;

and a third pixel pair determining unit, configured to determine, for each pair of pixel feature pairs, a pixel in the first image corresponding to a first pixel feature in the pixel feature pair, and a pixel in the second image corresponding to a second pixel feature in the pixel feature pair as a pair of pixel pairs.

According to one embodiment of the present invention, the coordinate mapping relation determining module includes:

A first matrix construction unit, configured to construct a first matrix according to position information of pixels in the first coordinate system of the first image in the pixel pair;

a second matrix construction unit, configured to construct a second matrix according to position information of pixels in the second coordinate system of the second image in the pixel pair;

and the coordinate mapping relation determining unit is used for calculating the conversion relation from the first matrix to the second matrix and determining the conversion relation as the coordinate mapping relation.

A third aspect of the present invention provides an electronic device, including a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor implements the image mapping method as described in the foregoing embodiment when executing the program.

A fourth aspect of the present invention provides a machine-readable storage medium, having stored thereon a program which, when executed by a processor, implements an image mapping method as described in the previous embodiments.

The embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the pixels in the first image and the second image are respectively processed through the first neural network, so that the pixel-level feature extraction is realized, the first pixel feature set and the second pixel feature set are obtained, when the first pixel feature set and the second pixel feature set are matched, the pixel pairs formed by the pixels in the first image and the second image can be determined according to the matching degree between the pixel features, the pixel features in the first pixel feature set and the second pixel feature set are more, the matching precision is higher, the coordinate mapping relation is determined according to the determined position information of the pixel pairs, the image mapping can be more accurate, and the problem that the image mapping cannot be realized due to the fact that the matching failure is caused by insufficient feature points is avoided.

Drawings

FIG. 1 is a flow chart of an image mapping method according to an embodiment of the invention;

fig. 2 is a block diagram of an image mapping apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of calculating the similarity between pixel features according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a coordinate mapping relationship according to an embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various devices, these information should not be limited by these terms. These terms are only used to distinguish one device from another of the same type. For example, a first device could also be termed a second device, and, similarly, a second device could also be termed a first device, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In order to make the description of the present invention clearer and more concise, some technical terms of the present invention are explained below:

neural network: a technique for simulating the abstraction of brain structure features that a network system is formed by complex connection of a great number of simple functions, which can fit extremely complex functional relation, and generally includes convolution/deconvolution operation, activation operation, pooling operation, addition, subtraction, multiplication and division, channel merging and element rearrangement. Training the network with specific input data and output data, adjusting the connections therein, and allowing the neural network to learn the mapping between the fitting inputs and outputs.

Gun machine: an image pickup apparatus is provided in which the pose and focal length are not automatically adjusted during use.

Ball machine: an image pickup apparatus has a horizontal motor, a vertical motor, and a focus adjustment capability, and in use, the pose and focus can be automatically adjusted. The dome camera can rotate by a PT angle to enable the target to be located at the center of the picture, and then the target is enlarged according to the multiplying power Z, which is also called PTZ camera. PT: the angle of the motor in the ball machine is horizontal, P is vertical and T is vertical.

The image mapping method according to the embodiment of the present invention is described in more detail below, but is not limited thereto. In one embodiment, referring to fig. 1, an image mapping method may include the steps of:

s100: respectively inputting a first image and a second image into a trained first neural network, so that the first neural network processes pixels in the input first image to obtain a first pixel characteristic set, and processes pixels in the input second image to obtain a second pixel characteristic set; the first image and the second image are images acquired by two different devices aiming at the same scene;

s200: determining N pairs of pixels meeting preset conditions from the first image and the second image according to the first pixel feature set and the second pixel feature set, wherein N is greater than 1, and two pixels in each pixel pair are respectively positioned in the first image and the second image;

S300: determining a coordinate mapping relation mapped from a first coordinate system to a second coordinate system according to the position information of the pixels in each pixel pair, wherein the first coordinate system is a coordinate system applied by a first image, and the second coordinate system is a coordinate system applied by a second image;

s400: and mapping the designated image area in the first image from a first coordinate system to a second coordinate system according to the coordinate mapping relation.

In the embodiment of the present invention, the first image and the second image are images acquired by two different devices for the same scene, for example, the first image may be a local detail image of the scene, the second image is a panoramic image of the scene, and the panoramic image includes an object in the local detail image, which may, of course, also be the first image as a panoramic image, and the second image is a local detail image, which is not particularly limited.

The image mapping method in the embodiment of the invention can be applied to the electronic equipment with the image processing capability, wherein the electronic equipment can be equipment for acquiring the first image or the second image, can also be other equipment capable of acquiring the first image and the second image and has the image processing capability, and is particularly not limited. In the embodiment of the invention, the first image may be acquired by a first device, the second image may be acquired by a second device, the first device and the second device may be carried on the same platform, the platform may be a movable platform or a fixed platform, the movable platform is not limited, for example, an unmanned plane, an unmanned vehicle, a ground robot and the like, and the image may be transmitted to a processor on the platform to execute corresponding processing after being acquired.

The image mapping method provided by the embodiment of the invention can be applied to monitoring of large scenes such as squares, entrances, perimeters, airports, ports, wharfs, substations, reservoirs, parks, intersections and the like. Taking an intersection as an example, a plurality of target objects are arranged at the intersection, the second image is a panoramic image containing all the target objects of the intersection, the pixel size of the target objects in the panoramic image is smaller, and the resolution ratio of the target objects after amplification is very low; the first image is a local detail image containing part of target objects of the intersection, and the pixel size of the part of target objects in the local detail image is larger; the areas where the partial target objects are located in the first image can be mapped to the areas where the corresponding target objects are located in the second image, so that when the second image is enlarged, the resolution of the partial target objects in the second image is higher than that of the partial target objects in the non-mapped image, and the picture is clearer.

For scenes with higher security requirements such as the perimeter, the image mapping method provided by the embodiment of the invention can realize accurate monitoring of actions such as intrusion of a moving target area, crossing of a warning surface, entering of the area, leaving of the area and the like while synchronously completing monitoring of the panoramic area, so that security with high-level requirements is realized.

In step S100, the first image and the second image are respectively input into a trained first neural network, so that the first neural network processes the pixels in the input first image to obtain a first pixel feature set, and processes the pixels in the input second image to obtain a second pixel feature set.

In order to reduce the subsequent throughput, the size of the first pixel feature set may be smaller than the size of the first image, and the size of the second pixel feature set may be smaller than the size of the second image, as long as the first neural network performs downsampling of the image by a corresponding magnification when processing. For example, the width of the first image and the second image is w, the height of the first image is h, the first neural network performs 8 times downsampling during processing, correspondingly, the width of the first pixel feature set and the second pixel feature set is w/8, the height of the first pixel feature set and the second pixel feature set is h/8, the first image has pixels corresponding to each first pixel feature of the first pixel feature set, and the second image has pixels corresponding to each second pixel feature of the second pixel feature set. Of course, the first set of pixel features may be the same size as the first image and the second set of pixel features may be the same size as the second image.

The first neural network may be implemented on the basis of a semantic segmentation network, which may be, for example, an FCN (full convolutional network), without limitation. The related semantic segmentation network outputs a semantic image, each value on the semantic image is the confidence that the pixel feature belongs to each category, but in the embodiment of the invention, the first neural network outputs a pixel feature set, and the difference is that in the embodiment of the invention, the last layer of the semantic segmentation network is less than the classification layer for classifying the pixel feature. When the first neural network is trained, the semantic segmentation network can be trained, and after the training is finished, the output of the previous layer of the classification layer of the semantic segmentation network is used as the output of the first neural network, namely the processing of the last classification layer in the semantic segmentation network is omitted.

The semantic segmentation network is trained in a manner that, for example, a sample image is used as an input of the semantic segmentation network, a sample semantic segmentation image corresponding to the sample image is used as an output, and the semantic segmentation network is trained, wherein confidence that each pixel feature belongs to each category is calibrated in the sample semantic segmentation image.

It is understood that the first neural network may also be implemented using other neural networks, such as RNN, CNN, etc.

In step S200, N pairs of pixels satisfying a preset condition are determined from the first image and the second image according to the first pixel feature set and the second pixel feature set, where N is greater than 1.

The pixel characteristics in the first pixel characteristic set and the second pixel characteristic set can be matched, and one pixel which is in the first image and one pixel which is in the second image and corresponds to the successfully matched pixel characteristic is determined to be a pixel pair, and of course, the specific determination mode is not limited.

The number of N is not limited, as long as the determined pixel pairs can be enough to calculate the coordinate mapping relation, preferably, N can be 4,4 pairs of pixel pairs, the coordinate mapping relation calculated subsequently can be more accurate, the determined 4 pairs of pixel pairs are not collinear, and the four pixels in the first image are not collinear corresponding to each other. It is understood that N may be 5 or more, and is not particularly limited.

In step S300, a coordinate mapping relationship mapped from a first coordinate system to a second coordinate system is determined according to the position information of the pixels in each pixel pair, wherein the first coordinate system is a coordinate system applied to the first image, and the second coordinate system is a coordinate system applied to the second image.

According to the position information of the pixel in the first image (first pixel for short) and the position information of the pixel in the second image (second pixel for short) in each pixel pair, the conversion relation between the position information of the two pixels in the pixel pair can be uniquely calculated, and the conversion relation is the coordinate mapping relation from the first coordinate system to the second coordinate system, and can be used for realizing the mapping from the first image to the second image, and the specific determination mode of the coordinate mapping relation is not limited.

In step S400, the specified image area in the first image is mapped from the first coordinate system to the second coordinate system according to the coordinate mapping relation.

The designated image area can be any interested local area in the first image, for example, can be an area where a monitoring target object is located, and can be specifically an area where a target object such as a person, a car and the like is located; the designated image area may also be the entire image area of the first image. When determining the designated image area, a target object in the first image may be detected by a target detection technique, an area in which the target object is located may be determined, and the area may be determined as the designated image area.

The designated image area mapped into the second coordinate system may directly cover a corresponding area in the second image, such as a designated image area containing the target object directly covers an area in the second image where the target object is located. Of course, the specified image area mapped into the second coordinate system may also be used for other purposes, such as controlling the first device to perform a corresponding movement according to the position information of the first image or the target area in the second coordinate system, so as to acquire an image containing the target object contained in the first image or the target area.

Specifically, a first device and a second device which are co-located on the same platform collect images for the same scene, in one case, the first device can rotate relative to the platform when the images are collected, the second device is fixed relative to the platform when the images are collected, the first image is a partial image collected in the rotation process of the first device, and the second image is a panoramic image collected by the second device, and according to the embodiment of the invention, the partial image collected by the first device in the rotation process can be mapped into the panoramic image, so that the finally obtained panoramic image is clearer; in another case, the focal length of the first device is variable when the image is acquired, the focal length of the second device is fixed when the image is acquired, the first image is an image acquired in the focal length change process of the first device, and the second image is an image acquired by the second device. Alternatively, the first device may be a ball machine, for example, and the second device may be a bolt machine, for example.

In one embodiment, the first neural network includes at least a plurality of cascaded convolutional layers;

And respectively extracting the features of the first image and the second image through cascaded convolution layers, wherein each convolution layer outputs a corresponding feature map, and the features in the feature maps output by the later convolution layers are more refined until each pixel feature can determine the corresponding M channel feature.

In one embodiment, each pixel feature in the M-channel feature map includes M channel features, and each channel feature corresponds to a class, and each class to which each pixel feature belongs is a class corresponding to one channel feature that meets a specified requirement in the pixel feature.

Referring to fig. 3, F1 is a first set of pixel features, each pixel feature including M channel features; f2 is a second set of pixel features, each pixel feature comprising M channel features. Each channel feature corresponds to a category in the M channel features, and may also characterize a confidence level that the pixel feature where the channel feature is located belongs to the category corresponding to the channel feature.

And if the similarity between a certain channel feature in the pixel features and the standard feature of the corresponding category is highest, the category corresponding to the channel feature is the category to which the pixel feature belongs, namely the channel feature with the highest standard feature similarity with the corresponding category in the pixel features is the channel feature meeting the specified requirement. For example, M is 2 (here, only by way of example, more dimensions may be actually used, and the specific dimension is not limited), the 1 st channel feature is a, the standard feature of the class corresponding to the 1 st channel feature is a1, the 2 nd channel feature is b, the standard feature of the class corresponding to the 2 nd channel feature is b1, the similarity between a and a1 is greater than the similarity between b and b1, the 1 st channel feature is a channel feature meeting the specified requirement, and the class to which the pixel feature belongs is the class corresponding to the 1 st channel feature.

In one embodiment, in step S200, determining N pairs of pixels that satisfy a preset condition from the first image and the second image according to the first pixel feature set and the second pixel feature set may include the following steps:

s201: clustering is respectively carried out on a first pixel feature of the first pixel feature set and a second pixel feature of the second pixel feature set;

S202: determining a first region and a second region from the clustered first pixel feature set and the clustered second pixel feature set respectively, wherein the categories of the first pixel feature of the first region and the second pixel feature of the second region are the same;

s203: calculating, for a first pixel feature in the first region, a similarity between the first pixel feature and each second pixel feature in the second region;

s204: determining pixel characteristic pairs corresponding to N highest target similarities in the calculated similarities, wherein the pixel characteristic pairs comprise a first pixel characteristic and a second pixel characteristic;

s205: for each pair of pixel feature pairs, a pixel in the first image corresponding to a first pixel feature in the pair of pixel feature pairs and a pixel in the second image corresponding to a second pixel feature in the pair of pixel feature pairs are determined as a pair of pixel pairs.

In step S201, each first pixel feature of the first pixel feature set is a category corresponding to one channel feature satisfying the specified requirement in the first pixel features, each second pixel feature of the second pixel feature set is a category corresponding to one channel feature satisfying the specified requirement in the second pixel features, so that clustering processing can be performed on the first pixel features of the first pixel feature set and the second pixel features of the second pixel feature set respectively, so as to aggregate all the first pixel features in the first pixel feature set into regions of multiple categories based on the categories, and aggregate all the second pixel features in the second pixel feature set into regions of multiple categories based on the categories.

The manner of clustering may be, for example, hierarchical-based clustering, partition-based clustering, or the like, as long as clustering can be achieved.

In step S202, after the clustering process, the first pixel feature set and the second pixel feature set are divided into regions of respective categories, and regions of the same category may be found in the two sets, as the first region and the second region, where the first pixel feature of the first region and the second pixel feature of the second region belong to the same category.

The first region and the second region are regions of current interest in the first pixel feature set and the second pixel feature set.

In step S203, for a first pixel feature in the first region, a similarity between the first pixel feature and each second pixel feature in the second region is calculated.

Specifically, taking fig. 3 as an example, calculating the similarity between the first pixel feature in the first pixel feature set and the first second pixel feature in the second pixel feature set, during calculation, the first pixel feature and the second pixel feature are respectively represented by an M-dimensional vector, each dimensional data in the M-dimensional vector is respectively M channel features of the pixel feature, the similarity S1 between the first pixel feature and the M-dimensional vector of the second pixel feature is calculated, the similarity S1 can be calculated by adopting calculation modes such as calculating the euclidean distance and the COS distance, and the specific mode is not limited, and the similarity can also be calculated by adopting a depth measurement network.

After the execution of step S203 is completed, the number of obtained similarities is a product between the total number of the first pixel features in the first pixel feature set and the total number of the second pixel features in the second pixel feature set.

In step S204, N highest target similarities are determined from the calculated similarities, and a pixel feature pair corresponding to each target similarity is determined, where the pixel feature pair includes a first pixel feature and a second pixel feature.

In step S205, for each pair of pixel features, a pixel in the first image corresponding to a first pixel feature in the pair of pixel features and a pixel in the second image corresponding to a second pixel feature in the pair of pixel features are determined as a pair of pixel pairs.

In this embodiment, the first region and the second region in which the similarity needs to be calculated are determined based on the clustering result of the pixel features, and then the similarity calculation is performed on the pixel features in the first region and the second region, so that the required calculation amount can be reduced.

In another embodiment, in step S200, determining N pairs of pixels that satisfy a preset condition from the first image and the second image according to the first pixel feature set and the second pixel feature set may include the following steps:

S206: calculating the similarity between each first pixel feature in the first pixel feature set and each second pixel feature in the second pixel feature set;

s207: determining pixel characteristic pairs corresponding to N highest target similarities in the calculated similarities, wherein the pixel characteristic pairs comprise a first pixel characteristic and a second pixel characteristic;

s208: for each pair of pixel feature pairs, a pixel in the first image corresponding to a first pixel feature in the pair of pixel feature pairs and a pixel in the second image corresponding to a second pixel feature in the pair of pixel feature pairs are determined as a pair of pixel pairs.

The difference between the present embodiment and the foregoing embodiment is that the similarity is calculated for each pixel feature in the first pixel feature set and the second pixel feature set, the calculation result is more accurate, and the remaining points are not described herein.

In one embodiment, in step S300, determining the coordinate mapping relationship mapped from the first coordinate system to the second coordinate system according to the position information of the pixels in each pixel pair may include the following steps:

s301: constructing a first matrix according to the position information of the pixels in the first coordinate system of the pixels in the pixel pair;

S302: constructing a second matrix according to the position information of the pixels in the second coordinate system of the pixel pair, wherein the pixels are positioned in the second image;

s303: and calculating a conversion relation from the first matrix to the second matrix, and determining the conversion relation as the coordinate mapping relation.

Referring to fig. 4, a first matrix is constructed according to the position information of the pixels P1'-P4' in the first image M1 in the pixel pair, a second matrix is constructed according to the position information of the corresponding pixels P1-P4 in the second image M2 in the pixel pair, and the conversion relationship from the first matrix to the second matrix is calculated to obtain a coordinate mapping relationship. When the image mapping is performed according to the coordinate mapping relation, P1 will be mapped to P1', P2 will be mapped to P2', P3 will be mapped to P3', and P4 will be mapped to P4'.

Specifically, an example of determining the coordinate mapping relationship is given below:

there are N pairs of pixels in common, each pair of pixels including a first pixel in the first image and a second pixel in the second image, and a first matrix S of 3*N is constructed from positional information (coordinates in the first coordinate system) of the first pixel in the N pairs of pixels as follows:

a second matrix D of 3*N is constructed from the positional information (coordinates in the second coordinate system) of the second pixel in the N pairs of pixels as follows:

The last row of the first matrix S and the second matrix D are all complemented with 1, so that two matrices 3*N are obtained, x, y represent the coordinates of the first pixel, u, v represent the coordinates of the second pixel.

Let S to D be the conversion relationship H, H is a homography matrix of 3x3 as follows:

converting H into a matrix of 1x 9:

h＝(H ₁₁ ,H ₁₂ ,H ₁₃ ,H ₂₁ ,H ₂₂ ,H ₂₃ ,H ₃₁ ,H ₃₂ ,H ₃₃ ) ^Τ 。

constructing a coefficient matrix required for calculating the homography matrix:

a _x,u ＝(-x,-y,-1,0,0,0,ux,uy,u) ^Τ ，

a _y,v ＝(0,0,0,-x,-y,-1,vx,vy,v) ^Τ ，

solving for ah=0, the solution for h can be obtained.

When solving ah=0, performing SVD decomposition on a to obtain:

[U,Σ,V]＝svd(A) (1)

the right singular vector V and the left singular vector U of A obtained by SVD decomposition are sequenced from large to small according to the size of each element in the sigma, and the V corresponding to the smallest element in the sigma is determined as an approximate solution of h, and the formula is as follows:

h＝V[[min(∑)],:] (2)

{ min (Σ) } represents the index number corresponding to the Σ minimum element.

After solving H, determining the homography matrix H, wherein H can be used as a coordinate mapping relation.

After the homography matrix H is obtained, pixels in the designated image area of the first image may be mapped from the pixel coordinates x, y of the first coordinate system where they are located to the pixel coordinates u, v of the first coordinate system:

based on the above formula (3), the mapping of the specified image area in the first image from the first coordinate system into the second coordinate system can be completed.

A second aspect of the present invention provides an image mapping apparatus, in one embodiment, referring to fig. 2, comprising:

the pixel-level processing module 100 is configured to input a first image and a second image into a trained first neural network, respectively, so that the first neural network processes pixels in the input first image to obtain a first pixel feature set, and processes pixels in the input second image to obtain a second pixel feature set; the first image and the second image are images acquired by two different devices aiming at the same scene;

the pixel pair determining module 200 is configured to determine, according to the first pixel feature set and the second pixel feature set, N pairs of pixel pairs that satisfy a preset condition from the first image and the second image, where N is greater than 1, and two pixels in each pixel pair are respectively in the first image and the second image;

the coordinate mapping relation determining module 300 is configured to determine, according to the position information of the pixels in each pixel pair, a coordinate mapping relation mapped from a first coordinate system to a second coordinate system, where the first coordinate system is a coordinate system applied by the first image, and the second coordinate system is a coordinate system applied by the second image;

The image mapping module 400 is configured to map the specified image area in the first image from the first coordinate system to the second coordinate system according to the coordinate mapping relationship.

the pixel pair determining module includes:

The implementation process of the functions and roles of each module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements.

The invention also provides an electronic device, which comprises a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor implements the image mapping method as described in the foregoing embodiments when executing the program.

The embodiment of the image mapping device can be applied to electronic equipment. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of an electronic device where the device is located for operation. In terms of hardware, as shown in fig. 5, fig. 5 is a hardware structure diagram of an electronic device where the image mapping apparatus 10 according to an exemplary embodiment of the present invention is located, and in addition to the processor 510, the memory 530, the interface 520, and the nonvolatile storage 540 shown in fig. 5, the electronic device where the apparatus 10 is located in the embodiment may further include other hardware according to the actual functions of the electronic device, which will not be described herein.

The present invention also provides a machine-readable storage medium having stored thereon a program which, when executed by a processor, implements an image mapping method as in any of the preceding embodiments.

The present invention may take the form of a computer program product embodied on one or more storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. An image mapping method, comprising:

respectively inputting a first image and a second image into a trained first neural network, so that the first neural network processes pixels in the input first image to obtain a first pixel characteristic set, and processes pixels in the input second image to obtain a second pixel characteristic set; the first image and the second image are images acquired by two different devices aiming at the same scene, the first image is a local detail image of the scene, the second image is a panoramic image of the scene, the panoramic image comprises objects in the local detail image, and the first neural network at least comprises a plurality of cascaded convolution layers; the first set of pixel characteristics is: the plurality of cascaded convolution layers perform feature extraction on the first image to obtain an M-channel feature map; the M is greater than 1; the second set of pixel characteristics is: the plurality of cascaded convolution layers perform feature extraction on the second image to obtain an M-channel feature map, each pixel feature in the M-channel feature map comprises M channel features, each channel feature corresponds to a category, and each category to which each pixel feature belongs is a category corresponding to one channel feature meeting specified requirements in the pixel features;

Clustering is respectively carried out on a first pixel feature of the first pixel feature set and a second pixel feature of the second pixel feature set; determining a first region and a second region from the clustered first pixel feature set and the clustered second pixel feature set respectively, wherein the categories of the first pixel feature of the first region and the second pixel feature of the second region are the same; performing similarity calculation according to pixel characteristics in the first area and the second area, and determining N pairs of pixels meeting preset conditions, wherein N is greater than 1, and two pixels in each pixel pair are respectively positioned in the first image and the second image;

2. The image mapping method according to claim 1, wherein the determining N pairs of pixels satisfying a preset condition by performing similarity calculation according to pixel characteristics in the first region and the second region includes:

3. The image mapping method according to claim 1, wherein determining the coordinate mapping relationship mapped from the first coordinate system to the second coordinate system based on the positional information of the pixels in each pixel pair, comprises:

4. An image mapping apparatus, comprising:

the pixel-level processing module is used for respectively inputting the first image and the second image into the trained first neural network, so that the first neural network processes the pixels in the input first image to obtain a first pixel characteristic set, and processes the pixels in the input second image to obtain a second pixel characteristic set; the first image and the second image are images acquired by two different devices aiming at the same scene, the first image is a local detail image of the scene, the second image is a panoramic image of the scene, the panoramic image comprises objects in the local detail image, and the first neural network at least comprises a plurality of cascaded convolution layers; the first set of pixel characteristics is: the plurality of cascaded convolution layers perform feature extraction on the first image to obtain an M-channel feature map; the M is greater than 1; the second set of pixel characteristics is: the plurality of cascaded convolution layers perform feature extraction on the second image to obtain an M-channel feature map; each pixel feature in the M-channel feature map comprises M channel features, each channel feature corresponds to a category, and the category to which each pixel feature belongs is the category corresponding to one channel feature meeting the specified requirement in the pixel feature;

A pixel pair determining module, configured to perform clustering processing on a first pixel feature of the first pixel feature set and a second pixel feature of the second pixel feature set, respectively; determining a first region and a second region from the clustered first pixel feature set and the clustered second pixel feature set respectively, wherein the categories of the first pixel feature of the first region and the second pixel feature of the second region are the same; performing similarity calculation according to pixel characteristics in the first area and the second area, and determining N pairs of pixels meeting preset conditions, wherein N is greater than 1, and two pixels in each pixel pair are respectively positioned in the first image and the second image;

5. The image mapping apparatus of claim 4, wherein the pixel pair determining module is specifically configured to:

6. The image mapping apparatus of claim 4, wherein the coordinate mapping relationship determination module comprises:

7. An electronic device, comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the image mapping method as claimed in any one of claims 1-3.

8. A machine readable storage medium having stored thereon a program which, when executed by a processor, implements an image mapping method as claimed in any one of claims 1-3.