CN113947635A

CN113947635A - Image positioning method, device, electronic device and storage medium

Info

Publication number: CN113947635A
Application number: CN202111207034.4A
Authority: CN
Inventors: 陈曲; 叶晓青; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2022-01-18

Abstract

The disclosure provides an image positioning method, an image positioning device, electronic equipment and a storage medium, relates to the field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be particularly used in image recognition and visual positioning scenes. The specific implementation scheme is as follows: determining descriptor information corresponding to key points in an image to be positioned; determining the weight of descriptor information according to the representation information of key points included in the associated image related to the image to be positioned; and determining pose information related to the image to be positioned according to the weighted descriptor information weighted by the weight so as to position the image to be positioned.

Description

Image positioning method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of computer vision and deep learning technologies, which can be used in image recognition and visual positioning scenes, and in particular, to an image positioning method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of digital image processing and computer vision technologies, more and more researchers adopt cameras as perception sensors of fully autonomous mobile robots. Since the real world is three-dimensional and the image projected on the camera lens is two-dimensional, the pose of the camera in the world coordinate system can be estimated according to the three-dimensional coordinates of the feature points acquired by the camera in the current camera coordinate system and the world coordinates in the map, so that the relevant three-dimensional world information can be extracted from the perceived two-dimensional image, and the application in various industries such as virtual reality, augmented reality, map navigation, industrial production and the like is realized.

Disclosure of Invention

The disclosure provides an image positioning method, an image positioning device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided an image localization method, including: determining descriptor information corresponding to key points in an image to be positioned; determining the weight of the descriptor information according to the representation information of the key points in the associated image related to the image to be positioned; and determining pose information related to the image to be positioned according to the weighted descriptor information weighted by the weight so as to position the image to be positioned.

According to another aspect of the present disclosure, there is provided an image localization apparatus including: the first determining module is used for determining descriptor information corresponding to key points in the image to be positioned; the second determining module is used for determining the weight of the descriptor information according to the representation information of the key point included in the associated image related to the image to be positioned; and the third determining module is used for determining the pose information related to the image to be positioned according to the weighted descriptor information weighted by the weight so as to position the image to be positioned.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image localization method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the image localization method as described above.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the image localization method as described above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which the image localization method and apparatus may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow chart of an image localization method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of an image localization method according to an embodiment of the present disclosure;

FIG. 4 schematically shows a block diagram of an image localization arrangement according to an embodiment of the present disclosure; and

FIG. 5 illustrates a schematic block diagram of an example electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

The visual positioning refers to visual localization, and can position 6DoF poses of an image acquisition device, equipment comprising a plurality of image acquisition devices and the like on a map, wherein the poses comprise three-dimensional coordinate information and three orientation angles of the image acquisition device. Based on the image acquisition device determining the 6DOF pose, the visual positioning can substantially realize the positioning of an image which can be observed in the current visual range. In practical application, the visual positioning method is mainly realized based on technologies such as image retrieval, local descriptor extraction, pose calculation and the like. When the main focus point of the image retrieval of the visual positioning is basically unchanged, the position and the attitude of a camera are changed, and the image is changed due to the scene change. Some optimized visual positioning methods mainly include optimizing image retrieval, local descriptor extraction and the like. For example, the visual positioning method is optimized by a serial method of image retrieval and local descriptor extraction, and the method firstly carries out image retrieval and then carries out local descriptor extraction. For another example, the visual localization method is optimized by a method of parallel extraction of global and local image descriptors, which performs the steps of image retrieval and local descriptor extraction in parallel.

The inventor finds that the steps of image retrieval and local descriptor extraction are performed sequentially based on the method of image retrieval and local descriptor extraction serial execution, so that the global information and the local information of the image to be positioned cannot be effectively combined, and a large amount of unnecessary calculation consumption exists in the calculated amount, the memory and the like under the condition of obtaining the same precision. The method based on the parallel execution of image retrieval and local descriptor extraction extracts global and local features through a network, cannot combine various local features, and is difficult to retrain and optimize.

Fig. 1 schematically illustrates an exemplary system architecture to which the image localization method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the image positioning method and apparatus may be applied may include a terminal device, but the terminal device may implement the image positioning method and apparatus provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit information or the like.

The

terminal devices

101, 102, 103 may be various electronic devices with image capturing functionality, including but not limited to cameras, video cameras, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for images captured by the user with the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the image, and feed back a processing result (e.g., a web page, information, or data obtained or generated according to a user request) to the terminal device. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that the image positioning method provided by the embodiment of the present disclosure may be generally executed by the

terminal device

101, 102, or 103. Accordingly, the image positioning apparatus provided by the embodiment of the present disclosure may also be disposed in the

terminal device

101, 102, or 103.

Alternatively, the image positioning method provided by the embodiment of the present disclosure may also be generally executed by the server 105. Accordingly, the image positioning apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The image positioning method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the image positioning apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

For example, when image positioning is required, the

terminal device

101, 102, 103 may acquire an image to be positioned and an associated image related to the image to be positioned. The acquired image to be positioned and the associated image associated with the image to be positioned are then sent to server 105. Determining, by the server 105, descriptor information corresponding to the keypoints in the image to be located; determining the weight of descriptor information according to the representation information of key points included in the associated image related to the image to be positioned; and determining pose information related to the image to be positioned according to the weighted descriptor information weighted by the weight so as to position the image to be positioned. Or a server cluster capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105 analyzes the image to be positioned and the associated image related to the image to be positioned, and realizes positioning of the image to be positioned.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of an image localization method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, descriptor information corresponding to a key point in an image to be located is determined.

In operation S220, a weight of the descriptor information is determined according to the representation information of the key point included in the associated image related to the image to be positioned.

In operation S230, pose information associated with the image to be positioned is determined according to the weighted descriptor information weighted by the weight, so as to position the image to be positioned.

According to the embodiment of the disclosure, the image to be positioned may include images acquired by various devices or apparatuses with image acquisition functions for various angles, directions and positions of an actual scene, or may include images acquired for various angles, directions and positions of a predefined virtual scene. The acquisition of the image can be obtained by acquiring a picture, or by acquiring a corresponding video first and then extracting a video frame in the video. The key points can be determined by schemes such as SIFT, ORB and the like, and can include feature points with higher identification degree in the image to be positioned. For example, if the image to be located includes a tower, the key points may include feature points corresponding to the tip of the tower, the edge of the tower, and the like in the image. The descriptor information may characterize feature information of the keypoint.

According to the embodiment of the disclosure, the associated image can be obtained through image retrieval. The image retrieval may include various image retrieval meeting visual positioning requirements, such as content-based image retrieval, which may include NetVLAD (feature descriptor-based feature coding algorithm), and the like. The content-based image retrieval can be carried out according to the image, the content semantics of the image and the contextual connection, and other images with similar characteristics are retrieved from the image database by taking the semantic features of the image as clues. For example, an image database may be constructed from all images acquired for a scene related to an image to be located, image retrieval may be performed from image content of the image to be located, and a related image may be determined from an obtained retrieval result. For example, the related image may be determined according to the top N images with the highest correlation with the image to be positioned in the search result, where N may be an integer greater than or equal to 1. For example, the image search may be performed according to one or more key points in the image to be located, and the associated image may be determined according to the image including at least one key point obtained by the search.

According to an embodiment of the present disclosure, the representation information of the keypoints may include at least one of semantic information, content information, color intensity information, brightness information, and the like of the keypoint representation. According to the strength of the correlation between the representation information of the key point and the descriptor information thereof, corresponding weight can be configured for the descriptor information corresponding to the key point. For example, if the correlation between the representation information of a certain key point and its descriptor information is strong, a larger weight value may be configured for the descriptor information corresponding to the key point, and if the correlation between the representation information of a certain key point and its descriptor information is weak, a smaller weight value may be configured for the descriptor information corresponding to the key point.

According to an embodiment of the present disclosure, the weighted descriptor information may include descriptor information for characterizing a certain keypoint and weight information of the descriptor information. By using the weighted descriptor information of each key point in the image to be positioned in the step of pose calculation, the 6DOF pose of the image acquisition device when acquiring the image to be positioned can be calculated. The pose estimation step can be realized by adopting schemes such as colomap, which is a universal motion structure and a multi-view stereo pipeline, has a graph and command line interface, and can conveniently carry out three-dimensional reconstruction on a series of two-dimensional pictures.

According to the embodiment of the disclosure, a set of algorithm can be constructed according to steps S210 to S230, each image in the image sequence can be used as an image to be positioned by taking all the image sequences for visual positioning acquired by the image acquisition device as input of the algorithm, weighted descriptor information corresponding to a key point in each image to be positioned is determined according to the representation information of the key point included in the associated image related to each image to be positioned, and a 6DOF pose of the image acquisition device when acquiring each image to be positioned in the image sequence is calculated. Because the 6DOF poses have position information and orientation angle information, the user view angle can be simulated at each 6DOF pose to realize environment observation. By determining the 6DOF pose, the user can obtain the same observation experience as that in a real scene when the user performs environment observation at any position in any actual or virtual scene and at any view angle.

Through the embodiment of the disclosure, the pose calculation is performed through the local feature descriptors configured with the weights, so that unnecessary calculation consumption is reduced on the basis of adopting less calculation amount and memory occupation, and the positioning accuracy of image positioning can be effectively improved based on less calculation amount.

The method shown in fig. 2 is further described below with reference to specific embodiments.

According to an embodiment of the present disclosure, the characterization information includes heat information. Determining the weight of the descriptor information according to the characterization information of the key points included in the associated image related to the image to be positioned may include: and performing feature extraction on the associated image to obtain a thermodynamic diagram corresponding to the associated image. And determining heat information of the key points according to the thermodynamic diagram. And determining the weight of the descriptor information corresponding to the key point according to the heat information.

According to the embodiment of the disclosure, a neural network which can perform image retrieval and obtain the thermodynamic diagram can be obtained through training in a manner of designing loss (for training the thermodynamic diagram network) based on a retrieval network, and the thermodynamic diagram corresponding to the associated image related to the image to be positioned can be obtained by inputting the image to be positioned into the neural network. And a thermodynamic diagram can also be obtained by designing the loss and training the loss in combination with the position of the region of the feature point on the image to be positioned, which is used in the pose calculation process. It should be noted that loss can be designed by user as long as it satisfies the configuration with CHW in the output layer. H denotes the width of the thermodynamic diagram, W denotes the height of the thermodynamic diagram, and C denotes the number of channels of the thermodynamic diagram.

According to the embodiment of the disclosure, local descriptor extraction can be firstly performed on key points in an image to be positioned, so that descriptor information corresponding to the key points in the image to be positioned is obtained. Then, a corresponding key point can be found from the thermodynamic diagram corresponding to the related image related to the image to be positioned, and the heat information of the corresponding key point in the thermodynamic diagram is determined, so that the descriptor information corresponding to the key point in the image to be positioned is weighted according to the heat information.

According to the embodiment of the disclosure, the global information and the local information of the image to be positioned and the associated image thereof are combined in a thermodynamic diagram manner, the weights are configured for the local feature descriptors in the image to be positioned, and the accuracy of weight configuration and the accuracy of image positioning can be effectively improved.

According to the embodiment of the disclosure, M key points are included in the image to be positioned. Determining descriptor information corresponding to keypoints in the image to be located may comprise: and determining N types of descriptor information respectively corresponding to the M key points, wherein M is an integer larger than 1, N is an integer larger than or equal to 1, and M is larger than or equal to N.

According to the embodiment of the disclosure, the number of the determined key points in the image to be positioned can be multiple. The descriptor information corresponding to different key points can be extracted and obtained through an extraction mode matched with each key point, the extraction modes matched with different key points can be the same or different, and the descriptor information extracted based on different extraction modes can be of different types. For example, the number of the key points determined in the image to be located may be M, N extraction manners adapted to the M key points respectively and used for extracting the descriptor information may be provided, and after the operation of extracting the descriptor information is performed on each key point of the M key points, N types of descriptor information may be obtained.

It should be noted that the types of the descriptor information extracted based on different extraction manners may also be the same, and then after the operation of extracting the descriptor information is performed on each of the M key points, descriptor information of less than N types may be obtained.

Through the above embodiments of the present disclosure, one type of descriptor information that can most represent the keypoint information can be extracted for each keypoint, so that the effectiveness of the obtained descriptor information of each keypoint can be enhanced, and the accuracy of image positioning is improved.

According to an embodiment of the present disclosure, the descriptor information may include a plurality of types of descriptor information. Determining the weight of the descriptor information according to the characterization information of the key points included in the associated image related to the image to be positioned may include: and determining target key points corresponding to the target descriptor information of the target type. And performing feature extraction on the target associated image including the target key points to obtain a target thermodynamic diagram corresponding to the target associated image. And determining target heat information of the target key points according to the target thermodynamic diagram. And determining the weight of the target descriptor information according to the target heat information.

According to an embodiment of the present disclosure, the target type may be any one of a plurality of types. For each type of descriptor information, the descriptor information can correspond to one or more specially trained thermodynamic diagram extraction networks, and the thermodynamic diagram extraction networks can be obtained by means of training in a mode such as NetVLAD, SIFT and the like. By training a plurality of different thermodynamic diagram extraction networks and taking the image to be positioned as the input of the network, a plurality of thermodynamic diagrams corresponding to any one of the image to be positioned and the associated image thereof can be obtained. By extracting a thermodynamic diagram suitable for a certain type of descriptor information and determining the corresponding heat information of the type of descriptor information in the thermodynamic diagram, the weight configured for the type of descriptor information can be determined. For other types of descriptor information, the configured weight can be determined by using a morphological method. Therefore, under the condition that a plurality of local descriptor extraction schemes exist, namely a plurality of types of descriptor information exist, the weighted descriptor information is determined by adopting thermodynamic diagrams matched with the descriptor information of each type, and the weighted descriptor information is fused so as to position the image to be positioned according to the fused weighted descriptor information.

By the embodiment of the disclosure, the weights of the same or different types of target descriptor information can be determined by combining the same or different target thermodynamic diagrams, the obtained weights of the descriptor information corresponding to each key point in the image to be positioned are fused, the confidence coefficient of higher accuracy is achieved, and the accuracy of image positioning can be effectively improved.

According to an embodiment of the present disclosure, the weight descriptor information includes a plurality of weight descriptor information. Determining pose information associated with the image to be located based on the weighted descriptor information weighted with the weights may include: determining a weight value of each weight descriptor information. And sequencing the plurality of weighted descriptor information according to the weight value to obtain a sequencing result. And acquiring a preset number of first target weighted descriptor information according to the sorting result. And determining pose information related to the image to be positioned according to the first target weighting descriptor information.

According to the embodiment of the disclosure, under the condition that the image to be positioned is determined to include a plurality of weighted descriptor information, the descriptor information can be sequenced according to the weight values, and the pose information is calculated only according to the first N weighted descriptor information with larger weight values, so that the positioning of the image to be positioned is realized.

By adopting the embodiment of the disclosure, the pose calculation is carried out by adopting part of the weighted descriptor information, the calculation amount can be effectively reduced, and the precision of the image positioning result can be effectively ensured because the part of the weighted descriptor information is determined according to the sorting result of sorting the weighted descriptor information according to the weighted value.

According to an embodiment of the present disclosure, in a case that the weighted descriptor information includes a plurality of weighted descriptor information, determining the pose information related to the image to be positioned according to the weighted descriptor information weighted by the weight may further include: determining a weight value of each weight descriptor information. And determining second target weighting descriptor information corresponding to the weight value larger than or equal to the preset threshold value. And determining pose information related to the image to be positioned according to the second target weighted descriptor information.

According to the embodiment of the disclosure, in the case that it is determined that a plurality of weighted descriptor information is included in the image to be positioned, a preset threshold value may be set for the weight of the weighted descriptor information. And then carrying out binarization on the weighted descriptor information, wherein the weighted descriptor information corresponding to the weight greater than or equal to a preset threshold value is determined as 1, and the weighted descriptor information corresponding to the weight smaller than the preset threshold value is determined as 0. And then, determining the weighted descriptor information determined as 1 as second target weighted descriptor information, and calculating pose information according to the second target weighted descriptor information to realize the positioning of the image to be positioned.

According to the embodiment of the present disclosure, the preset threshold may also be set for the heat information in the thermodynamic diagram, in which case, the heat information in the thermodynamic diagram may be binarized. For example, a region having heat information (e.g., a heat value) greater than or equal to a preset threshold in the thermodynamic diagram may be determined as 1, and a region having heat information (e.g., a heat value) less than the preset threshold in the thermodynamic diagram may be determined as 0. Then, descriptor information for pose calculation is determined from the descriptor information of the feature point in the region determined as 1, a weight configured for the descriptor information for pose calculation is determined from the heat information of the region, and second target weighted descriptor information is determined from the descriptor information for pose calculation and the weight configured therefor. And calculating pose information according to the second target weighted descriptor information to realize the positioning of the image to be positioned.

It should be noted that the preset threshold set for the weight of the weighted descriptor information may be a first preset threshold, the preset threshold set for the heat information in the thermodynamic diagram may be a second preset threshold, and the first preset threshold and the second preset threshold may be the same or different. Whether the first preset threshold and the second preset threshold are the same may depend on a relationship between the heat value and the weight. For example, in the case where the heat value and the weight are the same, the first preset threshold and the second preset threshold may be the same. Under the condition that the heat value and the weight are different, the determination mode of the first preset threshold and the second preset threshold may be determined according to the conversion relationship between the heat value and the weight, or the first preset threshold and the second preset threshold may be determined in a self-defined manner, which is not limited herein.

By adopting the embodiment of the disclosure, the pose calculation is carried out by adopting part of weighted descriptor information, the calculation amount can be effectively reduced, and the precision of the image positioning result can be effectively ensured because the part of weighted descriptor information is determined according to the preset threshold value.

FIG. 3 schematically illustrates a schematic diagram of an image localization method according to an embodiment of the present disclosure.

As shown in fig. 3, the image 310 to be positioned is input to the descriptor extraction module, and descriptor information 370 corresponding to each feature point in the image 310 to be positioned can be extracted. The image 310 to be positioned is input to the image retrieval module 330 for image content retrieval, and one or more associated images 340 related to the image 310 to be positioned can be obtained. The thermodynamic diagram extraction module 350 may perform feature extraction on the one or more associated images 350 resulting in one or more thermodynamic diagrams 360 corresponding to the one or more associated images 340. From the descriptor information 370 and the heat information corresponding to the descriptor information 370 in the thermodynamic diagram 360, the weight of the descriptor information 370 may be determined, thereby determining the weighted descriptor information 380.

According to the embodiment of the disclosure, a plurality of key points a, b, c, d, etc. may be determined in the image 310 to be located, the descriptor information 321 may include descriptor information extracted from the key points a, b, c, d, etc. one by one, and any one of the retrieved associated

images

341, 342, 343 may include at least one of the key points. For example, the associated image 341 may include key points a, b, etc., the associated image 342 may include key point c, etc., and the associated image 343 may include key point d, etc.

According to the embodiment of the disclosure, for different key points, the same or different extraction modes can be adopted to extract the descriptor information, and the same or different types of descriptor information can be extracted according to different extraction modes. For example, for the key points a and b, a first type of descriptor information may be extracted and obtained by a first extraction method based on the descriptor extraction module 321. For the key points c and d, a second type of descriptor information can be extracted and obtained by a second extraction method based on the descriptor extraction module 322. Through the steps, the descriptor information with two different types respectively corresponding to the key points a, b, c and d can be extracted. The descriptor information is extracted through different extraction modes, and the information effectiveness of the descriptor information extracted aiming at different key points can be effectively improved.

According to the embodiment of the disclosure, for different types of descriptor information, the weights of the types of descriptor information can be more accurately determined by adapting the thermodynamic diagrams adapted to the types of descriptor information. For example, for the associated image 341 including the key points a and b, the descriptor information corresponding to the key points a and b is extracted by the first extraction method based on the descriptor extraction module 321, and in the associated image 342 including the key c and the associated image 343 including the key d, the descriptor information corresponding to the key points c and d is extracted by the second extraction method based on the descriptor extraction module 322. Accordingly, a different thermodynamic diagram extraction module may be employed for determining the thermodynamic diagram for the associated image 341 and the associated

images

342, 343. For example, the thermodynamic diagram extraction module 351 may extract features of the associated image 341 to obtain a thermodynamic diagram 361, and the thermodynamic diagram extraction module 352 may extract features of the associated

images

342 and 343 to obtain thermodynamic diagrams 362 and 363. Therefore, thermodynamic diagrams matched with the descriptor information of each type can be obtained, and more accurate weighted descriptor information can be determined.

It should be noted that the descriptor extraction module 320 may not be limited to 321 and 322, the thermodynamic diagram extraction module 350 may also be limited to 351 and 352, and both the descriptor extraction module 320 and the thermodynamic diagram extraction module 350 may be added or subtracted according to actual scenes.

By means of the above embodiments of the present disclosure, for the technical solutions of image retrieval and local descriptor extraction of a visual positioning scene in a visual positioning technology based on a computer vision technology, a scheme is provided for limiting the weight of the same or different types of descriptor information in an image to be positioned in a manner of obtaining a related thermodynamic diagram based on image retrieval. By training the thermodynamic diagram of the retrieval image and extracting 1 or fusing a plurality of local feature descriptor information and the like through the thermodynamic diagram, the image retrieval and the local descriptor extraction are decoupled to a certain extent, and a plurality of image retrieval and local descriptor extraction schemes can be easily multiplexed. In addition, by the scheme, the precision of image retrieval and local feature descriptor information extraction can be improved, so that the precision of a visual positioning algorithm can be improved.

FIG. 4 schematically shows a block diagram of an image localization arrangement according to an embodiment of the present disclosure.

As shown in fig. 4, the image localization apparatus 400 includes a first determination module 410, a second determination module 420, and a third determination module 430.

A first determining module 410, configured to determine descriptor information corresponding to a keypoint in an image to be located.

The second determining module 420 is configured to determine the weight of the descriptor information according to the representation information of the key point included in the associated image related to the image to be located.

And a third determining module 430, configured to determine pose information associated with the image to be positioned according to the weighted descriptor information weighted by the weight, so as to position the image to be positioned.

According to an embodiment of the present disclosure, the characterization information includes heat information. The second determination module includes a first obtaining unit, a first determination unit, and a second determination unit.

And the first obtaining unit is used for extracting the characteristics of the associated image to obtain a thermodynamic diagram corresponding to the associated image.

And the first determining unit is used for determining the heat information of the key points according to the thermodynamic diagram.

And the second determining unit is used for determining the weight of the descriptor information corresponding to the key point according to the heat information.

According to the embodiment of the disclosure, M key points are included in the image to be positioned. The first determination module includes a third determination unit.

And a third determining unit, configured to determine N types of descriptor information respectively corresponding to the M key points. M is an integer greater than 1, N is an integer greater than or equal to 1, and M is greater than or equal to N.

According to an embodiment of the present disclosure, the descriptor information includes a plurality of types of descriptor information, and the representation information includes heat information. The second determination module includes a fourth determination unit, a second obtaining unit, a fifth determination unit, and a sixth determination unit.

And a fourth determining unit for determining a target key point corresponding to the target descriptor information of the target type.

And the second obtaining unit is used for extracting the characteristics of the target associated image including the target key points to obtain a target thermodynamic diagram corresponding to the target associated image.

And the fifth determining unit is used for determining the target heat information of the target key points according to the target thermodynamic diagram.

And a sixth determining unit for determining the weight of the target descriptor information according to the target heat information.

According to an embodiment of the present disclosure, the weight descriptor information includes a plurality of weight descriptor information. The third determining module comprises a seventh determining unit, a sorting unit, an obtaining unit and an eighth determining unit.

A seventh determining unit configured to determine a weight value of each of the weight descriptor information.

And the sorting unit is used for sorting the plurality of weighted descriptor information according to the weight value to obtain a sorting result.

And the obtaining unit is used for obtaining a preset number of pieces of first target weighting descriptor information according to the sorting result.

And the eighth determining unit is used for determining the pose information related to the image to be positioned according to the first target weighting descriptor information.

According to an embodiment of the present disclosure, the weight descriptor information includes a plurality of weight descriptor information. The third determination module includes a ninth determination unit, a tenth determination unit, and an eleventh determination unit.

A ninth determining unit for determining a weight value of each weight descriptor information.

And the tenth determining unit is used for determining second target weighting descriptor information corresponding to the weight value which is greater than or equal to the preset threshold value.

And the eleventh determining unit is used for determining the pose information related to the image to be positioned according to the second target weighting descriptor information.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image localization method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the image localization method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the image localization method as described above.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the image positioning method. For example, in some embodiments, the image localization method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the image localization method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the image localization method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image positioning method, comprising:

Determine the descriptor information corresponding to the key points in the to-be-located image;

Determine the weight of the descriptor information according to the characterization information of the key point included in the associated image related to the image to be located; and

According to the weighted descriptor information weighted by the weight, the pose information related to the to-be-located image is determined, so as to locate the to-be-located image.

2. The method according to claim 1, wherein the characterization information comprises heat information;

The determining the weight of the descriptor information according to the representation information of the key point included in the associated image related to the image to be located includes:

performing feature extraction on the associated image to obtain a heat map corresponding to the associated image;

According to the heat map, determine the heat information of the key point; and

The weight of the descriptor information corresponding to the key point is determined according to the heat information.

3. The method according to claim 1, wherein M key points are included in the to-be-located image;

The determining the descriptor information corresponding to the key points in the to-be-located image includes:

N types of descriptor information respectively corresponding to the M key points are determined, where M is an integer greater than 1, N is an integer greater than or equal to 1, and M is greater than or equal to N.

4. The method according to claim 1, wherein the descriptor information includes multiple types of descriptor information, and the characterization information includes popularity information;

Determine the target key points corresponding to the target descriptor information of the target type;

Perform feature extraction on the target-related image including the target key points to obtain a target heat map corresponding to the target-related image;

determining target heat information of the target key points according to the target heat map; and

The weight of the target descriptor information is determined according to the target popularity information.

5. The method of claim 1, wherein the weighted descriptor information comprises a plurality of weighted descriptor information;

The determining of the pose information related to the image to be positioned according to the weighted descriptor information weighted by the weight includes:

determining the weight value of each of the weighted descriptor information;

According to the size of the weight value, sort the plurality of weighted descriptor information to obtain a sorting result;

obtaining a preset number of first target weighted descriptor information according to the sorting result; and

According to the first target weighted descriptor information, the pose information related to the to-be-located image is determined.

6. The method of claim 1, wherein the weighted descriptor information comprises a plurality of weighted descriptor information;

determining the weight value of each of the weighted descriptor information;

determining second target weighted descriptor information corresponding to a weight value greater than or equal to a preset threshold; and

According to the second target weighted descriptor information, the pose information related to the to-be-located image is determined.

7. An image positioning device, comprising:

a first determining module, configured to determine descriptor information corresponding to key points in the image to be positioned;

a second determining module, configured to determine the weight of the descriptor information according to the characteristic information of the key points included in the associated image related to the image to be positioned; and

The third determining module is configured to determine the pose information related to the image to be located according to the weighted descriptor information weighted by the weight, so as to locate the image to be located.

8. The apparatus of claim 7, wherein the characterization information comprises heat information;

The second determining module includes:

a first obtaining unit, configured to perform feature extraction on the associated image to obtain a heat map corresponding to the associated image;

a first determining unit, configured to determine the heat information of the key point according to the heat map; and

The second determination unit is configured to determine the weight of the descriptor information corresponding to the key point according to the popularity information.

9. The apparatus according to claim 7, wherein M key points are included in the to-be-located image;

The first determining module includes:

The third determination unit is configured to determine N types of descriptor information corresponding to the M key points respectively, where M is an integer greater than 1, N is an integer greater than or equal to 1, and M is greater than or equal to N.

10. The apparatus according to claim 7, wherein the descriptor information includes multiple types of descriptor information, and the characterization information includes popularity information;

The second determining module includes:

a fourth determination unit, used for determining the target key point corresponding to the target descriptor information of the target type;

a second obtaining unit, configured to perform feature extraction on the target-related image including the target key points, to obtain a target heat map corresponding to the target-related image;

a fifth determining unit, configured to determine target heat information of the target key points according to the target heat map; and

A sixth determining unit, configured to determine the weight of the target descriptor information according to the target heat information.

11. The apparatus of claim 7, wherein the weighted descriptor information comprises a plurality of weighted descriptor information;

The third determining module includes:

a seventh determination unit, configured to determine the weight value of each of the weighted descriptor information;

a sorting unit, configured to sort the multiple weighted descriptor information according to the size of the weight value to obtain a sorting result;

an obtaining unit, configured to obtain a preset number of first target weighted descriptor information according to the sorting result; and

An eighth determination unit, configured to determine pose information related to the image to be positioned according to the first target weighted descriptor information.

12. The apparatus of claim 7, wherein the weighted descriptor information comprises a plurality of weighted descriptor information;

The third determining module includes:

a ninth determination unit, configured to determine the weight value of each of the weighted descriptor information;

A tenth determination unit, configured to determine second target weighted descriptor information corresponding to a weight value greater than or equal to a preset threshold; and

An eleventh determination unit, configured to determine pose information related to the image to be positioned according to the second target weighted descriptor information.

13. An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-6 Methods.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-6.