[go: up one dir, main page]

CN114187344A - Map construction method, device and equipment - Google Patents

Map construction method, device and equipment Download PDF

Info

Publication number
CN114187344A
CN114187344A CN202111348552.8A CN202111348552A CN114187344A CN 114187344 A CN114187344 A CN 114187344A CN 202111348552 A CN202111348552 A CN 202111348552A CN 114187344 A CN114187344 A CN 114187344A
Authority
CN
China
Prior art keywords
image
target
dimensional
sample
virtual camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111348552.8A
Other languages
Chinese (zh)
Other versions
CN114187344B (en
Inventor
秦延文
李佳宁
毛慧
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202111348552.8A priority Critical patent/CN114187344B/en
Publication of CN114187344A publication Critical patent/CN114187344A/en
Application granted granted Critical
Publication of CN114187344B publication Critical patent/CN114187344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • G01C21/3807Creation or updating of map data characterised by the type of data
    • G01C21/383Indoor data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种地图构建方法、装置及设备,该方法包括:获取目标场景的全景图像,基于所述全景图像生成第一虚拟相机对应的第一小孔图像;确定第二虚拟相机的目标姿态与初始姿态之间的旋转矩阵,基于所述旋转矩阵确定所述目标姿态与所述初始姿态之间的外参矩阵;基于所述旋转矩阵确定第二虚拟相机对应的第二小孔图像,从第一小孔图像和第二小孔图像中选取目标场景的实际位置对应的二维特征点,并基于所述二维特征点和所述外参矩阵确定所述实际位置对应的三维地图点;基于所述目标场景的多个三维地图点构建所述目标场景的三维视觉地图。通过本申请的技术方案,可以基于三维视觉地图对目标场景的终端设备进行全局定位,对终端设备进行准确定位。

Figure 202111348552

The present application provides a map construction method, device and device. The method includes: acquiring a panoramic image of a target scene, generating a first pinhole image corresponding to a first virtual camera based on the panoramic image; determining a target posture of the second virtual camera and the rotation matrix between the initial posture, determine the external parameter matrix between the target posture and the initial posture based on the rotation matrix; determine the second pinhole image corresponding to the second virtual camera based on the rotation matrix, from Selecting two-dimensional feature points corresponding to the actual position of the target scene from the first pinhole image and the second pinhole image, and determining the three-dimensional map points corresponding to the actual location based on the two-dimensional feature points and the extrinsic parameter matrix; A three-dimensional visual map of the target scene is constructed based on a plurality of three-dimensional map points of the target scene. Through the technical solution of the present application, the terminal device in the target scene can be globally positioned based on the three-dimensional visual map, and the terminal device can be positioned accurately.

Figure 202111348552

Description

Map construction method, device and equipment
Technical Field
The present application relates to the field of computer vision, and in particular, to a map construction method, apparatus, and device.
Background
The GPS (Global Positioning System) is a high-precision radio navigation Positioning System based on artificial earth satellites, and can provide accurate geographic position, vehicle speed and time information anywhere in the world and in the near-earth space. The Beidou satellite navigation system consists of a space section, a ground section and a user section, can provide high-precision, high-reliability positioning, navigation and time service for users all day long in the global range, and has regional navigation, positioning and time service capabilities.
Because the terminal equipment is provided with the GPS or the Beidou satellite navigation system, the GPS or the Beidou satellite navigation system can be adopted to position the terminal equipment when the terminal equipment needs to be positioned. Under the outdoor environment, because GPS signal or big dipper signal are better, can adopt GPS or big dipper satellite navigation system to carry out accurate positioning to terminal equipment. However, in an indoor environment, the GPS or beidou satellite navigation system cannot accurately position the terminal device because the GPS signal or beidou signal is poor. For example, in energy industries such as coal, electric power, petrochemical industry, and the like, the positioning needs are more and more, and these positioning needs are generally in indoor environments, and due to the problems such as signal shielding, accurate positioning of terminal equipment cannot be performed.
Disclosure of Invention
The application provides a map construction method, which comprises the following steps:
acquiring a panoramic image of a target scene, and generating a first pinhole image corresponding to a first virtual camera based on the panoramic image; the position of the first virtual camera is the sphere center position of a visual spherical coordinate system, and the initial posture of the first virtual camera is any posture taking the sphere center position as the center;
determining a rotation matrix between a target pose of a second virtual camera and the initial pose, and determining a reference matrix between the target pose and the initial pose based on the rotation matrix; the position of the second virtual camera is the sphere center position of the visual spherical coordinate system, and the target posture is obtained by rotating the initial posture around the coordinate axis of the visual spherical coordinate system;
determining a second pinhole image corresponding to a second virtual camera based on the rotation matrix, selecting two-dimensional feature points corresponding to the actual position of the target scene from the first pinhole image and the second pinhole image, and determining three-dimensional map points corresponding to the actual position based on the two-dimensional feature points and the external reference matrix;
a three-dimensional visual map of a target scene is constructed based on a plurality of three-dimensional map points of the target scene.
The present application provides a map construction apparatus, the apparatus including:
the acquisition module is used for acquiring a panoramic image of a target scene;
the generating module is used for generating a first pinhole image corresponding to the first virtual camera based on the panoramic image; the position of the first virtual camera is the sphere center position of a visual spherical coordinate system, and the initial posture of the first virtual camera is any posture taking the sphere center position as the center;
a determination module to determine a rotation matrix between a target pose of a second virtual camera and the initial pose and to determine a parameterization matrix between the target pose and the initial pose based on the rotation matrix; the position of the second virtual camera is the sphere center position of the visual spherical coordinate system, and the target posture is obtained by rotating the initial posture around the coordinate axis of the visual spherical coordinate system;
the generating module is further configured to determine a second pinhole image corresponding to the second virtual camera based on the rotation matrix; the determining module is further configured to select a two-dimensional feature point corresponding to an actual position of the target scene from the first pinhole image and the second pinhole image, and determine a three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the external parameter matrix; and constructing a three-dimensional visual map of the target scene based on the plurality of three-dimensional map points of the target scene.
The present application provides a map building apparatus, including: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is used for executing machine executable instructions to realize the map construction method of the embodiment of the application.
According to the technical scheme, the three-dimensional visual map of the target scene can be constructed, the terminal device of the target scene is globally positioned based on the three-dimensional visual map, the terminal device is accurately positioned, the target scene can be an indoor environment, an indoor positioning function based on vision is realized, the method can be applied to energy industries such as coal, electric power and petrochemical industry, the indoor positioning of personnel (such as workers and inspection personnel) is realized, the personnel position information is rapidly acquired, the personnel safety is guaranteed, and the efficient management of the personnel is realized. The three-dimensional visual map can be constructed by adopting the panoramic image of the target scene, the panoramic image has a larger field angle, repeated data acquisition of the target scene is avoided, and the data acquisition efficiency is improved. When the three-dimensional map points are determined, the small hole images can be obtained in a projection mode of the virtual camera, pose constraints of the virtual camera are added, the three-dimensional map points are determined by adopting a virtual camera binding optimization strategy, and the map building robustness, the map building efficiency and the map building precision are improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a mapping method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a panoramic image based scene reconstruction and localization approach of the present application;
FIG. 3 is a schematic diagram of unfolding a panoramic image into an aperture image in one embodiment of the present application;
FIG. 4A is a schematic illustration of latitude and longitude coordinates of a spherical image and rectangular coordinates of a panoramic image;
FIG. 4B is a schematic diagram of rectangular coordinates of the first pinhole image and longitude and latitude coordinates of the spherical-view image;
FIG. 4C is a schematic diagram between the coordinates of the pinhole image and the coordinates of the viewing sphere;
fig. 5 is a schematic structural diagram of a map building apparatus according to an embodiment of the present application.
Detailed Description
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The embodiment of the application provides a map construction method, which is used for constructing a three-dimensional visual map of a target scene, and then using the three-dimensional visual map to perform global positioning on terminal equipment of the target scene, namely, using the three-dimensional visual map to perform global positioning on the terminal equipment in the moving process of the target scene. Referring to fig. 1, a schematic flow chart of a map construction method is shown, where the method may include:
step 101, acquiring a panoramic image of a target scene, and generating a first pinhole image corresponding to a first virtual camera based on the panoramic image. Illustratively, the position of the first virtual camera is a center position of a sphere from a spherical coordinate system, and the initial pose of the first virtual camera is any pose centered on the center position.
For example, generating a first pinhole image corresponding to the first virtual camera based on the panoramic image may include, but is not limited to: and generating a spherical viewing surface image corresponding to the spherical viewing surface coordinate system based on the panoramic image, and generating a first small hole image corresponding to the first virtual camera based on the spherical viewing surface image.
The generating of the spherical view image corresponding to the spherical view coordinate system based on the panoramic image may include, but is not limited to: determining a mapping relation between longitude and latitude coordinates in the spherical view image and rectangular coordinates in the panoramic image based on the width and the height of the panoramic image; for each longitude and latitude coordinate in the spherical-view image, a rectangular coordinate corresponding to the longitude and latitude coordinate may be determined from the panoramic image based on the mapping relationship, and a pixel value of the longitude and latitude coordinate may be determined based on a pixel value of the rectangular coordinate. On this basis, the apparent spherical image can be generated based on the pixel value of each latitude and longitude coordinate in the apparent spherical image.
Wherein, generating the first pinhole image corresponding to the first virtual camera based on the spherical view surface image may include, but is not limited to: determining a central point coordinate of the first small hole image based on the width and the height of the first small hole image, and determining a mapping relation between a rectangular coordinate in the first small hole image and a longitude and latitude coordinate in the spherical-surface-viewing image based on the central point coordinate and a target distance, wherein the target distance is a distance between the central point of the first small hole image and a sphere center position of a spherical-surface-viewing coordinate system. And for each rectangular coordinate in the first small hole image, determining a longitude and latitude coordinate corresponding to the rectangular coordinate from the spherical viewing surface image based on the mapping relation, and determining a pixel value of the rectangular coordinate based on a pixel value of the longitude and latitude coordinate. On this basis, the first pinhole image may be generated based on the pixel values of each rectangular coordinate in the first pinhole image.
102, determining a rotation matrix between a target posture of a second virtual camera and an initial posture of a first virtual camera, and determining a reference matrix between the target posture and the initial posture based on the rotation matrix; for example, the position of the second virtual camera may be a center position of the view spherical coordinate system, and the target pose is obtained by rotating the initial pose about a coordinate axis of the view spherical coordinate system.
For example, determining a rotation matrix between the target pose of the second virtual camera and the initial pose of the first virtual camera may include, but is not limited to: determining a first rotation angle between the target posture and the initial posture in the first coordinate axis direction, and determining a first sub-rotation matrix in the first coordinate axis direction based on the first rotation angle; determining a second rotation angle between the target posture and the initial posture in the second coordinate axis direction, and determining a second sub-rotation matrix in the second coordinate axis direction based on the second rotation angle; and determining a third rotation angle between the target posture and the initial posture in the third coordinate axis direction, and determining a third sub-rotation matrix in the third coordinate axis direction based on the third rotation angle. On this basis, a rotation matrix between the target pose and the initial pose is determined based on the first sub-rotation matrix, the second sub-rotation matrix, and the third sub-rotation matrix.
In one possible embodiment, determining the external reference matrix between the target pose and the initial pose based on the rotation matrix may include, but is not limited to: a translation matrix between the first virtual camera and the second virtual camera is determined, and the external parameter matrix is determined based on the rotation matrix and the translation matrix.
Step 103, determining a second pinhole image corresponding to the second virtual camera based on the rotation matrix, selecting a two-dimensional feature point corresponding to the actual position of the target scene from the first pinhole image and the second pinhole image, and determining a three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the external reference matrix.
In a possible implementation, the determining the three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the external reference matrix may include, but is not limited to: determining a target loss value of the configured loss function; determining a projection function value between a coordinate system of the virtual camera and a coordinate system of the pinhole image based on the target loss value and the two-dimensional feature point corresponding to the actual position; and determining a three-dimensional map point corresponding to the actual position based on the external reference matrix and the projection function value, namely, the three-dimensional map point is used as a three-dimensional map point in the three-dimensional visual map.
And 104, constructing a three-dimensional visual map of the target scene based on the plurality of three-dimensional map points of the target scene.
By way of example, the three-dimensional visual map may include, but is not limited to: a sample global descriptor corresponding to the sample image, a three-dimensional map point corresponding to the sample image and a sample local descriptor corresponding to the three-dimensional map point; wherein the sample image is an aperture image selected from the first aperture image and the second aperture image.
In a possible implementation manner, after step 104, in the global positioning process of the terminal device, a target image of the terminal device in a target scene is obtained; selecting candidate sample images from the multi-frame sample images based on the similarity between the target image and the multi-frame sample images corresponding to the three-dimensional visual map; acquiring a plurality of feature points from a target image; aiming at each feature point, determining a target three-dimensional map point corresponding to the feature point from the three-dimensional map points corresponding to the candidate sample image; and determining a global positioning pose in the three-dimensional visual map corresponding to the target image based on the plurality of feature points and the target three-dimensional map points corresponding to the plurality of feature points.
The candidate sample image is selected from the multiple frame sample images based on the similarity between the target image and the multiple frame sample images corresponding to the three-dimensional visual map, and the candidate sample image may include, but is not limited to: determining a global descriptor to be detected corresponding to the target image, and determining the distance between the global descriptor to be detected and a sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map. Selecting a candidate sample image from the multi-frame sample images based on the distance between the global descriptor to be detected and each sample global descriptor; the distance between the global descriptor to be detected and the sample global descriptor corresponding to the candidate sample image is the minimum distance; or the distance between the global descriptor to be detected and the sample global descriptor corresponding to the candidate sample image is smaller than the distance threshold.
The determining of the target three-dimensional map point corresponding to the feature point from the three-dimensional map points corresponding to the candidate sample image may include, but is not limited to: determining a local descriptor to be detected corresponding to the feature point, wherein the local descriptor to be detected can be used for representing a feature vector of an image block where the feature point is located, and the image block can be located in the target image; determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each three-dimensional map point corresponding to the candidate sample image; on the basis, selecting target three-dimensional map points from the three-dimensional map points corresponding to the candidate sample images based on the distance between the local descriptor to be detected and each sample local descriptor; the distance between the local descriptor to be measured and the sample local descriptor corresponding to the target three-dimensional map point may be a minimum distance, and the minimum distance is smaller than a distance threshold.
The determining of the global descriptor to be detected corresponding to the target image may include, but is not limited to: determining a bag-of-words vector corresponding to the target image based on the trained dictionary model, and determining the bag-of-words vector as a global descriptor to be tested; or inputting the target image to the trained deep learning model to obtain a target vector corresponding to the target image, and determining the target vector as a global descriptor to be detected. Of course, the above are only two examples of determining the global descriptor to be tested, and the determination method of the global descriptor to be tested is not limited.
According to the technical scheme, the three-dimensional visual map of the target scene can be constructed, the terminal device of the target scene is globally positioned based on the three-dimensional visual map, the terminal device is accurately positioned, the target scene can be an indoor environment, an indoor positioning function based on vision is realized, the method can be applied to energy industries such as coal, electric power and petrochemical industry, the indoor positioning of personnel (such as workers and inspection personnel) is realized, the personnel position information is rapidly acquired, the personnel safety is guaranteed, and the efficient management of the personnel is realized. The three-dimensional visual map can be constructed by adopting the panoramic image of the target scene, the panoramic image has a larger field angle, repeated data acquisition of the target scene is avoided, and the data acquisition efficiency is improved. When the three-dimensional map points are determined, the small hole images can be obtained in a projection mode of the virtual camera, pose constraints of the virtual camera are added, the three-dimensional map points are determined by adopting a virtual camera binding optimization strategy, and the map building robustness, the map building efficiency and the map building precision are improved.
The map construction method according to the embodiment of the present application will be described below with reference to specific embodiments.
The data source for implementing the positioning function may include a GPS, a laser radar, a millimeter wave radar, a vision sensor (e.g., a camera), and the like, and the positioning function may be implemented using the data source. The GPS is easily affected by satellite conditions, weather conditions, and data transmission conditions, and cannot be used in an indoor environment. The laser radar and the millimeter wave radar have the advantages of small calculation amount, depth information providing, no influence of illumination and the like, but the laser radar and the millimeter wave radar have sparse information and high price and do not have the possibility of wide use. In comparison, the visual information provided by the visual sensor is influenced by illumination and weather, but the visual sensor is low in cost, small in size, easy to install and rich in content, and has a great application prospect in the aspect of map positioning.
In order to realize a positioning function by adopting a visual sensor, a high-precision three-dimensional visual map is generally required to be constructed, the three-dimensional visual map is used for sensing environment priori knowledge, global positioning can be carried out on the basis of the three-dimensional visual map, and a global positioning pose of the terminal equipment in the three-dimensional visual map is obtained, so that the positioning function is realized.
When a three-dimensional visual map is constructed, a monocular camera is usually used to acquire images of a target scene, and based on the images, three-dimensional reconstruction can be performed through an SFM (Structure From Motion) algorithm to obtain the three-dimensional visual map. However, the monocular camera has a narrow field angle, which results in insufficient coverage of the scene, and needs to repeatedly acquire images of the target scene to reconstruct the images, so that the image construction efficiency is low.
In order to solve the above problems, an embodiment of the present application provides a scene reconstruction and positioning method based on a panoramic image, which may be implemented by using a panoramic camera to collect the panoramic image, and performing three-dimensional reconstruction based on the panoramic image to obtain a three-dimensional visual map, where the three-dimensional reconstruction refers to a process of constructing a physical world scene into a three-dimensional point cloud through an SFM algorithm. The panoramic camera has a very large field angle, so that repeated acquisition of images of a target scene can be avoided, the data acquisition efficiency is improved, the problem caused by a small field angle of the monocular camera can be solved, the quality of the three-dimensional visual map is obviously higher than the mapping effect of the monocular camera, the mapping efficiency is higher, and the mapping precision and robustness are improved.
Referring to fig. 2, a schematic diagram of a scene reconstruction and positioning method based on a panoramic image may include an offline mapping process and an online positioning process. Aiming at the off-line map building process, a panoramic image of a target scene (namely, a scene needing to build a three-dimensional visual map) can be collected, the panoramic image is expanded into an aperture image, SFM (small form-factor) reconstruction bound by multiple cameras is carried out on the basis of the aperture image and fixed connection constraint (namely, an external reference matrix), the three-dimensional visual map is obtained, and map information corresponding to the three-dimensional visual map is stored. Aiming at the on-line positioning process, a target image of a target scene can be collected, attitude calculation is carried out on the target image based on a three-dimensional visual map of the target scene, a global positioning pose in the three-dimensional visual map corresponding to the target image is obtained, and the positioning process is completed.
Firstly, a panoramic image of a target scene can be collected, and the panoramic image is expanded into at least two pinhole images, as shown in fig. 3, which is a schematic diagram of expanding the panoramic image into the pinhole images, the process includes:
step 301, acquiring a panoramic image of a target scene. For example, a panoramic video of a target scene may be acquired, and the panoramic video may be converted into a multi-frame panoramic image, which is not limited to this process.
And 302, generating a spherical view image corresponding to the spherical view coordinate system based on the panoramic image.
For example, assuming that the mapping relationship between the spherical-view image and the panoramic image is longitude and latitude mapping, the longitude and latitude coordinates of the spherical-view image
Figure BDA0003355107050000081
The relationship with the rectangular coordinates (x, y) of the panoramic image is shown in equation (1):
Figure BDA0003355107050000082
in the formula (1), λ and
Figure BDA0003355107050000083
respectively, the longitude and latitude coordinates of the view spherical image (i.e., the image in the view spherical coordinate system), and x and y respectively, the horizontal and vertical coordinates of the panoramic image.
Referring to fig. 4A, a schematic diagram of a relationship between longitude and latitude coordinates of the spherical-view image and rectangular coordinates of the panoramic image is shown, the left side is a schematic diagram of the spherical-view image, and the right side is a schematic diagram of the panoramic image. In the view spherical image, the longitude coordinate is in the range of lambda ∈ [0,2 π ]]The value range of the latitude coordinate is
Figure BDA0003355107050000084
In the panoramic image, the aspect ratio of the panoramic image is W: H2: 1, and the center position of the panoramic image is the origin (0, 0).
Continuing with FIG. 4A, looking at the latitude and longitude coordinates in the spherical image
Figure BDA0003355107050000085
From rectangular coordinates (u) in the panoramic imagep,vp) The mapping relationship between the two can be seen in formula (2).
Figure BDA0003355107050000086
As can be seen from formula (2), based on the width W of the panoramic image and the height H of the panoramic image, the mapping relationship between the longitude and latitude coordinates in the spherical-view image and the rectangular coordinates in the panoramic image can be determined. Obviously for each latitude and longitude coordinate in the spherical-view image, e.g.
Figure BDA0003355107050000091
The latitude and longitude coordinates can be determined from the panoramic image based on the mapping relation
Figure BDA0003355107050000092
Corresponding rectangular coordinates(up,vp) And based on rectangular coordinates (u)p,vp) Determining latitude and longitude coordinates of the pixel values
Figure BDA0003355107050000093
Pixel value of (a), i.e. rectangular coordinate (u)p,vp) As latitude and longitude coordinates
Figure BDA0003355107050000094
The pixel value of (2). On the basis, the pixel values of each longitude and latitude coordinate in the spherical view image can be combined into a spherical view image, so that a spherical view image can be obtained.
In summary, a spherical viewing surface image in a spherical viewing surface coordinate system can be obtained, the spherical viewing surface coordinate system can be a spherical coordinate system, the spherical viewing surface coordinate system is not limited, the spherical viewing surface image is the spherical viewing surface image in the spherical viewing surface coordinate system, and the relationship between the spherical viewing surface image and the panoramic image is shown in fig. 4A.
And 303, generating a first pinhole image corresponding to the first virtual camera based on the view spherical surface image. The spherical viewing surface image is an image in a spherical viewing surface coordinate system, the position of the first virtual camera is a sphere center position in the spherical viewing surface coordinate system, and the initial posture of the first virtual camera is any posture with the sphere center position as a center.
For example, the view spherical image may be expanded into pinhole images from multiple viewpoints, each pinhole image corresponds to a virtual camera, the virtual camera does not exist in a real scene, and is a camera virtualized at the center of sphere position of the view spherical coordinate system, that is, the position of the virtual camera coincides with the center of sphere position of the view spherical coordinate system.
The apparent spherical coordinate system may be a three-dimensional coordinate system, i.e., there are an X-axis, a Y-axis, and a Z-axis, and for each virtual camera, the pose of the virtual camera may be any pose centered on the center position of the sphere. For example, the virtual camera corresponds to three orientations, the orientation in the first orientation coincides with the X-axis of the view spherical coordinate system (i.e., rotates 0 degrees around the X-axis), the orientation in the second orientation coincides with the Y-axis of the view spherical coordinate system (i.e., rotates 0 degrees around the Y-axis), and the orientation in the third orientation coincides with the Z-axis of the view spherical coordinate system (i.e., rotates 0 degrees around the Z-axis). For another example, the virtual camera rotates the attitude of the first direction by 60 degrees around the X axis, the attitude of the second direction coincides with the Y axis, and the attitude of the third direction coincides with the Z axis. For another example, the virtual camera may rotate 120 degrees around the X-axis in the first direction, 60 degrees around the Y-axis in the second direction, and the Z-axis in the third direction. In summary, the pose of the virtual camera can be obtained by rotating around the coordinate axes (such as the X axis, the Y axis and the Z axis) of the spherical-view coordinate system, and the rotation angle is not limited.
For example, the virtual camera in any posture may be referred to as a first virtual camera, and the posture of the first virtual camera may be referred to as an initial posture, and it is obvious that the position of the first virtual camera is a sphere center position of a view spherical coordinate system, and the initial posture of the first virtual camera is any posture centered on the sphere center position.
In this embodiment, it is taken as an example that the first direction of the initial posture coincides with the X axis of the spherical viewing surface coordinate system, the second direction of the initial posture coincides with the Y axis of the spherical viewing surface coordinate system, and the third direction of the initial posture coincides with the Z axis of the spherical viewing surface coordinate system, and the above is only an example of the initial posture, and is not limited thereto.
Referring to fig. 4B, a schematic diagram of a relationship between a rectangular coordinate of the first pinhole image (i.e., the pinhole image corresponding to the first virtual camera) and a longitude and latitude coordinate of the spherical-surface-viewing image is shown, where the left side is the schematic diagram of the spherical-surface-viewing image, the right side is the schematic diagram of the first pinhole image, and the spherical-surface-viewing image may be an image of a unit sphere.
Referring to fig. 4B, a pinhole camera, i.e. a first virtual camera, is virtualized at the center of the sphere of the spherical coordinate system, and an image corresponding to the first virtual camera is a first pinhole image in the view plane. Assuming that the width of the first small hole image is w and the height of the first small hole image is h, the coordinate of the center point of the first small hole image is (u)0,v0) (w/2, h/2). Suppose that the rectangular coordinate of point Q on the first pinhole image is (u)q,vq) The connecting line of the point Q and the sphere center position O of the visual sphere coordinate system intersects the visual sphereAt point QsThe distance between the sphere center position O of the spherical coordinate system and the center point of the first small hole image is d. Based on this, when the rectangular coordinates of the point Q on the first pinhole image are converted into three-dimensional rectangular coordinates with the unit sphere center as the origin, see formula (3):
Figure BDA0003355107050000101
as can be seen from fig. 4B, there is a ratio between the three-dimensional coordinates of the point Qs and the point Q, the ratio is based on the similarity principle of triangles, and the relationship of the ratio can be seen in formula (4):
Figure BDA0003355107050000102
due to OQs=1,
Figure BDA0003355107050000103
Thus, point QsCan be seen in equation (5):
Figure BDA0003355107050000111
in practical applications, the point coordinates of the spherical surface can also be expressed by latitude and longitude, as shown in formula (6):
Figure BDA0003355107050000112
by combining the formula (3), the formula (5) and the formula (6), the relationship between the rectangular coordinates of the first small hole image and the longitude and latitude coordinates of the spherical viewing surface image can be obtained, which is shown in the formula (7) and is an example of the relationship.
Figure BDA0003355107050000113
In the formula (7), λ and
Figure BDA0003355107050000114
respectively representing the longitude and latitude coordinates, u, of the spherical-view image0=w/2,v0W denotes the width of the first pinhole image, h denotes the height of the first pinhole image, and w and h are both known values. d represents the distance between the sphere center position O of the spherical coordinate system and the center point of the first small hole image, is a known value, can be configured according to experience, and can also be calculated by adopting a certain algorithm to obtain the value of d, which is not limited. Based on the above formula (7), the rectangular coordinates on the first pinhole image and the longitude and latitude coordinates of the spherical-view image can be projected, i.e. the spherical-view image is converted into the first pinhole image.
In summary, based on the width w and the height h of the first pinhole image, the center point coordinate (u) of the first pinhole image can be determined0,v0) Based on the coordinates of the center point and the target distance d, the rectangular coordinates (u, v) in the first pinhole image and the longitude and latitude coordinates in the spherical-view image can be determined
Figure BDA0003355107050000115
The mapping relationship can be shown in formula (7). Obviously, for each rectangular coordinate in the first pinhole image, the longitude and latitude coordinate corresponding to the rectangular coordinate may be determined from the spherical-view image based on the mapping relationship, and the pixel value of the rectangular coordinate may be determined based on the pixel value of the longitude and latitude coordinate, that is, the pixel value of the longitude and latitude coordinate is taken as the pixel value of the rectangular coordinate. On the basis, the pixel values of each rectangular coordinate in the first small hole image can be combined into the first small hole image, so that a first small hole image is obtained.
In a possible implementation manner, after the panoramic image of the target scene is obtained, the panoramic image may also be directly converted into the first pinhole image corresponding to the first virtual camera, instead of converting the panoramic image into the spherical view image corresponding to the spherical view coordinate system, and the first pinhole image is generated based on the spherical view image. For example, the relationship between the rectangular coordinate of the first pinhole image and the rectangular coordinate of the panoramic image can be obtained, as shown in formula (8), the derivation process of formula (8) is shown in formula (2), formula (3), formula (5) and formula (6), and the meaning of each letter in formula (8) is shown in the above formula, and is not described herein again.
Figure BDA0003355107050000121
As can be seen from equation (8), for each rectangular coordinate (u, v) in the first pinhole image, the rectangular coordinate (u, v) corresponding to the rectangular coordinate (u, v) can be determined from the panoramic image based on the mapping relationshipp,vp) And based on the rectangular coordinate (u)p,vp) Determines the pixel value of the rectangular coordinate (u, v), i.e. the rectangular coordinate (u, v)p,vp) As the pixel value of the rectangular coordinates (u, v). On the basis of the first image, the pixel values of each rectangular coordinate in the first small hole image can be combined into the first small hole image, so that the first small hole image is obtained.
Step 304, determining a rotation matrix between a target pose of the second virtual camera and an initial pose of the first virtual camera, for example, the position of the second virtual camera may be a center of sphere position of the spherical coordinate system, and the target pose is obtained by rotating the initial pose around coordinate axes of the spherical coordinate system.
For example, the view spherical image may be expanded into pinhole images at a plurality of viewpoints, each pinhole image corresponds to a virtual camera, and the virtual camera is a camera which is virtual at the spherical center position of the view spherical coordinate system, that is, the position of the virtual camera coincides with the spherical center position of the view spherical coordinate system. The pose of the virtual camera can be obtained by rotating around the coordinate axes (such as the X axis, the Y axis and the Z axis) of the spherical coordinate system, and the rotating angle is not limited.
On the basis of knowing the initial posture of the first virtual camera, the target posture of the second virtual camera is obtained by rotating the initial posture around the coordinate axis of the visual spherical coordinate system, for example, the first direction of the target posture is obtained by rotating the initial posture by 60 degrees around the X axis, the second direction of the target posture is obtained by rotating the initial posture by 60 degrees around the Y axis, and the third direction of the target posture is obtained by rotating the initial posture by 0 degree around the Z axis. Assuming that the first direction of the initial pose coincides with the X-axis, the second direction of the initial pose coincides with the Y-axis, and the third direction of the initial pose coincides with the Z-axis, then the first direction of the target pose is rotated 60 degrees about the X-axis, the second direction of the target pose is rotated 60 degrees about the Y-axis, and the third direction of the target pose coincides with the Z-axis. For another example, the first direction of the target pose is obtained by rotating the initial pose by 120 degrees around the X-axis, the second direction of the target pose is obtained by rotating the initial pose by 90 degrees around the Y-axis, the third direction of the target pose is obtained by rotating the initial pose by 0 degrees around the Z-axis, and so on, and the rotation relationship between the target pose and the initial pose is not limited.
In summary, based on the target pose of the second virtual camera and the initial pose of the first virtual camera, a first rotation angle between the target pose and the initial pose in the first coordinate axis direction, such as a rotation angle around the X axis for the initial pose, denoted as aXA second rotation angle between the target attitude and the initial attitude in the direction of the second coordinate axis, i.e. the rotation angle about the Y axis for the initial attitude, can be determined, the second rotation angle being denoted AYA third rotation angle between the target attitude and the initial attitude in the direction of the third coordinate axis, i.e. the rotation angle about the Z axis for the initial attitude, can be determined, which third rotation angle is denoted AZ
Based on the first rotation angle AXA second rotation angle AYAnd a third angle of rotation AZA rotation matrix between the target pose of the second virtual camera and the initial pose of the first virtual camera may be determined, e.g., based on the first rotation angle aXDetermining a first sub-rotation matrix R of a first coordinate axis directionxBased on the second angle of rotation AYDetermining a second sub-rotation matrix R for a second coordinate axis directionyBased on the third angle of rotation AZDetermining a third sub-rotation matrix R of a third coordinate axis directionz. Then, the user can use the device to perform the operation,based on the first sub-rotation matrix RxA second sub-rotation matrix RyAnd a third sub-rotation matrix RzA rotation matrix between the target pose and the initial pose can be determined.
The rotation around the Z-axis direction does not conform to the shooting habit, namely the rotation angle around the Z-axis for the initial posture is usually 0 degree, and the third sub-rotation matrix RzUsually 1, for which the third sub-rotation matrix RzWithout limitation, to base on the first sub-rotation matrix RxAnd a second sub-rotation matrix RyThe rotation matrix is determined for the example.
See formula (9), based on the first rotation angle AXDetermining a first sub-rotation matrix RxSee equation (10), is based on the second rotation angle AYDetermining a second sub-rotation matrix RyBased on the first sub-rotation matrix RxAnd a second sub-rotation matrix RyThe rotation matrix R can be obtained as shown in equation (11).
Figure BDA0003355107050000141
Figure BDA0003355107050000142
R=RyRxFormula (11)
In summary, the first rotation angle A between the target posture of the second virtual camera and the initial posture of the first virtual camera is usedXAnd a second angle of rotation AYA first sub-rotation matrix R can be determinedxAnd a second sub-rotation matrix RyAnd then obtaining a rotation matrix R between the target attitude and the initial attitude.
For example, the number of the second virtual cameras may be at least one, and for each second virtual camera, the target pose corresponding to the second virtual camera may be known, and then a rotation matrix R between the target pose of the second virtual camera and the initial pose of the first virtual camera is obtained.For example, referring to fig. 4C, which is a schematic diagram of a relationship between coordinates of the pinhole image and coordinates of the viewing sphere, a viewpoint of the pinhole camera is virtualized at an equal angular interval around the Y axis at the center of the viewing sphere coordinate system, and the second rotation angle a is obtainedYIs {0, 1/3, 2/3, pi, 4/3, 5/3, pi }, and the first rotation angle AXIs 0. On the basis, 6 virtual cameras can be obtained, and the first rotation angle A of the first virtual cameraXIs 0, the second rotation angle AYWhich is 0, this virtual camera is the first virtual camera. First rotation angle A of the second virtual cameraXIs 0, the second rotation angle AYTo 1/3 x pi, this virtual camera is denoted as the second virtual camera 1, and a rotation matrix R between the target pose of the second virtual camera 1 and the initial pose of the first virtual camera is determined based on formula (9) -formula (11). By analogy, the first rotation angle A of the sixth virtual cameraXIs 0, the second rotation angle AYTo 5/3 x pi, this virtual camera is denoted as the second virtual camera 5, and a rotation matrix R between the target pose of the second virtual camera 5 and the initial pose of the first virtual camera is determined based on formula (9) -formula (11).
In summary, when there are multiple second virtual cameras, the target poses of different second virtual cameras may be different, and the rotation matrix R corresponding to each second virtual camera may be determined.
Step 305, determining a reference matrix between the target posture and the initial posture based on a rotation matrix between the target posture of the second virtual camera and the initial posture of the first virtual camera. For example, a translation matrix between the first virtual camera and the second virtual camera may be determined, and then a reference matrix between the target pose and the initial pose may be determined based on the rotation matrix and the translation matrix.
For example, since the second virtual camera and the first virtual camera are fixedly connected and optically concentric, that is, the position of the second virtual camera is the spherical center position of the spherical coordinate system, and the position of the first virtual camera is also the spherical center position of the spherical coordinate system, the first virtual camera and the second virtual camera are connected with each otherThe translation matrix of (a) may be
Figure BDA0003355107050000151
I.e. no translation between the positions of the first virtual camera and the second virtual camera has taken place.
For example, after obtaining the rotation matrix and the translation matrix, an external reference matrix can be obtained based on the rotation matrix and the translation matrix, as shown in equation (12), which is an example of determining the external reference matrix.
Figure BDA0003355107050000152
In the formula (12), c0Representing a first virtual camera (i.e. a reference camera), ciRepresents the i-th second virtual camera,
Figure BDA0003355107050000153
represents a rotation matrix between the i-th second virtual camera and the first virtual camera, as shown in equation (9) -equation (11),
Figure BDA0003355107050000154
representing a translation matrix between the ith second virtual camera and the first virtual camera, i.e.
Figure BDA0003355107050000155
Figure BDA0003355107050000156
Representing an external parameter matrix between the ith second virtual camera and the first virtual camera
Figure BDA0003355107050000157
Will be used in the subsequent multi-camera binding optimization process, see the subsequent embodiments.
In summary, when there are a plurality of second virtual cameras, a parameter matrix between each second virtual camera and the first virtual camera may be determined, which may include a rotation matrix and a translation matrix.
Step 306, determining a second pinhole image corresponding to the second virtual camera based on the rotation matrix between the target pose of the second virtual camera and the initial pose of the first virtual camera. For example, the second pinhole image is determined based on the rotation matrix and the first pinhole image, or the second pinhole image is determined based on the rotation matrix and the view sphere image. Of course, the above manner is only an example, and the present embodiment does not limit this.
For example, referring to FIG. 4B, a point Q on the spherical image may be viewed based on a rotation matrix R between the target pose of the second virtual camera and the initial pose of the first virtual camerasRotating to obtain a rotated point Qs', and point Q after rotations' with Point Q before rotationsSee equation (13):
Figure BDA0003355107050000161
obviously, see point Q on the spherical imagesCorresponding to the point Q on the first pinhole image, and the point Q on the spherical surface images' corresponds to the point Q ' on the second pinhole image, and based on this, the relationship between the point Q ' on the second pinhole image and the point Q on the first pinhole image can be seen in equation (14):
Figure BDA0003355107050000162
in summary, it can be seen that, based on the rotation matrix R, a mapping relationship between the coordinates on the first pinhole image and the coordinates on the second pinhole image can be determined, the mapping relationship can be shown in formula (14), and based on the mapping relationship, the first pinhole image can be converted into the second pinhole image, which is not described herein again.
In another possible implementation, the formula (13) may be substituted into the formula (6) and the formula (7) to obtain a mapping relationship between the coordinates on the spherical viewing surface image and the coordinates on the second pinhole image, and based on the mapping relationship, the spherical viewing surface image may be converted into the second pinhole image, which is not described herein again.
For example, when generating pixels of a pinhole image (such as a first pinhole image or a second pinhole image), interpolation calculation may also be performed, for example, coordinates of the panoramic image may be floating point values, and interpolation calculation may be performed in a bilinear interpolation manner to obtain pixels of the pinhole image, which is not limited in this process.
In summary, the external reference matrix between the first pinhole image and the second pinhole image and between the virtual cameras can be obtained. For example, assuming that there are a first virtual camera, a second virtual camera 1, and a second virtual camera 2, a first pinhole image corresponding to the first virtual camera, a second pinhole image 1 corresponding to the second virtual camera 1, and a second pinhole image 2 corresponding to the second virtual camera 2 are obtained, and an external parameter matrix 11 between the second virtual camera 1 and the first virtual camera, and an external parameter matrix 21 between the second virtual camera 2 and the first virtual camera are obtained.
And secondly, building a multi-camera binding map, for example, based on the first pinhole image, the second pinhole image and the fixed connection constraint (namely, an external reference matrix), performing multi-camera binding SFM reconstruction to obtain a three-dimensional visual map.
Illustratively, for a first pinhole image and a second pinhole image (the number of the first pinhole image is one, and the number of the second pinhole image is at least one), two-dimensional feature points corresponding to the actual positions of a target scene may be selected from the first pinhole image and the second pinhole image (in practical application, multiple frames of panoramic images at different times may be obtained, each frame of panoramic image corresponds to the first pinhole image and the second pinhole image, and multiple two-dimensional feature points corresponding to the actual positions may be selected from the first pinhole image and the second pinhole image, that is, the two-dimensional feature points are feature points in all the pinhole images at different times). For example, for a certain actual position (i.e., an actual physical position) of the target scene, a two-dimensional feature point corresponding to the actual position may be selected from the first pinhole image, and a two-dimensional feature point corresponding to the actual position may be selected from the second pinhole image, so as to obtain a plurality of two-dimensional feature points. And determining the three-dimensional map point corresponding to the actual position based on the two-dimensional feature points and the external reference matrix. For example, determining a target loss value of the configured loss function; determining a projection function value between a coordinate system of the virtual camera and a coordinate system of the pinhole image based on the target loss value and the two-dimensional feature point corresponding to the actual position; and determining the three-dimensional map point corresponding to the actual position based on the external reference matrix and the projection function value, namely, the three-dimensional map point is used as the three-dimensional map point in the three-dimensional visual map.
For example, in the multi-camera bound SFM reconstruction process, the fixed connection constraint (i.e., the constraint that a camera in a group has an external reference matrix) between cameras is considered, and the fixed connection constraint (i.e., the external reference matrix) between virtual cameras can be added into the optimization, the camera binding optimization mainly includes the binding optimization of the rigid body connection constraint added between cameras in the reconstruction process, each camera fixed connection group set is composed of a plurality of snapshots having the same camera fixed connection constraint, each snapshot is composed of images taken by each camera in the camera group at the same time, and the external reference between cameras in the group is the external reference of the above embodiment
Figure BDA0003355107050000171
This fixed constraint will be added to the subsequent reconstruction process.
For example, the reprojection error of a solid set can be expressed as shown in equation (15):
Figure BDA0003355107050000172
in the formula (15), SkRepresenting a solid connected set, S assuming there are 5 second virtual cameraskThen representing the fixed connection group formed by the 5 second virtual cameras, the value range of i is 1-5, namely c1Representing a first and a second virtual camera, c2Representing the first and second virtual camera, and so on. j represents a two-dimensional feature point, i.e., a two-dimensional feature point corresponding to the actual position (a two-dimensional feature point determined from all pinhole images), and when j is 1, it represents the second one corresponding to the actual positionAnd when j is 2, one two-dimensional feature point represents a second two-dimensional feature point corresponding to the actual position, and the like. RhojAnd representing a kernel function of the jth two-dimensional characteristic point, and suppressing a large outlier error. x is the number ofjAnd 2D point coordinates of the jth two-dimensional characteristic point, namely the jth two-dimensional characteristic point, in the pinhole image are represented. Pi denotes the projection function between the coordinate system of the virtual camera and the coordinate system of the pinhole image, i.e. the projection function from the camera system to the image system.
Figure BDA0003355107050000181
Denotes the external reference matrix within the anchor group, c0Representing a first virtual camera, ciRepresenting the ith second virtual camera, i.e.
Figure BDA0003355107050000182
And the external parameter matrix between the ith second virtual camera and the first virtual camera is represented, when i is 1, the external parameter matrix between the 1 st second virtual camera and the first virtual camera is represented, when i is 2, the external parameter matrix between the 2 nd second virtual camera and the first virtual camera is represented, and so on, wherein the external parameter matrices are all obtained external parameter matrices. T iswcAnd representing the pose matrix of the reference camera in the fixed connection group, usually the No. 0 camera, namely the pose matrix of the first virtual camera. XkAnd representing the three-dimensional map points corresponding to the actual positions, namely the three-dimensional map points required to be obtained, and representing the 3D points of the scene. For any camera in the fixed connection group, the camera can pass through TwcAnd
Figure BDA0003355107050000183
is obtained by
Figure BDA0003355107050000184
In equation (15), the parameters to be optimized include the pose matrix of the reference camera (i.e., the pose matrix T of the first virtual camera)wc) And three-dimensional map point XkAnd a reference matrix
Figure BDA0003355107050000185
And a feature point xjAs is known, therefore, during the multi-camera bundled SFM reconstruction process, the above-mentioned loss function can be minimized, for example, by lm (levenberg acquirert) algorithm, and finally the pose matrix T of the reference camera is obtainedwcAnd three-dimensional map point XkThat is, a three-dimensional map point X corresponding to the actual position can be obtainedk
In summary, the function of the reprojection error (shown in equation (15)) can be used as the configured loss function, the loss function can be minimized by the LM algorithm, and the minimized value can be used as the target loss value of the loss function. As can be seen from equation (15), the two-dimensional feature point x corresponding to the actual position based on the target loss valuejThe value of the projection function between the coordinate system of the virtual camera and the coordinate system of the pinhole image can be determined, i.e.
Figure BDA0003355107050000186
Then, based on the external parameter matrix
Figure BDA0003355107050000187
And projection function value
Figure BDA0003355107050000188
The pose matrix T of the reference camera can be deducedwcAnd three-dimensional map point XkThat is, a three-dimensional map point X corresponding to the actual position can be obtainedkI.e. as three-dimensional map points in a three-dimensional visual map.
For a plurality of actual positions of the target scene, the three-dimensional map points corresponding to each actual position may be obtained in the above manner, so that the three-dimensional visual map of the target scene is constructed based on the plurality of three-dimensional map points of the target scene, that is, the three-dimensional visual map may include the plurality of three-dimensional map points of the target scene.
And thirdly, storing map information corresponding to the three-dimensional visual map, for example, storing the map information corresponding to the three-dimensional visual map in a visual feature database. For example, the map information corresponding to the three-dimensional visual map may include, but is not limited to: the method includes the steps of obtaining a sample global descriptor corresponding to a sample image, a three-dimensional map point corresponding to the sample image and a sample local descriptor corresponding to the three-dimensional map point, wherein the sample image is a pinhole image selected from a first pinhole image and a second pinhole image, for example, all the pinhole images are used as the sample image, or a part of the pinhole images are selected from all the pinhole images as the sample image, which is not limited to this.
After obtaining the three-dimensional visual map of the target scene, the three-dimensional visual map may include the following information:
pose of sample image: the sample image is an image when the three-dimensional visual map is constructed, that is, the first pinhole image, the second pinhole image, and the like, that is, the three-dimensional visual map can be constructed based on the sample image, and the pose matrix (which may be referred to as sample image pose for short) of the sample image can be stored in the three-dimensional visual map, that is, the three-dimensional visual map can include the pose of the sample image. Referring to the above embodiments, the pose of the sample image may be the pose matrix T of the reference camera corresponding to the first pinhole image corresponding to the sample imagewc
Sample global descriptor: for each frame of sample image, the sample image may correspond to an image global descriptor, and the image global descriptor is denoted as a sample global descriptor, where the sample global descriptor represents the sample image by using a high-dimensional vector, and the sample global descriptor is used to distinguish image features of different sample images.
For each frame of sample image, determining a bag-of-words vector corresponding to the sample image based on the trained dictionary model, and determining the bag-of-words vector as a sample global descriptor corresponding to the sample image. For example, a Bag of Words (Bag of Words) method is a way for determining a global descriptor, and in the Bag of Words method, a Bag of Words vector can be constructed, which is a vector representation method for image similarity detection, and the Bag of Words vector can be used as a sample global descriptor corresponding to a sample image.
In the visual bag-of-words method, a "dictionary", which may also be referred to as a dictionary model, needs to be trained in advance, and a classification tree is obtained by clustering feature point descriptors in a large number of images and training, wherein each classification tree can represent a visual "word", and the visual "words" form the dictionary model.
For a sample image, all feature point descriptors in the sample image may be classified as words, and the occurrence frequency of all words is counted, so that the frequency of each word in a dictionary may form a vector, the vector is a bag-of-word vector corresponding to the sample image, the bag-of-word vector may be used to measure the similarity of two images, and the bag-of-word vector is used as a sample global descriptor corresponding to the sample image.
For each frame of sample image, the sample image may be input to a trained deep learning model to obtain a target vector corresponding to the sample image, and the target vector is determined as a sample global descriptor corresponding to the sample image. For example, a deep learning method is a method for determining a global descriptor, in the deep learning method, a sample image may be subjected to multilayer convolution through a deep learning model, and a high-dimensional target vector is finally obtained, and the target vector is used as the sample global descriptor corresponding to the sample image.
In the deep learning method, a deep learning model, such as a CNN (Convolutional Neural Networks) model, needs to be trained in advance, and the deep learning model is generally obtained by training a large number of images, and the training mode of the deep learning model is not limited. For a sample image, the sample image may be input to a deep learning model, the deep learning model processes the sample image to obtain a high-dimensional target vector, and the target vector is used as a sample global descriptor corresponding to the sample image.
Sample local descriptors corresponding to feature points of the sample image: for a sample image, the sample image may include a plurality of feature points, where a feature point is a specific pixel position in the sample image, the feature point may correspond to an image local descriptor, and the image local descriptor is recorded as a sample local descriptor, and the sample local descriptor describes features of image blocks in a range near the feature point (i.e., a pixel position) with a vector, and the vector may also be referred to as a descriptor of the feature point. In summary, the sample local descriptor is a feature vector for representing an image block where the feature point is located, and the image block may be located in the sample image. For each feature point of the sample image, the feature point corresponds to a three-dimensional map point in the three-dimensional visual map, that is, a sample local descriptor corresponding to the feature point may be referred to as a sample local descriptor corresponding to the three-dimensional map point.
Wherein, algorithms such as ORB (Oriented FAST and Rotated FAST Transform), SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features), and the like can be adopted to extract Feature points from the sample image and determine the sample local descriptors corresponding to the Feature points. A deep learning algorithm (such as SuperPoint, DELF, D2-Net, etc.) may also be used to extract feature points from the sample image and determine a sample local descriptor corresponding to the feature points, which is not limited to this, as long as the feature points can be obtained and the sample local descriptor can be determined.
Map point information of three-dimensional map points: map point information may include, but is not limited to: the 3D spatial position of the three-dimensional map points, all observed sample images and the number of the corresponding 2D feature points.
And fourthly, performing an online positioning process. For example, a target image of a target scene may be acquired, and attitude calculation may be performed on the target image based on a three-dimensional visual map of the target scene to obtain a global positioning pose in the three-dimensional visual map corresponding to the target image, thereby completing the positioning process. For example, when global positioning of the terminal device needs to be performed based on the three-dimensional visual map, the terminal device may download the three-dimensional visual map from the server, and during the moving process of the terminal device in the target scene, the terminal device may be positioned based on the three-dimensional visual map. The target scene may be an indoor environment, that is, when the terminal device moves in the indoor environment, the global positioning pose of the terminal device in the three-dimensional visual map may be determined, and of course, the target scene may also be an outdoor environment or the like, which is not limited to this target scene. In the above embodiments, the pose (e.g., global positioning pose, etc.) may be a position and a posture, and is generally represented by a rotation matrix and a translation vector, which is not limited to this.
The terminal device may include a vision sensor, such as a camera, for capturing a target image of the target scene during movement of the terminal device (i.e., a real-time image during movement).
The terminal device can be a wearable device (such as a video helmet, a smart watch, smart glasses and the like), and the visual sensor is deployed on the wearable device; or the terminal device is a recorder (for example, the terminal device is carried by a worker during work and has the functions of collecting video and audio in real time, taking pictures, recording, talkbacking, positioning and the like), and the visual sensor is arranged on the recorder; alternatively, the terminal device is a camera (such as a split camera), and the vision sensor is disposed on the camera. Alternatively, the terminal device is a robot, and the vision sensor is disposed on the robot. Alternatively, the terminal device is an autonomous vehicle and the vision sensor is disposed on the autonomous vehicle. Of course, the above are only a few examples, and the examples are not limited thereto, and for example, the terminal device may also be a smart phone, as long as the terminal device is deployed with the visual sensor.
For example, after the target image is obtained, a target three-dimensional map point corresponding to the target image may be determined from the three-dimensional visual map of the target scene, and the global positioning pose of the terminal device in the three-dimensional visual map may be determined based on the target three-dimensional map point.
For example, based on a three-dimensional visual map of a target scene, in one possible implementation, the following steps may be adopted to determine a global positioning pose of a terminal device in the three-dimensional visual map:
step S11, in the global positioning process of the terminal device, acquiring a target image of the terminal device in a target scene, for example, acquiring a target image in the target scene through a visual sensor, that is, a real-time video image.
And step S12, determining the global descriptor to be tested corresponding to the target image.
For example, the target image may correspond to an image global descriptor, and the image global descriptor may be denoted as a global descriptor to be detected, where the global descriptor to be detected represents the target image by using a high-dimensional vector, and the global descriptor to be detected is used to distinguish image features of different target images. For example, a bag-of-words vector corresponding to the target image is determined based on the trained dictionary model, and the bag-of-words vector is determined as the global descriptor to be detected corresponding to the target image. Or inputting the target image to the trained deep learning model to obtain a target vector corresponding to the target image, and determining the target vector as a to-be-detected global descriptor corresponding to the target image.
In summary, the global descriptor to be detected corresponding to the target image may be determined based on a visual bag-of-words method or a deep learning method, and the determination manner refers to the determination manner of the sample global descriptor, which is not described herein again.
Step S13, determining a distance between the global descriptor to be measured (i.e. the global descriptor to be measured corresponding to the target image) and the sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map.
Referring to the above embodiments, the three-dimensional visual map may include a sample global descriptor corresponding to each frame of the sample image, and therefore, a distance, such as a euclidean distance, between the global descriptor to be measured and the sample global descriptor corresponding to each frame of the sample image may be determined, that is, the euclidean distance between two feature vectors is calculated.
Step S14, selecting candidate sample images from multi-frame sample images corresponding to the three-dimensional visual map based on the distance between the global descriptor to be detected and each sample global descriptor; the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is the minimum distance; or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is smaller than the distance threshold.
For example, assuming that the three-dimensional visual map corresponds to the sample image 1, the sample image 2, and the sample image 3, the distance 1 between the global descriptor to be measured and the sample global descriptor corresponding to the sample image 1 may be calculated, the distance 2 between the global descriptor to be measured and the sample global descriptor corresponding to the sample image 2 may be calculated, and the distance 3 between the global descriptor to be measured and the sample global descriptor corresponding to the sample image 3 may be calculated.
In one possible embodiment, if the distance 1 is the minimum distance, the sample image 1 is selected as the candidate sample image. Alternatively, if the distance 1 is smaller than the distance threshold (which may be configured empirically), and the distance 2 is smaller than the distance threshold, but the distance 3 is not smaller than the distance threshold, then both the sample image 1 and the sample image 2 are selected as candidate sample images. Or, if the distance 1 is the minimum distance and the distance 1 is smaller than the distance threshold, the sample image 1 is selected as the candidate sample image, but if the distance 1 is the minimum distance and the distance 1 is not smaller than the distance threshold, the candidate sample image cannot be selected, that is, the global positioning fails.
In summary, candidate sample images may be selected from the multi-frame sample images corresponding to the three-dimensional visual map.
Step S15 is to acquire a plurality of feature points from the target image. For example, for each feature point, a local descriptor to be tested corresponding to the feature point may be determined, where the local descriptor to be tested may be used to represent a feature vector of an image block where the feature point is located, and the image block may be located in the target image.
For example, the target image may include a plurality of feature points, where a feature point may be a specific pixel position in the target image, the feature point may correspond to an image local descriptor, the image local descriptor is recorded as a local descriptor to be detected, the local descriptor to be detected describes the feature of an image block in a range near the feature point (i.e., the pixel position) with a vector, and the vector may also be referred to as a descriptor of the feature point. To sum up, the local descriptor to be measured is a feature vector for representing an image block where the feature point is located.
The characteristic points can be extracted from the target image by using algorithms such as ORB, SIFT, SURF and the like, and the local descriptors to be detected corresponding to the characteristic points are determined. A deep learning algorithm (such as SuperPoint, DELF, D2-Net, etc.) may also be used to extract feature points from the target image and determine the local descriptor to be detected corresponding to the feature points, which is not limited to this, as long as the feature points can be obtained and the local descriptor to be detected can be determined.
Step S16, for each feature point corresponding to the target image, determining a distance, such as an euclidean distance, between the local descriptor to be detected corresponding to the feature point and the sample local descriptor corresponding to each three-dimensional map point corresponding to the candidate sample image, that is, calculating the euclidean distance between two feature vectors.
Referring to the above embodiment, for each frame of sample image, the three-dimensional visual map includes the sample local descriptor corresponding to each three-dimensional map point corresponding to the sample image, and after the candidate sample image is obtained, the sample local descriptor corresponding to each three-dimensional map point corresponding to the candidate sample image may be obtained from the three-dimensional visual map.
After each feature point corresponding to the target image is obtained, the distance between the local descriptor to be detected corresponding to the feature point and the sample local descriptor corresponding to each three-dimensional map point corresponding to the candidate sample image is determined.
Step S17, aiming at each feature point, selecting a target three-dimensional map point from the three-dimensional map points corresponding to the candidate sample image based on the distance between the local descriptor to be detected corresponding to the feature point and the sample local descriptor corresponding to each three-dimensional map point corresponding to the candidate sample image; and the distance between the local descriptor to be detected and the sample local descriptor corresponding to the target three-dimensional map point is the minimum distance, and the minimum distance is smaller than the distance threshold.
For example, assuming that the candidate sample image corresponds to the three-dimensional map point 1, the three-dimensional map point 2, and the three-dimensional map point 3, the distance 1 between the local descriptor to be measured and the sample local descriptor corresponding to the three-dimensional map point 1 may be calculated, the distance 2 between the local descriptor to be measured and the sample local descriptor corresponding to the three-dimensional map point 2 may be calculated, and the distance 3 between the local descriptor to be measured and the sample local descriptor corresponding to the three-dimensional map point 3 may be calculated.
In one possible embodiment, if the distance 1 is the minimum distance, the three-dimensional map point 1 is selected as the target three-dimensional map point. Or, if the distance 1 is smaller than the distance threshold, the distance 2 is smaller than the distance threshold, but the distance 3 is not smaller than the distance threshold, both the three-dimensional map point 1 and the three-dimensional map point 2 may be selected as the target three-dimensional map point. Or, if the distance 1 is the minimum distance and the distance 1 is smaller than the distance threshold, the three-dimensional map point 1 may be selected as the target map point, but if the distance 1 is the minimum distance and the distance 1 is not smaller than the distance threshold, the target three-dimensional map point cannot be selected, that is, the global positioning fails.
And aiming at each characteristic point of the target image, selecting a target three-dimensional map point corresponding to the characteristic point from the candidate sample image corresponding to the target image to obtain the matching relation between the characteristic point and the target three-dimensional map point.
And step S18, determining a global positioning pose in the three-dimensional visual map corresponding to the target image based on the plurality of feature points corresponding to the target image and the target three-dimensional map points corresponding to the plurality of feature points.
For example, the feature point 1 of the target image corresponds to the three-dimensional map point 1, the feature point 2 of the target image corresponds to the three-dimensional map point 2, and so on, thereby obtaining a plurality of matching relationship pairs, each matching relationship pair includes a two-dimensional feature point and a three-dimensional map point, the two-dimensional feature point represents a two-dimensional position in the target image, and the three-dimensional map point represents a three-dimensional position in the three-dimensional visual map, that is, the matching relationship pair includes a mapping relationship from the two-dimensional position to the three-dimensional position, that is, a mapping relationship from the two-dimensional position in the target image to the three-dimensional position in the three-dimensional visual map.
And if the total number of the multiple matching relationship pairs does not meet the number requirement, the fact that the global positioning pose in the three-dimensional visual map corresponding to the target image cannot be determined based on the multiple matching relationship pairs is shown. If the total number of the plurality of matching relationship pairs meets the number requirement (that is, the total number reaches a preset number value), it indicates that the global positioning pose in the three-dimensional visual map corresponding to the target image can be determined based on the plurality of matching relationship pairs, that is, the global positioning pose in the three-dimensional visual map corresponding to the target image can be determined based on the plurality of matching relationship pairs.
For example, a PnP (global NPoint, n-point Perspective) algorithm may be used to calculate the global positioning pose of the target image in the three-dimensional visual map, and the calculation method is not limited. For example, the input data of the PnP algorithm is a plurality of pairs of matching relationships, and for each pair of matching relationships, the pair of matching relationships includes a two-dimensional position in the target image and a three-dimensional position in the three-dimensional visual map, and based on the plurality of pairs of matching relationships, the position and pose of the target image in the three-dimensional visual map, that is, the global positioning position and pose, can be calculated by using the PnP algorithm.
In a possible implementation manner, after obtaining the plurality of matching relationship pairs, an effective matching relationship pair may be found from the plurality of matching relationship pairs. Based on the effective matching relation pairs, the global positioning pose of the target image in the three-dimensional visual map can be calculated by adopting a PnP algorithm. For example, a RANdom SAmple Consensus (RANdom SAmple Consensus) detection algorithm may be adopted to find a valid pair of matching relationships from all pairs of matching relationships, which is not limited in this process.
In conclusion, the global positioning pose of the terminal device in the three-dimensional visual map can be determined.
According to the technical scheme, the three-dimensional visual map of the target scene can be constructed, the terminal equipment of the target scene is globally positioned based on the three-dimensional visual map, and the terminal equipment is accurately positioned. The three-dimensional visual map can be constructed by adopting the panoramic image of the target scene, the panoramic image has a larger field angle, repeated data acquisition of the target scene is avoided, and the data acquisition efficiency is improved. When the three-dimensional map points are determined, the small hole images can be obtained in a projection mode of the virtual camera, pose constraints of the virtual camera are added, the three-dimensional map points are determined by adopting a virtual camera binding optimization strategy, and the map building robustness, the map building efficiency and the map building precision are improved. By using the panoramic camera to realize the SFM reconstruction process, the accuracy and robustness of the image construction can be improved, and the image construction accuracy can be improved by adding a multi-camera binding optimization strategy to constrain the fixed connection relation between the virtual cameras. N virtual cameras which are fixedly connected with each other are obtained according to the spherical projection relation, the fixedly connected constraint is added into the reconstruction process, the constraint can improve the registration precision of the multiple cameras, and the proportion of error registration is reduced. In addition, the concept of the virtual aperture camera fixed connection constraint binding optimization can be expanded to a multi-camera system with hardware fixed connection, the algorithm has universality, only the camera fixed connection external reference needs to be modified, and the details are not repeated.
Based on the same application concept as the method, the embodiment of the present application provides a map building apparatus, as shown in fig. 5, for a structural diagram of the map building apparatus, the apparatus may include:
an obtaining module 51, configured to obtain a panoramic image of a target scene;
a generating module 52, configured to generate a first pinhole image corresponding to the first virtual camera based on the panoramic image; the position of the first virtual camera is the sphere center position of a visual spherical coordinate system, and the initial posture of the first virtual camera is any posture taking the sphere center position as the center;
a determining module 53, configured to determine a rotation matrix between a target pose of a second virtual camera and the initial pose, and determine a reference matrix between the target pose and the initial pose based on the rotation matrix; the position of the second virtual camera is the sphere center position of the visual spherical coordinate system, and the target posture is obtained by rotating the initial posture around the coordinate axis of the visual spherical coordinate system;
the generating module 52 is further configured to determine a second pinhole image corresponding to the second virtual camera based on the rotation matrix; the determining module 53 is further configured to select a two-dimensional feature point corresponding to an actual position of the target scene from the first pinhole image and the second pinhole image, and determine a three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the external reference matrix; and constructing a three-dimensional visual map of the target scene based on the plurality of three-dimensional map points of the target scene.
For example, the generating module 52 is specifically configured to, when generating the first pinhole image corresponding to the first virtual camera based on the panoramic image: generating a spherical view image corresponding to the spherical view coordinate system based on the panoramic image; and generating a first small hole image corresponding to the first virtual camera based on the view spherical surface image.
For example, the generating module 52 is specifically configured to, when generating the spherical view image corresponding to the spherical view coordinate system based on the panoramic image: determining a mapping relation between longitude and latitude coordinates in the spherical view image and rectangular coordinates in the panoramic image based on the width and the height of the panoramic image; aiming at each longitude and latitude coordinate in the spherical view image, determining a rectangular coordinate corresponding to the longitude and latitude coordinate from the panoramic image based on the mapping relation, and determining a pixel value of the longitude and latitude coordinate based on a pixel value of the rectangular coordinate; and generating the apparent spherical image based on the pixel value of each longitude and latitude coordinate in the apparent spherical image.
For example, the generating module 52 is specifically configured to, when generating the first pinhole image corresponding to the first virtual camera based on the spherical view image: determining a center point coordinate of a first small hole image based on the width and the height of the first small hole image, and determining a mapping relation between a rectangular coordinate in the first small hole image and a longitude and latitude coordinate in the spherical viewing surface image based on the center point coordinate and a target distance, wherein the target distance is a distance between the center point of the first small hole image and the spherical center position of the spherical viewing surface coordinate system; aiming at each rectangular coordinate in the first small hole image, determining a longitude and latitude coordinate corresponding to the rectangular coordinate from the spherical viewing surface image based on the mapping relation, and determining a pixel value of the rectangular coordinate based on a pixel value of the longitude and latitude coordinate; the first pinhole image is generated based on the pixel values of each rectangular coordinate in the first pinhole image.
For example, the determining module 53 is specifically configured to determine a rotation matrix between the target pose and the initial pose of the second virtual camera: determining a first rotation angle in a first coordinate axis direction between the target posture and the initial posture, and determining a first sub-rotation matrix in the first coordinate axis direction based on the first rotation angle; determining a second rotation angle in a second coordinate axis direction between the target posture and the initial posture, and determining a second sub-rotation matrix in the second coordinate axis direction based on the second rotation angle; determining a third rotation angle in a third coordinate axis direction between the target posture and the initial posture, and determining a third sub-rotation matrix in the third coordinate axis direction based on the third rotation angle; determining a rotation matrix between the target pose and the initial pose based on the first, second, and third sub-rotation matrices.
For example, the determining module 53 is specifically configured to, when determining the external reference matrix between the target pose and the initial pose based on the rotation matrix: determining a translation matrix between the first virtual camera and the second virtual camera; determining the external parameter matrix based on the rotation matrix and the translation matrix.
For example, the determining module 53 is specifically configured to, when determining the three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the external reference matrix: determining a target loss value of the configured loss function; determining a projection function value between a coordinate system of a virtual camera and a coordinate system of a pinhole image based on the target loss value and the two-dimensional feature points corresponding to the actual positions; and determining the three-dimensional map point corresponding to the actual position based on the external reference matrix and the projection function value.
Illustratively, the three-dimensional visual map comprises a sample global descriptor corresponding to a sample image, a three-dimensional map point corresponding to the sample image and a sample local descriptor corresponding to the three-dimensional map point; the sample image is an aperture image selected from a first aperture image and a second aperture image; the device also comprises a positioning module, a processing module and a processing module, wherein the positioning module is used for acquiring a target image of the terminal equipment in a target scene in the global positioning process of the terminal equipment; selecting candidate sample images from the multi-frame sample images based on the similarity between the target image and the multi-frame sample images corresponding to the three-dimensional visual map; acquiring a plurality of feature points from the target image; for each feature point, determining a target three-dimensional map point corresponding to the feature point from the three-dimensional map points corresponding to the candidate sample image; and determining a global positioning pose in the three-dimensional visual map corresponding to the target image based on the plurality of feature points and the target three-dimensional map points corresponding to the plurality of feature points.
For example, the positioning module is configured to, based on the similarity between the target image and the multi-frame sample images corresponding to the three-dimensional visual map, specifically: determining a global descriptor to be detected corresponding to the target image, and determining the distance between the global descriptor to be detected and a sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map; selecting a candidate sample image from the multi-frame sample images based on the distance between the global descriptor to be detected and each sample global descriptor; the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is the minimum distance; or the distance between the global descriptor to be detected and the sample global descriptor corresponding to the candidate sample image is smaller than a distance threshold.
For example, when the positioning module determines a target three-dimensional map point corresponding to the feature point from the three-dimensional map points corresponding to the candidate sample image, the positioning module is specifically configured to: determining a local descriptor to be tested corresponding to the feature point, wherein the local descriptor to be tested is used for representing a feature vector of an image block where the feature point is located, and the image block is located in the target image; determining the distance between the local descriptor to be tested and the sample local descriptor corresponding to each three-dimensional map point corresponding to the candidate sample image; selecting target three-dimensional map points from the three-dimensional map points corresponding to the candidate sample images based on the distance between the local descriptor to be detected and each sample local descriptor; and the distance between the local descriptor to be detected and the sample local descriptor corresponding to the target three-dimensional map point is a minimum distance, and the minimum distance is smaller than a distance threshold value.
Based on the same application concept as the method, the embodiment of the present application provides a map building apparatus, which may include: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the mapping method disclosed in the above example of the present application.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where a plurality of computer instructions are stored, and when the computer instructions are executed by a processor, the map construction method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (13)

1.一种地图构建方法,其特征在于,所述方法包括:1. a map construction method, is characterized in that, described method comprises: 获取目标场景的全景图像,基于所述全景图像生成第一虚拟相机对应的第一小孔图像;其中,所述第一虚拟相机的位置是视球面坐标系的球心位置,所述第一虚拟相机的初始姿态是以所述球心位置为中心的任一姿态;Obtain a panoramic image of the target scene, and generate a first pinhole image corresponding to the first virtual camera based on the panoramic image; wherein, the position of the first virtual camera is the position of the center of the spherical coordinate system of view, and the first virtual camera The initial posture of the camera is any posture centered on the position of the center of the sphere; 确定第二虚拟相机的目标姿态与所述初始姿态之间的旋转矩阵,并基于所述旋转矩阵确定所述目标姿态与所述初始姿态之间的外参矩阵;其中,所述第二虚拟相机的位置是所述视球面坐标系的球心位置,所述目标姿态是对所述初始姿态绕所述视球面坐标系的坐标轴进行旋转得到;determining a rotation matrix between the target posture of the second virtual camera and the initial posture, and determining an extrinsic parameter matrix between the target posture and the initial posture based on the rotation matrix; wherein the second virtual camera The position of is the position of the center of the spherical coordinate system, and the target attitude is obtained by rotating the initial attitude around the coordinate axis of the spherical coordinate system; 基于所述旋转矩阵确定第二虚拟相机对应的第二小孔图像,从第一小孔图像和第二小孔图像中选取目标场景的实际位置对应的二维特征点,并基于所述二维特征点和所述外参矩阵确定所述实际位置对应的三维地图点;A second pinhole image corresponding to the second virtual camera is determined based on the rotation matrix, a two-dimensional feature point corresponding to the actual position of the target scene is selected from the first pinhole image and the second pinhole image, and based on the two-dimensional feature point The feature point and the external parameter matrix determine the three-dimensional map point corresponding to the actual position; 基于目标场景的多个三维地图点构建所述目标场景的三维视觉地图。A three-dimensional visual map of the target scene is constructed based on a plurality of three-dimensional map points of the target scene. 2.根据权利要求1所述的方法,其特征在于,2. The method according to claim 1, wherein 所述基于所述全景图像生成第一虚拟相机对应的第一小孔图像,包括:The generating the first pinhole image corresponding to the first virtual camera based on the panoramic image includes: 基于所述全景图像生成所述视球面坐标系对应的视球面图像;generating a viewing spherical image corresponding to the viewing spherical coordinate system based on the panoramic image; 基于所述视球面图像生成所述第一虚拟相机对应的第一小孔图像。A first pinhole image corresponding to the first virtual camera is generated based on the spherical image. 3.根据权利要求2所述的方法,其特征在于,3. The method of claim 2, wherein 所述基于所述全景图像生成所述视球面坐标系对应的视球面图像,包括:The generating the spherical viewing image corresponding to the viewing spherical coordinate system based on the panoramic image includes: 基于所述全景图像的宽度和高度,确定所述视球面图像中的经纬度坐标与所述全景图像中的直角坐标之间的映射关系;针对所述视球面图像中的每个经纬度坐标,基于所述映射关系从所述全景图像中确定与该经纬度坐标对应的直角坐标,并基于该直角坐标的像素值确定该经纬度坐标的像素值;Based on the width and height of the panoramic image, determine the mapping relationship between the latitude and longitude coordinates in the spherical image and the rectangular coordinates in the panoramic image; for each latitude and longitude coordinate in the spherical image, based on the The mapping relationship determines the rectangular coordinates corresponding to the latitude and longitude coordinates from the panoramic image, and determines the pixel value of the latitude and longitude coordinates based on the pixel values of the rectangular coordinates; 基于视球面图像中的每个经纬度坐标的像素值生成所述视球面图像。The viewing spherical image is generated based on the pixel value of each latitude and longitude coordinate in the viewing spherical image. 4.根据权利要求2所述的方法,其特征在于,所述基于所述视球面图像生成所述第一虚拟相机对应的第一小孔图像,包括:4. The method according to claim 2, wherein the generating, based on the spherical image, the first pinhole image corresponding to the first virtual camera comprises: 基于第一小孔图像的宽度和高度确定所述第一小孔图像的中心点坐标,基于所述中心点坐标和目标距离确定所述第一小孔图像中的直角坐标与所述视球面图像中的经纬度坐标之间的映射关系,所述目标距离是所述第一小孔图像的中心点与所述视球面坐标系的球心位置之间的距离;针对所述第一小孔图像中的每个直角坐标,基于所述映射关系从所述视球面图像中确定与该直角坐标对应的经纬度坐标,基于该经纬度坐标的像素值确定该直角坐标的像素值;The center point coordinates of the first pinhole image are determined based on the width and height of the first pinhole image, and the rectangular coordinates in the first pinhole image and the spherical image are determined based on the center point coordinates and the target distance. The mapping relationship between the latitude and longitude coordinates in , the target distance is the distance between the center point of the first pinhole image and the spherical center position of the viewing spherical coordinate system; for the first pinhole image in For each rectangular coordinate, determine the longitude and latitude coordinates corresponding to the rectangular coordinates from the spherical image based on the mapping relationship, and determine the pixel value of the rectangular coordinates based on the pixel value of the longitude and latitude coordinates; 基于第一小孔图像中的每个直角坐标的像素值生成所述第一小孔图像。The first pinhole image is generated based on the pixel value of each rectangular coordinate in the first pinhole image. 5.根据权利要求1所述的方法,其特征在于,5. The method of claim 1, wherein 所述确定第二虚拟相机的目标姿态与所述初始姿态之间的旋转矩阵,包括:The determining the rotation matrix between the target posture of the second virtual camera and the initial posture includes: 确定所述目标姿态与所述初始姿态之间的在第一坐标轴方向的第一旋转角度,基于所述第一旋转角度确定第一坐标轴方向的第一子旋转矩阵;determining a first rotation angle in the direction of the first coordinate axis between the target attitude and the initial attitude, and determining a first sub-rotation matrix in the direction of the first coordinate axis based on the first rotation angle; 确定所述目标姿态与所述初始姿态之间的在第二坐标轴方向的第二旋转角度,基于所述第二旋转角度确定第二坐标轴方向的第二子旋转矩阵;determining a second rotation angle in the direction of the second coordinate axis between the target attitude and the initial attitude, and determining a second sub-rotation matrix in the direction of the second coordinate axis based on the second rotation angle; 确定所述目标姿态与所述初始姿态之间的在第三坐标轴方向的第三旋转角度,基于所述第三旋转角度确定第三坐标轴方向的第三子旋转矩阵;determining a third rotation angle in the direction of the third coordinate axis between the target attitude and the initial attitude, and determining a third sub-rotation matrix in the direction of the third coordinate axis based on the third rotation angle; 基于所述第一子旋转矩阵、所述第二子旋转矩阵和所述第三子旋转矩阵确定所述目标姿态与所述初始姿态之间的旋转矩阵。A rotation matrix between the target pose and the initial pose is determined based on the first sub-rotation matrix, the second sub-rotation matrix and the third sub-rotation matrix. 6.根据权利要求1所述的方法,其特征在于,所述基于所述旋转矩阵确定所述目标姿态与所述初始姿态之间的外参矩阵,包括:6. The method according to claim 1, wherein the determining an extrinsic parameter matrix between the target pose and the initial pose based on the rotation matrix comprises: 确定所述第一虚拟相机与所述第二虚拟相机之间的平移矩阵;determining a translation matrix between the first virtual camera and the second virtual camera; 基于所述旋转矩阵和所述平移矩阵确定所述外参矩阵。The extrinsic parameter matrix is determined based on the rotation matrix and the translation matrix. 7.根据权利要求1所述的方法,其特征在于,所述基于所述二维特征点和所述外参矩阵确定所述实际位置对应的三维地图点,包括:7. The method according to claim 1, wherein determining the three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the extrinsic parameter matrix, comprising: 确定已配置的损失函数的目标损失值;Determine the target loss value of the configured loss function; 基于所述目标损失值和所述实际位置对应的二维特征点,确定虚拟相机的坐标系与小孔图像的坐标系之间的投影函数值;Determine the projection function value between the coordinate system of the virtual camera and the coordinate system of the pinhole image based on the target loss value and the two-dimensional feature point corresponding to the actual position; 基于所述外参矩阵和所述投影函数值确定所述实际位置对应的三维地图点。The three-dimensional map point corresponding to the actual position is determined based on the extrinsic parameter matrix and the projection function value. 8.根据权利要求1所述的方法,其特征在于,8. The method of claim 1, wherein: 所述三维视觉地图包括样本图像对应的样本全局描述子、所述样本图像对应的三维地图点和所述三维地图点对应的样本局部描述子;其中,所述样本图像是从所述第一小孔图像和所述第二小孔图像中选取的小孔图像;The three-dimensional visual map includes a sample global descriptor corresponding to the sample image, a three-dimensional map point corresponding to the sample image, and a sample local descriptor corresponding to the three-dimensional map point; a hole image and a selected hole image from the second hole image; 所述构建所述目标场景的三维视觉地图之后,所述方法还包括:After the three-dimensional visual map of the target scene is constructed, the method further includes: 在终端设备的全局定位过程中,获取所述终端设备在所述目标场景中的目标图像;基于所述目标图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像;During the global positioning process of the terminal device, the target image of the terminal device in the target scene is acquired; based on the similarity between the target image and the multi-frame sample images corresponding to the three-dimensional visual map, from the Candidate sample images are selected from the multi-frame sample images; 从所述目标图像中获取多个特征点;针对每个特征点,从所述候选样本图像对应的三维地图点中确定出与该特征点对应的目标三维地图点;Obtain a plurality of feature points from the target image; for each feature point, determine the target 3D map point corresponding to the feature point from the 3D map points corresponding to the candidate sample image; 基于所述多个特征点和所述多个特征点对应的目标三维地图点,确定所述目标图像对应的所述三维视觉地图中的全局定位位姿。Based on the multiple feature points and the target three-dimensional map points corresponding to the multiple feature points, a global positioning pose in the three-dimensional visual map corresponding to the target image is determined. 9.根据权利要求8所述的方法,其特征在于,9. The method of claim 8, wherein 所述基于所述目标图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像,包括:The selection of candidate sample images from the multi-frame sample images based on the similarity between the target image and the multi-frame sample images corresponding to the three-dimensional visual map includes: 确定所述目标图像对应的待测全局描述子,确定所述待测全局描述子与所述三维视觉地图对应的每帧样本图像对应的样本全局描述子之间的距离;determining the global descriptor to be tested corresponding to the target image, and determining the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map; 基于所述待测全局描述子与每个样本全局描述子之间的距离,从所述多帧样本图像中选取出候选样本图像;其中,所述待测全局描述子与候选样本图像对应的样本全局描述子之间的距离为最小距离;或,所述待测全局描述子与候选样本图像对应的样本全局描述子之间的距离小于距离阈值。Based on the distance between the global descriptor to be tested and the global descriptor of each sample, a candidate sample image is selected from the multi-frame sample images; wherein the global descriptor to be tested and the sample corresponding to the candidate sample image The distance between the global descriptors is the minimum distance; or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is smaller than the distance threshold. 10.根据权利要求8所述的方法,其特征在于,所述从所述候选样本图像对应的三维地图点中确定出与该特征点对应的目标三维地图点,包括:10. The method according to claim 8, wherein determining the target three-dimensional map point corresponding to the feature point from the three-dimensional map points corresponding to the candidate sample images, comprising: 确定该特征点对应的待测局部描述子,所述待测局部描述子用于表示该特征点所处图像块的特征向量,且所述图像块位于所述目标图像中;determining a local descriptor to be tested corresponding to the feature point, the local descriptor to be tested is used to represent the feature vector of the image block where the feature point is located, and the image block is located in the target image; 确定所述待测局部描述子与所述候选样本图像对应的每个三维地图点对应的样本局部描述子之间的距离;基于所述待测局部描述子与每个样本局部描述子之间的距离,从所述候选样本图像对应的三维地图点中选取目标三维地图点;Determine the distance between the local descriptor to be tested and the sample local descriptor corresponding to each 3D map point corresponding to the candidate sample image; based on the distance between the local descriptor to be tested and each sample local descriptor distance, select the target three-dimensional map point from the three-dimensional map point corresponding to the candidate sample image; 其中,所述待测局部描述子与所述目标三维地图点对应的样本局部描述子之间的距离为最小距离,且所述最小距离小于距离阈值。The distance between the local descriptor to be measured and the sample local descriptor corresponding to the target three-dimensional map point is a minimum distance, and the minimum distance is less than a distance threshold. 11.一种地图构建装置,其特征在于,所述装置包括:11. A map construction device, wherein the device comprises: 获取模块,用于获取目标场景的全景图像;an acquisition module for acquiring a panoramic image of the target scene; 生成模块,用于基于所述全景图像生成第一虚拟相机对应的第一小孔图像;其中,所述第一虚拟相机的位置是视球面坐标系的球心位置,所述第一虚拟相机的初始姿态是以所述球心位置为中心的任一姿态;The generating module is configured to generate the first pinhole image corresponding to the first virtual camera based on the panoramic image; wherein, the position of the first virtual camera is the position of the center of the spherical coordinate system, and the position of the first virtual camera The initial posture is any posture centered on the position of the center of the ball; 确定模块,用于确定第二虚拟相机的目标姿态与所述初始姿态之间的旋转矩阵,并基于所述旋转矩阵确定所述目标姿态与所述初始姿态之间的外参矩阵;其中,所述第二虚拟相机的位置是所述视球面坐标系的球心位置,所述目标姿态是对所述初始姿态绕所述视球面坐标系的坐标轴进行旋转得到;A determination module, configured to determine the rotation matrix between the target posture of the second virtual camera and the initial posture, and determine the external parameter matrix between the target posture and the initial posture based on the rotation matrix; wherein, the The position of the second virtual camera is the spherical center position of the viewing spherical coordinate system, and the target posture is obtained by rotating the initial posture around the coordinate axis of the viewing spherical coordinate system; 所述生成模块,还用于基于所述旋转矩阵确定所述第二虚拟相机对应的第二小孔图像;所述确定模块,还用于从所述第一小孔图像和所述第二小孔图像中选取所述目标场景的实际位置对应的二维特征点,并基于所述二维特征点和所述外参矩阵确定所述实际位置对应的三维地图点;以及,基于所述目标场景的多个三维地图点构建所述目标场景的三维视觉地图。The generating module is further configured to determine a second pinhole image corresponding to the second virtual camera based on the rotation matrix; the determining module is further configured to obtain a second pinhole image from the first pinhole image and the second pinhole image. Selecting two-dimensional feature points corresponding to the actual position of the target scene in the hole image, and determining the three-dimensional map points corresponding to the actual position based on the two-dimensional feature points and the extrinsic parameter matrix; and, based on the target scene A plurality of three-dimensional map points are constructed to construct a three-dimensional visual map of the target scene. 12.根据权利要求11所述的装置,其特征在于,12. The apparatus of claim 11, wherein 其中,所述生成模块基于所述全景图像生成第一虚拟相机对应的第一小孔图像时具体用于:基于所述全景图像生成所述视球面坐标系对应的视球面图像;基于所述视球面图像生成所述第一虚拟相机对应的第一小孔图像;Wherein, when the generating module generates the first pinhole image corresponding to the first virtual camera based on the panoramic image, it is specifically configured to: generate a viewing spherical image corresponding to the viewing spherical coordinate system based on the panoramic image; The spherical image generates a first pinhole image corresponding to the first virtual camera; 其中,所述生成模块基于所述全景图像生成所述视球面坐标系对应的视球面图像时具体用于:基于所述全景图像的宽度和高度,确定所述视球面图像中的经纬度坐标与所述全景图像中的直角坐标之间的映射关系;针对所述视球面图像中的每个经纬度坐标,基于所述映射关系从所述全景图像中确定与该经纬度坐标对应的直角坐标,并基于该直角坐标的像素值确定该经纬度坐标的像素值;基于视球面图像中的每个经纬度坐标的像素值生成所述视球面图像;Wherein, when the generating module generates the viewing spherical image corresponding to the viewing spherical coordinate system based on the panoramic image, it is specifically configured to: determine the latitude and longitude coordinates in the viewing spherical image based on the width and height of the panoramic image and the The mapping relationship between the rectangular coordinates in the panoramic image; for each longitude and latitude coordinates in the spherical image, the rectangular coordinates corresponding to the longitude and latitude coordinates are determined from the panoramic image based on the mapping relationship, and based on the The pixel value of the rectangular coordinate determines the pixel value of the latitude and longitude coordinates; the spherical image is generated based on the pixel value of each latitude and longitude coordinate in the spherical image; 其中,所述生成模块基于所述视球面图像生成所述第一虚拟相机对应的第一小孔图像时具体用于:基于第一小孔图像的宽度和高度确定第一小孔图像的中心点坐标,基于所述中心点坐标和目标距离确定所述第一小孔图像中的直角坐标与所述视球面图像中的经纬度坐标之间的映射关系,所述目标距离是所述第一小孔图像的中心点与所述视球面坐标系的球心位置之间的距离;针对第一小孔图像中的每个直角坐标,基于所述映射关系从所述视球面图像中确定与该直角坐标对应的经纬度坐标,基于该经纬度坐标的像素值确定该直角坐标的像素值;基于第一小孔图像中的每个直角坐标的像素值生成所述第一小孔图像;Wherein, when the generating module generates the first pinhole image corresponding to the first virtual camera based on the viewing spherical image, it is specifically configured to: determine the center point of the first pinhole image based on the width and height of the first pinhole image Coordinates, the mapping relationship between the rectangular coordinates in the first eyelet image and the latitude and longitude coordinates in the spherical image is determined based on the center point coordinates and the target distance, and the target distance is the first eyelet The distance between the center point of the image and the position of the spherical center of the viewing spherical coordinate system; for each rectangular coordinate in the first pinhole image, determine the rectangular coordinate from the viewing spherical image based on the mapping relationship. Corresponding latitude and longitude coordinates, the pixel value of the rectangular coordinate is determined based on the pixel value of the latitude and longitude coordinate; the first small hole image is generated based on the pixel value of each rectangular coordinate in the first small hole image; 其中,所述确定模块确定第二虚拟相机的目标姿态与所述初始姿态之间的旋转矩阵时具体用于:确定所述目标姿态与所述初始姿态之间的在第一坐标轴方向的第一旋转角度,基于第一旋转角度确定第一坐标轴方向的第一子旋转矩阵;确定所述目标姿态与所述初始姿态之间的在第二坐标轴方向的第二旋转角度,基于第二旋转角度确定第二坐标轴方向的第二子旋转矩阵;确定所述目标姿态与所述初始姿态之间的在第三坐标轴方向的第三旋转角度,基于第三旋转角度确定第三坐标轴方向的第三子旋转矩阵;基于第一子旋转矩阵、第二子旋转矩阵和第三子旋转矩阵确定所述目标姿态与所述初始姿态之间的旋转矩阵;Wherein, when the determining module determines the rotation matrix between the target posture of the second virtual camera and the initial posture, it is specifically used for: determining the first coordinate axis direction between the target posture and the initial posture A rotation angle, determining a first sub-rotation matrix in the direction of the first coordinate axis based on the first rotation angle; determining a second rotation angle in the direction of the second coordinate axis between the target attitude and the initial attitude, based on the second rotation angle The rotation angle determines the second sub-rotation matrix in the direction of the second coordinate axis; determines the third rotation angle between the target posture and the initial posture in the direction of the third coordinate axis, and determines the third coordinate axis based on the third rotation angle the third sub-rotation matrix of the direction; the rotation matrix between the target posture and the initial posture is determined based on the first sub-rotation matrix, the second sub-rotation matrix and the third sub-rotation matrix; 其中,所述确定模块基于所述旋转矩阵确定所述目标姿态与所述初始姿态之间的外参矩阵时具体用于:确定所述第一虚拟相机与所述第二虚拟相机之间的平移矩阵;基于所述旋转矩阵和所述平移矩阵确定所述外参矩阵;Wherein, when the determining module determines the extrinsic parameter matrix between the target posture and the initial posture based on the rotation matrix, it is specifically configured to: determine the translation between the first virtual camera and the second virtual camera matrix; determining the extrinsic parameter matrix based on the rotation matrix and the translation matrix; 其中,所述确定模块基于所述二维特征点和所述外参矩阵确定所述实际位置对应的三维地图点时具体用于:确定已配置的损失函数的目标损失值;基于所述目标损失值和所述实际位置对应的所述二维特征点,确定虚拟相机的坐标系与小孔图像的坐标系之间的投影函数值;基于所述外参矩阵和所述投影函数值确定所述实际位置对应的三维地图点;Wherein, when the determining module determines the three-dimensional map point corresponding to the actual position based on the two-dimensional feature point and the external parameter matrix, it is specifically used for: determining the target loss value of the configured loss function; based on the target loss value and the two-dimensional feature point corresponding to the actual position, determine the projection function value between the coordinate system of the virtual camera and the coordinate system of the pinhole image; determine the value of the projection function based on the external parameter matrix and the projection function value The 3D map point corresponding to the actual location; 其中,所述三维视觉地图包括样本图像对应的样本全局描述子、所述样本图像对应的三维地图点和所述三维地图点对应的样本局部描述子;所述样本图像是从第一小孔图像和第二小孔图像中选取的小孔图像;所述装置还包括:定位模块,用于在终端设备的全局定位过程中,获取终端设备在目标场景中的目标图像;基于所述目标图像与三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像;从所述目标图像中获取多个特征点;针对每个特征点,从所述候选样本图像对应的三维地图点中确定出与该特征点对应的目标三维地图点;基于所述多个特征点和所述多个特征点对应的目标三维地图点,确定所述目标图像对应的所述三维视觉地图中的全局定位位姿;The three-dimensional visual map includes a sample global descriptor corresponding to the sample image, a three-dimensional map point corresponding to the sample image, and a sample local descriptor corresponding to the three-dimensional map point; the sample image is obtained from the first pinhole image and the pinhole image selected from the second pinhole image; the device further includes: a positioning module for acquiring a target image of the terminal device in the target scene during the global positioning process of the terminal device; based on the target image and the The similarity between the multi-frame sample images corresponding to the three-dimensional visual map, select candidate sample images from the multi-frame sample images; obtain a plurality of feature points from the target image; for each feature point, from the The target three-dimensional map point corresponding to the feature point is determined from the three-dimensional map points corresponding to the candidate sample images; based on the multiple feature points and the target three-dimensional map points corresponding to the multiple feature points, the the global positioning pose in the three-dimensional visual map; 其中,所述定位模块基于所述目标图像与所述三维视觉地图对应的多帧样本图像之间的相似度,从所述多帧样本图像中选取出候选样本图像时具体用于:确定所述目标图像对应的待测全局描述子,确定所述待测全局描述子与所述三维视觉地图对应的每帧样本图像对应的样本全局描述子之间的距离;基于所述待测全局描述子与每个样本全局描述子之间的距离,从所述多帧样本图像中选取出候选样本图像;其中,所述待测全局描述子与所述候选样本图像对应的样本全局描述子之间的距离为最小距离;或者,所述待测全局描述子与所述候选样本图像对应的样本全局描述子之间的距离小于距离阈值;Wherein, when the positioning module selects candidate sample images from the multi-frame sample images based on the similarity between the target image and the multi-frame sample images corresponding to the three-dimensional visual map, it is specifically used for: determining the The global descriptor to be tested corresponding to the target image, the distance between the global descriptor to be tested and the sample global descriptor corresponding to each frame of sample image corresponding to the three-dimensional visual map is determined; based on the global descriptor to be tested and The distance between the global descriptors of each sample, and the candidate sample image is selected from the multi-frame sample images; wherein, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is the minimum distance; or, the distance between the global descriptor to be tested and the sample global descriptor corresponding to the candidate sample image is less than the distance threshold; 其中,所述定位模块从所述候选样本图像对应的三维地图点中确定出与该特征点对应的目标三维地图点时具体用于:确定该特征点对应的待测局部描述子,所述待测局部描述子用于表示该特征点所处图像块的特征向量,且所述图像块位于所述目标图像中;确定所述待测局部描述子与所述候选样本图像对应的每个三维地图点对应的样本局部描述子之间的距离;基于所述待测局部描述子与每个样本局部描述子之间的距离,从所述候选样本图像对应的三维地图点中选取目标三维地图点;其中,所述待测局部描述子与所述目标三维地图点对应的样本局部描述子之间的距离为最小距离,且所述最小距离小于距离阈值。Wherein, when the positioning module determines the target three-dimensional map point corresponding to the feature point from the three-dimensional map points corresponding to the candidate sample image, it is specifically used to: determine the local descriptor to be measured corresponding to the feature point, the to-be-measured local descriptor The measured local descriptor is used to represent the feature vector of the image block where the feature point is located, and the image block is located in the target image; determine each three-dimensional map corresponding to the local descriptor to be measured and the candidate sample image The distance between the sample local descriptors corresponding to the point; based on the distance between the local descriptor to be measured and each sample local descriptor, select the target three-dimensional map point from the three-dimensional map point corresponding to the candidate sample image; The distance between the local descriptor to be measured and the sample local descriptor corresponding to the target three-dimensional map point is a minimum distance, and the minimum distance is less than a distance threshold. 13.一种地图构建设备,其特征在于,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;所述处理器用于执行机器可执行指令,以实现权利要求1-10任一项所述的方法步骤。13. A map construction device, comprising: a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions that can be executed by the processor; the processor is used for Machine-executable instructions are executed to implement the method steps of any of claims 1-10.
CN202111348552.8A 2021-11-15 2021-11-15 Map construction method, device and equipment Active CN114187344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111348552.8A CN114187344B (en) 2021-11-15 2021-11-15 Map construction method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111348552.8A CN114187344B (en) 2021-11-15 2021-11-15 Map construction method, device and equipment

Publications (2)

Publication Number Publication Date
CN114187344A true CN114187344A (en) 2022-03-15
CN114187344B CN114187344B (en) 2025-08-22

Family

ID=80540081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111348552.8A Active CN114187344B (en) 2021-11-15 2021-11-15 Map construction method, device and equipment

Country Status (1)

Country Link
CN (1) CN114187344B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114674328A (en) * 2022-03-31 2022-06-28 北京百度网讯科技有限公司 Map generation method, device, electronic device, storage medium, and vehicle
CN115272470A (en) * 2022-08-01 2022-11-01 影石创新科技股份有限公司 Camera positioning method and device, computer equipment and storage medium
CN115933718A (en) * 2022-11-07 2023-04-07 武汉大学 Unmanned aerial vehicle autonomous flight technical method integrating panoramic SLAM and target recognition
CN116402892A (en) * 2023-03-31 2023-07-07 阿里巴巴(中国)有限公司 Positioning method, device, equipment and program product
EP4553765A1 (en) * 2023-11-07 2025-05-14 Check&Visit A method of estimating a pose of a panoramic image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8363928B1 (en) * 2010-12-24 2013-01-29 Trimble Navigation Ltd. General orientation positioning system
CN104748746A (en) * 2013-12-29 2015-07-01 刘进 Attitude determination and virtual reality roaming method of intelligent machine
CN108364252A (en) * 2018-01-12 2018-08-03 深圳市粒视界科技有限公司 A kind of correction of more fish eye lens panorama cameras and scaling method
CN109087244A (en) * 2018-07-26 2018-12-25 贵州火星探索科技有限公司 A kind of Panorama Mosaic method, intelligent terminal and storage medium
CN111126304A (en) * 2019-12-25 2020-05-08 鲁东大学 Augmented reality navigation method based on indoor natural scene image deep learning
WO2020103075A1 (en) * 2018-11-22 2020-05-28 深圳印象认知技术有限公司 Image processing method and device
CN113205591A (en) * 2021-04-30 2021-08-03 北京奇艺世纪科技有限公司 Method and device for acquiring three-dimensional reconstruction training data and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8363928B1 (en) * 2010-12-24 2013-01-29 Trimble Navigation Ltd. General orientation positioning system
CN104748746A (en) * 2013-12-29 2015-07-01 刘进 Attitude determination and virtual reality roaming method of intelligent machine
CN105474033A (en) * 2013-12-29 2016-04-06 刘进 Attitude determination, panoramic image generation and target recognition methods for intelligent machine
CN108364252A (en) * 2018-01-12 2018-08-03 深圳市粒视界科技有限公司 A kind of correction of more fish eye lens panorama cameras and scaling method
CN109087244A (en) * 2018-07-26 2018-12-25 贵州火星探索科技有限公司 A kind of Panorama Mosaic method, intelligent terminal and storage medium
WO2020103075A1 (en) * 2018-11-22 2020-05-28 深圳印象认知技术有限公司 Image processing method and device
CN111126304A (en) * 2019-12-25 2020-05-08 鲁东大学 Augmented reality navigation method based on indoor natural scene image deep learning
CN113205591A (en) * 2021-04-30 2021-08-03 北京奇艺世纪科技有限公司 Method and device for acquiring three-dimensional reconstruction training data and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114674328A (en) * 2022-03-31 2022-06-28 北京百度网讯科技有限公司 Map generation method, device, electronic device, storage medium, and vehicle
CN115272470A (en) * 2022-08-01 2022-11-01 影石创新科技股份有限公司 Camera positioning method and device, computer equipment and storage medium
CN115933718A (en) * 2022-11-07 2023-04-07 武汉大学 Unmanned aerial vehicle autonomous flight technical method integrating panoramic SLAM and target recognition
CN116402892A (en) * 2023-03-31 2023-07-07 阿里巴巴(中国)有限公司 Positioning method, device, equipment and program product
EP4553765A1 (en) * 2023-11-07 2025-05-14 Check&Visit A method of estimating a pose of a panoramic image
WO2025099170A1 (en) * 2023-11-07 2025-05-15 Check&Visit A method of estimating a pose of a panoramic image

Also Published As

Publication number Publication date
CN114187344B (en) 2025-08-22

Similar Documents

Publication Publication Date Title
US11748906B2 (en) Gaze point calculation method, apparatus and device
CN111586360B (en) Unmanned aerial vehicle projection method, device, equipment and storage medium
CN114187344B (en) Map construction method, device and equipment
CN110568447B (en) Visual positioning method, device and computer readable medium
JP6768156B2 (en) Virtually enhanced visual simultaneous positioning and mapping systems and methods
US20190012804A1 (en) Methods and apparatuses for panoramic image processing
TWI795885B (en) Visual positioning method, device and computer-readable storage medium
CN110799921A (en) Filming method, device and drone
WO2016199605A1 (en) Image processing device, method, and program
EP3274964B1 (en) Automatic connection of images using visual features
CN114120301B (en) Method, device and apparatus for determining posture
CN111737518A (en) Image display method and device based on three-dimensional scene model and electronic equipment
US10977810B2 (en) Camera motion estimation
CN114185073A (en) Pose display method, device and system
CN114882106A (en) Pose determination method and device, equipment and medium
CN110544278B (en) Rigid body motion capture method and device and AGV pose capture system
CN116170689B (en) Video generation method, device, computer equipment and storage medium
Huttunen et al. A monocular camera gyroscope
CN120374668A (en) VSLAM mapping method, device, computer equipment and storage medium
JP2005063012A (en) Omnidirectional camera motion and three-dimensional information restoration method, apparatus and program thereof, and recording medium recording the same
Yang et al. A fast and effective panorama stitching algorithm on UAV aerial images
WO2018150086A2 (en) Methods and apparatuses for determining positions of multi-directional image capture apparatuses
JP3548652B2 (en) Apparatus and method for restoring object shape
Amorós et al. Towards relative altitude estimation in topological navigation tasks using the global appearance of visual information
Butt et al. Multi-task learning for camera calibration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant