[go: up one dir, main page]

CN118864786B - Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching - Google Patents

Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching Download PDF

Info

Publication number
CN118864786B
CN118864786B CN202411353326.2A CN202411353326A CN118864786B CN 118864786 B CN118864786 B CN 118864786B CN 202411353326 A CN202411353326 A CN 202411353326A CN 118864786 B CN118864786 B CN 118864786B
Authority
CN
China
Prior art keywords
image
real
target
matching
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411353326.2A
Other languages
Chinese (zh)
Other versions
CN118864786A (en
Inventor
滕锡超
叶熠彬
李璋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202411353326.2A priority Critical patent/CN118864786B/en
Publication of CN118864786A publication Critical patent/CN118864786A/en
Application granted granted Critical
Publication of CN118864786B publication Critical patent/CN118864786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Remote Sensing (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及一种基于一致语义约束实例分割匹配的飞行器视觉导航方法。所述方法包括:利用通用图像标记模型分别提取基准图和实时图中的地物目标集合;利用开集目标检测模型检测地物目标集合中的所有元素在原图上的目标框;再使用通用图像分割模型分割出该目标在目标框中的轮廓;对实时图,基准图以及导航目标库中具有相同语义内涵的区域进行特征点匹配获得匹配点对信息;建立实时图上二维匹配点与对应的基准图三维信息的关系后再结合机载相机内参,通过PnP算法计算当前相机的位置及姿态。采用本方法能够利用相同语义区域来辅助图像匹配。采用本方法能够提高飞行器在低空飞行,大视角倾斜观测等条件下的匹配鲁棒性和定位精度。

The present application relates to a method for aircraft visual navigation based on consistent semantic constraint instance segmentation and matching. The method comprises: using a general image labeling model to extract ground object target sets in a reference image and a real-time image respectively; using an open set target detection model to detect the target frame of all elements in the ground object target set on the original image; then using a general image segmentation model to segment the contour of the target in the target frame; performing feature point matching on the areas with the same semantic connotation in the real-time image, the reference image and the navigation target library to obtain matching point pair information; establishing a relationship between the two-dimensional matching points on the real-time image and the corresponding three-dimensional information of the reference image, and then combining the internal parameters of the onboard camera to calculate the position and posture of the current camera through the PnP algorithm. The method can be used to assist image matching by using the same semantic area. The method can improve the matching robustness and positioning accuracy of the aircraft under conditions such as low-altitude flight and large-angle tilt observation.

Description

Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching
Technical Field
The application relates to the technical field of aircraft navigation, in particular to an aircraft visual navigation method based on consistent semantic constraint instance segmentation matching.
Background
The aircraft visual navigation technology mainly uses visual sensors (visible light, infrared and the like) carried on a flight platform to image the ground and combines a reference image containing geographic position information and an image matching algorithm to realize the estimation of navigation parameters such as the pose of the aircraft. The existing visual navigation technology can be divided into two types of relative visual navigation and absolute visual navigation, and the relative visual navigation method mainly comprises the methods of visual odometer (Visual Odometry, VO), instant positioning, map construction (Simultaneous Localization AND MAPPING, SLAM) and the like by matching image sequences and estimating the relative gesture of an aircraft according to the geometric relation between front and rear frame images. The absolute visual navigation method utilizes a real-time image shot by the aircraft to be matched with a reference image with geographic coordinate information, and then solves the absolute pose of the aircraft according to the reference image information and an imaging model. The relative visual navigation method needs to build a map of an unknown flight environment, has a good effect in indoor navigation, but the outdoor flight condition (the flight height is more than 100 meters) and the observation mode (monocular vision) of the unmanned aerial vehicle are difficult to support SLAM to carry out synchronous three-dimensional map construction and accurate positioning. The absolute navigation method based on reference map matching is widely applied in various fields, but the algorithms are mainly suitable for high-altitude forward looking down shooting observation scenes, and under the condition of low-altitude flight (the main focus range is more than 100 meters and less than 1 km according to the current low-altitude economy) and large-inclination oblique observation, the existing method is still difficult to realize high-precision real-time map-reference map matching navigation under the influence of the stereoscopic effect of a target and the visual angle difference. In addition, urban scene structure is complex and various, repeated textures are many, most of existing matching methods do not pay attention to semantic features of matching areas, and many mismatching point pairs appear on non-rigid or non-fixed targets such as trees, water areas and moving objects, so that accurate pose solving results are difficult to obtain.
For the wide baseline image matching problem caused by large image visual angle difference and low overlapping rate, the traditional image matching method (such as SIFT, SURF, ORB) and the like are difficult to realize robust matching, the image matching model based on deep learning in recent years has made series progress on the wide baseline matching problem, and the deep learning model represented by Superpoint, superglue, loFTR, DKM and the like has certain robustness on the visual angle difference, but still cannot meet the requirement of high-precision visual navigation. In order to fully utilize the advantages of the deep learning technology in the understanding of image content, in recent years, students research the influence of factors such as visual angle difference and low overlapping rate on an image matching algorithm by adopting models of other visual tasks, such as Chen Ying and the like, firstly, the overlapping area of the image to be matched is calculated by using a deep learning method, then feature point matching is carried out on the overlapping area, zhao Chenhao and the like, target detection is carried out on the image by adopting Yolo-v5, then the feature points are extracted by taking a target frame as a center, semantic information and position information are fused into a feature encoder, finally, the mismatching points are removed by checking the semantic consistency of matching point pairs, zhang Yesheng and the like adopt a matching strategy from the surface to the point, firstly, a region matching method (SEMANTIC AND Geometry AREA MATCHING, SGAM) based on semantic features and geometric features is used, then the matching area is carried out on a pixel level, the whole semantic and geometric information of the image area can be utilized to provide priori for the matching of the subsequent feature points, zhang Yesheng and the like also proposes a method (MATCHING EVERYTHING by SEGMENTING ANYTHING, MESA) for matching by segmentation, firstly, a map (3435) is segmented by using a map (35) is carried out, and then a visual registration method is carried out on the map matching with high accuracy compared with the matching result. In addition, the MATCHING ANYTHING by SEGMENTING ANYTHING (MASA) framework proposed by Li Siyuan et al, which also takes advantage of the powerful segmentation capabilities of SAM models, has robust inter-frame matching and tracking capabilities for a variety of targets in video sequences by training together image segmentation and instance-level matching tracking tasks. In the method, chen Yeng et al needs to train an image overlapping region discrimination model in advance and cannot eliminate negative effects caused by overlapping region visual angle difference, zhao Chenhao et al adopts Yolo-v5 to detect targets, does not have the capability of identifying and detecting all types of targets in an image, does not provide a region segmentation result with finer granularity, still has a lot of background information interference in the same target region, SGAM, MESA and MASA do not use image regions or display semantic information of specific targets, and the operation can avoid the interference of the false identification of the image regions on subsequent matching, but also ignores the strong constraint effect of the display semantic information on the matching process, has large visual angle difference and low overlapping rate, and has insufficient matching robustness of the aircraft remote sensing images with various targets.
Disclosure of Invention
Based on the above, it is necessary to provide an aircraft visual navigation method based on consistent semantic constraint instance segmentation matching, which can improve matching robustness and positioning accuracy of an aircraft under conditions of low-altitude flight, large-view angle oblique observation and the like.
An aircraft visual navigation method based on consistent semantic constraint instance segmentation matching, the method comprising:
The method comprises the steps of acquiring an aerospace remote sensing image data set and a general image basic model trained according to the aerospace remote sensing image data set, wherein the aerospace remote sensing image data set comprises a plurality of ground scene optical images shot by airborne cameras of an aerospace vehicle;
Establishing a navigation target library mainly comprising rigid targets and plane area targets, taking an aerial optical image with geocoding and ground elevation information as a reference image, taking a ground scene optical image as a real-time image, and respectively extracting the reference image and ground object information in the real-time image according to a general image marking model to obtain target sets simultaneously existing in the real-time image, the reference image and the navigation target library;
Detecting positions of all semantic elements in a target set on a real-time graph and a reference graph by using an open set target detection model to obtain target frame sets of different elements;
dividing the outlines in all the target frames in the target frame set by using a general image segmentation model to obtain target area information;
performing feature point matching on the regions with the same semantic meaning in the real-time image and the reference image based on the target region information and an image matching algorithm to obtain matching point pairs between the reference image and the real-time image;
And establishing a relation between two-dimensional matching points on the real-time graph and corresponding three-dimensional information of the reference graph according to the matching point pairs between the reference graph and the real-time graph, then calculating the position and the gesture of the current camera by combining with the onboard camera internal parameters through a PnP algorithm, and then calculating the pose of the aircraft according to the translational rotation relation between the camera coordinate system and the aircraft coordinate so as to realize visual navigation of the aircraft.
According to the aircraft visual navigation method based on consistent semantic constraint instance segmentation matching, firstly, a navigation target library mainly comprising rigid targets and plane area targets is established, an aerial optical image with geocoding and ground elevation information is used as a reference image, a ground scene optical image is used as a real-time image, the reference image and ground object information in the real-time image are respectively extracted according to a general image marking model to obtain target sets which are simultaneously existing in the real-time image, the reference image and the navigation target library, a navigation database is constructed in advance, so that the common, fixed, rigid or plane area targets in a flight scene can be screened out in advance, and the targets can be detected, segmented and matched to further improve the practicability and matching accuracy of an algorithm; and inputting the target type into an open set target detection model to obtain the position of a target frame, inputting the target text and a spatial prompt into a general image segmentation model to obtain a fine granularity segmentation result of a corresponding target, realizing semantic extraction and instance segmentation of a reference image and a real-time image in the visual navigation process through a general image intelligent processing model, simultaneously obtaining fine contour information of all distribution areas of all key targets in a remote sensing image, and providing priori by fully utilizing semantic information of image areas by respectively carrying out feature point matching on the same semantic areas in the real-time image and the reference image, thereby overcoming the interference caused by factors such as visual angle change, illumination change, low overlapping rate, sensor modal difference and the like, and avoiding mismatching among different semantic areas. In conclusion, the method can effectively improve the matching robustness and the positioning accuracy of the visual navigation algorithm based on image matching under the conditions of low-altitude flight, large-view angle oblique observation and the like. The application can be applied to monocular visual navigation tasks of various flight platforms such as unmanned aerial vehicles, airships and the like, and has wide application prospect and economic value.
Drawings
FIG. 1 is a flow diagram of an aircraft visual navigation method based on consistent semantic constraint instance segmentation matching in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, there is provided an aircraft visual navigation method based on consistent semantic constraint instance segmentation matching, comprising the steps of:
Step 102, acquiring an aerospace remote sensing image dataset and a general image basic model trained according to the aerospace remote sensing image dataset, wherein the aerospace remote sensing image dataset comprises a plurality of ground scene optical images shot by airborne cameras of an aerospace vehicle, and the general image basic model comprises a general image marking model, an open set target detection model and a general image segmentation model.
Acquiring an air-to-air remote sensing image dataset, wherein the air-to-air remote sensing image dataset comprises ground scene optical images shot by a plurality of air-to-air aircrafts (unmanned aerial vehicles and satellites) and labels such as ground object target types, target frames and target instance segmentation information contained in image data, training or fine-tuning a general image marking model based on the dataset after acquiring the air-to-air remote sensing image dataset, and an open-set target detection model and a general image segmentation model, wherein the general image marking model, the open-set target detection model and the general image segmentation model are neural network models based on deep learning. Because the mode of the remote sensing image has specificity, the empty remote sensing image with labels is used for training or fine tuning the corresponding model, and the generalization performance of the model on the remote sensing image can be improved.
Step 104, a navigation target library mainly comprising rigid targets and plane area targets is established, an aerial optical image with geocoding and ground elevation information is used as a reference image, a ground scene optical image is used as a real-time image, and ground object information in the reference image and the real-time image is respectively extracted according to a general image marking model to obtain target sets which are simultaneously existing in the real-time image, the reference image and the navigation target library.
The reference map carries absolute geographical coordinate information, such as latitude and longitude, and ground level elevation information (digital elevation model DEM or digital surface model DSM). The generic image tag model includes, but is not limited to, all models (Recognize Anything Model, RAM) that are capable of identifying all typical object targets in the remote sensing image.
And 106, detecting positions of all semantic elements in the target set on the real-time graph and the reference graph by using an open set target detection model to obtain target frame sets of different elements.
The open set object detection model includes, but is not limited to, the Grounding-DINO model, provided that it has the ability to detect the corresponding object location in the image from the input text prompt (represented in the form of an object rectangular box).
And step 108, segmenting the outlines in all the target frames in the target frame set by using a general image segmentation model to obtain target region information.
The general image segmentation Model includes, but is not limited to, segmentation of a Model (SEGMENT ANYTHING Model, SAM), which can generate an arbitrary image segmentation mask based on text or spatial cues, can fully automatically segment images without text or spatial cues, and can adapt to new target segmentation tasks with zero samples.
The method comprises the steps of carrying out semantic extraction and instance segmentation on a reference image and a real-time image in the visual navigation process, analyzing common fixed targets in a flying scene in advance, establishing a navigation target library mainly comprising rigid targets and plane area targets (the targets are more beneficial to detection, segmentation and matching), taking an aerial optical image with geocoding and ground elevation information as the reference image, taking an optical image shot by an aerial vehicle in real time on the ground as the real-time image, respectively extracting ground object information in the reference image and the real-time image by using a general image marking model to obtain target sets existing in the real-time image, the reference image and the navigation target library simultaneously, detecting positions of the elements on the real-time image and the reference image by using an open set target detection model for all semantic elements in the ground object target set to obtain target frame sets of different elements, and dividing outlines of the targets in the target frame by using a general image segmentation model for all target frames to obtain more accurate target area information.
And 110, carrying out feature point matching on the regions with the same semantic meaning in the real-time image and the reference image based on the target region information and the image matching algorithm to obtain a matching point pair between the reference image and the real-time image.
Such image matching algorithms include, but are not limited to, conventional feature point matching algorithms (e.g., SIFT, SURF, ORB, etc.) or deep learning models (e.g., superpoint + Superglue, D2Net, DKM, loFTR, etc.).
And 112, establishing a relation between two-dimensional matching points on the real-time graph and corresponding three-dimensional information of the reference graph according to the matching point pairs between the reference graph and the real-time graph, calculating the position and the posture of the current camera by combining the internal parameters of the airborne camera through a PnP algorithm, and calculating the pose of the aircraft according to the translational rotation relation between the camera coordinate system and the aircraft coordinate so as to realize visual navigation of the aircraft.
And (3) carrying out feature point matching on the regions with the same semantic meaning in the real-time image and the reference image by using an image matching algorithm to obtain corresponding matching point pair information, and calculating the current position and posture of the onboard camera (the camera for shooting the real-time image) by using a 2D-3D PnP algorithm under the condition that the geographic coordinates and the three-dimensional information of the onboard camera reference and the reference image are known for the matching point pair information obtained by image matching, and then calculating the pose of the aircraft according to the translational rotation relation between the camera coordinate system and the aircraft coordinate. PnP pose resolving based on 2D-3D matching point pairs belongs to a general flow in the field of visual navigation, and comprises a direct linear transformation method (DLT), OPnP and a EPnP method, so that the robustness of a PnP algorithm is further improved, the PnP resolving is constrained by using inertial navigation parameters with errors of an aircraft, and the problem that the PnP is trapped into a local extremum due to factors such as observation geometric conditions is avoided.
The aircraft visual navigation method based on consistent semantic constraint instance segmentation matching firstly establishes a navigation target library mainly comprising rigid targets and plane area targets, takes an air-sky optical image with geocode and ground elevation information as a reference image, takes a ground scene optical image as a real-time image, respectively extracts ground object information in the reference image and the real-time image according to a general image mark model to obtain target sets which are simultaneously existing in the real-time image, the reference image and the navigation target library, and can screen out the targets which are common in a flight scene, fixed and rigid targets or plane area targets in advance by constructing a navigation database in advance, the targets are detected, segmented and matched, the practicability and the matching precision of an algorithm can be further improved, then the target types are input into an open-set target detection model to obtain the positions of target frames, finally, inputting the target text and the spatial prompt into a general image segmentation model to obtain fine-granularity segmentation results of the corresponding targets, realizing semantic extraction and instance segmentation of a reference image and a real-time image in the visual navigation process through a general image intelligent processing model, simultaneously obtaining fine contour information of all distribution areas of all key targets in a remote sensing image, and carrying out characteristic point matching on the same semantic areas in the real-time image and the reference image respectively to fully utilize semantic information of the image areas to provide priori, thereby overcoming interference caused by factors such as visual angle change, illumination change, low overlapping rate, sensor modal difference and the like and avoiding mismatching among different semantic areas The monocular visual navigation task of various flight platforms such as airship has wide application prospect and economic value.
In one embodiment, the general image marking model is a neural network model based on deep learning, and the method for respectively extracting ground feature information in the reference map and the real-time map according to the general image marking model to obtain target sets simultaneously existing in the real-time map, the reference map and the navigation target library comprises the following steps:
Respectively identifying all typical object targets in the input image according to the general image marking model, and respectively returning to the real-time graph Target set in (a)Reference mapTarget set in (a)Then to the collection,Navigation target libraryAcquiring intersection sets to obtain target sets simultaneously existing in a real-time image, a reference image and a navigation target libraryIs that
;
Wherein, AndRepresenting the target text in the real-time and reference graphs respectively,And (5) representing the real-time graph, and obtaining target texts with the same semantic meaning in the reference graph and the navigation target library.
In particular embodiments, the target text is such as a building, road, football field, and the like.
In one embodiment, the open set target detection model is a neural network model based on deep learning, and the method for detecting the positions of all semantic elements in the target set on the real-time graph and the reference graph by using the open set target detection model to obtain the target frame set of different elements comprises the following steps:
Detecting corresponding targets in the image according to the input text prompt by using an open set target detection model, returning the positions of the targets in the form of target rectangular boxes, and aiming at the types of the targets in the real-time graph Corresponding firstThe target positions areWherein, the method comprises the steps of, wherein,Representing a real-time graphThe number of type objects is determined by the number of type objects,Pixel coordinates of four corner points of the rectangular frame;
For object types in a reference graph Corresponding firstThe target positions areWherein, the method comprises the steps of, wherein,Representative of the reference diagramThe number of type objects is determined by the number of type objects,Is the pixel coordinates of the four corner points of the rectangular frame.
In one embodiment, the general image segmentation model can generate any image segmentation mask according to text or space prompt, can fully automatically segment images without text or space prompt and can adapt to a new target segmentation task in a zero sample mode, and the general image segmentation model is used for segmenting outlines in all target frames in a target frame set to obtain target region information, wherein the method comprises the following steps:
For object types in a real-time graph Text information of the target type is taken as a prompt word, and is displayed in a rectangular frameImage segmentation is carried out by using a general image segmentation model to obtainCorresponding target area maskFor the object type in the reference graphThe same operation is also performed to obtainCorresponding target area mask
In one embodiment, the image matching algorithm comprises, but is not limited to, a traditional feature point matching algorithm or a deep learning model, and the method comprises the steps of performing feature point matching on the regions with the same semantic meaning in the real-time image and the reference image based on the target region information and the image matching algorithm to obtain a matching point pair between the reference image and the real-time image, and comprises the following steps:
For collections Target in (a)The corresponding area in the real-time graph isThe corresponding area in the reference graph isUsing image matching algorithms for matching objects having the same semantic meaningRegion and areaPerforming feature point matching on the region to obtain a matching point corresponding relationWhereinAndRepresenting objects on the real-time and reference maps, respectivelyAnd obtaining the matching point pairs between the reference graph and the real-time graph.
In one embodiment, a relationship between two-dimensional matching points on a real-time graph and corresponding three-dimensional information of the reference graph is established according to a matching point pair between the reference graph and the real-time graph, and then the position and the posture of a current camera are calculated by a PnP algorithm by combining with an onboard camera internal reference, including:
from pairs of matching points between reference and real-time maps And establishing a relation between two-dimensional matching points on the real-time map and corresponding three-dimensional information of the reference map, and solving the current position and the current posture of the airborne camera by utilizing a 2D-3D PnP algorithm under the condition that the internal parameters of the airborne camera and the geographic coordinates and the three-dimensional information of the reference map are known.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (6)

1. An aircraft visual navigation method based on consistent semantic constraint instance segmentation matching, the method comprising:
acquiring an aerospace remote sensing image dataset and a general image basic model trained according to the aerospace remote sensing image dataset, wherein the aerospace remote sensing image dataset comprises a plurality of ground scene optical images shot by an airborne camera of an aerospace vehicle;
Establishing a navigation target library mainly comprising rigid targets and plane area targets, taking an air-sky optical image with geocoding and ground elevation information as a reference image, taking the ground scene optical image as a real-time image, and respectively extracting ground object information in the reference image and the real-time image according to a general image marking model to obtain target sets which are simultaneously existing in the real-time image, the reference image and the navigation target library;
Detecting positions of all semantic elements in the target set on a real-time graph and a reference graph by using an open set target detection model to obtain a target frame set of different elements;
Dividing the outlines in all target frames in the target frame set by using a general image segmentation model to obtain target region information;
performing feature point matching on the regions with the same semantic meaning in the real-time image and the reference image based on the target region information and the image matching algorithm to obtain matching point pairs between the reference image and the real-time image;
And establishing a relation between two-dimensional matching points on the real-time graph and corresponding three-dimensional information of the reference graph according to the matching point pairs between the reference graph and the real-time graph, then calculating the position and the gesture of the current camera by combining with the onboard camera internal parameters through a PnP algorithm, and then calculating the pose of the aircraft according to the translational rotation relation between the camera coordinate system and the aircraft coordinate so as to realize visual navigation of the aircraft.
2. The method of claim 1, wherein the generic image marking model is a neural network model based on deep learning, wherein extracting ground feature information in the reference map and the real-time map respectively according to the generic image marking model obtains a set of objects simultaneously existing in the real-time map, the reference map and a navigation object library, comprises:
Respectively identifying all typical object targets in the input image according to the general image marking model, and respectively returning to the real-time graph Target set in (a)Reference mapTarget set in (a)Then to the collection,Navigation target libraryAcquiring intersection sets to obtain target sets simultaneously existing in a real-time image, a reference image and a navigation target libraryIs that
;
Wherein, AndRepresenting the target text in the real-time and reference graphs respectively,And (5) representing the real-time graph, and obtaining target texts with the same semantic meaning in the reference graph and the navigation target library.
3. The method of claim 1, wherein the open set object detection model is a neural network model based on deep learning, and detecting positions of all semantic elements in the object set on the real-time graph and the reference graph by using the open set object detection model to obtain an object frame set of different elements comprises:
Detecting corresponding targets in the image according to the input text prompt by using an open set target detection model, returning the positions of the targets in the form of target rectangular boxes, and aiming at the types of the targets in the real-time graph Corresponding firstThe target positions areWherein, the method comprises the steps of, wherein,Representing a real-time graphThe number of type objects is determined by the number of type objects,Pixel coordinates of four corner points of the rectangular frame;
For object types in a reference graph Corresponding firstThe target positions areWherein, the method comprises the steps of, wherein,Representative of the reference diagramThe number of type objects is determined by the number of type objects,Is the pixel coordinates of the four corner points of the rectangular frame.
4. A method according to any one of claims 1 to 3, wherein the generic image segmentation model is capable of generating arbitrary image segmentation masks from text or spatial cues, of segmenting images fully automatically without text or spatial cues, and of adapting to new object segmentation tasks with zero samples, and wherein segmenting contours in all object boxes in the set of object boxes with the generic image segmentation model, obtaining object region information, comprises:
For object types in a real-time graph Text information of the target type is taken as a prompt word, and is displayed in a rectangular frameImage segmentation is carried out by using a general image segmentation model to obtainCorresponding target area maskFor the object type in the reference graphThe same operation is also performed to obtainCorresponding target area mask
5. The method of claim 4, wherein the image matching algorithm includes, but is not limited to, a conventional feature point matching algorithm or a deep learning model, wherein the step of performing feature point matching on regions having the same semantic meaning in the real-time graph and the reference graph based on the target region information and the image matching algorithm to obtain a matching point pair between the reference graph and the real-time graph includes:
For collections Target in (a)The corresponding area in the real-time graph isThe corresponding area in the reference graph isUsing image matching algorithms for matching objects having the same semantic meaningRegion and areaPerforming feature point matching on the region to obtain a matching point corresponding relationWhereinAndRepresenting objects on the real-time and reference maps, respectivelyAnd obtaining the matching point pairs between the reference graph and the real-time graph.
6. The method of claim 1, wherein establishing a relationship between two-dimensional matching points on the real-time map and corresponding three-dimensional information of the reference map according to matching point pairs between the reference map and the real-time map, and then calculating a position and an attitude of a current camera by a PnP algorithm in combination with onboard camera references comprises:
from pairs of matching points between reference and real-time maps And establishing a relation between two-dimensional matching points on the real-time map and corresponding three-dimensional information of the reference map, and solving the current position and the current posture of the airborne camera by utilizing a 2D-3D PnP algorithm under the condition that the internal parameters of the airborne camera and the geographic coordinates and the three-dimensional information of the reference map are known.
CN202411353326.2A 2024-09-26 2024-09-26 Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching Active CN118864786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411353326.2A CN118864786B (en) 2024-09-26 2024-09-26 Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411353326.2A CN118864786B (en) 2024-09-26 2024-09-26 Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching

Publications (2)

Publication Number Publication Date
CN118864786A CN118864786A (en) 2024-10-29
CN118864786B true CN118864786B (en) 2024-11-29

Family

ID=93175497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411353326.2A Active CN118864786B (en) 2024-09-26 2024-09-26 Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching

Country Status (1)

Country Link
CN (1) CN118864786B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102353377A (en) * 2011-07-12 2012-02-15 北京航空航天大学 High altitude long endurance unmanned aerial vehicle integrated navigation system and navigating and positioning method thereof
CN110672088A (en) * 2019-09-09 2020-01-10 北京航空航天大学 A UAV autonomous navigation method imitating the homing mechanism of carrier pigeon landform perception

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118644554B (en) * 2024-07-31 2024-10-18 中国人民解放军国防科技大学 Aircraft navigation method based on monocular depth estimation and ground characteristic point matching

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102353377A (en) * 2011-07-12 2012-02-15 北京航空航天大学 High altitude long endurance unmanned aerial vehicle integrated navigation system and navigating and positioning method thereof
CN110672088A (en) * 2019-09-09 2020-01-10 北京航空航天大学 A UAV autonomous navigation method imitating the homing mechanism of carrier pigeon landform perception

Also Published As

Publication number Publication date
CN118864786A (en) 2024-10-29

Similar Documents

Publication Publication Date Title
Taneja et al. City-scale change detection in cadastral 3d models using images
CN107093205B (en) A kind of three-dimensional space building window detection method for reconstructing based on unmanned plane image
CN111882612A (en) A vehicle multi-scale localization method based on 3D laser detection of lane lines
Majdik et al. Air‐ground matching: Appearance‐based GPS‐denied urban localization of micro aerial vehicles
CN109598794B (en) Construction method of three-dimensional GIS dynamic model
US20200364554A1 (en) Systems and methods for deep localization and segmentation with a 3d semantic map
CN113516664A (en) A Visual SLAM Method Based on Semantic Segmentation of Dynamic Points
JP2022520019A (en) Image processing methods, equipment, mobile platforms, programs
Taneja et al. Geometric change detection in urban environments using images
WO2021017211A1 (en) Vehicle positioning method and device employing visual sensing, and vehicle-mounted terminal
Gao et al. Ground and aerial meta-data integration for localization and reconstruction: A review
CN115717894A (en) A high-precision vehicle positioning method based on GPS and common navigation maps
Ardeshir et al. Geo-semantic segmentation
CN113838129B (en) Method, device and system for obtaining pose information
CN111060924A (en) SLAM and target tracking method
CN103697883B (en) A kind of aircraft horizontal attitude defining method based on skyline imaging
He et al. Ground and aerial collaborative mapping in urban environments
CN106127791A (en) A kind of contour of building line drawing method of aviation remote sensing image
CN117036484B (en) Visual positioning and mapping method, system, equipment and medium based on geometry and semantics
Xiao et al. Geo-spatial aerial video processing for scene understanding and object tracking
Matei et al. Image to lidar matching for geotagging in urban environments
KR102824305B1 (en) Method and System for change detection and automatic updating of road marking in HD map through IPM image and HD map fitting
CN115597592A (en) Comprehensive positioning method applied to unmanned aerial vehicle inspection
Majdik et al. Micro air vehicle localization and position tracking from textured 3d cadastral models
CN118864786B (en) Aircraft visual navigation method based on consistent semantic constraint instance segmentation matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant