[go: up one dir, main page]

CN120849650B - Remote sensing image searching method, device and computer readable storage medium - Google Patents

Remote sensing image searching method, device and computer readable storage medium

Info

Publication number
CN120849650B
CN120849650B CN202511370079.1A CN202511370079A CN120849650B CN 120849650 B CN120849650 B CN 120849650B CN 202511370079 A CN202511370079 A CN 202511370079A CN 120849650 B CN120849650 B CN 120849650B
Authority
CN
China
Prior art keywords
remote sensing
sensing image
wide
regions
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511370079.1A
Other languages
Chinese (zh)
Other versions
CN120849650A (en
Inventor
李超
许诺
姚柯璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202511370079.1A priority Critical patent/CN120849650B/en
Publication of CN120849650A publication Critical patent/CN120849650A/en
Application granted granted Critical
Publication of CN120849650B publication Critical patent/CN120849650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种遥感影像搜索方法、设备及计算机可读存储介质,获取广域遥感图像和描述搜索目标的语言指令,通过融合对广域遥感图像和语言指令的跨模态语义信息实现语义对齐,从而利用语言指令引导搜索过程,显著提升在海量数据和复杂环境下的目标子区域发现效率与准确性,为实现目标的高效定位提供基础。在有限的资源约束下动态规划广域遥感图像中多个子区域的待查询序列,优先探索高价值区域,有效平衡搜索与利用。在执行待查询序列时,依次查询广域遥感图像中相应子区域,最终输出与语言指令相匹配的包含目标的子区域集合,通过多模态信息协同能力,总体上提高系统响应速度与智能化水平,适用于天基对地观测等资源受限场景。

This application provides a remote sensing image search method, device, and computer-readable storage medium. It acquires wide-area remote sensing images and language instructions describing the search target. By fusing cross-modal semantic information from the wide-area remote sensing image and the language instructions, semantic alignment is achieved. This allows the language instructions to guide the search process, significantly improving the efficiency and accuracy of target sub-region discovery in massive data and complex environments, providing a foundation for efficient target localization. Under limited resource constraints, it dynamically plans a query sequence of multiple sub-regions in the wide-area remote sensing image, prioritizing the exploration of high-value areas and effectively balancing search and utilization. When executing the query sequence, it sequentially queries the corresponding sub-regions in the wide-area remote sensing image, ultimately outputting a set of sub-regions containing the target that match the language instructions. Through multi-modal information collaboration capabilities, it improves the overall system response speed and intelligence level, making it suitable for resource-constrained scenarios such as space-based Earth observation.

Description

Remote sensing image searching method, device and computer readable storage medium
Technical Field
The present application relates to the field of remote sensing image processing technologies, and in particular, to a remote sensing image searching method, apparatus and computer readable storage medium.
Background
As spatial, temporal and spectral resolution continue to increase, the scale of image data produced by earth-looking systems has grown dramatically. The method brings great challenges to the traditional data processing mode of 'space-based sensing, foundation calculation and manual decision', namely, the network transmission and the ground information processing system are subjected to pressure rapid increase, so that the task response period is long, the data utilization rate is low, the task management and control intelligent level is insufficient, and finally, the application efficiency of the system in a rapid change environment is limited. Large-scale data is time-consuming due to transmission and processing delays, resulting in a large waste of resources. Meanwhile, the existing observation system is insufficient in autonomy and self-adaptability, task planning and execution are delayed remarkably, and especially when the emergency such as earthquake and debris flow is dealt with, the latest image data cannot be acquired timely, so that irrecoverable loss is caused. Advances in space technology have spawned urgent demands for "space-based sensing, space-based computing, space-based decision" models. According to the mode, the information processing module is directly deployed on the intelligent satellite cluster with the body, the foundation is changed into the space-based, the image processing algorithm is actively executed on orbit through the intelligent algorithm, the transmission load of the space-based network is obviously reduced, and the flexibility and the response speed of task execution are finally improved. However, how to quickly locate a critical area in a massive wide-area remote sensing image becomes one of the key technical challenges of a space-based earth observation system.
Disclosure of Invention
In order to overcome the problems in the related art, the present specification provides a remote sensing image searching method, apparatus and computer readable storage medium.
In a first aspect, a remote sensing image searching method is provided, the method includes:
Acquiring a wide-area remote sensing image and a language instruction describing a search target;
Dynamically planning sequences to be queried of a plurality of subareas in the wide-area remote sensing image under preset resource constraint by fusing cross-modal semantic information of the wide-area remote sensing image and the language instruction;
and executing the sequence to be queried, sequentially querying corresponding subareas in the wide-area remote sensing image, and outputting a subarea set matched with the language instruction and containing the target.
According to the remote sensing image searching method provided by the application, the cross-modal semantic information of the wide area remote sensing image and the language instruction is fused, and the sequence to be queried of a plurality of subareas in the wide area remote sensing image is dynamically planned under the preset resource constraint, and the method comprises the following steps:
semantic alignment is carried out on the wide-area remote sensing image and the language instruction, and a fused multi-modal feature representation is obtained;
constructing a current search state based on the multi-modal feature representation, the historical search results and the remaining resources;
And under the preset resource constraint, generating sequences to be queried of a plurality of subareas in the wide-area remote sensing image according to the current search state.
According to the remote sensing image searching method provided by the application, the sequence to be queried of a plurality of subareas in the wide area remote sensing image is generated according to the current searching state and is realized through a pre-trained searching strategy model;
The search strategy model is configured to be trained through reinforcement learning to maximize cumulative rewards under resource constraints, the rewards being determined based on whether targets are present within the queried sub-region.
According to the remote sensing image searching method provided by the application, the method further comprises the following steps:
clustering the image space according to semantic features of a plurality of subareas in the wide-area remote sensing image, and constructing a graph model representing the relation among clustered areas;
generating graph guidance features for macroscopic searching based on the graph model;
The constructing the current search state comprises the following steps:
and constructing a current search state based on the multi-modal feature representation, the graph guide feature, the historical search result and the residual resources.
According to the remote sensing image searching method provided by the application, the nodes of the graph model represent a plurality of clustering areas obtained by clustering the image space, and the edges of the graph model represent the association relation among the clustering areas.
According to the remote sensing image searching method provided by the application, the characteristic representation of the node in the graph model is dynamically updated according to the historical searching result in the inquiring process.
According to the remote sensing image searching method provided by the application, the resource constraint comprises query times constraint and/or movement cost constraint among sub-areas.
According to the remote sensing image searching method provided by the application, the movement cost constraint is determined based on Manhattan distance between subareas.
In a second aspect, an electronic device is provided, including a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the remote sensing image searching method according to the first aspect is implemented.
In a third aspect, a computer readable storage medium is provided, where a remote sensing image search program is stored, and when executed, implements any one of the remote sensing image search methods described in the first aspect above.
The application also provides a computer program product comprising a computer program which when executed by a processor implements a remote sensing image search method as described in any one of the above.
Compared with the current difficulty in rapidly positioning a specific target in a wide-area remote sensing image, the remote sensing image searching method, the remote sensing image searching device and the computer readable storage medium have the following beneficial effects:
According to the method, the wide-area remote sensing image and the language instruction describing the search target are obtained, and semantic alignment is achieved through fusion of cross-mode semantic information of the wide-area remote sensing image and the language instruction, so that the search process is guided by the language instruction, the target sub-area discovery efficiency and accuracy under the conditions of massive data and complex environments are remarkably improved, and a foundation is provided for achieving efficient positioning of the target.
In the second aspect, the sequences to be queried of a plurality of subareas in the wide-area remote sensing image are dynamically planned under the constraint of limited resources, the high-value areas are explored preferentially, and the searching and the utilization are balanced effectively. And when the sequence to be queried is executed, sequentially querying corresponding subareas in the wide-area remote sensing image, finally outputting a subarea set matched with the language instruction and containing a target, and improving the response speed and the intelligent level of the system as a whole through the multi-mode information coordination capability, thereby being suitable for resource-limited scenes such as space-based earth observation and the like.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.
Fig. 1 is a flowchart of a remote sensing image searching method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart of a first embodiment of a remote sensing image searching method according to an exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart of a second embodiment of a remote sensing image searching method according to an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a remote sensing image searching apparatus according to an exemplary embodiment of the present disclosure.
Fig. 5 is a schematic block diagram of a remote sensing image searching apparatus according to an exemplary embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments (or "implementations") of the present application will be clearly and completely described herein with reference to the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated.
If there are terms (e.g., upper, lower, left, right, front, rear, inner, outer, top, bottom, center, vertical, horizontal, longitudinal, lateral, length, width, counterclockwise, clockwise, axial, radial, circumferential, etc.) related to directional indications or positional relationships in embodiments of the present application, such terms are used merely to explain the relative positional relationships, movement, etc. between the components at a particular pose (as shown in the drawings), and if the particular pose is changed, the directional indications or positional relationships are correspondingly changed. In addition, the terms "first", "second", etc. in the embodiments of the present application are used for descriptive convenience only and are not to be construed as indicating or implying relative importance.
The application provides a remote sensing image searching method, remote sensing image searching equipment and a computer readable storage medium. The present application will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other.
Wide area remote sensing images typically cover a wide range of features, e.g., 30000 x 30000 pixels, and locating specific targets (e.g., vehicles, buildings, disaster areas) quickly in wide area remote sensing images is a challenging task.
The existing non-short-sight searching technology uses a visual active exploration framework based on deep reinforcement learning for wide-area geographic space exploration, and introduces a meta-learning method to improve the adaptability and efficiency of the model in a new task. However, this method relies on visual cues, has limited exploration ability in complex environments, and is inefficient in searching.
In order to solve the above technical problems, the present disclosure provides a remote sensing image searching method, and referring to fig. 1, fig. 1 is a schematic flow chart of a remote sensing image searching method according to an embodiment of the present disclosure.
The method aims at guiding a visual search process by utilizing language instructions, and dynamically planning a search path through reinforcement learning so as to realize efficient positioning of targets in a wide-area remote sensing image under resource constraint.
In other words, through VLAS (Visual-Language ACTIVE SEARCH) technology, sequential screening of sub-regions in the wide-area remote sensing image according to priority under the guidance of a specific Language instruction can maximize coverage of as many target objects as possible while overcoming budget or resource constraints. Furthermore, the inherent spatial correlation of adjacent sub-regions can also provide important clues to the search process. The method balances exploration (improving model efficiency) and utilization (finding more targets) under resource limitation by integrating a machine learning model of a predicted target label and a customized algorithm strategy.
It should be noted that, the remote sensing image searching scheme is directly deployed on the intelligent satellite cluster with body, the foundation is changed into the space-based, the remote sensing image searching algorithm is actively executed on orbit through the intelligent algorithm, the transmission load of the space-ground network is obviously reduced, and the flexibility and the response speed of task execution are finally improved.
The present application provides a first embodiment of a remote sensing image searching method, and referring to fig. 2, fig. 2 is a schematic flow chart of the first embodiment of the remote sensing image searching method provided in the embodiments of the present specification.
Specifically, the method comprises the following steps 101 to 103:
In step 101, a wide-area remote sensing image and language instructions describing a search target are acquired.
First, a search scene and parameters are initialized. The method comprises the following steps:
And acquiring a wide-area remote sensing image, and dividing the wide-area remote sensing image into grid cells to obtain a plurality of subareas. , wherein,Total number of grid cells divided for an image, eachRepresenting a sub-region in the wide-area image. And then, screening out key subregions containing the target from the subregions, so as to realize efficient positioning of the target.
At the same time, language instructions for describing the search target are acquired
The language instructions described herein are used to explicitly define visual targets to be searched, the semantic content of which includes, but is not limited to, descriptions of target object categories, and their optional visual attributes, spatial relationships, or scene contexts, such as "identify areas where large vehicles are present. In the remote sensing influence search, the language instruction is used as priori knowledge to guide the semantic understanding and decision direction of the whole subsequent visual search process.
As one example, the language instructions may be, but are not limited to, in the form of text, speech, standard instructions generated by a structured template such as a form, and the like.
Further, a preset resource constraint for searching is set.
As an example, the resource constraints include a query number constraint and/or a movement cost constraint between sub-regions.
Presetting a total query budgetAnd initializing historical search resultsResidual query budgetTime step. The query budget needs to comprehensively consider the execution times of the query and the moving cost of the front and back sub-regions to be queried. Wherein the movement cost constraint is determined based on Manhattan distances between sub-regions to be queried.
By the setting, under the constraint of limited resources and budget, the high-value area can be searched preferentially, and the searching and the utilization can be balanced effectively.
Step 102, dynamically planning to-be-queried sequences of a plurality of subareas in the wide-area remote sensing image under preset resource constraint by fusing cross-modal semantic information of the wide-area remote sensing image and the language instruction.
In the face of tens to hundreds of TB data generated daily by a global satellite network, VLAS is used for fusing multimodal information, random search is not needed, only a specific language instruction is needed to be input, the system can synthesize visual observation and human text prompt of an unmanned plane or a remote sensing satellite, and a high-probability area is explored preferentially, so that a team is focused on the subarea most likely to find a target. The method can be used for rapidly locking key sub-areas in the complex wide-area image under the conditions of mass data and limited computing power, and obviously optimizing resource allocation of subsequent tasks (such as target detection).
In some embodiments, the dynamically planning the query sequence of the plurality of sub-regions in the wide-area remote sensing image under the preset resource constraint by fusing the cross-modal semantic information of the wide-area remote sensing image and the language instruction includes the following steps 1021 to 1023:
In step 1021, semantic alignment is performed on the wide-area remote sensing image and the language instruction, so as to obtain a fused multi-modal feature representation.
As an example, a modal encoder is cross-modal over CLIP or other basisAnd respectively extracting image features and language instruction features, and realizing multi-mode semantic fusion through feature alignment.
Specifically, for imagesAnd language instructionsCoding:
Image is formed Encoding as image featuresThe spatial location information is preserved.
Will language instructionEncoding as instruction featuresWhereinIs a language feature dimension.
In step 1022, a current search state is constructed based on the multi-modal representation, historical search results, and remaining resources.
Fusing image featuresInstruction featuresHistorical search resultsResidual budgetGenerating an initial state
The history search result adopts a three-value mark to record three states of searched and target-existing, searched and target-absent and unsearched of the subareas.
In step 1023, under a preset resource constraint, generating a sequence to be queried of a plurality of sub-areas in the wide-area remote sensing image according to the current search state.
In some embodiments, the generating the to-be-queried sequences of the plurality of sub-regions in the wide-area remote sensing image according to the current search state is implemented through a pre-trained search strategy model;
And outputting actions according to the current search state by the search strategy model, and selecting the subareas to be queried. For ease of description, the search strategy model employing VLAS will be referred to as a controller And (3) representing.
Illustratively, at time stepsBy a controllerAccording to the current stateGenerating actionsAnd selecting the number of the subarea to be queried. Wherein the method comprises the steps ofFor the number of sub-regions per query,Can be a decision neural network with current input state and action output, namely action
And 103, executing the sequence to be queried, sequentially querying corresponding subareas in the wide-area remote sensing image, and outputting a subarea set matched with the language instruction and containing the target.
The actions in the obtained sequence to be queried are performedThe method is applied to the sub-region selection operation, and each sub-region image data appointed in the sequence is processed in sequence according to the sub-region number given by the sequence to be queried. Illustratively, for each sub-region, a visual perception model is invoked for analysis, determining whether its content matches a search target described by natural language instructions, and generating a binary query result (present target/absent target). Illustratively, the visual perception model may be, but is not limited to, an object detection model (e.g., YOLO, fast R-CNN) or an image classification model.
After the whole query sequence is executed, the system integrates all successful query results to generate a position information set containing all sub-areas hitting targets, and the position information set is used as the final output of the active search task. Wherein the set of location information may be, but is not limited to, e.g., a set of coordinate lists or sub-region bounding boxes.
In some embodiments, the search policy model is configured to be trained by reinforcement learning to maximize cumulative rewards under resource constraints, the rewards being determined based on whether targets are present within the queried sub-region.
Action is to takeApplied to sub-region selection, the following operations are performed:
a. Acquiring instant rewards. Marking according to target presence Calculating rewards. Wherein, the Indicating the presence of a target object,Indicating the absence.
B. and updating historical search results. For explored sub-regions, i.e. sequence numbersSetting upWhile the new history search results are
That is, the sub-region corresponding to the existence of the target after inquiryWhen confirming absence ofIf not searched
C. The residual budget is updated. According to Manhattan distanceCalculating movement costs, updating residual budgetsWhereinFor the sub-region selected in the previous step.
D. the current search state is updated. Generating new states
Thereafter, training data is collected. Recording transition tuplesFor aiming atIs described.
And the searching path process terminates the searching flow when the termination condition is met. As an example, if the residual budgetTerminating the search flow, otherwise, lettingAnd returns to step 102 to continue the iteration.
In some embodiments, the search path is optimized by reinforcement learning strategies.
As one example, a controller is trained using Reinforcement Learning (RL) in combination with Supervised Learning (SL).
First, a loss value of a loss function is calculated.
Transition tuple based collectionCalculating reinforcement learning lossSupervising learning lossIs a weighted sum of:
wherein Is a super parameter.
Next, the controller is updatedParameters.
Updating tactical controllers by a back-propagation algorithm based on the previously calculated loss valuesAnd a basic backbone networkIs a parameter of (a). Wherein the gradient is calculated asTo maximize the jackpot.
Finally, through iterative execution of sub-region selection, strategy optimization and controller parameter updating, a target maximization discovery result conforming to budget limitation is finally output.
That is, output serialized sub-region selectionWhereinRepresent the firstAnd step, selecting a sub-region set, and finally realizing the maximum discovery of the target object under the budget constraint.
Through the embodiment, the wide-area remote sensing image and the language instruction describing the search target are acquired, and the semantic alignment is realized by mapping the image sub-area and the language instruction to the same semantic space. On the basis, a reinforcement learning intelligent agent (controller) is constructed) It decides which sub-region to query next according to the fused multi-modal state (image features, language features, historical search state, residual budget) and continuously optimizes the search strategy by rewarding signals. Finally, the rapid positioning of the subarea where the target is realized. Meanwhile, the cross-modal semantic information of the wide-area remote sensing image and the language instruction is fused, so that the adaptability to complex scenes is remarkably enhanced, and the target can be efficiently positioned in the complex wide-area image.
The present application provides a second embodiment of a remote sensing image searching method, and referring to fig. 3, fig. 3 is a schematic flow chart of the second embodiment of the remote sensing image searching method provided in the embodiments of the present specification.
The method improves upon the basic visual-linguistic active search (VLAS) method, which is based on the graph-enhanced visual-linguistic active search (PAGE) method. The method realizes the efficient target sub-region positioning under the complex scene by introducing a hierarchical controller architecture and a dynamic graph model, and for convenience of description, a search strategy model adopting VLAS is called a controllerAnd (3) representing.
Specifically, the method comprises the following steps 201 to 203:
step 201, acquiring a wide-area remote sensing image and a language instruction describing a search target.
Receiving and blocking aerial or satellite imagesInput language instructionsWhereinTotal number of grid cells divided for an image, eachRepresenting a sub-region in the wide-area image.
Setting a preset total query budgetAnd initializing historical search resultsResidual query budgetInitial queried sub-region featuresTime step. The query budget needs to comprehensively consider the execution times of the query and the Manhattan distance of the front and back sub-regions to be queried.
Step 202, dynamically planning to-be-queried sequences of a plurality of subareas in the wide-area remote sensing image under preset resource constraint by fusing cross-modal semantic information of the wide-area remote sensing image and the language instruction.
First, cross-modal encoder through CLIP or other basisFor imagesAnd language instructionsCoding:
Image is formed Encoding as image featuresThe spatial location information is preserved.
Will language instructionEncoding as instruction featuresWhereinIs a language feature dimension.
Next, a knowledge graph based on visual language data is constructedWherein the nodeRepresenting the mean value characteristics of clustering subareas and edgesRepresenting feature similarity between classes. The knowledge graph can model commonalities among the sub-region features, thereby improving the accuracy of active searching.
In some embodiments, image spaces are clustered according to semantic features of a plurality of subareas in the wide-area remote sensing image, a graph model representing relations among clustered areas is built, graph guide features for macroscopic search are generated based on the graph model, and a current search state is built based on the multi-mode feature representation, the graph guide features, historical search results and residual resources.
The nodes of the graph model represent a plurality of clustering areas obtained by clustering the image space, and the edges of the graph model represent association relations among the clustering areas.
Specifically, the construction process adopts clustering region division, namely clustering semantic features of all sub-regions of all images in a training set by adopting a K-Means algorithm, and dividingCluster areas with the numbers of
Graph nodeIs defined as each cluster regionAverage image characteristics of all sub-regions in a computer systemAverage language featuresComposition is prepared. Wherein, the Extracting the sub-region pixels, and obtaining the average value characteristic after encoding by a cross-mode encoder; Extracted by sub-region classes via a cross-modal encoder.
Meanwhile, calculating the adjacency probability among the clustered regions, namely the average probability of whether the subareas are physically adjacent or not through Sinkhorn-Knopp algorithm so as to generate a normalized adjacency matrix, and finally defining the normalized adjacency matrix as the edge of the graph model. Finally, based on the graphBuilding graph rolling network. Processing graph structure by using convolution network according to queried subarea characteristicsExtracting graph guidance features
Thereafter, image features are fusedInstruction featuresGraph guidance featureHistorical search resultsResidual budgetGenerating a current search state
And under the preset resource constraint, generating sequences to be queried of a plurality of subareas in the wide-area remote sensing image according to the current search state.
As one example, sub-region features are calculated first.
At a time stepWill be at the firstStep selected sub-region setAll sub-regions in (1) are based on respectivelyCoding and calculating the characteristic mean value to obtain the characteristic of the subarea
And identifying the corresponding node serial numbers of the current sub-region and the target sub-region in the graph.
The current area isWhereinIs Euclidean distance, and the target area isWhereinAnd (3) representing one-hot vectors of the categories of the target objects to be searched in the language instruction.
At a time stepHierarchical controllerAccording to the current search stateGenerating actionsAnd selecting the numbers of the subareas to be queried to form a sequence to be queried. Wherein the method comprises the steps ofThe number of sub-regions for each query.
In addition, in the case of the optical fiber,Is a decision neural network, can convolve the network through the graphComputing graph featuresThen calculate the sub-region selection according to VLAS
And 203, executing the sequence to be queried, sequentially querying corresponding subareas in the wide-area remote sensing image, and outputting a subarea set matched with the language instruction and containing the target.
Action is to takeApplied to sub-region selection, the following operations are performed:
Acquiring instant rewards. Marking according to target presence Calculating rewards. Wherein, the Indicating the presence of a target object,Indicating the absence.
And updating historical search results. For explored sub-regions, i.e. sequence numbersSetting upWhile the new history search results are. In summary, the sub-region correspondence of the existence of the target is confirmed after the queryWhen confirming absence ofIf not searched
The residual budget is updated. According to Manhattan distanceCalculating movement costs, updating residual budgetsWhereinFor the sub-region selected in the previous step.
Updating state, generating new state
And dynamically updating the characteristic representation of the node in the graph model according to the historical search results in the query process. In particular, according to the characteristics of the selected sub-regionThe average image characteristics of the clustering center of the current subarea are adjusted in real time, namely. Wherein the method comprises the steps ofTo adjust the super-parameters, a positive real number near 0 is typically set.
If residual budgetTerminating the search flow, otherwise, lettingAnd returns to step S301 to continue the iteration.
During the search, training data is collected. Recording transition tuplesFor aiming atIs described.
The specific optimization process is as follows:
Transition tuple based collection Calculating reinforcement learning lossSupervising learning lossIs a weighted sum of (2)WhereinIs a super parameter.
Updating a controller by a back propagation algorithmAnd a basic backbone networkIs calculated as the parameter, gradientTo maximize the jackpot.
Through the above process, output serialized sub-region selectionWhereinRepresent the firstAnd step, selecting a sub-region set, and finally realizing the maximum discovery of the target object under the budget constraint.
Through the embodiment, a graph structure modeling and layering controller is introduced on the basis of VLAS, a macro area graph is constructed through clustering, and semantic understanding and strategic planning capability of a complex scene are enhanced. Finer decision making is realized in the complex wide-area image, and the efficiency and accuracy of target positioning are remarkably improved.
Compared with the current method for quickly positioning a specific target in a wide-area remote sensing image, the method, the device and the computer readable storage medium have the following beneficial effects:
According to the method, the wide-area remote sensing image and the language instruction describing the search target are obtained, and semantic alignment is achieved through fusion of cross-mode semantic information of the wide-area remote sensing image and the language instruction, so that the search process is guided by the language instruction, the target sub-area discovery efficiency and accuracy under the conditions of massive data and complex environments are remarkably improved, and a foundation is provided for achieving efficient positioning of the target.
In the second aspect, the sequences to be queried of a plurality of subareas in the wide-area remote sensing image are dynamically planned under the constraint of limited resources, the high-value areas are explored preferentially, and the searching and the utilization are balanced effectively. And when the sequence to be queried is executed, sequentially querying corresponding subareas in the wide-area remote sensing image, finally outputting a subarea set matched with the language instruction and containing a target, and improving the response speed and the intelligent level of the system as a whole through the multi-mode information coordination capability, thereby being suitable for resource-limited scenes such as space-based earth observation and the like.
Based on the same application conception as the method, the embodiment of the application also provides a remote sensing image searching device. As shown in fig. 4, the apparatus includes:
the information acquisition module is used for acquiring the wide-area remote sensing image and a language instruction describing a search target;
the feature fusion module is used for dynamically planning sequences to be queried of a plurality of subareas in the wide-area remote sensing image under the preset resource constraint by fusing the cross-modal semantic information of the wide-area remote sensing image and the language instruction;
And the target searching module is used for executing the sequence to be queried, sequentially querying the corresponding subareas in the wide-area remote sensing image, and outputting a subarea set which is matched with the language instruction and contains the target.
The implementation process of the functions and actions of each module/sub-module/unit in the above device is specifically detailed in the implementation process of the corresponding steps in the above method, so that the same technical effects can be achieved, and will not be described herein again.
The application also provides a whole vehicle controller which is used for realizing the remote sensing image searching method.
Fig. 5 illustrates a physical structure diagram of a remote sensing image searching apparatus, as shown in fig. 5, the remote sensing image searching apparatus may include a processor (processor) 510, a communication interface (CommunicationsInterface) 520, a memory (memory) 530, and a communication bus 540, where the processor 510, the communication interface 520, and the memory 530 complete communication with each other through the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform the telemetry image search method.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In another aspect, the present application also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the remote sensing image searching method provided by the above methods.
In yet another aspect, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the remote sensing image searching method provided by the above methods.
It should be noted that the technical solutions or technical features described in the above embodiments may be combined or supplemented with each other without generating a conflict. The scope of the present application is not limited to the exact construction described in the above embodiments and illustrated in the accompanying drawings, but modifications, equivalents, improvements, etc. that fall within the spirit and principle of the present application are intended to be included in the scope of the present application.

Claims (8)

1.一种遥感影像搜索方法,其特征在于,所述方法包括:1. A remote sensing image search method, characterized in that the method comprises: 获取广域遥感图像和描述搜索目标的语言指令;Acquire wide-area remote sensing images and language instructions describing the search targets; 通过融合所述广域遥感图像和所述语言指令的跨模态语义信息,在预设的资源约束下动态规划所述广域遥感图像中多个子区域的待查询序列;By fusing cross-modal semantic information of the wide-area remote sensing image and the language instructions, a query sequence of multiple sub-regions in the wide-area remote sensing image is dynamically planned under preset resource constraints. 执行所述待查询序列,依次查询所述广域遥感图像中相应子区域,输出与所述语言指令相匹配的包含所述目标的子区域集合;Execute the query sequence, sequentially query the corresponding sub-regions in the wide-area remote sensing image, and output a set of sub-regions containing the target that match the language command; 其中,所述通过融合所述广域遥感图像和所述语言指令的跨模态语义信息,在预设的资源约束下动态规划所述广域遥感图像中多个子区域的待查询序列,包括:The step of dynamically planning a query sequence for multiple sub-regions in the wide-area remote sensing image under preset resource constraints by fusing cross-modal semantic information of the wide-area remote sensing image and the language instructions includes: 对所述广域遥感图像和所述语言指令进行语义对齐,得到融合后的多模态特征表示;Semantic alignment is performed on the wide-area remote sensing image and the language instructions to obtain a fused multimodal feature representation; 根据所述广域遥感图像中多个子区域的语义特征对图像空间进行聚类,构建表示聚类区域间关系的图模型;基于所述图模型生成用于宏观搜索的图引导特征;基于所述多模态特征表示、所述图引导特征、历史搜索结果以及剩余资源,构建当前搜索状态;其中,所述图模型的节点表示对图像空间进行聚类得到的多个聚类区域,所述图模型的边表示所述聚类区域间的关联关系;The image space is clustered based on the semantic features of multiple sub-regions in the wide-area remote sensing image, and a graph model representing the relationships between the clustered regions is constructed. Graph-guided features for macroscopic search are generated based on the graph model. The current search state is constructed based on the multimodal feature representation, the graph-guided features, historical search results, and remaining resources. Nodes in the graph model represent multiple clustered regions obtained by clustering the image space, and edges in the graph model represent the relationships between the clustered regions. 在预设的资源约束下,根据所述当前搜索状态生成所述广域遥感图像中多个子区域的待查询序列。Under preset resource constraints, a query sequence of multiple sub-regions in the wide-area remote sensing image is generated based on the current search state. 2.如权利要求1所述的遥感影像搜索方法,其特征在于,所述通过融合所述广域遥感图像和所述语言指令的跨模态语义信息,在预设的资源约束下动态规划所述广域遥感图像中多个子区域的待查询序列,包括:2. The remote sensing image search method as described in claim 1, characterized in that, the step of dynamically planning the query sequence of multiple sub-regions in the wide-area remote sensing image under preset resource constraints by fusing cross-modal semantic information of the wide-area remote sensing image and the language command includes: 对所述广域遥感图像和所述语言指令进行语义对齐,得到融合后的多模态特征表示;Semantic alignment is performed on the wide-area remote sensing image and the language instructions to obtain a fused multimodal feature representation; 基于所述多模态特征表示、历史搜索结果以及剩余资源,构建当前搜索状态;Based on the multimodal feature representation, historical search results, and remaining resources, the current search state is constructed; 在预设的资源约束下,根据所述当前搜索状态生成所述广域遥感图像中多个子区域的待查询序列。Under preset resource constraints, a query sequence of multiple sub-regions in the wide-area remote sensing image is generated based on the current search state. 3.如权利要求2所述的遥感影像搜索方法,其特征在于,所述根据所述当前搜索状态生成所述广域遥感图像中多个子区域的待查询序列通过预先训练的搜索策略模型实现;3. The remote sensing image search method as described in claim 2, wherein the step of generating a query sequence of multiple sub-regions in the wide-area remote sensing image based on the current search state is achieved through a pre-trained search strategy model; 所述搜索策略模型被配置为通过强化学习训练,以在资源约束下最大化累计奖励,所述奖励根据所查询子区域内是否存在目标来确定。The search strategy model is configured to be trained through reinforcement learning to maximize cumulative rewards under resource constraints, the rewards being determined based on whether the target exists within the queried sub-region. 4.如权利要求1所述的遥感影像搜索方法,其特征在于,根据查询过程中的历史搜索结果动态更新所述图模型中所述节点的特征表示。4. The remote sensing image search method as described in claim 1, characterized in that the feature representation of the nodes in the graph model is dynamically updated based on the historical search results during the query process. 5.如权利要求1所述的遥感影像搜索方法,其特征在于,所述资源约束包括查询次数约束和/或子区域间的移动成本约束。5. The remote sensing image search method as described in claim 1, wherein the resource constraints include query count constraints and/or movement cost constraints between sub-regions. 6.如权利要求5所述的遥感影像搜索方法,其特征在于,所述移动成本约束基于子区域间的曼哈顿距离确定。6. The remote sensing image search method as described in claim 5, wherein the moving cost constraint is determined based on the Manhattan distance between sub-regions. 7.一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,当所述计算机程序被所述处理器执行时,实现如权利要求1至6中任一项所述的遥感影像搜索方法。7. An electronic device comprising a memory and a processor, wherein the memory stores a computer program that, when executed by the processor, implements the remote sensing image search method as described in any one of claims 1 to 6. 8.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有遥感影像搜索程序,所述遥感影像搜索程序执行时实现如权利要求1至6中任一项所述的遥感影像搜索方法。8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a remote sensing image search program, which, when executed, implements the remote sensing image search method as described in any one of claims 1 to 6.
CN202511370079.1A 2025-09-24 2025-09-24 Remote sensing image searching method, device and computer readable storage medium Active CN120849650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511370079.1A CN120849650B (en) 2025-09-24 2025-09-24 Remote sensing image searching method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511370079.1A CN120849650B (en) 2025-09-24 2025-09-24 Remote sensing image searching method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN120849650A CN120849650A (en) 2025-10-28
CN120849650B true CN120849650B (en) 2025-12-09

Family

ID=97420055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511370079.1A Active CN120849650B (en) 2025-09-24 2025-09-24 Remote sensing image searching method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN120849650B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117705059A (en) * 2023-12-14 2024-03-15 中国自然资源航空物探遥感中心 Positioning method and system for remote sensing mapping image of natural resource

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240386015A1 (en) * 2015-10-28 2024-11-21 Qomplx Llc Composite symbolic and non-symbolic artificial intelligence system for advanced reasoning and semantic search
US11263277B1 (en) * 2018-11-01 2022-03-01 Intuit Inc. Modifying computerized searches through the generation and use of semantic graph data models
US11900670B2 (en) * 2022-06-30 2024-02-13 Metrostudy, Inc. Construction stage detection using satellite or aerial imagery
CN117972126A (en) * 2024-02-05 2024-05-03 航天宏图信息技术股份有限公司 Remote sensing image retrieval method and device, electronic equipment and computer storage medium
CN119577172A (en) * 2024-10-31 2025-03-07 浪潮智慧科技有限公司 Satellite remote sensing imagery image and text retrieval method, system, terminal and medium based on multi-scale cross-modality
CN120542945A (en) * 2025-04-24 2025-08-26 武汉大学 Urban planning decision-making method and system based on multimodal remote sensing and knowledge graph

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117705059A (en) * 2023-12-14 2024-03-15 中国自然资源航空物探遥感中心 Positioning method and system for remote sensing mapping image of natural resource

Also Published As

Publication number Publication date
CN120849650A (en) 2025-10-28

Similar Documents

Publication Publication Date Title
Dornhege et al. A frontier-void-based approach for autonomous exploration in 3d
KR102556767B1 (en) Apparatus and method for visual localization
CN119291714B (en) A multi-sensor online three-dimensional detection method and device for large-scale assembly scenes
KR102615412B1 (en) Apparatus and method for performing visual localization
KR102556765B1 (en) Apparatus and method for visual localization
CN117973820B (en) Task dynamic allocation system and method based on artificial intelligence
CN119356391A (en) A three-dimensional AI perception direction method and system based on drone
KR102616028B1 (en) Apparatus and method for performing visual localization effectively
CN120849650B (en) Remote sensing image searching method, device and computer readable storage medium
CN120612683A (en) A robot perception and decision-making method based on lightweight multimodal large model
KR102616522B1 (en) Apparatus and method for performing visual localization effectively
KR102616029B1 (en) Apparatus and method for performing visual localization effectively
CN119808950A (en) A method, system, device and medium for semantic inference of resident trajectory activities
Luo et al. Learning Bird’s Eye View scene graph and knowledge-inspired policy for embodied visual navigation
Guo et al. Object goal visual navigation using semantic spatial relationships
KR102600939B1 (en) Apparatus and method for generating data for visual localization
Horney et al. An ontology controlled data fusion process for a query language
CN112800235B (en) Visual knowledge graph data modeling method and system
Wang Path planning of intelligent tennis ball picking robot integrating twin network target tracking algorithm
CN114691888A (en) Target association identification method and system based on capability data base map
Meng et al. SSR-ZSON: Zero-Shot Object Navigation via Spatial-Semantic Relations within a Hierarchical Exploration Framework
Klein et al. Optimization of machine learning models applied to robot localization in the robotatfactory 4.0 competition
CN120875431B (en) Multi-machine collaborative scheduling system of man-machine control shelter
Bailey Design of environment aware planning heuristics for complex navigation objectives
Zhao et al. Place recognition with deep superpixel features for brain-inspired navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant