[go: up one dir, main page]

CN119169643B - A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion - Google Patents

A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion Download PDF

Info

Publication number
CN119169643B
CN119169643B CN202411679083.1A CN202411679083A CN119169643B CN 119169643 B CN119169643 B CN 119169643B CN 202411679083 A CN202411679083 A CN 202411679083A CN 119169643 B CN119169643 B CN 119169643B
Authority
CN
China
Prior art keywords
architecture diagram
text
feature
architecture
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411679083.1A
Other languages
Chinese (zh)
Other versions
CN119169643A (en
Inventor
聂志锋
张琳
赵瑞
范军
王祎童
金杨
房璐
张廷彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Big Data Center
Original Assignee
Beijing Big Data Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Big Data Center filed Critical Beijing Big Data Center
Priority to CN202411679083.1A priority Critical patent/CN119169643B/en
Publication of CN119169643A publication Critical patent/CN119169643A/en
Application granted granted Critical
Publication of CN119169643B publication Critical patent/CN119169643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the field of reasonable judgment of intelligent architecture diagram design, and discloses a reasonable analysis and judgment method of architecture diagram based on multi-mode feature fusion, which is used for acquiring an architecture diagram to be analyzed; identifying and extracting an architecture graphic script part in an architecture graphic, recording, identifying and acquiring a feature map set of an interested region of an image element through an R-CNN model, recording, establishing a corresponding relation between text information of an OCR module and position and logic information of the image element of the R-CNN model, generating an expanded architecture graphic script part, performing word segmentation processing and semantic coding on the expanded architecture graphic script to obtain a word granularity feature vector set, performing semantic coding to obtain the word granularity feature vector set, generating comprehensive feature vectors of the same dimension, calculating semantic matching coefficients of the feature vectors, and judging whether the design of the architecture graphic to be analyzed meets the rationality requirement of integral planning. The design and the function of the architecture diagram can be effectively evaluated whether to meet the overall planning requirement, and the accuracy and the efficiency of analysis and review are improved.

Description

Architecture diagram rationality analysis and judgment method based on multi-mode feature fusion
Technical Field
The invention relates to the technical field of intelligent architecture diagram design rationality judgment, in particular to an architecture diagram rationality analysis judgment method based on multi-mode feature fusion.
Background
Architecture diagrams are widely used as a key visualization tool in a plurality of fields such as engineering, construction, information technology, project management and business. These diagrams, in combination with the figures and text, describe in detail the various components of the system, organization, process, or project and their relationship to one another in order to provide a high-level overview that helps engineers, designers, and other related personnel to more intuitively and deeply understand and communicate with the design and implementation of complex systems.
In conventional approaches, the judgment of the rationality of the architecture diagram design relies primarily on manual analysis and understanding, which requires a professional to carefully read the individual components, lines, and arrows in the architecture diagram and infer relationships and roles between them therefrom. However, this approach has significant limitations. First, the manual judgment is susceptible to subjective consciousness and experience of the individual, possibly leading to inconsistency and deviation of the judgment result. Secondly, for large-scale and complex architecture diagrams, a great deal of time and effort are required for manual judgment, and the efficiency is low. This limitation is particularly evident when dealing with architectural drawings that contain large amounts of textual information and graphical elements.
With the development of information technology, intelligent judgment of the rationality of architecture diagram design by using computer technology becomes a trend. Conventional automatic judgment methods generally rely on rules and templates, and are difficult to deal with complex and variable architecture diagrams. In recent years, with the rapid development of artificial intelligence and deep learning technologies, a multi-mode fusion technology based on a large model provides a new solution for the rationality judgment of the design of the architecture diagram. The multi-mode fusion technology can combine the characteristics of texts and images to carry out comprehensive analysis and processing, thereby remarkably improving the accuracy and efficiency of judgment.
The architecture diagram design rationality judging technology based on multi-mode fusion can convert text and image features into the same feature space by using a pre-trained large model (such as CLIP) and capture complex semantic relations between the text and the image features. The technology not only can solve the subjectivity and inefficiency problems in the traditional method, but also can process complex and changeable architecture diagrams and provide more reliable judgment results.
Therefore, there is an urgent need for a design rationality judgment scheme for optimizing architecture diagrams based on multi-mode fusion, which can overcome the limitations of the traditional method and realize efficient and accurate judgment and review of the architecture diagrams.
Disclosure of Invention
The invention aims to solve the problem that the traditional automatic judging method generally depends on rules and templates and is difficult to cope with complex and changeable structure patterns.
The invention adopts the following technical scheme for realizing the purposes:
a method for analyzing and judging the rationality of a framework graph based on multi-mode feature fusion comprises the following steps:
s1, acquiring a structure diagram to be analyzed, and acquiring image data of the structure diagram to be analyzed as input of subsequent processing;
s2, performing text recognition on the architecture diagram to be analyzed by utilizing an OCR module to obtain an architecture diagram text part, and recording the position of each text message;
S3, processing the architecture diagram to be analyzed by using an R-CNN model to obtain a set of feature diagrams of the region of interest of the image elements in the architecture diagram, and recording the position of each image element;
s4, establishing a corresponding relation between the text information of the OCR module and the position and logic information of the R-CNN model image element, and recording the association between the text information and the R-CNN model image element;
S5, converting the position and the logic relation of the image elements into text description, and combining the text description with the architecture text part to generate an expanded architecture text part;
S6, carrying out semantic coding based on word granularity on the expanded architecture text part to obtain a set of architecture text descriptor granularity feature vectors;
S7, carrying out semantic coding based on word granularity on the architecture design rationality judgment text to obtain a set of architecture design rationality judgment text feature vectors;
S8, carrying out multi-mode feature fusion by using the CLIP model to generate a comprehensive feature vector;
s9, matching the comprehensive feature vector with the text feature vector judged by the rationality of the architecture design, and calculating a semantic matching coefficient of the text feature vector;
s10, judging the comparison result of the semantic matching coefficient of the text feature vector and a preset threshold based on the comprehensive feature vector and the rationality of the architecture design, and determining whether the architecture diagram to be analyzed meets the overall planning requirement.
Further, establishing the corresponding relation between the text information of the OCR module and the position and logic information of the R-CNN model image element comprises the following steps:
Checking whether the text box recognized by each OCR module is overlapped with the image element boundary box recognized by any R-CNN model or not, and using the cross ratio as a measurement standard;
if IoU values of the text box and the image element boundary box exceed a preset threshold, recording the association between the text box and the image element boundary box;
And detecting the corresponding relation between the large bounding box and the small bounding box through the intersection relation, recording the hierarchical relation, and ensuring the hierarchical structure and the module relation in the capturing architecture diagram.
Further, the method for extracting the position information of the text and the image elements by utilizing the OCR module and the R-CNN model specifically comprises the following steps:
character recognition is carried out on the architecture diagram by using an OCR module, text parts of all architecture diagram in the architecture diagram are extracted, and the position of each text message in the architecture diagram is recorded;
Processing the architecture diagram by using an R-CNN model, identifying and acquiring a region of interest feature diagram set of the image elements, and recording the position of each image element;
and converting the position and logic information of the image element into text description according to the corresponding relation between the position and logic information established by the OCR module and the R-CNN model, and combining the text description with the framework text part to generate an expanded framework text part.
Further, extracting local semantic feature vectors of the region of interest of the image element in the architecture diagram from the feature diagram of the region of interest of the image element in the architecture diagram specifically comprises the following steps:
upsampling the image element region of interest feature map in the architecture diagram to obtain an upsampled image element region of interest feature map in the architecture diagram;
performing point convolution coding on the image element region of interest feature map in the up-sampling structure map to obtain the image element region of interest feature map in the channel modulation up-sampling structure map;
And carrying out two-dimensional convolution coding on the image element region of interest feature map in the channel modulation up-sampling structure map to obtain the image element region of interest local semantic feature vector in the structure map.
Further, the semantic coding based on word granularity is carried out on the architecture diagram text part to obtain a set of architecture diagram text descriptor granularity feature vectors, and the method specifically comprises the following steps:
After word segmentation is carried out on the framework text part, a set of framework text descriptor granularity feature vectors is obtained through a semantic encoder of a word embedding layer contained in the large model;
The semantic coding based on word granularity is carried out on the architecture design rationality judgment text to obtain a set of architecture design rationality judgment text descriptor granularity feature vectors, and the method comprises the steps of obtaining the set of architecture design rationality judgment text descriptor granularity feature vectors through a semantic encoder of a word embedding layer contained in the large model after word segmentation processing is carried out on the architecture design rationality judgment text.
Further, after word segmentation is performed on the architecture text portion, a set of granularity feature vectors of the architecture text description words is obtained through a semantic encoder of a word embedding layer contained in the large model, and the method specifically comprises the following steps:
word segmentation processing is carried out on the architecture diagram and the architecture design rationality judgment text so as to convert the architecture diagram text part into a word sequence consisting of a plurality of words;
the embedding layer of the semantic encoder comprising the word embedding layer respectively maps each word in the word sequence into a word embedding vector to obtain a sequence of the word embedding vector;
And the converter of the semantic encoder containing the word embedding layer carries out global context semantic encoding based on the converter thought on the sequence of the word embedding vectors to obtain the set of the architecture text descriptor granularity feature vectors.
Further, based on the comparison between the semantic matching coefficients between the feature sequences and a preset threshold, determining whether the architecture diagram to be analyzed meets the overall planning requirement comprises determining that the architecture diagram to be analyzed meets the overall planning requirement in response to the semantic matching coefficients between the feature sequences being greater than the preset threshold.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a method for judging the rationality of architecture diagram design based on multi-mode fusion, which is used for identifying text information in architecture diagrams, including but not limited to component names, labels, notes and the like by utilizing the OCR module (optical character recognition) capability in a large model. Further, graphic elements in the architectural drawing, such as block diagrams, lines, arrows, etc., are identified by computer vision techniques and their location and logical and pointing relationships are understood. According to the method, the marked text feature vector and the image element feature vector are subjected to multi-mode fusion to generate the comprehensive feature vector, the comprehensive feature vector is matched with the architecture diagram design rationality judgment feature vector, the semantic matching coefficient is calculated, and whether the architecture diagram to be analyzed meets the overall planning requirement is judged based on a comparison result of the matching coefficient and a preset threshold value. The method can realize the automation and the comprehensive analysis of the architecture diagram, and overcomes the problems of subjectivity and low efficiency of the traditional manual analysis.
Drawings
FIG. 1 is a flow chart of a method for performing a rationality determination for a design of a architectural diagram based on feature fusion in accordance with an embodiment of the application;
FIG. 2 is a diagram of a system architecture for performing a rationality determination for architecture design based on feature fusion in accordance with an embodiment of the application;
FIG. 3 is a block diagram of a system for performing architectural diagram design rationality determination based on feature fusion in accordance with an embodiment of the application;
fig. 4 is a diagram of a department informatization construction architecture of a production supervision department in a certain area.
Reference numeral 310, an image acquisition module, 320, a character recognition module, 330, an interested region extraction module, 335, a position and logic information processing module, 337, a position description generation module, 350, a semantic coding module, 360, a comprehensive feature vector generation module, 370, a semantic matching coefficient calculation module, 380, and a judgment result generation module.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1 and 2, a method for judging the rationality of a framework graph based on multi-mode fusion comprises the following steps:
S1, acquiring a structure diagram to be analyzed;
the image data of the structure diagram to be analyzed is obtained and used as the input of the subsequent processing, such as the department informatization construction structure diagram of the production supervision department in a certain area in fig. 4.
S2, performing character recognition on the structure diagram to be analyzed by utilizing an OCR module to obtain a text part of the structure diagram and recording the position of text information;
The text recognition of the structure diagram can be automatically processed through the OCR module, the tedious process of manually recognizing text information in the structure diagram one by one is avoided, time and labor cost can be saved, the OCR module is usually trained through a large amount of data, and has high accuracy and recognition capability, and compared with manual recognition, the OCR module can recognize text content in the structure diagram more quickly and accurately. Specifically, the architecture graphic script part often contains key information such as important component names, labels, notes and the like, which is helpful for subsequent semantic understanding and comprehensive analysis and judgment of architecture diagrams. The location of each text message in the frame map is recorded, including the upper left and lower right corner coordinates (e.g., x1, y1, x2, y 2) of the text box.
S3, identifying and acquiring a region of interest feature map set of the image element by utilizing an R-CNN model, and recording the position of the image element;
in the architecture diagram, various graphic elements, such as block diagrams, lines, arrows and the like, are included, and the elements have interrelated position, logic relationship and pointing relationship. In order to capture the hidden correlation relationship between the graphic elements, the R-CNN model is used for processing the to-be-analyzed structure graph to obtain a set of the feature graphs of the interested areas of the graphic elements in the structure graph. The R-CNN model is a deep learning model for object detection, and can identify objects of interest in different areas in an image and extract features thereof. In architectural diagram parsing, the R-CNN model can be utilized to help identify various image elements in an architectural diagram, such as block diagrams, lines, arrows, etc. Therefore, the R-CNN model can automatically learn and capture the local interested areas and the implicit characteristic information of each element in the architecture diagram, which is helpful for the follow-up more accurate understanding of the layout and connection modes of each element in the architecture diagram, so that the content and structure of the architecture diagram can be more comprehensively and carefully understood, and important information is provided for the follow-up analysis and the judgment of whether the architecture diagram meets the overall planning requirement. The location of each image element is recorded, including the bounding box coordinates (e.g., x1, y1, x2, y 2) of the element.
S4, establishing a corresponding relation between the text information of the OCR module and the position and logic information of the R-CNN model image element;
For each text box identified by the OCR module, it is checked whether it overlaps with any of the R-CNN model identified image element bounding boxes. The cross-over ratio (IoU, intersection over Union) was used as a measure. The higher IoU value indicates a greater degree of overlap of the two boxes. If IoU of the text box and the image element bounding box exceeds a preset threshold (e.g., 0.5), the text information and the image element are considered to have a corresponding relationship, and the association between the text information and the image element is recorded. Completely within the large bounding box, the small bounding box is considered to be encompassed by the large bounding box. Then, the hierarchical relationship is recorded, and the relationship between the large bounding box and the small bounding box contained in the large bounding box is recorded. For example, if a large frame includes four small frames, the content corresponding to the large frame is considered to include the content corresponding to the four small frames. In this way, hierarchical and modular relationships in the architecture diagram can be captured.
S5, converting the position and logic relation of the image elements into text description;
And converting the position and logic information of the image element into text description according to the corresponding relation between the position and logic information established by the OCR module and the R-CNN model. For example, the "service center" includes 3 modules, namely "flow and rule management", "user and authority management", "system and operation management", respectively. And combining the position and logic information description into a corresponding architecture diagram text part to generate an extended architecture diagram text part. By the method, the spatial relationship, the position and the logic information of the image elements can be converted into the text description which is easier to understand and process, and richer information support is provided for subsequent semantic analysis and comprehensive judgment.
S6, word segmentation and semantic coding are carried out on the framework text (marked text + position and logical relation text description) part, and a set of granularity feature vectors of the framework text description is obtained
In order to capture text semantic information in the architecture diagram, including important semantics such as component names, labels, notes and the like, after the architecture diagram text part is subjected to word segmentation, a semantic encoder of a word embedding layer contained in a large model is utilized to obtain a set of architecture diagram description word granularity feature vectors. The text information can be converted into a more structured and processable form by word segmentation processing on the expanded architecture text portion, and the word segmentation processing is helpful for identifying key words and phrases in the text, so that the meaning and the context of the text can be better understood. The specific method comprises the following steps:
Word segmentation is carried out on the structure image text part so as to convert the structure image text part into a word sequence composed of a plurality of words;
mapping each word in the word sequence into a word embedding vector by using an embedding layer of a semantic encoder comprising the word embedding layer to obtain a sequence of the word embedding vector;
And performing global context semantic coding on the sequence of word embedding vectors based on the thought of the converter by using a converter of a semantic coder containing a word embedding layer to obtain a set of granular feature vectors of the text description words of the architecture diagram.
S7, carrying out semantic coding based on word granularity on the architecture design rationality judgment text to obtain a set of architecture design rationality judgment text feature vectors;
In order to capture semantic information in the reasonable judgment of the design of the architecture diagram, which comprises important contents such as functional requirements, design standards, evaluation indexes and the like, the word segmentation processing is further required to be carried out on the reasonable judgment of the design of the architecture diagram, and a semantic encoder of a word embedding layer contained in a large model is utilized to obtain a set of word granularity feature vectors of the reasonable judgment of the design of the architecture diagram. The specific method comprises the following steps:
Word segmentation processing is carried out on the architecture design rationality judgment text so as to convert the architecture design rationality judgment text into a word sequence composed of a plurality of words;
mapping each word in the word sequence into a word embedding vector by using an embedding layer of a semantic encoder comprising the word embedding layer to obtain a sequence of word embedding vectors;
And performing global context semantic coding on the sequence of word embedded vectors based on the thought of the converter by using a converter of a semantic coder containing a word embedded layer, and generating a word granularity feature vector set of the architecture design rationality judgment text.
The text information for judging the rationality of the design of the architecture diagram is composed of specific judging rules, for example, in a certain project, in order to ensure the consistence of the rationality of the design of the architecture diagram and the overall planning, the rationality judging rules of the design of the architecture diagram comprise the following aspects:
1. the consistency of the architecture diagram and the text description ensures that each module, component and element in the architecture diagram is consistent with the corresponding text description.
2. Frame map rationality and task consistency-the frame map should clearly embody the positioning of planning tasks and keep consistent with the main tasks.
3. Describing the butt joint relation between the system and other relevant systems or platforms, ensuring that the butt joint is completed and meeting the requirements of relevant standards and control rules.
4. And (3) checking the integrity of the overall framework of the architecture diagram, and ensuring that the contents such as a common application supporting platform, data convergence sharing, unified portal channels and the like are clearly displayed.
S8, carrying out multi-mode feature fusion by using the CLIP model to generate a comprehensive feature vector;
And converting the image elements into feature vectors by using a CLIP pre-training model, and fusing the architecture icon text feature vectors and the image element feature vectors to generate comprehensive feature vectors. The specific method comprises the following steps:
Feature extraction, namely extracting a marked text feature vector and an image element feature vector by using a text encoder and an image encoder of the CLIP model respectively.
Feature space alignment-the CLIP model maps text and image feature vectors into the same multi-modal feature space such that text and image features with the same semantics are close in that space. In this way, the CLIP model is able to capture complex semantic relationships between text and images.
And (3) feature fusion, namely carrying out weighted fusion on the text feature vector and the image feature vector to generate a comprehensive feature vector. And weighting and fusing the marked text feature vector and the image element feature vector by using a self-attention mechanism, dynamically distributing weights according to the content of the feature vector, and capturing the complex relationship between the text and the image. Specifically, the labeled text feature vector and the image element feature vector are respectively input into a self-attention mechanism model, and the weighted fusion feature vector is obtained by calculating the attention score between each feature vector and other feature vectors. And finally, further processing the fused feature vectors through a multi-layer perceptron (MLP), and improving the richness and discrimination capability of the feature representation. The multi-layer perceptron consists of a plurality of fully connected layers, and the feature vector is subjected to nonlinear transformation through a nonlinear activation function, so that the expression capability of the feature is enhanced.
S9, matching the comprehensive feature vector with the text feature vector judged by the rationality of the architecture design, and calculating the semantic matching coefficient of the text feature vector;
in the step of matching the integrated feature vector with the architecture diagram design rationality judgment feature vector to calculate the semantic matching coefficient thereof, it is first necessary to ensure that the integrated feature vector and the architecture diagram design rationality judgment feature vector are in the same vector space. The specific method comprises the following steps:
The two feature vector sets are subjected to standardization processing to enable the two feature vector sets to have comparability on the same scale, the standardization processing can be carried out by adopting a Z-score standardization method, and the feature vectors are scaled to the same scale, so that the magnitude difference among different features is eliminated, and the accuracy of a matching result is ensured.
And evaluating the semantic matching degree of the comprehensive feature vectors and the architecture diagram design rationality judgment feature vectors by calculating the cosine similarity between the comprehensive feature vectors and the architecture diagram design rationality judgment feature vectors. Cosine similarity is a commonly used index for measuring the similarity between two vectors, and is calculated by calculating the ratio of the inner product of the two vectors to the product of the modular lengths of the two vectors. Specifically, for each integrated feature vector and architecture diagram design rationality judgment feature vector, their cosine similarity values are calculated, the range of values is between-1 and 1, and a value closer to 1 indicates a higher semantic similarity of the two vectors.
And comparing the semantic matching coefficient obtained through calculation with a preset threshold value to determine the matching degree of the comprehensive feature vector of the architecture diagram and the design rationality judgment feature vector of the architecture diagram. If the semantic matching coefficient is higher than a preset threshold, the comprehensive feature vector is considered to be matched with the design rationality judgment feature vector of the architecture diagram in a semantic manner, which indicates that the design and the function of the architecture diagram meet the overall planning requirement, and if the semantic matching coefficient is lower than the preset threshold, which indicates that mismatch exists in certain aspects, further analysis and adjustment are needed.
S10, determining whether the structure diagram to be analyzed meets the overall planning requirement or not based on the comparison result of the semantic matching coefficient of the comprehensive feature vector and the structural design rationality judgment text (namely, the review rule text) feature vector and a preset threshold;
In the step of determining whether the architecture diagram to be analyzed meets the overall planning requirement or not based on the comparison result between the semantic matching coefficients of the comprehensive feature vector and the architecture diagram design rationality judgment feature vector and the preset threshold, firstly, statistics and analysis are required to be carried out on all the semantic matching coefficients. The method comprises the following specific steps:
And designing and rationally judging semantic matching coefficients of the feature vectors for each pair of comprehensive feature vectors and the architecture diagram, comparing the semantic matching coefficients with a preset threshold value, and counting the number and proportion of feature pairs with the matching coefficients higher than the threshold value. The preset threshold is determined according to actual application scenes and requirements, and a critical value capable of effectively distinguishing matching from unmatched is generally selected to ensure accuracy and reliability of an evaluation result.
And carrying out comprehensive evaluation based on the statistical result of the matching coefficient. For matching pairs above the preset threshold, they are considered to be semantically highly consistent, indicating that the corresponding parts in the architecture diagram meet the requirements of the review rules, while for matching pairs below the preset threshold, they are considered to be semantically different, indicating that there may be parts of the architecture diagram that are not in accordance with or deviate from the review rules. By comprehensively analyzing the matching result, whether the architecture diagram is in accordance with the planning requirement on the whole and which parts are in need of further optimization and adjustment can be determined.
And finally, summarizing the evaluation results and forming a detailed review report. The review report should include statistics of overall match, portions that meet the planning requirements, portions that do not meet the planning requirements and their specific reasons, improvement suggestions, and so forth. By the method, the overall planning compliance of the architecture diagram can be clarified, targeted improvement measures can be provided, and effective adjustment and optimization of designers in subsequent work can be facilitated. Through the review process of a scientific system, the design and implementation of the architecture diagram can be ensured to better accord with the expected planning, and powerful support is provided for successful implementation of projects.
The invention has the following beneficial effects:
And in the automatic processing, text information in the architecture diagram is automatically identified through the OCR module, so that the complicated process of manually identifying one by one is avoided, and the time and labor cost are saved.
Accuracy and high efficiency, namely, the text and image element information in the architecture diagram is quickly and accurately identified by utilizing the high accuracy and identification capability of a large model through a large amount of data training.
And (3) multi-mode fusion, namely effectively fusing the feature vector of the labeling text of the architecture diagram and the feature vector of the image element by a multi-mode fusion technology to generate the feature vector containing comprehensive semantic information.
And comprehensively evaluating, namely enhancing the richness and discrimination capability of the feature representation by utilizing the self-attention mechanism, the multi-layer perceptron and other technologies, and ensuring the design and the function of the architecture diagram to meet the overall planning requirement.
And semantic matching, namely judging the semantic matching coefficient between the feature vectors by calculating the rationality of the comprehensive feature vectors and the design of the architecture diagram, comparing the semantic matching coefficient with a preset threshold value, and evaluating the design rationality of the architecture diagram by a system.
And comprehensively analyzing, namely capturing element semantic features in the architecture diagram from multiple scales and multiple layers, performing system evaluation and judgment, and providing detailed review reports, wherein the detailed review reports comprise parts meeting and not meeting planning requirements and specific reasons and improvement suggestions thereof.
The scientific support provides deep support for system design and planning, ensures that the design and the function of the architecture diagram can better accord with the expected planning, and provides powerful guarantee for successful implementation of projects.
As shown in fig. 3, the present invention further provides a system for performing rationality judgment on architecture diagram design based on feature fusion, including:
The image acquisition module 310 is used for acquiring image data of the structure diagram to be analyzed as input of subsequent processing.
The character recognition module 320 is configured to perform character recognition on the to-be-analyzed architecture diagram by using the OCR module, extract all the text parts of the architecture diagram in the architecture diagram, including component names, tags, notes, and the like, and record the location of each text message.
The region of interest extraction module 330 is configured to process the structure diagram to be analyzed by using the R-CNN model, identify and acquire a set of feature diagrams of the region of interest of the image elements, such as a block diagram, a line, an arrow, and the like, and record a position of each image element.
And the position and logic information processing module 335 is used for establishing the corresponding relation between the text information of the OCR module and the position and logic information of the R-CNN model image element. For each text box identified by the OCR module, it is checked whether it overlaps with any of the R-CNN model identified image element bounding boxes. The cross-over ratio (IoU, intersection over Union) was used as a measure. If IoU values of the text box and the image element boundary box exceed a preset threshold, the text information and the image element are considered to have a corresponding relation, and the association between the text information and the image element boundary box is recorded.
And the position description generating module 337 converts the position and logic information of the image element into text description according to the corresponding relation between the position and logic information established by the OCR module and the R-CNN model. For example, 10 modules are included under the "service center", and the "data center" is located in the middle left position of the figure. And combining the position and logic information description into a corresponding architecture diagram text part to generate an extended architecture diagram text part.
The semantic coding module 350 is configured to perform word segmentation and semantic coding on the expanded architecture diagram text portion and the architecture diagram design rationality judgment portion to obtain a word granularity feature vector set of the architecture diagram description and a word granularity feature vector set of the architecture diagram design rationality judgment description.
The comprehensive feature vector generation module 360 is configured to fuse the architecture icon text feature vector set with the image element region of interest feature vector set to generate a comprehensive feature vector. The method specifically comprises three steps of feature extraction, feature space alignment and feature fusion. The richness and discrimination capability of the feature representation are improved through a self-attention mechanism and a multi-layer perceptron (MLP) processing of the fused feature vectors.
And the semantic matching coefficient calculating module 370 is used for matching the comprehensive feature vector with the architecture diagram design rationality judging feature vector and calculating the semantic matching coefficient of the architecture diagram design rationality judging feature vector. And ensuring that the comprehensive feature vector and the architecture diagram design rationality judgment feature vector are in the same feature space so as to effectively match. And (5) calculating cosine similarity of each pair of feature vectors, and evaluating semantic matching degree of the feature vectors.
The judging result generating module 380 is configured to determine whether the structure diagram to be analyzed meets the overall planning requirement based on the comparison between the semantic matching coefficients among the feature sequences and the preset threshold. And carrying out statistics and analysis on all semantic matching coefficients, comprehensively evaluating the overall matching degree of the architecture diagram, and generating a detailed review report, wherein the detailed review report comprises overall matching degree statistics, parts meeting planning requirements, parts not meeting requirements, specific reasons, improvement suggestions and the like.
The present invention is not limited to the preferred embodiments, but the patent protection scope of the invention is defined by the claims, and all equivalent structural changes made by the specification and the drawings are included in the scope of the invention.

Claims (7)

1.一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于,包括以下步骤:1. A method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion, characterized in that it comprises the following steps: S1、获取待分析架构图,获取待分析架构图的图像数据,作为后续处理的输入;S1. Obtain the architecture diagram to be analyzed, and obtain image data of the architecture diagram to be analyzed as input for subsequent processing; S2、利用OCR模块对所述待分析架构图进行文字识别,得到架构图文本部分,并记录每个文本信息的位置;S2. Using an OCR module to perform text recognition on the architecture diagram to be analyzed, obtain the text portion of the architecture diagram, and record the position of each text information; S3、利用R-CNN模型对所述待分析架构图进行处理,得到架构图中图像元素感兴趣区域特征图的集合,并记录每个图像元素的位置;S3. Process the architecture diagram to be analyzed using the R-CNN model to obtain a set of feature maps of regions of interest of image elements in the architecture diagram, and record the position of each image element; S4、建立OCR模块文本信息与R-CNN模型图像元素的位置及逻辑信息对应关系,记录它们之间的关联;S4, establish the position and logical information correspondence between the OCR module text information and the R-CNN model image elements, and record the association between them; S5、将图像元素的位置及逻辑关系转为文字描述,并与架构图文本部分合并,生成扩展后的架构图文本部分;S5, converting the positions and logical relationships of the image elements into text descriptions, and merging them with the text part of the architecture diagram to generate an expanded text part of the architecture diagram; S6、对所述扩展后的架构图文本部分进行基于词粒度的语义编码,得到架构图文本描述词粒度特征向量的集合;S6, performing semantic encoding based on word granularity on the expanded architecture diagram text portion to obtain a set of architecture diagram text description word granularity feature vectors; S7、对架构设计合理性判断文本进行基于词粒度的语义编码,得到架构设计合理性判断文本特征向量的集合;S7, performing semantic encoding based on word granularity on the architecture design rationality judgment text to obtain a set of architecture design rationality judgment text feature vectors; S8、使用CLIP模型进行多模态特征融合,生成综合特征向量;S8, use CLIP model to perform multimodal feature fusion and generate comprehensive feature vector; 具体方法如下:The specific method is as follows: 特征提取:使用CLIP模型提取架构图中图像元素感兴趣区域特征图像元素特征向量;Feature extraction: Use the CLIP model to extract the feature vectors of the image elements in the region of interest in the architecture diagram; 特征空间对齐:CLIP模型将架构图文本描述词粒度特征向量的集合和架构图中图像元素感兴趣区域特征图像元素特征向量映射到相同的多模态特征空间中,使得具有相同语义的文本和图像特征在该空间中接近;Feature space alignment: The CLIP model maps the set of granular feature vectors of the text description words in the architecture diagram and the feature vectors of the image element regions of interest in the architecture diagram to the same multimodal feature space, so that text and image features with the same semantics are close in this space; 特征融合:将架构图文本描述词粒度特征向量的集合与架构图中图像元素感兴趣区域特征图像元素特征向量进行加权融合,生成综合特征向量;Feature fusion: weighted fusion of the set of granular feature vectors of the text description words in the architecture diagram and the feature vectors of the image elements in the region of interest in the architecture diagram to generate a comprehensive feature vector; S9、将综合特征向量与架构设计合理性判断文本特征向量进行匹配,计算其语义匹配系数;S9, matching the comprehensive feature vector with the architecture design rationality judgment text feature vector, and calculating the semantic matching coefficient; S10、基于综合特征向量与架构设计合理性判断文本特征向量的语义匹配系数与预设阈值的比较结果,确定待分析架构图是否符合整体规划要求。S10. Based on the comparison result of the semantic matching coefficient of the text feature vector and the rationality of the architecture design judgment text feature vector and the preset threshold, determine whether the architecture diagram to be analyzed meets the overall planning requirements. 2.根据权利要求1所述的一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于:建立OCR模块文本信息与R-CNN模型图像元素的位置及逻辑信息对应关系包括以下步骤:2. According to the method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion according to claim 1, it is characterized in that: establishing the corresponding relationship between the position and logical information of the OCR module text information and the image elements of the R-CNN model includes the following steps: 对每个OCR模块识别出的文本框,检查其是否与任何一个R-CNN模型识别出的图像元素边界框重叠,使用交并比作为衡量标准;For each text box recognized by the OCR module, check whether it overlaps with any image element bounding box recognized by the R-CNN model, using the intersection-over-union ratio as the metric; 若文本框与图像元素边界框的IoU值超过预设阈值,记录它们之间的关联;If the IoU value between the text box and the image element bounding box exceeds the preset threshold, the association between them is recorded; 通过交并关系检测大边界框与小边界框的对应关系,记录层次关系,确保捕捉架构图中的层次结构和模块关系。The correspondence between the large bounding box and the small bounding box is detected through the intersection relationship, and the hierarchical relationship is recorded to ensure that the hierarchical structure and module relationship in the architecture diagram are captured. 3.根据权利要求2所述的一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于:利用OCR模块和R-CNN模型提取文本和图像元素的位置信息,具体包括以下步骤:3. According to claim 2, a method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion is characterized in that the location information of text and image elements is extracted using an OCR module and an R-CNN model, specifically comprising the following steps: 使用OCR模块对架构图进行文字识别,提取架构图中的所有架构图文本部分,并记录每个文本信息在架构图中的位置;Use the OCR module to perform text recognition on the architecture diagram, extract all the text parts in the architecture diagram, and record the position of each text information in the architecture diagram; 使用R-CNN模型对架构图进行处理,识别并获取图像元素的感兴趣区域特征图集合,并记录每个图像元素的位置;Use the R-CNN model to process the architecture diagram, identify and obtain the feature map set of the region of interest of the image element, and record the position of each image element; 根据OCR模块与R-CNN模型建立的位置及逻辑信息对应关系,将图像元素的位置及逻辑信息转化为文字描述,并与架构图文本部分合并,生成扩展后的架构图文本部分。According to the position and logical information correspondence established between the OCR module and the R-CNN model, the position and logical information of the image elements are converted into text descriptions and merged with the text part of the architecture diagram to generate the expanded text part of the architecture diagram. 4.根据权利要求3所述的一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于:从所述架构图中图像元素感兴趣区域特征图中提取架构图中图像元素感兴趣区域局部语义特征向量,具体包括以下步骤:4. According to claim 3, a method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion is characterized in that: extracting a local semantic feature vector of an image element region of interest in the architecture diagram from a feature map of an image element region of interest in the architecture diagram specifically comprises the following steps: 对所述架构图中图像元素感兴趣区域特征图进行上采样,得到上采样架构图中图像元素感兴趣区域特征图;Upsampling the feature map of the region of interest of the image element in the architecture diagram to obtain the feature map of the region of interest of the image element in the upsampled architecture diagram; 对所述上采样架构图中图像元素感兴趣区域特征图进行点卷积编码,得到通道调制上采样架构图中图像元素感兴趣区域特征图;Performing point convolution coding on the feature map of the region of interest of the image element in the upsampling architecture diagram to obtain the feature map of the region of interest of the image element in the channel modulation upsampling architecture diagram; 对所述通道调制上采样架构图中图像元素感兴趣区域特征图进行二维卷积编码,得到所述架构图中图像元素感兴趣区域局部语义特征向量。Two-dimensional convolution coding is performed on the feature map of the image element region of interest in the channel modulation upsampling architecture diagram to obtain a local semantic feature vector of the image element region of interest in the architecture diagram. 5.根据权利要求4所述的一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于:对所述架构图文本部分进行基于词粒度的语义编码,得到架构图文本描述词粒度特征向量的集合,具体包括以下步骤:5. According to claim 4, a method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion is characterized in that: the text part of the architecture diagram is semantically encoded based on word granularity to obtain a set of word granularity feature vectors describing the text of the architecture diagram, specifically comprising the following steps: 对所述架构图文本部分进行分词处理后通过大模型包含的词嵌入层的语义编码器,得到所述架构图文本描述词粒度特征向量的集合;After word segmentation processing is performed on the text part of the architecture diagram, a set of granular feature vectors of the word description of the architecture diagram text is obtained through a semantic encoder of the word embedding layer included in the large model; 对架构设计合理性判断文本进行基于词粒度的语义编码,得到架构设计合理性判断文本描述词粒度特征向量的集合,包括:对所述架构设计合理性判断文本进行分词处理后通过大模型包含的词嵌入层的语义编码器,得到所述架构设计合理性判断文本描述词粒度特征向量的集合。The text of the rationality judgment of the architecture design is semantically encoded based on word granularity to obtain a set of feature vectors of the word granularity describing the text of the rationality judgment of the architecture design, including: after word segmentation processing is performed on the text of the rationality judgment of the architecture design, a semantic encoder of the word embedding layer included in the large model is passed to obtain a set of feature vectors of the word granularity describing the text of the rationality judgment of the architecture design. 6.根据权利要求5所述的一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于:对所述架构图文本部分进行分词处理后通过所述大模型包含的词嵌入层的语义编码器,得到所述架构图文本描述词粒度特征向量的集合,具体包括以下步骤:6. According to claim 5, a method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion is characterized in that: after word segmentation processing is performed on the text portion of the architecture diagram, a set of granular feature vectors of the word description words of the architecture diagram text is obtained through a semantic encoder of the word embedding layer contained in the large model, specifically comprising the following steps: 对所述架构图及架构设计合理性判断文本进行分词处理以将所述架构图文本部分转化为由多个词组成的词序列;Performing word segmentation processing on the architecture diagram and the architecture design rationality judgment text to convert the architecture diagram text portion into a word sequence consisting of multiple words; 使用所述包含词嵌入层的语义编码器的嵌入层分别将所述词序列中各个词映射为词嵌入向量,得到词嵌入向量的序列;Using the embedding layer of the semantic encoder including the word embedding layer, each word in the word sequence is mapped into a word embedding vector to obtain a sequence of word embedding vectors; 包含所述词嵌入层的语义编码器的转换器对所述词嵌入向量的序列进行基于转换器思想的全局上下文语义编码,得到所述架构图文本描述词粒度特征向量的集合。The converter including the semantic encoder of the word embedding layer performs global context semantic encoding on the sequence of the word embedding vectors based on the converter idea to obtain a set of granular feature vectors of the text description words of the architecture diagram. 7.根据权利要求6所述的一种基于多模态特征融合的架构图合理性分析判断方法,其特征在于:基于特征序列间的语义匹配系数与预设阈值之间的比较,确定所述待分析架构图是否符合整体规划要求,包括:响应于所述特征序列间语义匹配系数大于所述预设阈值,确定所述待分析架构图符合整体规划要求。7. According to claim 6, a method for analyzing and judging the rationality of an architecture diagram based on multimodal feature fusion is characterized in that: based on a comparison between a semantic matching coefficient between feature sequences and a preset threshold, determining whether the architecture diagram to be analyzed meets the overall planning requirements includes: in response to the semantic matching coefficient between the feature sequences being greater than the preset threshold, determining that the architecture diagram to be analyzed meets the overall planning requirements.
CN202411679083.1A 2024-11-22 2024-11-22 A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion Active CN119169643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411679083.1A CN119169643B (en) 2024-11-22 2024-11-22 A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411679083.1A CN119169643B (en) 2024-11-22 2024-11-22 A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion

Publications (2)

Publication Number Publication Date
CN119169643A CN119169643A (en) 2024-12-20
CN119169643B true CN119169643B (en) 2025-04-01

Family

ID=93884353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411679083.1A Active CN119169643B (en) 2024-11-22 2024-11-22 A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion

Country Status (1)

Country Link
CN (1) CN119169643B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119398954A (en) * 2025-01-03 2025-02-07 上海微创软件股份有限公司 Financial data management system based on large language model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229482A (en) * 2023-02-03 2023-06-06 华北水利水电大学 Visual multimodal text detection, recognition and error correction methods in network public opinion analysis
CN117972359A (en) * 2024-03-28 2024-05-03 北京尚博信科技有限公司 Intelligent data analysis method based on multi-mode data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818906B (en) * 2021-02-22 2023-07-11 浙江传媒学院 An intelligent cataloging method for all-media news based on multi-modal information fusion understanding
WO2023050295A1 (en) * 2021-09-30 2023-04-06 中远海运科技股份有限公司 Multimodal heterogeneous feature fusion-based compact video event description method
CN116994069B (en) * 2023-09-22 2023-12-22 武汉纺织大学 Image analysis method and system based on multi-mode information
CN117611245B (en) * 2023-12-14 2024-05-31 浙江博观瑞思科技有限公司 Data analysis management system and method for planning E-business operation activities
CN117875289A (en) * 2024-01-24 2024-04-12 河南高辉教育科技有限公司 Table information processing method, system and storage medium
CN117744785B (en) * 2024-02-19 2024-09-03 北京博阳世通信息技术有限公司 Space-time knowledge graph intelligent construction method and system based on network acquisition data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229482A (en) * 2023-02-03 2023-06-06 华北水利水电大学 Visual multimodal text detection, recognition and error correction methods in network public opinion analysis
CN117972359A (en) * 2024-03-28 2024-05-03 北京尚博信科技有限公司 Intelligent data analysis method based on multi-mode data

Also Published As

Publication number Publication date
CN119169643A (en) 2024-12-20

Similar Documents

Publication Publication Date Title
CN108984683B (en) Method, system, equipment and storage medium for extracting structured data
Li et al. Localizing and quantifying damage in social media images
CN114092707B (en) Image-text visual question answering method, system and storage medium
CN119169643B (en) A method for analyzing and judging the rationality of architecture diagrams based on multimodal feature fusion
CN111159356B (en) Knowledge graph construction method based on teaching content
CN110598634B (en) Machine room sketch identification method and device based on graph example library
CN113420619A (en) Remote sensing image building extraction method
CN117437647B (en) Oracle bone text detection method based on deep learning and computer vision
CN118154988B (en) Automatic monitoring system and method for infringing and counterfeit goods
CN117608545B (en) Standard operation program generation method based on knowledge graph
CN117173530B (en) Target abnormality detection method and device
CN113239227A (en) Image data structuring method and device, electronic equipment and computer readable medium
CN116704512A (en) A meter recognition method and system integrating semantic and visual information
CN118799690A (en) Marine remote sensing visual question answering method and system based on multi-order knowledge comparison
CN117576015A (en) Weld defect visual detection method based on improved YOLOv5s network
CN117648093A (en) RPA flow automatic generation method based on large model and self-customized demand template
CN119669600B (en) RPA element anchor point automatic searching method based on multi-mode model
Yin et al. An automated layer classification method for converting CAD drawings to 3D BIM models
Han et al. Rule-based continuous line classification using shape and positional relationships between objects in piping and instrumentation diagram
CN112967292B (en) Automatic cutout and scoring method and system for E-commerce products
CN118762369A (en) A marking recognition method, device and electronic equipment for engineering drawings
Joy et al. Automation of material takeoff using computer vision
CN115934966A (en) Automatic labeling method based on remote sensing image recommendation information
CN117372510B (en) Map annotation identification method, terminal and medium based on computer vision model
CN119904760B (en) Remote sensing image change detection method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant