CN118691917B - Bridge structure damage identification method and system based on machine vision - Google Patents
Bridge structure damage identification method and system based on machine vision Download PDFInfo
- Publication number
- CN118691917B CN118691917B CN202411173109.5A CN202411173109A CN118691917B CN 118691917 B CN118691917 B CN 118691917B CN 202411173109 A CN202411173109 A CN 202411173109A CN 118691917 B CN118691917 B CN 118691917B
- Authority
- CN
- China
- Prior art keywords
- image
- implicit
- representation
- representations
- machine vision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 115
- 239000011159 matrix material Substances 0.000 claims abstract description 94
- 239000013598 vector Substances 0.000 claims description 119
- 238000000605 extraction Methods 0.000 claims description 105
- 230000003044 adaptive effect Effects 0.000 claims description 63
- 238000012545 processing Methods 0.000 claims description 40
- 230000009467 reduction Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000010354 integration Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 4
- 239000003086 colorant Substances 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000014509 gene expression Effects 0.000 claims 28
- 230000000750 progressive effect Effects 0.000 claims 27
- 230000003902 lesion Effects 0.000 claims 2
- 238000012512 characterization method Methods 0.000 claims 1
- 238000005192 partition Methods 0.000 claims 1
- 239000000523 sample Substances 0.000 claims 1
- 238000000638 solvent extraction Methods 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 44
- 238000013527 convolutional neural network Methods 0.000 description 30
- 230000006870 function Effects 0.000 description 22
- 230000007246 mechanism Effects 0.000 description 20
- 238000004422 calculation algorithm Methods 0.000 description 18
- 238000013528 artificial neural network Methods 0.000 description 17
- 230000004913 activation Effects 0.000 description 13
- 239000000284 extract Substances 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 11
- 238000009826 distribution Methods 0.000 description 11
- 230000009466 transformation Effects 0.000 description 11
- 230000007797 corrosion Effects 0.000 description 10
- 238000005260 corrosion Methods 0.000 description 10
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000011176 pooling Methods 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000000513 principal component analysis Methods 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 7
- 238000012706 support-vector machine Methods 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 238000010606 normalization Methods 0.000 description 6
- 238000013136 deep learning model Methods 0.000 description 5
- 238000003708 edge detection Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 230000004807 localization Effects 0.000 description 5
- 238000004901 spalling Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 229910000831 Steel Inorganic materials 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000010959 steel Substances 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 230000026676 system process Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/40—Analysis of texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/467—Encoded features or binary features, e.g. local binary patterns [LBP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
本申请提供一种基于机器视觉的桥梁结构损伤识别方法及系统,获取需要完成桥梁结构损伤识别的机器视觉图像中的图像分块,获取用以表示图像分块的不同类型特征的t个图像隐式表示和机器视觉图像对应的图像块连通性矩阵;根据图像分块对应的t个图像隐式表示和图像块连通性矩阵,得到图像分块对应的合并隐式表示。合并隐式表示用以对图像分块进行桥梁结构损伤识别,进而对机器视觉图像进行桥梁结构损伤识别,得到识别结果。本申请对机器视觉图像进行桥梁结构损伤识别,不是将机器视觉图像与已有损伤特征进行对比,当已有特征不能覆盖未知损伤情况时,也能采用上述实现过程对机器视觉图像进行桥梁结构损伤识别,增加了桥梁结构损伤识别的可靠性。
The present application provides a method and system for bridge structure damage identification based on machine vision, which obtains image blocks in machine vision images that need to complete bridge structure damage identification, obtains t image implicit representations for representing different types of features of the image blocks and the image block connectivity matrix corresponding to the machine vision image; and obtains a merged implicit representation corresponding to the image blocks based on the t image implicit representations and the image block connectivity matrix corresponding to the image blocks. The merged implicit representation is used to perform bridge structure damage identification on the image blocks, and then perform bridge structure damage identification on the machine vision image to obtain an identification result. The present application performs bridge structure damage identification on machine vision images, rather than comparing the machine vision images with existing damage features. When the existing features cannot cover unknown damage conditions, the above implementation process can also be used to perform bridge structure damage identification on machine vision images, thereby increasing the reliability of bridge structure damage identification.
Description
技术领域Technical Field
本申请涉及机器视觉技术领域,尤其涉及一种基于机器视觉的桥梁结构损伤识别方法及系统。The present application relates to the field of machine vision technology, and in particular to a method and system for identifying bridge structure damage based on machine vision.
背景技术Background Art
在桥梁结构健康监测领域,机器视觉技术作为一种非接触、高效率的检测手段,近年来得到了广泛的应用。现有的基于机器视觉的桥梁结构损伤识别技术仍存在一些挑战。一方面,桥梁结构损伤的类型和形态多样,包括裂缝、腐蚀、变形等,这些损伤在机器视觉图像中的表现形式也各不相同。另一方面,由于桥梁结构的复杂性和多样性,机器视觉图像中的背景噪声和干扰因素也较多,这给损伤识别带来了较大的困难。传统的机器视觉损伤识别方法通常是通过将待识别图像与已有的损伤特征库进行对比,来实现对损伤的识别和分类。然而,这种方法存在明显的局限性。当待识别图像中的损伤特征与已有特征库中的特征不匹配时,这种方法往往无法准确识别出损伤,从而导致漏检或误检的情况。特别是在面对未知或新颖的损伤类型时,这种方法的可靠性更是大打折扣。因此,需要一种新的基于机器视觉的桥梁结构损伤识别技术,该技术能够克服传统方法的局限性,提高损伤识别的准确性和可靠性。In the field of bridge structure health monitoring, machine vision technology has been widely used in recent years as a non-contact and efficient detection method. The existing machine vision-based bridge structure damage identification technology still has some challenges. On the one hand, the types and forms of bridge structure damage are diverse, including cracks, corrosion, deformation, etc., and the manifestations of these damages in machine vision images are also different. On the other hand, due to the complexity and diversity of bridge structures, there are also many background noises and interference factors in machine vision images, which brings great difficulties to damage identification. Traditional machine vision damage identification methods usually achieve damage identification and classification by comparing the image to be identified with the existing damage feature library. However, this method has obvious limitations. When the damage features in the image to be identified do not match the features in the existing feature library, this method often cannot accurately identify the damage, resulting in missed detection or false detection. Especially when facing unknown or novel damage types, the reliability of this method is greatly reduced. Therefore, a new machine vision-based bridge structure damage identification technology is needed, which can overcome the limitations of traditional methods and improve the accuracy and reliability of damage identification.
发明内容Summary of the invention
有鉴于此,本申请实施例提供一种基于机器视觉的桥梁结构损伤识别方法及系统。本申请的技术方案是这样实现的:In view of this, the embodiment of the present application provides a bridge structure damage identification method and system based on machine vision. The technical solution of the present application is implemented as follows:
第一方面,本申请实施例提供一种基于机器视觉的桥梁结构损伤识别方法,包括:采集待识别桥梁结构的机器视觉图像;对所述机器视觉图像中的图像分块进行图像隐式表示提取,得到所述图像分块对应的t个图像隐式表示,其中,t∈(0,+∞);t个所述图像隐式表示用以表示不同类型的特征,所述图像分块是对所述机器视觉图像进行分块划分获得的;对t个所述图像隐式表示进行合并,得到所述图像分块对应的合并隐式表示,对所述合并隐式表示进行适应性特征集成,得到用以突出表征所述合并隐式表示的适应性隐式表示;对所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的输出隐式表示进行逐级合并,得到所述图像分块对应的合并隐式表示;所述输出隐式表示是通过所述适应性隐式表示和所述合并隐式表示进行获取,所述图像块连通性矩阵用以表征所述图像分块之间的相似度;根据所述合并隐式表示对所述图像分块进行桥梁结构损伤识别,得到图像分块损伤识别结果,根据各个图像分块的图像分块损伤识别结果得到所述待识别桥梁结构的损伤识别结果。In a first aspect, an embodiment of the present application provides a method for identifying damage to a bridge structure based on machine vision, comprising: acquiring a machine vision image of a bridge structure to be identified; performing image implicit representation extraction on image blocks in the machine vision image to obtain t image implicit representations corresponding to the image blocks, wherein t∈(0,+∞); the t image implicit representations are used to represent different types of features, and the image blocks are obtained by dividing the machine vision image into blocks; merging the t image implicit representations to obtain a merged implicit representation corresponding to the image blocks, and performing adaptive feature integration on the merged implicit representation to obtain a feature for highlighting the representation. The adaptive implicit representation of the merged implicit representation is characterized; the image block connectivity matrix corresponding to the machine vision image and the output implicit representation corresponding to the image block are merged step by step to obtain the merged implicit representation corresponding to the image block; the output implicit representation is obtained through the adaptive implicit representation and the merged implicit representation, and the image block connectivity matrix is used to characterize the similarity between the image blocks; bridge structure damage is identified for the image blocks according to the merged implicit representation to obtain image block damage identification results, and the damage identification result of the bridge structure to be identified is obtained according to the image block damage identification results of each image block.
第二方面,本申请提供一种计算机系统,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以上所述方法中的步骤。In a second aspect, the present application provides a computer system, comprising a memory and a processor, wherein the memory stores a computer program executable on the processor, and the processor implements the steps in the above-described method when executing the program.
本申请的有益效果至少包括:本申请获取需要完成桥梁结构损伤识别的机器视觉图像中的图像分块,获取用以表示图像分块的不同类型特征的t个图像隐式表示和机器视觉图像对应的图像块连通性矩阵;然后,根据图像分块对应的t个图像隐式表示和图像块连通性矩阵,得到图像分块对应的合并隐式表示。合并隐式表示用以对图像分块进行桥梁结构损伤识别,进而对机器视觉图像进行桥梁结构损伤识别,得到识别结果。本申请对机器视觉图像进行桥梁结构损伤识别,不是将机器视觉图像与已有损伤特征进行对比,当已有特征不能覆盖未知损伤情况时,本申请也能采用上述实现过程对机器视觉图像进行桥梁结构损伤识别,增加了桥梁结构损伤识别的可靠性。The beneficial effects of the present application include at least: the present application obtains image blocks in machine vision images that need to complete bridge structure damage recognition, obtains t image implicit representations for representing different types of features of the image blocks and the image block connectivity matrix corresponding to the machine vision image; then, based on the t image implicit representations corresponding to the image blocks and the image block connectivity matrix, obtains the merged implicit representation corresponding to the image blocks. The merged implicit representation is used to perform bridge structure damage recognition on the image blocks, and then perform bridge structure damage recognition on the machine vision image to obtain a recognition result. The present application performs bridge structure damage recognition on machine vision images, rather than comparing the machine vision images with existing damage features. When the existing features cannot cover unknown damage conditions, the present application can also use the above-mentioned implementation process to perform bridge structure damage recognition on machine vision images, thereby increasing the reliability of bridge structure damage recognition.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本申请的技术方案。It should be understood that the above general description and the following detailed description are merely exemplary and explanatory, and are not intended to limit the technical solutions of the present application.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。The drawings herein are incorporated into the specification and constitute a part of the specification. These drawings illustrate embodiments consistent with the present application and are used together with the specification to illustrate the technical solution of the present application.
图1为本申请实施例提供的一种基于机器视觉的桥梁结构损伤识别方法的实现流程示意图。FIG1 is a schematic diagram of an implementation flow of a method for identifying bridge structure damage based on machine vision provided in an embodiment of the present application.
图2为本申请实施例提供的一种计算机系统的硬件实体示意图。FIG. 2 is a schematic diagram of a hardware entity of a computer system provided in an embodiment of the present application.
具体实施方式DETAILED DESCRIPTION
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图和实施例对本申请的技术方案进一步详细阐述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions of the present application are further elaborated in detail below in conjunction with the drawings and embodiments. The described embodiments should not be regarded as limiting the present application. All other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of the present application.
本申请实施例提供一种基于机器视觉的桥梁结构损伤识别方法,该方法可以由计算机系统的处理器执行。其中,计算机系统可以指的是服务器、笔记本电脑、平板电脑、台式计算机、移动设备(例如移动电话、便携式视频播放器、个人数字助理、专用消息设备、便携式游戏设备)等具备数据处理能力的设备。The embodiment of the present application provides a bridge structure damage identification method based on machine vision, which can be executed by a processor of a computer system. The computer system can refer to a server, a laptop, a tablet computer, a desktop computer, a mobile device (such as a mobile phone, a portable video player, a personal digital assistant, a dedicated messaging device, a portable gaming device) and other devices with data processing capabilities.
图1为本申请实施例提供的一种基于机器视觉的桥梁结构损伤识别方法的实现流程示意图,如图1所示,该方法包括:FIG1 is a schematic diagram of an implementation flow of a bridge structure damage identification method based on machine vision provided in an embodiment of the present application. As shown in FIG1 , the method includes:
步骤S100:采集待识别桥梁结构的机器视觉图像。Step S100: Acquire a machine vision image of the bridge structure to be identified.
在步骤S100中,计算机系统控制并协调安装在桥梁上的机器视觉设备(如高清摄像头、红外热像仪等)。这些设备通常被安装在桥梁的关键位置,如桥墩、主梁连接处、支座等,以捕捉桥梁结构的详细图像。计算机系统根据预设的配置文件或实时指令,设定机器视觉设备的采集参数,包括分辨率、帧率、曝光时间等。例如,为了捕捉细微的裂纹或变形,系统可以设置较高的分辨率和适当的曝光时间。根据实际需求,图像采集可以设置为定时执行(如每日凌晨低流量时段自动采集),或根据特定事件触发(如检测到异常振动时即时采集)。计算机系统通过内部时钟或外部传感器信号来控制采集时机。采集到的机器视觉图像数据将被实时传输至中央处理服务器。这一过程中,计算机系统负责确保数据的完整性和安全性,采用加密通信协议防止数据泄露。同时,系统会将图像数据按照特定格式(如JPEG、PNG)存储在服务器的数据库中,以便后续处理和分析。In step S100, the computer system controls and coordinates the machine vision equipment (such as high-definition cameras, infrared thermal imagers, etc.) installed on the bridge. These devices are usually installed at key locations of the bridge, such as piers, main beam connections, supports, etc., to capture detailed images of the bridge structure. The computer system sets the acquisition parameters of the machine vision equipment according to the preset configuration file or real-time instructions, including resolution, frame rate, exposure time, etc. For example, in order to capture subtle cracks or deformations, the system can set a higher resolution and an appropriate exposure time. According to actual needs, image acquisition can be set to be executed on a scheduled basis (such as automatic acquisition during low-flow periods in the early morning every day) or triggered by specific events (such as instant acquisition when abnormal vibration is detected). The computer system controls the acquisition timing through an internal clock or an external sensor signal. The collected machine vision image data will be transmitted to the central processing server in real time. In this process, the computer system is responsible for ensuring the integrity and security of the data and using encrypted communication protocols to prevent data leakage. At the same time, the system will store the image data in a specific format (such as JPEG, PNG) in the database of the server for subsequent processing and analysis.
步骤S200:对所述机器视觉图像中的图像分块进行图像隐式表示提取,得到所述图像分块对应的t个图像隐式表示,其中,t∈(0,+∞);t个所述图像隐式表示用以表示不同类型的特征,所述图像分块是对所述机器视觉图像进行分块划分获得的。Step S200: extracting implicit image representations of image blocks in the machine vision image to obtain t implicit image representations corresponding to the image blocks, where t∈(0,+∞); the t implicit image representations are used to represent different types of features, and the image blocks are obtained by dividing the machine vision image into blocks.
在步骤S200中,计算机系统负责对机器视觉图像中的每个图像分块进行深入分析,通过特定的算法或模型提取出多种图像隐式表示,这些隐式表示分别刻画了图像分块在不同维度或方面的特征,为后续的损伤识别提供丰富的信息基础。以下是对该步骤的详细展开解释,结合具体应用场景进行说明。In step S200, the computer system is responsible for in-depth analysis of each image block in the machine vision image, and extracts multiple implicit image representations through specific algorithms or models. These implicit representations respectively describe the characteristics of the image blocks in different dimensions or aspects, providing a rich information basis for subsequent damage identification. The following is a detailed explanation of this step, combined with specific application scenarios.
计算机系统首先根据预设的分块策略(如固定大小的重叠窗口滑动)对原始图像进行分块处理。假设图像被划分为多个大小为128×128像素的图像分块,这些分块部分重叠以确保边界信息的连续性。对于每个图像分块,计算机系统利用纹理分析算法(如灰度共生矩阵、局部二值模式LBP等)来提取其纹理特征。这些特征以向量的形式表示,例如,一个长度为64的向量,每个元素对应一种纹理属性的量化值。为了捕捉图像分块的形状信息,系统可以采用边缘检测算法(如Canny边缘检测器)结合形状描述符(如Hu矩)来生成形状特征向量。这些特征向量可能包含边缘的几何属性、矩不变量等,用于描述分块的整体形状和轮廓。对于彩色图像,系统还会提取每个分块的颜色特征。这可以通过颜色直方图、颜色矩或颜色聚合向量等方法实现。例如,使用HSV颜色空间中的直方图,将颜色空间划分为多个区间,统计每个区间内的像素数量,形成一个颜色特征向量。在某些高级应用中,系统还可以尝试提取图像分块的语义信息。这例如涉及深度学习模型,如卷积神经网络(CNN)的预训练模型,用于识别图像中的对象或场景。通过CNN的某个中间层输出作为特征图,可以进一步处理得到语义特征向量,该向量隐式地表示了图像分块的高级语义内容。对于每个图像分块,计算机系统将上述不同类型的隐式表示(纹理、形状、颜色、语义等)进行合并。这通常意味着将这些特征向量拼接成一个更长的向量,作为该分块的综合表示。例如,如果纹理特征向量长度为64,形状特征向量长度为10,颜色特征向量长度为256,那么合并后的隐式表示向量长度将为64+10+256=330。The computer system first divides the original image into blocks according to a preset block strategy (such as sliding overlapping windows of fixed size). Assume that the image is divided into multiple image blocks of 128×128 pixels, and these blocks partially overlap to ensure the continuity of boundary information. For each image block, the computer system uses texture analysis algorithms (such as gray level co-occurrence matrix, local binary pattern LBP, etc.) to extract its texture features. These features are represented in the form of vectors, for example, a vector of length 64, each element of which corresponds to a quantized value of a texture attribute. In order to capture the shape information of the image block, the system can use edge detection algorithms (such as Canny edge detector) combined with shape descriptors (such as Hu moments) to generate shape feature vectors. These feature vectors may contain geometric properties of edges, moment invariants, etc., which are used to describe the overall shape and outline of the block. For color images, the system also extracts color features of each block. This can be achieved through methods such as color histograms, color moments, or color aggregation vectors. For example, using the histogram in the HSV color space, the color space is divided into multiple intervals, and the number of pixels in each interval is counted to form a color feature vector. In some advanced applications, the system can also try to extract semantic information from image patches. This involves, for example, a pre-trained model of a deep learning model, such as a convolutional neural network (CNN), for identifying objects or scenes in an image. The output of an intermediate layer of the CNN is used as a feature map, which can be further processed to obtain a semantic feature vector, which implicitly represents the high-level semantic content of the image patch. For each image patch, the computer system merges the different types of implicit representations mentioned above (texture, shape, color, semantics, etc.). This usually means concatenating these feature vectors into a longer vector as a comprehensive representation of the patch. For example, if the texture feature vector is 64 in length, the shape feature vector is 10 in length, and the color feature vector is 256 in length, then the length of the merged implicit representation vector will be 64+10+256=330.
步骤S300:对t个所述图像隐式表示进行合并,得到所述图像分块对应的合并隐式表示,对所述合并隐式表示进行适应性特征集成,得到用以突出表征所述合并隐式表示的适应性隐式表示。Step S300: merging t implicit representations of the image to obtain a merged implicit representation corresponding to the image block, and performing adaptive feature integration on the merged implicit representation to obtain an adaptive implicit representation for highlighting the merged implicit representation.
在步骤S300中,计算机系统对步骤S200中提取出的多种图像隐式表示(每种代表图像分块的一个特定特征维度,共计t种)进行进一步的处理,旨在整合这些多维特征,并通过适应性特征集成技术强化关键信息,从而生成一个更加全面且富有代表性的适应性隐式表示。这一步骤对于后续的桥梁结构损伤识别至关重要,因为它确保了所有相关特征都被有效地整合和利用。首先,计算机系统将每个图像分块的t个图像隐式表示进行合并。这些隐式表示可能包括纹理特征向量、形状特征向量、颜色特征向量等,它们各自捕捉了图像分块的不同方面。合并操作通常意味着将这些向量简单地拼接在一起,形成一个更长的向量,即合并隐式表示。例如,假设某个图像分块的纹理特征向量为ftexture=[0.2,0.3,…,0.1](长度为n),形状特征向量为fshape=[0.5,−0.2,…,0.4](长度为m),颜色特征向量为fcolor=[20,50,…,40](长度为p)。那么,合并后的隐式表示将是这三个向量的拼接:fmerged=[ftexture,fshape,fcolor]=[0.2,0.3,…,0.1,0.5,−0.2,…,0.4,20,50,…,40]。In step S300, the computer system further processes the multiple implicit image representations extracted in step S200 (each representing a specific feature dimension of the image block, a total of t types), aiming to integrate these multi-dimensional features and strengthen key information through adaptive feature integration technology, thereby generating a more comprehensive and representative adaptive implicit representation. This step is crucial for subsequent bridge structure damage identification because it ensures that all relevant features are effectively integrated and utilized. First, the computer system merges the t implicit image representations of each image block. These implicit representations may include texture feature vectors, shape feature vectors, color feature vectors, etc., each of which captures different aspects of the image block. The merging operation usually means simply splicing these vectors together to form a longer vector, that is, merging the implicit representations. For example, suppose the texture feature vector of an image block is f texture = [0.2, 0.3, …, 0.1] (length n), the shape feature vector is f shape = [0.5, −0.2, …, 0.4] (length m), and the color feature vector is f color = [20, 50, …, 40] (length p). Then, the merged implicit representation will be the concatenation of these three vectors: f merged = [f texture , f shape , f color ] = [0.2, 0.3, …, 0.1, 0.5, −0.2, …, 0.4, 20, 50, …, 40].
接下来,计算机系统对合并后的隐式表示进行适应性特征集成,以突出关键特征并减少冗余信息。这一步骤通常依赖于复杂的机器学习模型或算法,如自注意力机制(Self-Attention)或Transformer模型中的多头注意力(Multi-Head Attention)部分。然后,系统计算请求特征与所有索引特征之间的相似度分数(通常使用点积并应用缩放因子dk,其中dk是索引特征的维度),以决定每个输出特征对当前查询的贡献程度。这一步骤实际上是在执行一个加权求和操作,权重由相似度分数通过softmax函数归一化后得到。计算公式如下:Next, the computer system performs adaptive feature integration on the merged implicit representation to highlight key features and reduce redundant information. This step usually relies on complex machine learning models or algorithms, such as the Self-Attention mechanism or the Multi-Head Attention part in the Transformer model. The system then calculates the similarity score between the requested feature and all index features (usually using the dot product and applying a scaling factor d k , where d k is the dimension of the index feature) to determine the contribution of each output feature to the current query. This step is actually performing a weighted summation operation, with the weights obtained by normalizing the similarity scores through the softmax function. The calculation formula is as follows:
; ;
最终,系统得到一个新的向量,即适应性隐式表示,它更加聚焦于那些对桥梁结构损伤识别任务重要的特征。这个向量将作为后续步骤的输入,用于进一步的损伤识别分析。Finally, the system obtains a new vector, namely the adaptive implicit representation, which focuses more on the features that are important for the task of bridge structure damage identification. This vector will be used as input for further damage identification analysis in subsequent steps.
适应性隐式表示不仅整合了图像分块的多维度特征,还通过自注意力机制等高级技术强化了关键信息,使得后续的损伤识别模型能够更准确地捕捉桥梁结构的细微损伤迹象。The adaptive implicit representation not only integrates the multi-dimensional features of image blocks, but also strengthens key information through advanced techniques such as self-attention mechanism, so that the subsequent damage identification model can more accurately capture subtle damage signs of the bridge structure.
步骤S400:对所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的输出隐式表示进行逐级合并,得到所述图像分块对应的合并隐式表示;所述输出隐式表示是通过所述适应性隐式表示和所述合并隐式表示进行获取,所述图像块连通性矩阵用以表征所述图像分块之间的相似度。Step S400: The image block connectivity matrix corresponding to the machine vision image and the output implicit representation corresponding to the image block are merged step by step to obtain a merged implicit representation corresponding to the image block; the output implicit representation is obtained through the adaptive implicit representation and the merged implicit representation, and the image block connectivity matrix is used to characterize the similarity between the image blocks.
在步骤S400旨在利用图像块之间的连通性信息来进一步增强对桥梁结构损伤特征的理解,从而为后续的损伤识别提供更加全面和准确的信息。Step S400 aims to utilize the connectivity information between image blocks to further enhance the understanding of the damage characteristics of the bridge structure, thereby providing more comprehensive and accurate information for subsequent damage identification.
首先,计算机系统构建一个图像块连通性矩阵,该矩阵是一个二维数组,其行数和列数均等于图像分块的数量。矩阵中的每个元素代表了对应两个图像分块之间的相似度或连通性。这种相似度可以通过多种方式计算,比如基于图像分块之间的空间距离、颜色直方图的相似度、纹理特征的匹配程度等。例如,相邻的图像分块在连通性矩阵中对应元素的值较高,而不相邻的分块则值较低或为零。计算机系统利用图像块连通性矩阵和每个图像分块的输出隐式表示进行逐级合并。这一过程例如涉及图神经网络(Graph Neural Network,GNN)或类似的图处理模型,因为这些模型能够自然地处理图结构数据,其中节点(在这里是图像分块)通过边(连通性)相互连接。具体来说,系统可以首先使用图卷积网络(GraphConvolutional Network, GCN)作为合并子模块的一部分。GCN通过聚合每个节点(图像分块)的邻居节点的特征信息来更新该节点的特征表示。这一过程可以迭代进行多次,每次迭代都会根据当前的连通性矩阵和节点特征来更新节点特征,从而实现特征的逐级合并。本申请实施例中,通过多次迭代这样的图卷积操作,系统能够捕捉到图像分块之间复杂的空间关系,并将这些关系融入到每个分块的特征表示中,最终得到合并隐式表示。这些合并隐式表示不仅包含了分块自身的特征信息,还融入了周围分块的影响,从而更加全面地反映了桥梁结构的整体状态。First, the computer system constructs an image patch connectivity matrix, which is a two-dimensional array with the number of rows and columns equal to the number of image patches. Each element in the matrix represents the similarity or connectivity between the corresponding two image patches. This similarity can be calculated in many ways, such as based on the spatial distance between image patches, the similarity of color histograms, the matching degree of texture features, etc. For example, adjacent image patches have higher values for the corresponding elements in the connectivity matrix, while non-adjacent patches have lower values or zero. The computer system uses the image patch connectivity matrix and the output implicit representation of each image patch to merge level by level. This process involves, for example, a graph neural network (GNN) or similar graph processing models, because these models can naturally process graph structured data, where nodes (image patches in this case) are connected to each other through edges (connectivity). Specifically, the system can first use a graph convolutional network (GCN) as part of the merging submodule. GCN updates the feature representation of each node (image patch) by aggregating the feature information of its neighboring nodes. This process can be iterated multiple times, and each iteration will update the node features according to the current connectivity matrix and node features, thereby realizing the step-by-step merging of features. In the embodiment of the present application, by iterating such graph convolution operations multiple times, the system can capture the complex spatial relationships between image blocks, and incorporate these relationships into the feature representation of each block, and finally obtain a merged implicit representation. These merged implicit representations not only contain the feature information of the block itself, but also incorporate the influence of the surrounding blocks, thereby more comprehensively reflecting the overall state of the bridge structure.
步骤S500:根据所述合并隐式表示对所述图像分块进行桥梁结构损伤识别,得到图像分块损伤识别结果,根据各个图像分块的图像分块损伤识别结果得到所述待识别桥梁结构的损伤识别结果。Step S500: performing bridge structure damage identification on the image blocks according to the combined implicit representation to obtain image block damage identification results, and obtaining damage identification results of the bridge structure to be identified according to the image block damage identification results of each image block.
在步骤S500中,计算机系统利用前几步处理得到的合并隐式表示,对桥梁结构的图像分块进行详细的损伤识别分析。这一过程是整个机器视觉桥梁结构损伤识别流程的核心环节,它直接决定了识别结果的准确性和可靠性。首先,计算机系统将每个图像分块的合并隐式表示输入到一个预先训练好的损伤识别模型中。这个模型通常是一个复杂的机器学习模型,如深度神经网络(DNN)、卷积神经网络(CNN)或支持向量机(SVM)等,具体选择取决于数据的特性和识别任务的需求。本申请实施例中,由于图像数据具有高度的空间相关性,因此CNN是一个常见的选择。In step S500, the computer system uses the merged implicit representation obtained from the previous steps to perform detailed damage identification analysis on the image blocks of the bridge structure. This process is the core link of the entire machine vision bridge structure damage identification process, which directly determines the accuracy and reliability of the identification results. First, the computer system inputs the merged implicit representation of each image block into a pre-trained damage identification model. This model is usually a complex machine learning model, such as a deep neural network (DNN), a convolutional neural network (CNN), or a support vector machine (SVM). The specific choice depends on the characteristics of the data and the requirements of the recognition task. In the embodiment of the present application, since the image data has a high degree of spatial correlation, CNN is a common choice.
损伤识别模型的任务是对每个图像分块的合并隐式表示进行分类或回归,以判断该分块是否存在损伤,并可能进一步对损伤的类型、程度等进行细分。对于分类任务,模型会输出每个分块属于不同损伤类别的概率;对于回归任务,则可能直接输出损伤程度的量化指标。例如,如果采用CNN进行分类,模型的最后一层可以是一个softmax层,它输出每个分块属于“无损伤”、“轻微裂纹”、“严重裂纹”等类别的概率分布。而回归模型则可能直接输出一个表示损伤严重程度的连续值。The task of the damage recognition model is to classify or regress the merged implicit representation of each image block to determine whether the block is damaged, and may further subdivide the type and degree of damage. For classification tasks, the model outputs the probability that each block belongs to different damage categories; for regression tasks, it may directly output a quantitative indicator of the degree of damage. For example, if CNN is used for classification, the last layer of the model can be a softmax layer, which outputs the probability distribution of each block belonging to categories such as "no damage", "slight cracks", and "severe cracks". The regression model may directly output a continuous value representing the severity of the damage.
在得到每个图像分块的损伤识别结果后,计算机系统进一步将这些局部结果聚合起来,以形成对整个桥梁结构的整体损伤评估。这例如涉及对分块结果的加权平均、投票机制或更复杂的融合策略。After obtaining the damage identification results for each image block, the computer system further aggregates these local results to form an overall damage assessment of the entire bridge structure. This involves, for example, weighted averaging of the block results, voting mechanisms, or more complex fusion strategies.
例如,系统可以根据图像分块在桥梁结构中的位置和重要性赋予不同的权重,然后对加权后的损伤识别结果进行汇总。或者,系统可以采用一种空间平滑技术,以减少由于图像噪声或识别误差导致的局部异常值对整体结果的影响。For example, the system can assign different weights to image blocks according to their location and importance in the bridge structure, and then summarize the weighted damage identification results. Alternatively, the system can use a spatial smoothing technique to reduce the impact of local outliers caused by image noise or identification errors on the overall results.
最后,计算机系统将聚合后的损伤识别结果整理成易于理解的报告格式,输出给桥梁管理部门或维护人员。这份报告应包含桥梁结构的整体损伤情况、具体损伤位置、损伤类型及程度等信息,以便管理部门采取相应的维护和修复措施。Finally, the computer system organizes the aggregated damage identification results into an easy-to-understand report format and outputs it to the bridge management department or maintenance personnel. This report should include information such as the overall damage of the bridge structure, specific damage location, damage type and degree, so that the management department can take appropriate maintenance and repair measures.
损伤识别模型示例可以采用一个基于CNN的分类模型,该模型在训练阶段学习了大量桥梁结构损伤图像的特征表示与损伤类别之间的映射关系。对于每个图像分块,模型输出一个概率分布,如[0.1, 0.8, 0.1],表示该分块有10%的概率无损伤,80%的概率有轻微裂纹,10%的概率有严重裂纹。系统根据这些概率值判断分块的损伤情况,并据此生成整体损伤识别报告。通过以上步骤,计算机系统能够准确、高效地识别出桥梁结构的损伤情况,为桥梁的安全运行提供有力保障。The damage identification model example can use a CNN-based classification model, which learns the mapping relationship between the feature representation of a large number of bridge structure damage images and damage categories during the training phase. For each image block, the model outputs a probability distribution, such as [0.1, 0.8, 0.1], indicating that the block has a 10% probability of no damage, an 80% probability of slight cracks, and a 10% probability of severe cracks. The system determines the damage of the block based on these probability values and generates an overall damage identification report based on this. Through the above steps, the computer system can accurately and efficiently identify the damage of the bridge structure, providing a strong guarantee for the safe operation of the bridge.
在本申请的一种实现方案中,t个所述图像隐式表示包括纹理隐式表示;那么,基于此,所述步骤S200,对机器视觉图像中的图像分块进行图像隐式表示提取,得到所述图像分块对应的t个图像隐式表示,可以包括:In an implementation of the present application, the t implicit image representations include texture implicit representations; based on this, the step S200 of extracting the image implicit representations from the image blocks in the machine vision image to obtain the t implicit image representations corresponding to the image blocks may include:
步骤S1210:将机器视觉图像传入桥梁结构损伤识别模型的图像隐式表示提取模块;Step S1 210: transmitting the machine vision image to the image implicit representation extraction module of the bridge structure damage recognition model;
步骤S1220:在所述图像隐式表示提取模块中,获取所述机器视觉图像中的图像分块的w个纹理单元,w∈[1,+∞);w个所述纹理单元用以表示所述图像分块的纹理;Step S1 220: in the image implicit representation extraction module, obtaining w texture units of the image block in the machine vision image, w∈[1,+∞); the w texture units are used to represent the texture of the image block;
步骤S1230:获取w个所述纹理单元分别对应的纹理单元隐式表示,对w个所述纹理单元隐式表示分别进行特征提取,得到w个所述纹理单元隐式表示分别对应的纹理提取隐式表示;Step S1 230: Obtaining texture unit implicit representations corresponding to the w texture units respectively, performing feature extraction on the w texture unit implicit representations respectively, and obtaining texture extraction implicit representations corresponding to the w texture unit implicit representations respectively;
步骤S1240:对w个所述纹理提取隐式表示进行特征降维,得到所述图像分块对应的所述纹理隐式表示。Step S1 240: performing feature dimension reduction on the w texture extraction implicit representations to obtain the texture implicit representation corresponding to the image block.
步骤S1210中,计算机系统将采集到的机器视觉图像传递给一个专门设计的桥梁结构损伤识别模型。这个模型是一个复杂的机器学习系统,能够处理图像数据并识别出桥梁结构中的损伤迹象。为了专注于纹理特征的提取,模型中包含了一个特定的图像隐式表示提取模块。In step S1210 , the computer system passes the collected machine vision image to a specially designed bridge structure damage identification model. This model is a complex machine learning system that can process image data and identify signs of damage in the bridge structure. In order to focus on the extraction of texture features, a specific image implicit representation extraction module is included in the model.
例如,假设采用的是一种基于卷积神经网络(CNN)的桥梁结构损伤识别模型。CNN因其强大的图像特征提取能力而在计算机视觉领域得到了广泛应用。在这个模型中,图像隐式表示提取模块可以是一个或多个卷积层的组合,它们负责对输入的机器视觉图像进行初步处理,以生成后续分析所需的特征图。For example, suppose a bridge structure damage recognition model based on convolutional neural network (CNN) is used. CNN has been widely used in the field of computer vision due to its powerful image feature extraction capabilities. In this model, the image implicit representation extraction module can be a combination of one or more convolutional layers, which are responsible for preliminary processing of the input machine vision image to generate the feature map required for subsequent analysis.
图像隐式表示提取模块的实现依赖于深度学习框架(如TensorFlow、PyTorch等),这些框架提供了构建和训练神经网络所需的所有工具。在CNN中,该模块可能包含多个卷积层、激活层(如ReLU)、池化层等,它们以序列化的方式堆叠在一起。卷积层通过滑动窗口和卷积核与输入图像进行卷积操作,提取图像的局部特征;激活层引入非线性因素,使得网络能够学习复杂的模式;池化层则通过下采样减少特征图的尺寸,降低计算量并增加特征的鲁棒性。The implementation of the implicit representation extraction module for images relies on deep learning frameworks (such as TensorFlow, PyTorch, etc.), which provide all the tools needed to build and train neural networks. In CNN, this module may contain multiple convolutional layers, activation layers (such as ReLU), pooling layers, etc., which are stacked together in a serialized manner. The convolutional layer performs convolution operations with the input image through sliding windows and convolution kernels to extract local features of the image; the activation layer introduces nonlinear factors, allowing the network to learn complex patterns; the pooling layer reduces the size of the feature map by downsampling, reducing the amount of calculation and increasing the robustness of the features.
在图像被传入模型后,步骤S1220中,接下来的任务是在图像隐式表示提取模块中对其进行分块处理,并从每个分块中提取出代表其纹理特性的纹理单元。纹理单元是图像中具有特定纹理模式的小块区域,它们共同构成了图像分块的纹理特征。为了有效地提取纹理单元,系统首先需要将机器视觉图像划分为多个重叠或非重叠的图像分块。这些分块的大小和步长可以根据具体任务进行调整,以确保既能够覆盖整个图像,又能够捕捉到足够的局部细节。例如,可以设置一个固定大小的滑动窗口(如64x64像素),以一定的步长(如32像素)在图像上滑动,从而生成一系列图像分块。在得到图像分块后,系统从中识别出具有代表性的纹理单元。这通常涉及对分块内的像素值进行统计分析或模式匹配。一种简单的方法是通过聚类算法(如K-means)将分块内的像素值聚集成几个簇,每个簇对应一个纹理单元。然而,在更复杂的场景中,可能需要采用更高级的纹理分析技术,如局部二值模式(LBP)、灰度共生矩阵(GLCM)或基于深度学习的纹理描述符。例如,假设在某个图像分块中,系统识别出了三个纹理单元:一个代表混凝土表面的粗糙纹理、一个代表钢筋锈蚀的斑驳纹理以及一个可能由裂纹引起的线性纹理。这些纹理单元通过它们的灰度级分布、边缘特性或统计属性与其他区域区分开来。After the image is passed into the model, in step S1 220, the next task is to process it in blocks in the image implicit representation extraction module and extract texture units representing its texture characteristics from each block. Texture units are small areas with specific texture patterns in the image, which together constitute the texture features of the image block. In order to effectively extract texture units, the system first needs to divide the machine vision image into multiple overlapping or non-overlapping image blocks. The size and step size of these blocks can be adjusted according to the specific task to ensure that the entire image can be covered and sufficient local details can be captured. For example, a fixed-size sliding window (such as 64x64 pixels) can be set to slide on the image with a certain step size (such as 32 pixels) to generate a series of image blocks. After obtaining the image blocks, the system identifies representative texture units from them. This usually involves statistical analysis or pattern matching of the pixel values in the blocks. A simple method is to cluster the pixel values in the blocks into several clusters through a clustering algorithm (such as K-means), each cluster corresponding to a texture unit. However, in more complex scenes, more advanced texture analysis techniques may be required, such as local binary patterns (LBP), gray-level co-occurrence matrices (GLCM), or deep learning-based texture descriptors. For example, suppose that in an image patch, the system identifies three texture units: a rough texture representing the concrete surface, a mottled texture representing the corrosion of the steel bar, and a linear texture that may be caused by a crack. These texture units are distinguished from other areas by their gray-level distribution, edge characteristics, or statistical properties.
在识别出纹理单元后,步骤S1230为每个纹理单元生成一个隐式表示,并通过特征提取过程来增强其代表性。隐式表示是一种抽象的数据结构,用于捕捉纹理单元的本质特性。纹理单元的隐式表示可以是多种形式的,如直方图、矩特征、纹理描述符等。这些表示方法旨在将纹理单元的复杂视觉模式简化为易于处理和比较的数学对象。例如,可以使用灰度共生矩阵(GLCM)来生成纹理描述符,该描述符通过统计像素对之间的灰度级依赖性来反映纹理的粗细、方向性、对比度等特性。在得到纹理单元的隐式表示后,系统对其进行进一步的特征提取,以生成更加紧凑和具有区分度的纹理提取隐式表示。这通常涉及应用卷积操作、池化操作或其他类型的特征变换。然而,在这一步骤中,由于纹理单元本身已经相对较小且特征明确,因此可能不需要像处理整个图像那样复杂的卷积神经网络。相反,可以使用一些简单的特征提取方法,如局部特征聚合描述符(LBP直方图)、SIFT特征等。为了保持与整体流程的一致性,并展示深度学习技术在特征提取中的应用潜力,仍然可以假设系统使用了一个轻量级的卷积神经网络来对纹理单元的隐式表示进行特征提取。这个网络可能只包含少数几个卷积层和池化层,用于捕捉纹理单元中的关键特征。After the texture units are identified, step S1 230 generates an implicit representation for each texture unit and enhances its representativeness through a feature extraction process. The implicit representation is an abstract data structure used to capture the essential characteristics of the texture unit. The implicit representation of the texture unit can be in various forms, such as histograms, moment features, texture descriptors, etc. These representation methods aim to simplify the complex visual patterns of the texture unit into mathematical objects that are easy to process and compare. For example, a gray level co-occurrence matrix (GLCM) can be used to generate a texture descriptor, which reflects the coarseness, directionality, contrast and other characteristics of the texture by counting the gray level dependencies between pixel pairs. After obtaining the implicit representation of the texture unit, the system further extracts features from it to generate a more compact and discriminative implicit representation of the texture extraction. This usually involves applying convolution operations, pooling operations or other types of feature transformations. However, in this step, since the texture unit itself is relatively small and has clear features, a convolutional neural network as complex as that for processing the entire image may not be required. Instead, some simple feature extraction methods such as local feature aggregation descriptors (LBP histograms), SIFT features, etc. can be used. In order to maintain consistency with the overall process and demonstrate the potential of deep learning techniques in feature extraction, it can still be assumed that the system uses a lightweight convolutional neural network to extract features from the implicit representation of texture units. This network may only contain a few convolutional and pooling layers to capture the key features in the texture units.
例如,假设对于之前识别的三个纹理单元(混凝土粗糙纹理、钢筋锈蚀纹理、裂纹纹理),系统分别生成了它们的隐式表示(如GLCM描述符)。然后,这些描述符被输入到一个轻量级的卷积神经网络中进行特征提取。网络通过卷积层学习到了每个纹理单元中独特的空间模式和统计特性,并输出了更加紧凑和具有区分度的纹理提取隐式表示。For example, suppose that for the three texture units previously identified (concrete roughness, steel rust, crack texture), the system generates their implicit representations (such as GLCM descriptors). These descriptors are then input into a lightweight convolutional neural network for feature extraction. The network learns the unique spatial patterns and statistical properties of each texture unit through the convolutional layer, and outputs a more compact and discriminative implicit representation of texture extraction.
在得到每个纹理单元的纹理提取隐式表示后,步骤S1240中,系统将这些高维特征向量合并成一个综合的纹理隐式表示,以代表整个图像分块的纹理特性。然而,由于直接合并可以导致特征维度过高且存在冗余信息,因此通常需要进行特征降维处理。After obtaining the texture extraction implicit representation of each texture unit, the system merges these high-dimensional feature vectors into a comprehensive texture implicit representation to represent the texture characteristics of the entire image block in step S1 240. However, since direct merging can lead to excessively high feature dimensions and redundant information, feature dimensionality reduction processing is usually required.
特征降维是一种减少特征空间维度的方法,旨在保留原始数据中的重要信息同时去除噪声和冗余。本申请实施例中,常用的特征降维方法包括主成分分析(PCA)、线性判别分析(LDA)、池化以及非线性降维技术(如t-SNE、ISOMAP等),具体不做限定。例如,假设经过特征提取后,每个纹理单元对应了一个128维的纹理提取隐式表示。系统应用PCA对这些表示进行降维处理,并选择了前32个主成分来近似表示原始数据。这样,每个图像分块就被映射到了一个32维的低维空间中,其纹理隐式表示也随之变为一个32维的向量。这个向量综合了图像分块内所有纹理单元的特征信息,并去除了冗余和噪声部分,为后续的损伤识别任务提供了更加紧凑和有效的特征表示。Feature dimension reduction is a method of reducing the dimension of feature space, aiming to retain important information in the original data while removing noise and redundancy. In the embodiments of the present application, commonly used feature dimension reduction methods include principal component analysis (PCA), linear discriminant analysis (LDA), pooling, and nonlinear dimension reduction techniques (such as t-SNE, ISOMAP, etc.), which are not specifically limited. For example, assume that after feature extraction, each texture unit corresponds to a 128-dimensional texture extraction implicit representation. The system applies PCA to reduce the dimensionality of these representations and selects the first 32 principal components to approximate the original data. In this way, each image block is mapped to a 32-dimensional low-dimensional space, and its implicit texture representation is also transformed into a 32-dimensional vector. This vector integrates the feature information of all texture units in the image block and removes redundant and noisy parts, providing a more compact and effective feature representation for subsequent damage identification tasks.
在本申请的第二种实现方案中,t个所述图像隐式表示还可以包括形状隐式表示;那么,基于此,所述步骤S200,对机器视觉图像中的图像分块进行图像隐式表示提取,得到所述图像分块对应的t个图像隐式表示,可以包括:In a second implementation of the present application, the t implicit image representations may further include implicit shape representations; based on this, the step S200, extracting implicit image representations from image blocks in the machine vision image to obtain t implicit image representations corresponding to the image blocks, may include:
步骤S2210:将机器视觉图像传入桥梁结构损伤识别模型的图像隐式表示提取模块;Step S2 210: transmitting the machine vision image to the image implicit representation extraction module of the bridge structure damage recognition model;
步骤S2220:在所述图像隐式表示提取模块中,获取所述机器视觉图像中的图像分块的r个区域形状,r∈[1,+∞);r个所述区域形状用以表示包括不同形状的图像分块;Step S2 220: in the image implicit representation extraction module, obtaining r region shapes of the image blocks in the machine vision image, r∈[1,+∞); the r region shapes are used to represent image blocks of different shapes;
步骤S2230:对r个所述区域形状分别进行特征提取,得到r个所述区域形状分别对应的图形隐式表示;Step S2 230: extracting features of the r region shapes respectively to obtain implicit graphical representations corresponding to the r region shapes respectively;
步骤S2240:对r个所述图形隐式表示进行特征降维,得到所述图像分块对应的所述形状隐式表示。Step S2 240: performing feature dimension reduction on the r implicit representations of the graphics to obtain the implicit representation of the shape corresponding to the image block.
步骤S2210可以参考前述步骤S1210,在图像被传入模型后,下一步是在图像隐式表示提取模块内部对图像进行进一步处理,以识别出图像中的不同形状区域。这些区域形状可能对应于桥梁结构中的具体部件(如桥墩、主梁截面)或损伤迹象(如裂纹、变形区域)。Step S2210 can refer to the aforementioned step S1210 . After the image is passed into the model, the next step is to further process the image within the image implicit representation extraction module to identify different shape regions in the image. These region shapes may correspond to specific components in the bridge structure (such as piers, main beam sections) or damage signs (such as cracks, deformed areas).
为了有效地识别出图像中的区域形状,计算机系统可以采用多种图像处理技术。一种常见的方法是结合边缘检测算法(如Canny边缘检测器)和形状匹配算法。首先,通过边缘检测算法提取出图像中的边缘信息,这些边缘信息通常对应于物体的轮廓。然后,利用形状匹配算法(如霍夫变换、形状上下文算法等)将提取出的边缘组合成具体的形状区域。另外,随着深度学习技术的发展,基于卷积神经网络的形状分割模型也逐渐成为主流。这些模型能够直接从原始图像中分割出不同的形状区域,而无需显式的边缘检测步骤。例如,U-Net、Mask R-CNN等网络结构在医学图像分割领域取得了显著成效,同样也可以应用于桥梁结构图像的形状分割任务中。In order to effectively identify the shape of the region in the image, the computer system can use a variety of image processing techniques. A common method is to combine edge detection algorithms (such as Canny edge detector) and shape matching algorithms. First, the edge information in the image is extracted by the edge detection algorithm, which usually corresponds to the outline of the object. Then, the extracted edges are combined into specific shape regions using shape matching algorithms (such as Hough transform, shape context algorithm, etc.). In addition, with the development of deep learning technology, shape segmentation models based on convolutional neural networks have gradually become mainstream. These models can directly segment different shape regions from the original image without explicit edge detection steps. For example, network structures such as U-Net and Mask R-CNN have achieved remarkable results in the field of medical image segmentation, and can also be applied to the shape segmentation task of bridge structure images.
在桥梁图像中,系统可以先通过边缘检测算法识别出桥墩、主梁等关键部件的边缘轮廓。然后,利用形状匹配算法将这些轮廓组合成具体的形状区域。或者,如果采用了深度学习模型进行形状分割,系统则直接输出图像中的不同形状区域。这些区域形状可能包括规则的几何形状(如矩形、圆形)和不规则的裂纹形状等。In the bridge image, the system can first identify the edge contours of key components such as piers and main beams through edge detection algorithms. Then, these contours are combined into specific shape regions using shape matching algorithms. Alternatively, if a deep learning model is used for shape segmentation, the system directly outputs different shape regions in the image. These region shapes may include regular geometric shapes (such as rectangles, circles) and irregular crack shapes.
在识别出图像中的区域形状后,步骤S2230对这些形状区域进行特征提取,以生成对应的图形隐式表示。图形隐式表示是一种能够描述形状区域几何特性和空间分布的高层次特征向量或矩阵。对于每个识别出的区域形状,计算机系统可以采用多种特征提取方法来生成其图形隐式表示。这些方法包括但不限于:After identifying the regional shapes in the image, step S2 230 performs feature extraction on these shape regions to generate corresponding graphic implicit representations. Graphic implicit representation is a high-level feature vector or matrix that can describe the geometric characteristics and spatial distribution of the shape region. For each identified regional shape, the computer system can use a variety of feature extraction methods to generate its graphic implicit representation. These methods include but are not limited to:
几何特征提取:计算形状区域的周长、面积、边界框尺寸等基本几何参数。这些参数直接反映了形状的大小和尺度信息。Geometric feature extraction: Calculate the basic geometric parameters of the shape area, such as the perimeter, area, and bounding box size. These parameters directly reflect the size and scale information of the shape.
矩特征提取:利用Hu矩、Zernike矩等矩不变量来描述形状的旋转、缩放和平移不变性。这些矩特征对于识别不同视角下的相同形状非常有用。Moment feature extraction: Moment invariants such as Hu moments and Zernike moments are used to describe the rotation, scaling, and translation invariance of shapes. These moment features are very useful for identifying the same shape from different perspectives.
轮廓特征提取:通过傅里叶描述符、链码等方法对形状轮廓进行编码。这些方法能够捕捉轮廓的曲率和方向变化信息。Contour feature extraction: Encode the shape contour through Fourier descriptors, chain codes, etc. These methods can capture the curvature and direction change information of the contour.
深度学习特征提取:利用卷积神经网络对形状区域进行特征学习。网络通过多层卷积和池化操作提取出形状的高级语义特征,并输出一个紧凑的特征向量作为图形隐式表示。Deep learning feature extraction: Convolutional neural networks are used to learn features of shape regions. The network extracts high-level semantic features of the shape through multi-layer convolution and pooling operations, and outputs a compact feature vector as an implicit representation of the graph.
例如,对于桥梁图像中的每个形状区域(如桥墩、裂纹区域等),系统可以首先计算其基本几何参数(如周长、面积)作为初步特征。然后,利用Hu矩提取形状的不变特征,以应对图像旋转和缩放等问题。如果采用了深度学习模型进行特征提取,则系统直接利用预训练的卷积神经网络对形状区域进行处理,并输出一个高维特征向量作为该区域的图形隐式表示。For example, for each shape region in a bridge image (such as a bridge pier, crack region, etc.), the system can first calculate its basic geometric parameters (such as perimeter, area) as preliminary features. Then, the Hu moment is used to extract the invariant features of the shape to cope with problems such as image rotation and scaling. If a deep learning model is used for feature extraction, the system directly uses a pre-trained convolutional neural network to process the shape region and output a high-dimensional feature vector as a graphical implicit representation of the region.
在提取出每个形状区域的图形隐式表示后,由于这些表示可能具有高维性且存在冗余信息,因此需要步骤S2240进行特征降维处理以生成紧凑的形状隐式表示。After the graphic implicit representation of each shape region is extracted, since these representations may have high dimensionality and contain redundant information, step S2 240 is required to perform feature dimensionality reduction processing to generate a compact shape implicit representation.
在本申请的第三种实现方案中,t个所述图像隐式表示还包括颜色隐式表示;那么,基于此,所述步骤S200,对机器视觉图像中的图像分块进行图像隐式表示提取,得到所述图像分块对应的t个图像隐式表示,可以包括:In a third implementation scheme of the present application, the t implicit image representations also include color implicit representations; based on this, the step S200, extracting the image implicit representations from the image blocks in the machine vision image to obtain the t implicit image representations corresponding to the image blocks, may include:
步骤S3210:将机器视觉图像传入桥梁结构损伤识别模型的图像隐式表示提取模块;Step S3 210: transmitting the machine vision image to the image implicit representation extraction module of the bridge structure damage recognition model;
步骤S3220:在所述图像隐式表示提取模块中,对所述机器视觉图像进行分块划分,得到所述机器视觉图像中的g个图像分块,g∈[1,+∞);g个所述图像分块用以融合为所述机器视觉图像,g个所述图像分块为所述机器视觉图像中的不同色块的图像分块;Step S3 220: In the image implicit representation extraction module, the machine vision image is divided into blocks to obtain g image blocks in the machine vision image, g∈[1,+∞); the g image blocks are used to merge into the machine vision image, and the g image blocks are image blocks of different color blocks in the machine vision image;
步骤S3230:在g个所述图像分块分别对应的颜色隐式表示中获取目标图像分块对应的颜色隐式表示,将所述目标图像分块对应的颜色隐式表示确定成所述机器视觉图像中的图像分块对应的所述颜色隐式表示。Step S3 230: Obtain the color implicit representation corresponding to the target image block from the color implicit representations corresponding to the g image blocks, and determine the color implicit representation corresponding to the target image block as the color implicit representation corresponding to the image block in the machine vision image.
步骤S3210可以参考前述步骤S1210,步骤S3220中,在图像隐式表示提取模块中,对机器视觉图像进行分块划分,得到机器视觉图像中的g个图像分块。由于桥梁结构图像通常尺寸较大且包含丰富的细节信息,直接对整个图像进行颜色特征提取可能因为计算量过大而导致效率低下。因此,在提取颜色隐式表示之前,对图像进行分块划分。Step S3210 can refer to the aforementioned step S1210 . In step S3220 , in the image implicit representation extraction module, the machine vision image is divided into blocks to obtain g image blocks in the machine vision image. Since the bridge structure image is usually large in size and contains rich detail information, directly extracting color features from the entire image may be inefficient due to excessive calculation. Therefore, before extracting the color implicit representation, the image is divided into blocks.
计算机系统可以采用多种方法对图像进行分块划分。一种简单且常用的方法是均匀网格划分,即将图像划分为多个大小相同、排列整齐的小块(即图像分块)。每个图像分块包含了图像的一部分内容,且彼此之间存在重叠区域以确保信息的连续性。另外,也可以根据图像内容自适应地进行分块划分,比如基于图像边缘、纹理或颜色的变化来动态确定分块的大小和位置。Computer systems can use a variety of methods to divide images into blocks. A simple and commonly used method is uniform grid division, which divides the image into multiple small blocks of equal size and neatly arranged (i.e., image blocks). Each image block contains part of the image content, and there are overlapping areas between them to ensure the continuity of information. In addition, block division can also be performed adaptively based on the image content, such as dynamically determining the size and position of the blocks based on changes in image edges, textures, or colors.
在得到图像分块后,步骤S3230为每个分块生成对应的颜色隐式表示。然而,在第三种实现方案中,并不直接对每个分块的颜色特征进行独立提取和表示;相反,关注的是如何从所有分块的颜色特征中识别出与特定损伤迹象(如锈蚀)相关的目标分块,并获取其对应的颜色隐式表示。After the image blocks are obtained, the corresponding color implicit representation is generated for each block in step S3 230. However, in the third implementation, the color features of each block are not directly extracted and represented independently; instead, the focus is on how to identify the target block related to a specific damage sign (such as rust) from the color features of all blocks and obtain its corresponding color implicit representation.
为了提取颜色隐式表示,计算机系统可以采用多种颜色特征描述方法。这些方法包括但不限于颜色直方图、颜色矩、颜色聚合向量等。然而,在桥梁结构损伤识别的特定场景中,可能更关注于能够反映锈蚀现象的颜色特征。因此,可以采用基于颜色空间转换和颜色聚类的方法来识别出与锈蚀相关的颜色区域。首先,系统可以将图像分块从RGB颜色空间转换到更适合进行锈蚀检测的颜色空间(如HSV或Lab颜色空间)。在这些颜色空间中,锈蚀现象通常表现为特定范围内的色调(Hue)和饱和度(Saturation)值。然后,系统可以对转换后的颜色值进行聚类分析,将具有相似颜色的像素点归为一类,从而识别出与锈蚀相关的颜色区域。接下来,为了生成颜色隐式表示,系统可以对识别出的锈蚀区域进行特征提取。这些特征可能包括锈蚀区域的平均颜色值、颜色分布的方差、锈蚀区域的面积占比等。这些特征共同构成了锈蚀区域的颜色隐式表示,用于后续的损伤识别任务。然而,在实际应用中,可能并不需要对每个图像分块都进行完整的颜色隐式表示提取过程。相反,可以采用一种更高效的方法:首先通过快速的颜色筛选和区域生长算法大致定位出可能的锈蚀区域(即目标图像分块),然后仅对这些区域进行详细的颜色隐式表示提取。In order to extract the implicit color representation, the computer system can use a variety of color feature description methods. These methods include but are not limited to color histogram, color moment, color aggregation vector, etc. However, in the specific scenario of bridge structure damage identification, more attention may be paid to color features that can reflect the corrosion phenomenon. Therefore, a method based on color space conversion and color clustering can be used to identify the color area related to corrosion. First, the system can convert the image blocks from the RGB color space to a color space that is more suitable for corrosion detection (such as HSV or Lab color space). In these color spaces, the corrosion phenomenon is usually manifested as hue and saturation values within a specific range. Then, the system can perform cluster analysis on the converted color values, classify pixels with similar colors into one category, and thus identify the color area related to corrosion. Next, in order to generate the implicit color representation, the system can extract features from the identified rusted area. These features may include the average color value of the rusted area, the variance of the color distribution, the area proportion of the rusted area, etc. These features together constitute the color implicit representation of the rusted area for subsequent damage identification tasks. However, in practical applications, it may not be necessary to perform a complete color implicit representation extraction process for each image block. Instead, a more efficient method can be adopted: first, the possible rust areas (i.e., target image blocks) are roughly located through a fast color screening and region growing algorithm, and then only these areas are subjected to detailed color implicit representation extraction.
在桥梁结构图像中,系统首先遍历所有图像分块,利用颜色筛选算法(如基于阈值的颜色范围判断)快速定位出可能包含锈蚀现象的分块。这些分块被标记为目标图像分块,并作为后续颜色隐式表示提取的重点对象。然后,系统对每个目标图像分块进行颜色空间转换和颜色聚类分析,识别出具体的锈蚀区域。最后,系统对锈蚀区域进行特征提取和编码处理(如使用颜色直方图或颜色矩作为颜色隐式表示),生成每个目标图像分块对应的颜色隐式表示。In the bridge structure image, the system first traverses all image blocks and uses color screening algorithms (such as threshold-based color range judgment) to quickly locate blocks that may contain rust. These blocks are marked as target image blocks and serve as the focus of subsequent color implicit representation extraction. Then, the system performs color space conversion and color clustering analysis on each target image block to identify specific rusted areas. Finally, the system performs feature extraction and encoding processing on the rusted area (such as using color histograms or color moments as color implicit representations) to generate color implicit representations corresponding to each target image block.
需要注意的是,在实际应用中可能存在多个目标图像分块包含相似的锈蚀现象。为了综合评估整个桥梁结构的锈蚀情况,系统可以对所有目标图像分块的颜色隐式表示进行进一步的聚合或融合处理(如通过加权平均、最大池化等操作),生成一个全局的颜色隐式表示作为桥梁结构损伤识别的依据。It should be noted that in actual applications, there may be multiple target image blocks containing similar corrosion phenomena. In order to comprehensively evaluate the corrosion of the entire bridge structure, the system can further aggregate or fuse the color implicit representations of all target image blocks (such as through weighted averaging, maximum pooling, etc.) to generate a global color implicit representation as the basis for bridge structure damage identification.
在本申请的第四种实现方案中,t个所述图像隐式表示还包括语义隐式表示;那么,基于此,所述步骤S200,对机器视觉图像中的图像分块进行图像隐式表示提取,得到所述图像分块对应的t个图像隐式表示,可以包括:In a fourth implementation scheme of the present application, the t image implicit representations further include semantic implicit representations; based on this, the step S200 of extracting image implicit representations from image blocks in the machine vision image to obtain t image implicit representations corresponding to the image blocks may include:
步骤S4210:将机器视觉图像传入桥梁结构损伤识别模型的图像隐式表示提取模块;Step S4 210: transmitting the machine vision image to the image implicit representation extraction module of the bridge structure damage recognition model;
步骤S4220:在所述图像隐式表示提取模块中,对所述机器视觉图像中的图像分块进行图像语义表示,得到所述图像分块对应的初始语义隐式表示;Step S4 220: In the image implicit representation extraction module, image semantic representation is performed on the image blocks in the machine vision image to obtain initial semantic implicit representations corresponding to the image blocks;
步骤S4230:在所述机器视觉图像中确定所述图像分块的图像坐标,对所述图像分块的图像坐标进行坐标信息编码,得到所述图像分块对应的坐标隐式表示;Step S4 230: determining the image coordinates of the image block in the machine vision image, encoding the image coordinates of the image block, and obtaining an implicit representation of the coordinates corresponding to the image block;
步骤S4240:获取所述图像分块对应的定位向量,对所述图像分块对应的初始语义隐式表示、所述图像分块对应的坐标隐式表示和所述图像分块对应的定位向量进行合并,得到所述图像分块对应的所述语义隐式表示。Step S4240 : Obtain the positioning vector corresponding to the image block, merge the initial semantic implicit representation corresponding to the image block, the coordinate implicit representation corresponding to the image block and the positioning vector corresponding to the image block, and obtain the semantic implicit representation corresponding to the image block.
步骤S4210可以参考前述步骤S1210,步骤S4220中,计算机系统对图像进行分块处理,并对每个图像分块进行语义分析,以获取其初始语义隐式表示。语义隐式表示不仅包含了图像分块的视觉特征信息,还尝试理解这些特征所代表的实际含义和上下文关系。为了获取图像分块的初始语义隐式表示,计算机系统可以采用多种方法。例如,使用预训练的卷积神经网络(CNN)来提取图像分块的视觉特征,利用深度学习模型(如ResNet、DenseNet等)来提取图像分块的视觉特征,并通过全连接层或注意力机制等组件将这些特征转换为初始语义隐式表示。Step S 4 210 can refer to the aforementioned step S 1 210. In step S 4 220, the computer system processes the image in blocks and performs semantic analysis on each image block to obtain its initial semantic implicit representation. The semantic implicit representation not only contains the visual feature information of the image block, but also attempts to understand the actual meaning and contextual relationship represented by these features. In order to obtain the initial semantic implicit representation of the image block, the computer system can adopt a variety of methods. For example, a pre-trained convolutional neural network (CNN) is used to extract the visual features of the image block, and a deep learning model (such as ResNet, DenseNet, etc.) is used to extract the visual features of the image block, and these features are converted into initial semantic implicit representations through components such as fully connected layers or attention mechanisms.
步骤S4230中,计算机系统确定每个图像分块在原始图像中的位置(即图像坐标),并对这些坐标信息进行编码以生成坐标隐式表示。为了对图像分块的坐标信息进行编码,计算机系统可以采用多种方法。一种方法是直接使用坐标值(如x, y坐标)作为编码的一部分。然而,这种方法在处理高维数据时可能不够有效,因为坐标值本身并不包含丰富的空间结构信息。或者,采用位置嵌入(Position Embedding)实现。或者,可以将图像分块的坐标值转换为二维网格索引(如将图像划分为多个小格子,并为每个格子分配一个唯一的索引号),然后将这些索引号转换为嵌入向量。这些嵌入向量可以通过查找表(Lookup Table)或嵌入层(Embedding Layer)来生成。In step S 4 230, the computer system determines the position of each image block in the original image (i.e., the image coordinates), and encodes the coordinate information to generate an implicit representation of the coordinates. In order to encode the coordinate information of the image blocks, the computer system can adopt a variety of methods. One method is to directly use the coordinate values (such as x, y coordinates) as part of the encoding. However, this method may not be effective enough when processing high-dimensional data because the coordinate values themselves do not contain rich spatial structure information. Alternatively, position embedding is used. Alternatively, the coordinate values of the image blocks can be converted into two-dimensional grid indexes (such as dividing the image into multiple small grids and assigning a unique index number to each grid), and then these index numbers are converted into embedded vectors. These embedded vectors can be generated by a lookup table or an embedding layer.
例如,在桥梁结构图像中,系统先确定每个图像分块的坐标位置(如左上角和右下角的像素坐标)。然后,系统将这些坐标值转换为二维网格索引,并通过嵌入层将这些索引转换为坐标隐式表示。例如,假设图像被划分为10×10的网格区域,则每个图像分块都可以被分配一个唯一的网格索引(如(3, 5))。系统通过嵌入层将这个索引转换为一个固定长度的嵌入向量作为坐标隐式表示。这个向量能够反映出图像分块在图像中的相对位置信息。For example, in a bridge structure image, the system first determines the coordinate position of each image block (such as the pixel coordinates of the upper left corner and the lower right corner). Then, the system converts these coordinate values into two-dimensional grid indices and converts these indices into implicit coordinate representations through the embedding layer. For example, assuming that the image is divided into 10×10 grid areas, each image block can be assigned a unique grid index (such as (3, 5)). The system converts this index into a fixed-length embedding vector as an implicit coordinate representation through the embedding layer. This vector can reflect the relative position information of the image block in the image.
步骤S4240中,计算机系统获取每个图像分块的定位向量,并将其与初始语义隐式表示和坐标隐式表示进行合并以生成最终的语义隐式表示。定位向量是一种特殊的嵌入向量,它用于表示图像分块在整体图像或特定上下文中的身份或角色。通过合并这三种表示形式,系统能够生成一个更加全面且富含语义信息的特征向量用于后续的损伤识别任务。定位向量的获取方式与坐标隐式表示类似,坐标隐式表示主要关注图像分块的空间位置信息,而定位向量则更侧重于图像分块在整体图像或特定上下文中的身份或角色信息。在实际应用中,定位向量可以通过多种方式来获取:In step S 4 240, the computer system obtains the positioning vector of each image block and merges it with the initial semantic implicit representation and the coordinate implicit representation to generate a final semantic implicit representation. The positioning vector is a special embedding vector that is used to represent the identity or role of the image block in the overall image or a specific context. By merging these three representation forms, the system can generate a more comprehensive and semantically rich feature vector for subsequent damage identification tasks. The method of obtaining the positioning vector is similar to the coordinate implicit representation. The coordinate implicit representation mainly focuses on the spatial position information of the image block, while the positioning vector focuses more on the identity or role information of the image block in the overall image or a specific context. In practical applications, the positioning vector can be obtained in a variety of ways:
预定义的嵌入向量:可以为每个图像分块分配一个预定义的嵌入向量作为定位向量。这些嵌入向量可以通过随机初始化或基于某种规则(如分块的尺寸、形状等)来生成。Predefined embedding vectors: Each image block can be assigned a predefined embedding vector as a localization vector. These embedding vectors can be generated by random initialization or based on some rules (such as the size, shape, etc. of the block).
学习到的嵌入向量:在训练过程中学习到的嵌入向量也可以作为定位向量使用。例如,在训练桥梁结构损伤识别模型时,可以将每个图像分块视为一个独立的符号,并通过嵌入层为每个符号生成一个嵌入向量作为定位向量。Learned embedding vectors: The embedding vectors learned during training can also be used as localization vectors. For example, when training a bridge structure damage recognition model, each image block can be treated as an independent symbol, and an embedding vector is generated for each symbol through the embedding layer as a localization vector.
上下文相关的嵌入向量:在某些情况下,定位向量可能与图像分块周围的上下文信息密切相关。此时,可以采用注意力机制或图神经网络等方法来捕捉上下文信息,并生成与上下文相关的定位向量。Context-dependent embedding vectors: In some cases, the localization vector may be closely related to the contextual information around the image patch. In this case, methods such as attention mechanisms or graph neural networks can be used to capture the contextual information and generate context-dependent localization vectors.
在获取到初始语义隐式表示、坐标隐式表示和定位向量后,计算机系统将它们进行合并以生成最终的语义隐式表示。合并过程可以通过简单的拼接操作来实现,也可以通过更复杂的特征融合方法(如注意力机制、双线性池化等)来优化。无论采用哪种方法,合并后的语义隐式表示都应该能够全面且准确地反映图像分块的视觉特征、空间位置信息和上下文语义信息。After obtaining the initial semantic implicit representation, coordinate implicit representation, and positioning vector, the computer system merges them to generate the final semantic implicit representation. The merging process can be achieved through a simple splicing operation or optimized through more complex feature fusion methods (such as attention mechanism, bilinear pooling, etc.). Regardless of the method used, the merged semantic implicit representation should be able to fully and accurately reflect the visual features, spatial location information, and contextual semantic information of the image blocks.
例如,在桥梁结构图像中,系统为每个图像分块生成初始语义隐式表示和坐标隐式表示,并通过某种方式获取到定位向量(如通过预定义的嵌入向量或学习到的嵌入向量)。然后,系统将这三种表示进行合并以生成最终的语义隐式表示。例如,系统可以将初始语义隐式表示、坐标隐式表示和定位向量进行简单的拼接操作得到一个更长的特征向量;或者采用注意力机制对这三种表示进行加权求和以生成一个加权的特征向量作为最终的语义隐式表示。无论采用哪种方法,这个表示都将被用于后续的损伤识别任务中以判断图像分块是否存在损伤迹象。For example, in a bridge structure image, the system generates an initial semantic implicit representation and a coordinate implicit representation for each image block, and obtains a positioning vector in some way (such as through a predefined embedding vector or a learned embedding vector). The system then merges these three representations to generate a final semantic implicit representation. For example, the system can simply concatenate the initial semantic implicit representation, the coordinate implicit representation, and the positioning vector to obtain a longer feature vector; or use an attention mechanism to perform a weighted summation of the three representations to generate a weighted feature vector as the final semantic implicit representation. Regardless of which method is used, this representation will be used in subsequent damage identification tasks to determine whether there are signs of damage in the image block.
通过以上步骤的详细解释和实例说明,可以清晰地看到步骤S200的第四种实现方案是如何从机器视觉图像中提取出语义隐式表示的。这一过程不仅涉及到了复杂的图像处理技术和深度学习模型的应用;还充分考虑了图像分块的语义信息、空间位置信息和上下文语义信息的综合表示。最终生成的语义隐式表示将为后续的桥梁结构损伤识别任务提供更加全面且准确的特征信息支持。Through the detailed explanation and example of the above steps, we can clearly see how the fourth implementation of step S200 extracts semantic implicit representation from machine vision images. This process not only involves the application of complex image processing technology and deep learning models, but also fully considers the comprehensive representation of semantic information, spatial position information and contextual semantic information of image blocks. The semantic implicit representation finally generated will provide more comprehensive and accurate feature information support for the subsequent bridge structure damage identification task.
作为本申请的一种实现方案,所述步骤S300,对t个所述图像隐式表示进行合并,得到所述图像分块对应的合并隐式表示,对所述合并隐式表示进行适应性特征集成,得到用以突出表征所述合并隐式表示的适应性隐式表示,可以包括:As an implementation scheme of the present application, the step S300 of merging t implicit representations of the image to obtain a merged implicit representation corresponding to the image block, and adaptively integrating the features of the merged implicit representation to obtain an adaptive implicit representation for highlighting the merged implicit representation may include:
步骤S310:将t个所述图像隐式表示传入桥梁结构损伤识别模型的隐式表示合并模块。Step S310: passing the t implicit representations of the images into an implicit representation merging module of the bridge structure damage recognition model.
步骤S320:在所述隐式表示合并模块中,对t个所述图像隐式表示进行合并,得到所述图像分块对应的合并隐式表示。Step S320: In the implicit representation merging module, t implicit representations of the image are merged to obtain a merged implicit representation corresponding to the image block.
步骤S330:对所述合并隐式表示进行权重矩阵加权,得到所述合并隐式表示对应的h个适应性执行特征,h∈[1,+∞)。Step S330: weighting the combined implicit representation with a weight matrix to obtain h adaptive execution features corresponding to the combined implicit representation, h∈[1,+∞).
步骤S340:对h个所述适应性执行特征进行适应性加权操作,得到用以突出表征所述合并隐式表示的适应性隐式表示。Step S340: Adaptively weighting the h adaptive execution features to obtain an adaptive implicit representation for highlighting the combined implicit representation.
本申请实施例中,步骤S300将多种类型的图像隐式表示(如纹理隐式表示、形状隐式表示、颜色隐式表示、语义隐式表示等)进行有效合并,并通过适应性特征集成技术生成更具代表性的适应性隐式表示。这一过程不仅提升了特征的丰富度和表达力,还为后续的损伤识别提供了更为坚实的数据基础。In the embodiment of the present application, step S300 effectively merges multiple types of implicit image representations (such as texture implicit representation, shape implicit representation, color implicit representation, semantic implicit representation, etc.), and generates a more representative adaptive implicit representation through adaptive feature integration technology. This process not only improves the richness and expressiveness of features, but also provides a more solid data foundation for subsequent damage identification.
具体地,步骤S310中,计算机系统将步骤S200中提取出的多种图像隐式表示(共计t个)传递给桥梁结构损伤识别模型中的隐式表示合并模块。这个模块是专门设计用于处理多种图像隐式表示的融合问题,旨在通过有效的合并策略生成一个综合了所有特征信息的合并隐式表示。Specifically, in step S310, the computer system passes the multiple implicit representations of images (t in total) extracted in step S200 to the implicit representation merging module in the bridge structure damage identification model. This module is specially designed to handle the fusion problem of multiple implicit representations of images, and aims to generate a merged implicit representation that integrates all feature information through an effective merging strategy.
隐式表示合并模块可以是一个集成了多种特征融合技术的复杂系统。这些技术包括但不限于拼接、加权求和、主成分分析(PCA)、注意力机制(Attention Mechanism)等。The implicit representation merging module can be a complex system that integrates multiple feature fusion techniques, including but not limited to concatenation, weighted summation, principal component analysis (PCA), and attention mechanism.
例如,假设本申请实施例中,已经从某个图像分块中提取出了纹理隐式表示(特征向量长度为n1)、形状隐式表示(特征向量长度为n2)、颜色隐式表示(特征向量长度为n3)和语义隐式表示(特征向量长度为n4)。这四个隐式表示分别反映了图像分块在不同维度上的特征信息。计算机系统将这些隐式表示作为输入传递给隐式表示合并模块,准备进行后续的合并操作。For example, it is assumed that in the embodiment of the present application, texture implicit representation (feature vector length is n1), shape implicit representation (feature vector length is n2), color implicit representation (feature vector length is n3) and semantic implicit representation (feature vector length is n4) have been extracted from a certain image block. These four implicit representations respectively reflect the feature information of the image block in different dimensions. The computer system passes these implicit representations as input to the implicit representation merging module, preparing for subsequent merging operations.
步骤S320中,在隐式表示合并模块内部,计算机系统对传入的t个图像隐式表示进行合并操作。合并的目的是生成一个综合了所有特征信息的合并隐式表示,这个表示将作为后续适应性特征集成的输入。In step S320, inside the implicit representation merging module, the computer system merges the t implicit representations of the input images. The purpose of the merging is to generate a merged implicit representation that integrates all feature information, which will serve as the input for the subsequent adaptive feature integration.
例如,一种简单的合并方法是直接拼接(concat)。即,将t个图像隐式表示按照一定的顺序首尾相连,形成一个更长的特征向量。例如,如果纹理隐式表示为[t1, t2, ...,tn1],形状隐式表示为[s1, s2, ..., sn2],颜色隐式表示为[c1, c2, ..., cn3],语义隐式表示为[m1, m2, ..., mn4],则合并后的隐式表示为[t1, t2, ..., tn1, s1, s2,..., sn2, c1, c2, ..., cn3, m1, m2, ..., mn4]。直接拼接可以导致特征维度过高,增加后续处理的计算复杂度。因此,在实际应用中,可以考虑使用其他更高效的合并方法,如加权求和、主成分分析(PCA)降维等。For example, a simple merging method is direct concatenation. That is, the implicit representations of t images are connected end to end in a certain order to form a longer feature vector. For example, if the texture implicit representation is [t1, t2, ..., tn1], the shape implicit representation is [s1, s2, ..., sn2], the color implicit representation is [c1, c2, ..., cn3], and the semantic implicit representation is [m1, m2, ..., mn4], then the merged implicit representation is [t1, t2, ..., tn1, s1, s2,..., sn2, c1, c2, ..., cn3, m1, m2, ..., mn4]. Direct concatenation can lead to excessively high feature dimensions and increase the computational complexity of subsequent processing. Therefore, in practical applications, other more efficient merging methods can be considered, such as weighted summation, principal component analysis (PCA) dimensionality reduction, etc.
假设合并后的隐式表示为一个长度为N的向量(N=n1+n2+n3+n4),该向量综合了图像分块在纹理、形状、颜色和语义四个维度上的特征信息。这个向量将作为后续步骤的输入,用于生成适应性隐式表示。Assume that the merged implicit representation is a vector of length N (N=n1+n2+n3+n4), which integrates the feature information of the image block in four dimensions: texture, shape, color, and semantics. This vector will be used as input in the subsequent steps to generate adaptive implicit representations.
在得到合并隐式表示后,步骤S330对其进行权重矩阵加权处理,以生成适应性执行特征(即自注意力输入向量)。这一过程例如涉及线性变换和权重分配,旨在突出合并隐式表示中的关键特征,并为后续的适应性加权操作提供准备。After the merged implicit representation is obtained, step S330 performs weighted processing on it with a weight matrix to generate an adaptive execution feature (i.e., a self-attention input vector). This process involves, for example, linear transformation and weight assignment, which aims to highlight the key features in the merged implicit representation and provide preparation for subsequent adaptive weighting operations.
权重矩阵加权可以通过线性变换来实现。具体来说,可以定义一个权重矩阵W(其形状可能与合并隐式表示的特征维度相匹配),然后将合并隐式表示与该权重矩阵相乘,得到加权后的特征向量。在自注意力机制中,涉及三个不同的权重矩阵:查询权重矩阵(Wq)、键权重矩阵(Wk)和值权重矩阵(Wv),它们分别用于生成请求特征(Query)、索引特征(Key)和输出特征(Value)。Weight matrix weighting can be achieved through linear transformation. Specifically, a weight matrix W can be defined (whose shape may match the feature dimension of the merged implicit representation), and then the merged implicit representation is multiplied by the weight matrix to obtain the weighted feature vector. In the self-attention mechanism, three different weight matrices are involved: the query weight matrix (Wq), the key weight matrix (Wk), and the value weight matrix (Wv), which are used to generate request features (Query), index features (Key), and output features (Value), respectively.
本申请实施例中,可以假设使用自注意力机制来生成适应性执行特征。具体来说,首先通过查询权重矩阵Wq对合并隐式表示进行线性变换,生成请求特征;然后,通过Wk对合并隐式表示进行线性变换,生成索引特征(即key);输出特征则可以直接使用合并隐式表示本身或通过值权重矩阵Wv进行进一步变换。In the embodiment of the present application, it can be assumed that the self-attention mechanism is used to generate adaptive execution features. Specifically, the merged implicit representation is first linearly transformed through the query weight matrix Wq to generate the request feature; then, the merged implicit representation is linearly transformed through Wk to generate the index feature (i.e., key); the output feature can be directly used by the merged implicit representation itself or further transformed through the value weight matrix Wv.
假设经过权重矩阵加权后,得到了h个适应性执行特征(即需要输入的自注意力特征,可以是一个向量)。这些特征可能具有与合并隐式表示相同的维度或经过进一步变换后的不同维度。它们将作为后续自注意力处理的输入,用于生成适应性隐式表示。Assume that after weighting by the weight matrix, h adaptive execution features are obtained (i.e., self-attention features that need to be input, which can be a vector). These features may have the same dimensions as the merged implicit representation or different dimensions after further transformation. They will be used as input for subsequent self-attention processing to generate adaptive implicit representations.
在得到h个适应性执行特征后,步骤S340对这些特征进行适应性加权操作,例如应用自注意力机制来生成适应性隐式表示。自注意力机制通过计算请求特征与索引特征之间的相似度(或称为注意力分数),并根据这些相似度对输出特征进行加权求和,从而实现对输入特征的动态选择和重要性调整。After obtaining h adaptive execution features, step S340 performs adaptive weighting operations on these features, such as applying a self-attention mechanism to generate adaptive implicit representations. The self-attention mechanism calculates the similarity (or attention score) between the request feature and the index feature, and performs a weighted summation of the output features based on these similarities, thereby achieving dynamic selection and importance adjustment of the input features.
首先计算请求特征与索引特征之间的点积(或缩放点积),然后应用softmax函数将这些点积值归一化为概率分布。接下来,使用这些注意力权重对输出特征进行加权求和,得到加权后的特征向量作为输出。First, the dot product (or scaled dot product) between the request feature and the index feature is calculated, and then the softmax function is applied to normalize these dot product values into a probability distribution. Next, the output features are weighted summed using these attention weights to obtain the weighted feature vector as the output.
在实际应用中,可以将合并隐式表示通过多个线性变换层分别生成请求特征、索引特征和输出特征。然后按照自注意力机制的计算过程生成加权后的特征向量作为适应性隐式表示。这个表示不仅融合了多种图像隐式表示的特征信息,还通过自注意力机制对关键特征进行了强化和突出。In practical applications, the merged implicit representation can be used to generate request features, index features, and output features through multiple linear transformation layers. Then, the weighted feature vector is generated as the adaptive implicit representation according to the calculation process of the self-attention mechanism. This representation not only integrates the feature information of multiple implicit representations of the image, but also strengthens and highlights the key features through the self-attention mechanism.
假设经过自注意力处理后得到的适应性隐式表示为一个与合并隐式表示维度相同或经过进一步变换后的特征向量。这个向量综合了图像分块在多个维度上的特征信息,并通过自注意力机制对关键特征进行了强化处理。它将被用于后续的桥梁结构损伤识别任务中作为特征输入。Assume that the adaptive implicit representation obtained after self-attention processing is a feature vector with the same dimension as the merged implicit representation or after further transformation. This vector integrates the feature information of the image block in multiple dimensions and strengthens the key features through the self-attention mechanism. It will be used as feature input in the subsequent bridge structure damage identification task.
可选的实现方式中,h个所述适应性执行特征包括索引特征、输出特征和请求特征。基于此,所述步骤S340,对h个所述适应性执行特征进行适应性加权操作,得到用以突出表征所述合并隐式表示的适应性隐式表示,可以包括:In an optional implementation, the h adaptive execution features include index features, output features, and request features. Based on this, the step S340 of performing adaptive weighting operations on the h adaptive execution features to obtain an adaptive implicit representation for highlighting the merged implicit representation may include:
步骤S341:获取所述索引特征对应的逆特征,对所述请求特征和所述逆特征进行内积计算,得到内积矩阵,所述内积矩阵用以表示t个所述图像隐式表示之间的相似度;Step S341: obtaining an inverse feature corresponding to the index feature, performing inner product calculation on the request feature and the inverse feature to obtain an inner product matrix, wherein the inner product matrix is used to represent the similarity between the t implicit representations of the images;
步骤S342:对所述内积矩阵进行规范化,得到所述内积矩阵对应的规范化结果,对所述规范化结果和所述输出特征进行内积计算,得到用以合并表征t个所述图像隐式表示的适应性合并结果;Step S342: normalizing the inner product matrix to obtain a normalized result corresponding to the inner product matrix, performing inner product calculation on the normalized result and the output feature to obtain an adaptive merging result for merging and representing the implicit representations of the t images;
步骤S343:对所述适应性合并结果进行多层感知器处理,得到用以突出表征所述合并隐式表示的适应性隐式表示。Step S343: performing multi-layer perceptron processing on the adaptive merging result to obtain an adaptive implicit representation for highlighting the merged implicit representation.
在自注意力机制中,索引特征(Key)、输出特征(Value)和请求特征(Query)扮演着核心角色。本申请实施例中,这些特征向量分别代表了图像隐式表示的不同方面,如纹理、形状、颜色等。为了计算这些特征向量之间的相似度,系统首先需要对索引特征进行处理,并与请求特征进行内积计算。In the self-attention mechanism, index features (Key), output features (Value) and request features (Query) play a core role. In the embodiment of the present application, these feature vectors represent different aspects of the implicit representation of the image, such as texture, shape, color, etc. In order to calculate the similarity between these feature vectors, the system first needs to process the index feature and perform an inner product calculation with the request feature.
逆特征是索引特征的转置,系统获取到索引特征的逆特征后,将其与请求特征进行内积计算。内积操作衡量了两个向量之间的相似度或相关性。本申请实施例中,这意味着系统正在评估不同图像隐式表示(通过其对应的索引特征和请求特征表示)之间的关联程度。The inverse feature is the transpose of the index feature. After the system obtains the inverse feature of the index feature, it calculates the inner product with the request feature. The inner product operation measures the similarity or correlation between two vectors. In the embodiment of the present application, this means that the system is evaluating the degree of association between the implicit representations of different images (represented by their corresponding index features and request features).
在内积矩阵生成后,步骤S342中,对其进行规范化处理,以确保后续计算的稳定性。规范化过程例如涉及应用softmax函数将内积矩阵的行(或列)归一化为概率分布形式。然后,系统使用规范化后的结果与输出特征进行内积计算,以生成适应性合并结果。After the inner product matrix is generated, in step S342, it is normalized to ensure the stability of subsequent calculations. The normalization process, for example, involves applying a softmax function to normalize the rows (or columns) of the inner product matrix into a probability distribution form. Then, the system uses the normalized result and the output feature to perform an inner product calculation to generate an adaptive merging result.
假设内积矩阵为A,其第i行的softmax规范化结果为softmax(ai),其中ai是A的第i行向量。则规范化后的内积矩阵可以表示为softmax(A),其中每一行都是一个概率分布。Assume that the inner product matrix is A, and the softmax normalized result of its i-th row is softmax(ai), where ai is the i-th row vector of A. Then the normalized inner product matrix can be expressed as softmax(A), where each row is a probability distribution.
规范化后的内积矩阵与输出特征(Value vectors)进行内积计算,以生成适应性合并结果。这个过程相当于根据注意力权重对输出特征进行加权求和。The normalized inner product matrix is inner-producted with the output features (Value vectors) to generate an adaptive merge result. This process is equivalent to weighted summation of the output features according to the attention weights.
在得到适应性合并结果(即加权后的输出特征向量)后,步骤S343对这些结果进行进一步的处理,以增强其表达能力和非线性特性。多层感知器(MLP)是一种常用的处理方法,它能够通过多个隐藏层对输入特征进行变换和学习。After obtaining the adaptive merging results (ie, the weighted output feature vectors), step S343 further processes these results to enhance their expressiveness and nonlinear characteristics. Multilayer perceptron (MLP) is a commonly used processing method that can transform and learn input features through multiple hidden layers.
多层感知器由多个全连接层组成,每个全连接层都包含一定数量的神经元,并通过激活函数(如ReLU、sigmoid等)引入非线性特性。本申请实施例中,系统可以将适应性合并结果作为多层感知器的输入,并通过一系列的隐藏层对其进行变换和学习。最终,多层感知器的输出即为所需的适应性隐式表示。The multilayer perceptron is composed of multiple fully connected layers, each of which contains a certain number of neurons and introduces nonlinear characteristics through activation functions (such as ReLU, sigmoid, etc.). In the embodiment of the present application, the system can use the adaptive merging result as the input of the multilayer perceptron and transform and learn it through a series of hidden layers. Ultimately, the output of the multilayer perceptron is the desired adaptive implicit representation.
假设适应性合并结果为两个向量(对应于前面的两个请求特征),系统将这些向量作为多层感知器的输入。多层感知器通过其内部的隐藏层和激活函数对这些输入进行变换和处理,最终输出两个适应性隐式表示向量。这些向量不仅包含了原始图像隐式表示的信息,还通过自注意力机制和多层感知器的处理增强了其表达能力和鲁棒性。Assuming that the adaptive merging results in two vectors (corresponding to the two requested features above), the system uses these vectors as the input of the multilayer perceptron. The multilayer perceptron transforms and processes these inputs through its internal hidden layers and activation functions, and finally outputs two adaptive implicit representation vectors. These vectors not only contain the information of the implicit representation of the original image, but also enhance their expressiveness and robustness through the self-attention mechanism and multilayer perceptron processing.
作为本申请的一种实现方案,所述步骤S400,对所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的输出隐式表示进行逐级合并,得到所述图像分块对应的合并隐式表示,可以包括:As an implementation scheme of the present application, the step S400 of merging the image block connectivity matrix corresponding to the machine vision image and the output implicit representation corresponding to the image block step by step to obtain the merged implicit representation corresponding to the image block may include:
步骤S410:将所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的输出隐式表示传入桥梁结构损伤识别模型的逐级合并模块,所述逐级合并模块包括特征提取子模块、v个逐级合并子模块和合并子模块,v∈[1,+∞)。Step S410: The image block connectivity matrix corresponding to the machine vision image and the output implicit representation corresponding to the image block are input into a step-by-step merging module of the bridge structure damage recognition model, wherein the step-by-step merging module includes a feature extraction submodule, v step-by-step merging submodules and a merging submodule, v∈[1,+∞).
步骤S420:在所述特征提取子模块中,对所述图像分块对应的输出隐式表示进行特征提取,得到所述图像分块对应的特征提取隐式表示。Step S420: In the feature extraction submodule, feature extraction is performed on the output implicit representation corresponding to the image block to obtain a feature extracted implicit representation corresponding to the image block.
步骤S430:在v个所述逐级合并子模块中,基于所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的特征提取隐式表示,得到v个所述逐级合并子模块分别对应的分级隐式表示,将v个所述逐级合并子模块中的目标逐级合并子模块对应的分级隐式表示确定成所述图像分块对应的逐级合并隐式表示;所述目标逐级合并子模块为v个所述逐级合并子模块中的末尾的逐级合并子模块。Step S430: In the v step-by-step merging sub-modules, based on the image block connectivity matrix corresponding to the machine vision image and the feature extraction implicit representation corresponding to the image block, the hierarchical implicit representations corresponding to the v step-by-step merging sub-modules are obtained, and the hierarchical implicit representation corresponding to the target step-by-step merging sub-module in the v step-by-step merging sub-modules is determined as the step-by-step merging implicit representation corresponding to the image block; the target step-by-step merging sub-module is the last step-by-step merging sub-module among the v step-by-step merging sub-modules.
步骤S440:在所述合并子模块中,对所述图像分块对应的逐级合并隐式表示和所述图像分块对应的特征提取隐式表示进行合并,得到所述图像分块对应的合并隐式表示。Step S440: In the merging submodule, the step-by-step merged implicit representation corresponding to the image block and the feature extraction implicit representation corresponding to the image block are merged to obtain a merged implicit representation corresponding to the image block.
步骤S410中,计算机系统将关键的输入数据(图像块连通性矩阵和输出隐式表示)传递给桥梁结构损伤识别模型中的逐级合并模块。这个模块是专门设计来处理图像分块之间的空间关系以及它们各自的特征信息,以生成更高层次的合并隐式表示。In step S410, the computer system passes the key input data (image block connectivity matrix and output implicit representation) to the step-by-step merging module in the bridge structure damage identification model. This module is specially designed to process the spatial relationship between image blocks and their respective feature information to generate a higher level merged implicit representation.
逐级合并模块是一个复杂的神经网络,包含了多个子模块,每个子模块都承担着特定的处理任务。具体来说,该模块包括特征提取子模块、v个逐级合并子模块(如图卷积网络GCN)和合并子模块。这些子模块协同工作,实现了从原始输入到最终合并隐式表示的转换过程。The step-by-step merging module is a complex neural network that contains multiple submodules, each of which has a specific processing task. Specifically, the module includes a feature extraction submodule, v step-by-step merging submodules (such as the convolutional network GCN in the figure), and a merging submodule. These submodules work together to achieve the conversion process from the original input to the final merged implicit representation.
步骤S420中,在接收到图像分块的输出隐式表示后,特征提取子模块首先对其进行进一步的特征提取操作。这一步的目的是从输出隐式表示中提取出更加精炼和具有区分度的特征信息,以便后续的处理和分析。In step S420, after receiving the output implicit representation of the image block, the feature extraction submodule first performs further feature extraction operations on it. The purpose of this step is to extract more refined and distinguishing feature information from the output implicit representation for subsequent processing and analysis.
特征提取子模块可能采用多种技术来实现特征提取,包括但不限于线性变换、非线性激活函数、卷积操作等。然而,在这个特定的场景中,由于输出隐式表示已经是一个融合了多种特征的高维向量或矩阵,因此特征提取过程可能更加侧重于特征的降维和变换,以便更好地适应后续的图卷积网络处理。The feature extraction submodule may use a variety of techniques to achieve feature extraction, including but not limited to linear transformation, nonlinear activation function, convolution operation, etc. However, in this specific scenario, since the output implicit representation is already a high-dimensional vector or matrix that integrates multiple features, the feature extraction process may focus more on dimensionality reduction and transformation of features in order to better adapt to subsequent graph convolutional network processing.
步骤S430中,在得到图像分块的特征提取隐式表示后,系统将这些表示以及图像块连通性矩阵传递给v个逐级合并子模块进行处理。这些子模块通常采用图神经网络(如GCN)来实现,因为它们能够自然地处理图结构数据,并有效地捕捉节点(图像分块)之间的空间关系。In step S430, after obtaining the feature extraction implicit representation of the image block, the system passes these representations and the image block connectivity matrix to v level-by-level merging submodules for processing. These submodules are usually implemented using graph neural networks (such as GCN) because they can naturally process graph structured data and effectively capture the spatial relationship between nodes (image blocks).
图卷积网络是一种专门用于图结构数据处理的神经网络。它通过逐层卷积操作来聚合节点邻居的信息,从而生成每个节点的更新表示。本申请实施例中,GCN可以将图像块连通性矩阵作为图的邻接矩阵输入,将特征提取隐式表示作为节点的初始特征输入,然后通过多轮卷积操作来更新节点的表示。The graph convolutional network is a neural network specifically designed for graph structured data processing. It aggregates information about node neighbors through layer-by-layer convolution operations to generate an updated representation of each node. In an embodiment of the present application, the GCN can use the image block connectivity matrix as the adjacency matrix input of the graph, the feature extraction implicit representation as the initial feature input of the node, and then update the node representation through multiple rounds of convolution operations.
在逐级合并子模块中,系统根据图像块连通性矩阵和特征提取隐式表示构建图结构数据。然后,从第一个逐级合并子模块开始,逐层应用GCN进行节点的信息聚合和表示更新。每一层GCN都会根据当前层的节点表示和邻接矩阵生成新的节点表示。随着层数的增加,节点的表示会逐渐融合更多邻居节点的信息,从而形成更加全面和准确的分级隐式表示。In the step-by-step merging submodule, the system constructs graph structure data based on the image block connectivity matrix and feature extraction implicit representation. Then, starting from the first step-by-step merging submodule, GCN is applied layer by layer to aggregate node information and update representation. Each layer of GCN generates a new node representation based on the node representation and adjacency matrix of the current layer. As the number of layers increases, the representation of the node will gradually integrate the information of more neighboring nodes, thus forming a more comprehensive and accurate hierarchical implicit representation.
经过v个逐级合并子模块的处理后,系统将最后一个(即目标)逐级合并子模块输出的分级隐式表示作为图像分块对应的逐级合并隐式表示。这个表示不仅包含了图像分块自身的特征信息,还融合了其邻居分块的信息以及整个图像块连通性矩阵中的空间关系信息。After being processed by v level-by-level merging submodules, the system uses the hierarchical implicit representation output by the last (i.e., target) level-by-level merging submodule as the level-by-level merging implicit representation corresponding to the image block. This representation not only contains the feature information of the image block itself, but also integrates the information of its neighboring blocks and the spatial relationship information in the connectivity matrix of the entire image block.
步骤S440中,在得到图像分块的逐级合并隐式表示后,系统并不直接将其作为最终的输出。相反,它会将这个表示与原始的特征提取隐式表示进行合并操作,以生成更加完整和丰富的合并隐式表示。In step S440, after obtaining the step-by-step merged implicit representation of the image blocks, the system does not directly use it as the final output. Instead, it merges this representation with the original feature extraction implicit representation to generate a more complete and rich merged implicit representation.
合并子模块可能采用多种方法来实现两种隐式表示的合并。一种简单且常见的方法是拼接(concatenation),即将两种表示直接首尾相连形成一个更长的向量。然而,这种方法可以导致特征维度过高和计算复杂度增加。因此,在实际应用中,系统可以采用更复杂的特征融合技术(如注意力机制、双线性池化等)来优化合并过程。The merging submodule may use a variety of methods to merge two implicit representations. A simple and common method is concatenation, which directly connects the two representations end to end to form a longer vector. However, this method can lead to excessive feature dimensions and increased computational complexity. Therefore, in practical applications, the system can use more complex feature fusion techniques (such as attention mechanism, bilinear pooling, etc.) to optimize the merging process.
例如,假设经过v个逐级合并子模块处理后得到的逐级合并隐式表示为一个256维的特征向量,而原始的特征提取隐式表示为一个128维的特征向量。在合并子模块中,系统可以将这两个向量进行拼接操作,得到一个384维的合并隐式表示。考虑到计算效率和特征冗余的问题,系统可以先对这两个向量进行降维处理(如通过PCA降维),然后再进行拼接或采用其他融合方法。For example, suppose that the implicit representation of the step-by-step merging obtained after processing by v step-by-step merging submodules is a 256-dimensional feature vector, and the original feature extraction implicit representation is a 128-dimensional feature vector. In the merging submodule, the system can concatenate these two vectors to obtain a 384-dimensional merged implicit representation. Considering the issues of computational efficiency and feature redundancy, the system can first reduce the dimensionality of the two vectors (such as by PCA dimensionality reduction), and then concatenate them or use other fusion methods.
最终生成的合并隐式表示将作为后续损伤识别步骤的输入数据之一。它不仅包含了图像分块自身的特征信息(通过特征提取隐式表示体现),还融合了图像分块之间的空间关系信息(通过逐级合并隐式表示体现)。这种全面的特征表示将有助于提高损伤识别的准确性和鲁棒性。The resulting merged implicit representation will be used as one of the input data for the subsequent damage identification step. It not only contains the feature information of the image block itself (embodied by the feature extraction implicit representation), but also integrates the spatial relationship information between the image blocks (embodied by the step-by-step merging of implicit representations). This comprehensive feature representation will help improve the accuracy and robustness of damage identification.
可选的实现方式中,v个所述逐级合并子模块包括逐级合并子模块Cb,所述b不大于v。那么,基于此,所述步骤S430,在v个所述逐级合并子模块中,基于所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的特征提取隐式表示,得到v个所述逐级合并子模块分别对应的分级隐式表示,可以包括两种情况:In an optional implementation, the v step-by-step merging submodules include a step-by-step merging submodule C b , and b is not greater than v. Based on this, in step S430, in the v step-by-step merging submodules, based on the image block connectivity matrix corresponding to the machine vision image and the feature extraction corresponding to the image block, the hierarchical implicit representations corresponding to the v step-by-step merging submodules are obtained, which may include two cases:
情况1:当所述逐级合并子模块Cb是v个所述逐级合并子模块中的第一个逐级合并子模块时,通过所述逐级合并子模块Cb对所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的特征提取隐式表示进行上下文特征提取,得到所述逐级合并子模块Cb对应的分级隐式表示;Case 1: When the step-by-step merging submodule C b is the first step-by-step merging submodule among the v step-by-step merging submodules, the step-by-step merging submodule C b performs context feature extraction on the image block connectivity matrix corresponding to the machine vision image and the feature extraction implicit representation corresponding to the image block, and obtains the hierarchical implicit representation corresponding to the step-by-step merging submodule C b ;
情况2:当所述逐级合并子模块Cb不是v个所述逐级合并子模块中的第一个逐级合并子模块时,通过所述逐级合并子模块Cb对所述机器视觉图像对应的图像块连通性矩阵和逐级合并子模块Ca对应的分级隐式表示进行上下文特征提取,得到所述逐级合并子模块Cb对应的分级隐式表示;所述逐级合并子模块Ca为所述逐级合并子模块Cb的上一个逐级合并子模块。Case 2: When the step-by-step merging submodule C b is not the first step-by-step merging submodule among the v step-by-step merging submodules, the step-by-step merging submodule C b performs context feature extraction on the image block connectivity matrix corresponding to the machine vision image and the hierarchical implicit representation corresponding to the step-by-step merging submodule C a to obtain the hierarchical implicit representation corresponding to the step-by-step merging submodule C b ; the step-by-step merging submodule C a is the previous step-by-step merging submodule of the step-by-step merging submodule C b .
情况1中,假设在桥梁结构损伤识别的初期阶段,计算机系统已经完成了图像分块的特征提取,并构建了表示图像分块之间空间关系的图像块连通性矩阵。现在,这些输入数据被送入第一个逐级合并子模块Cb进行处理,以生成初步的分级隐式表示。本申请中,很明显,逐级合并子模块Ca、Cb中的a和b,都表示逐级合并子模块的序号,例如,v=5时,a和b都不能大于5,那么,a和b的取值是1~5,比如,a=1,b=2,此时可以理解,逐级合并子模块Ca就是第一个逐级合并子模块,逐级合并子模块Cb就是第二个逐级合并子模块,因为,2-1=1,所以逐级合并子模块Ca是逐级合并子模块Cb的前一个逐级合并子模块。至于逐级合并子模块Ca、Cb中的C,代表的是逐级合并子模块。为了更容易理解,举一个通俗的例子,运动会上有5位运动员,他们都被称为Athletes,为了简捷表达,都认为他们是A(也就是Athletes的缩写),而为了对他们进行区分,按照前述的序号方式划分,某个运动员张三,他的缩写是A1,李四的缩写是A2,以此类推(当然,这仅是一种划分方式,张三李四的序号还可以是其他的,但是都不大于5)。那么,很容易理解,出于正常的逻辑设定,A后面的数字不能超过5。对应到本申请,逐级合并子模块Ca、Cb中的C就类比于上面的A。此外,很显然,a、b、v都是正整数,也就是说,不能是分数和小数。In case 1, it is assumed that in the initial stage of bridge structure damage identification, the computer system has completed the feature extraction of image blocks and constructed an image block connectivity matrix representing the spatial relationship between image blocks. Now, these input data are sent to the first step-by-step merging submodule C b for processing to generate a preliminary hierarchical implicit representation. In this application, it is obvious that a and b in the step-by-step merging submodules C a and C b both represent the sequence number of the step-by-step merging submodules. For example, when v=5, a and b cannot be greater than 5, then the values of a and b are 1~5, for example, a=1, b=2, at this time it can be understood that the step-by-step merging submodule C a is the first step-by-step merging submodule, and the step-by-step merging submodule C b is the second step-by-step merging submodule, because 2-1=1, so the step-by-step merging submodule C a is the previous step-by-step merging submodule C b . As for the C in the step-by-step merging submodules C a and C b , it represents the step-by-step merging submodule. To make it easier to understand, let's take a common example. There are 5 athletes at the sports meeting. They are all called Athletes. For the sake of simplicity, they are all considered A (that is, the abbreviation of Athletes). In order to distinguish them, they are divided according to the aforementioned serial number method. A certain athlete Zhang San, his abbreviation is A 1 , Li Si's abbreviation is A 2 , and so on (of course, this is only a division method. The serial numbers of Zhang San and Li Si can also be other, but none of them is greater than 5). Then, it is easy to understand that due to normal logical settings, the number after A cannot exceed 5. Corresponding to this application, the C in the step-by-step merging of sub-modules Ca and Cb is analogous to the above A. In addition, it is obvious that a, b, and v are all positive integers, that is, they cannot be fractions and decimals.
计算机系统将图像块连通性矩阵和图像分块的特征提取隐式表示作为输入传递给第一个逐级合并子模块Cb。图像块连通性矩阵是一个二维数组,其中的元素表示图像分块之间的连通性(如相邻关系、距离等);特征提取隐式表示则是一个高维向量或矩阵,包含了图像分块的各种特征信息。The computer system passes the image block connectivity matrix and the feature extraction implicit representation of the image blocks as input to the first level-by-level merging submodule C b . The image block connectivity matrix is a two-dimensional array, in which the elements represent the connectivity between image blocks (such as adjacent relations, distances, etc.); the feature extraction implicit representation is a high-dimensional vector or matrix that contains various feature information of the image blocks.
在第一个逐级合并子模块Cb中,系统利用图卷积网络(GCN)或其他图神经网络对输入数据进行上下文特征提取。具体来说,GCN会根据图像块连通性矩阵定义的图结构,以及图像分块的特征提取隐式表示作为节点特征,进行图卷积操作。这一过程中,GCN会考虑每个图像分块的邻居信息,通过聚合邻居节点的特征来更新当前节点的表示。In the first step-by-step merging submodule C b , the system uses a graph convolutional network (GCN) or other graph neural network to extract contextual features from the input data. Specifically, GCN extracts implicit representations of image blocks as node features based on the graph structure defined by the image block connectivity matrix and the features of the image blocks, and performs graph convolution operations. In this process, GCN considers the neighbor information of each image block and updates the representation of the current node by aggregating the features of neighboring nodes.
经过一轮或多轮图卷积操作后,第一个逐级合并子模块Cb会输出每个图像分块对应的分级隐式表示。这些表示不仅包含了图像分块自身的特征信息,还融入了其邻居分块的信息以及整个图像块连通性矩阵中的空间关系信息。After one or more rounds of graph convolution operations, the first level-by-level merging submodule Cb outputs hierarchical implicit representations corresponding to each image block. These representations not only contain the feature information of the image block itself, but also incorporate the information of its neighboring blocks and the spatial relationship information in the connectivity matrix of the entire image block.
例如,假设桥梁图像被划分为4个图像分块(A、B、C、D),它们之间的连通性由图像块连通性矩阵表示。每个图像分块经过特征提取后得到一个128维的特征向量。在第一个逐级合并子模块Cb中,系统采用一个包含两层GCN的图神经网络进行处理。For example, suppose the bridge image is divided into four image blocks (A, B, C, D), and the connectivity between them is represented by the image block connectivity matrix. After feature extraction, each image block obtains a 128-dimensional feature vector. In the first step-by-step merging submodule C b , the system uses a graph neural network containing two layers of GCN for processing.
第一层GCN:根据图像块连通性矩阵和初始特征向量,通过图卷积操作更新每个图像分块的表示。假设GCN的卷积核大小为3(即考虑每个节点的一阶邻居),则每个节点的新表示将是其自身特征与其一阶邻居特征的加权平均(加权系数由GCN学习得到)。First layer GCN: According to the image block connectivity matrix and the initial feature vector, the representation of each image block is updated through graph convolution operation. Assuming that the convolution kernel size of GCN is 3 (i.e., considering the first-order neighbors of each node), the new representation of each node will be the weighted average of its own features and the features of its first-order neighbors (the weight coefficient is learned by GCN).
第二层GCN:进一步根据更新后的节点表示和图像块连通性矩阵进行图卷积操作,生成最终的分级隐式表示。这些表示将被用于后续的损伤识别任务中。The second layer of GCN: further performs graph convolution operations based on the updated node representation and image block connectivity matrix to generate the final hierarchical implicit representation. These representations will be used in subsequent damage identification tasks.
情况2中,在第一个逐级合并子模块处理完数据后,生成了初步的分级隐式表示。接下来,这些表示将作为输入传递给后续的逐级合并子模块(如Cb)进行进一步的处理。这一过程将重复进行,直到所有逐级合并子模块都完成处理,生成最终的合并隐式表示。In case 2, after the first level-by-level merging submodule processes the data, a preliminary hierarchical implicit representation is generated. Next, these representations will be passed as input to the subsequent level-by-level merging submodule (such as C b ) for further processing. This process will be repeated until all level-by-level merging submodules have completed processing and generated the final merged implicit representation.
对于非第一个逐级合并子模块Cb,其输入数据包括两部分:一是图像块连通性矩阵(与情况1相同),二是来自上一个逐级合并子模块Ca的分级隐式表示。这意味着Cb模块将在Ca模块输出的基础上进行进一步的特征提取和表示更新。For non-first level-by-level merging submodule C b , its input data includes two parts: one is the image block connectivity matrix (same as case 1), and the other is the hierarchical implicit representation from the previous level-by-level merging submodule Ca. This means that the C b module will perform further feature extraction and representation update based on the output of the Ca module.
与情况1类似,Cb模块也会利用图卷积网络或其他图神经网络对输入数据进行上下文特征提取。但不同的是,此时Cb模块考虑的是更新后的分级隐式表示(即Ca模块的输出)以及图像块连通性矩阵中的空间关系。通过新一轮的图卷积操作,Cb模块会进一步融合图像分块之间的信息,生成更加精细的分级隐式表示。Similar to case 1, the C b module also uses graph convolutional networks or other graph neural networks to extract contextual features from the input data. However, the difference is that this time the C b module considers the updated hierarchical implicit representation (i.e., the output of the C a module) and the spatial relationship in the image block connectivity matrix. Through a new round of graph convolution operations, the C b module will further fuse the information between image blocks to generate a more refined hierarchical implicit representation.
经过处理,Cb模块输出每个图像分块对应的新的分级隐式表示。这些表示比Ca模块的输出更加全面和准确,因为它们融合了更多层次的上下文信息。随着处理流程的推进,每个逐级合并子模块都会生成自己的分级隐式表示,直到最后一个模块完成处理。After processing, the C b module outputs new hierarchical implicit representations for each image block. These representations are more comprehensive and accurate than the outputs of the C a module because they incorporate more levels of contextual information. As the processing flow progresses, each level-by-level merging submodule generates its own hierarchical implicit representation until the last module completes processing.
继续以桥梁图像的4个图像分块(A、B、C、D)为例。假设第一个逐级合并子模块已经生成了初步的分级隐式表示(每个分块对应一个128维的特征向量)。现在,这些表示被传递给第二个逐级合并子模块Cb进行处理。Let's continue with the example of the four image blocks (A, B, C, D) of the bridge image. Assume that the first level-by-level merging submodule has generated preliminary hierarchical implicit representations (each block corresponds to a 128-dimensional feature vector). Now, these representations are passed to the second level-by-level merging submodule C b for processing.
Cb模块首先接收来自Ca模块的分级隐式表示和图像块连通性矩阵。然后,它利用图卷积网络进行新一轮的特征提取和表示更新。假设Cb模块也包含两层GCN,则它会根据Ca模块的输出和图像块连通性矩阵生成更加精细的分级隐式表示。通过多个逐级合并子模块的处理,系统能够逐步融合图像分块之间的上下文信息。每个模块都会在前一个模块的基础上进行优化和改进,从而生成更加准确和全面的分级隐式表示。这种逐级合并的策略有助于提高损伤识别的准确性和鲁棒性。The C b module first receives the hierarchical implicit representation and image block connectivity matrix from the C a module. Then, it uses the graph convolutional network to perform a new round of feature extraction and representation update. Assuming that the C b module also contains two layers of GCN, it will generate a more refined hierarchical implicit representation based on the output of the C a module and the image block connectivity matrix. Through the processing of multiple step-by-step merging sub-modules, the system is able to gradually fuse the contextual information between image blocks. Each module will optimize and improve on the basis of the previous module to generate a more accurate and comprehensive hierarchical implicit representation. This step-by-step merging strategy helps to improve the accuracy and robustness of damage identification.
综上所述,步骤S430中的逐级合并子模块通过逐步整合图像块连通性矩阵和图像分块的特征提取隐式表示,生成了分级隐式表示。这一过程不仅考虑了图像分块自身的特征信息还融入了它们之间的空间关系信息以及多层次的上下文信息。通过多个逐级合并子模块的处理系统能够生成更加全面和准确的特征表示从而为后续的桥梁结构损伤识别任务提供了有力的支持。In summary, the step-by-step merging submodule in step S430 generates a hierarchical implicit representation by gradually integrating the image block connectivity matrix and the implicit representation of feature extraction of the image blocks. This process not only considers the feature information of the image blocks themselves, but also incorporates the spatial relationship information between them and multi-level context information. The processing system through multiple step-by-step merging submodules can generate a more comprehensive and accurate feature representation, thereby providing strong support for the subsequent bridge structure damage identification task.
可选的实现方式中,所述图像分块的数量为m个,m个所述图像分块包括目标图像分块,m∈(1,+∞);所述机器视觉图像对应的图像块连通性矩阵包括所述目标图像分块分别针对m个所述图像分块的连通性参量;所述图像分块对应的特征提取隐式表示包括所述目标图像分块对应的特征提取隐式表示。那么,在情况1中,所述通过所述逐级合并子模块Cb对所述机器视觉图像对应的图像块连通性矩阵和所述图像分块对应的特征提取隐式表示进行上下文特征提取,得到所述逐级合并子模块Cb对应的分级隐式表示,可以包括:In an optional implementation, the number of image blocks is m, and the m image blocks include a target image block, m∈(1,+∞); the image block connectivity matrix corresponding to the machine vision image includes connectivity parameters of the target image block for the m image blocks respectively; the feature extraction implicit representation corresponding to the image block includes the feature extraction implicit representation corresponding to the target image block. Then, in case 1, the context feature extraction is performed on the image block connectivity matrix corresponding to the machine vision image and the feature extraction implicit representation corresponding to the image block by the step-by-step merging submodule C b to obtain the hierarchical implicit representation corresponding to the step-by-step merging submodule C b , which may include:
步骤S4311:在所述逐级合并子模块Cb中获取所述目标图像分块分别针对m个所述图像分块的连通性特征。Step S4311: obtaining connectivity features of the target image block for m image blocks respectively in the step-by-step merging submodule C b .
步骤S4312:在所述逐级合并子模块Cb中,对所述目标图像分块对应的特征提取隐式表示和m个所述连通性特征分别进行合并,得到m个所述图像分块分别对应的合并连通性特征。Step S4312: In the step-by-step merging submodule C b , the feature extraction implicit representation corresponding to the target image block and the m connectivity features are merged respectively to obtain merged connectivity features corresponding to the m image blocks.
步骤S4313:根据m个所述连通性参量对m个所述合并连通性特征进行加权求和,得到所述目标图像分块对应的图像关联特征。Step S4313: performing weighted summation on the m merged connectivity features according to the m connectivity parameters to obtain image association features corresponding to the target image block.
步骤S4314:对所述目标图像分块对应的图像关联特征进行激活,得到所述目标图像分块对应的分级子隐式表示,所述目标图像分块对应的分级子隐式表示属于所述逐级合并子模块Cb对应的分级隐式表示。Step S4314: activating the image association features corresponding to the target image block to obtain the hierarchical sub-implicit representation corresponding to the target image block, wherein the hierarchical sub-implicit representation corresponding to the target image block belongs to the hierarchical implicit representation corresponding to the step-by-step merging submodule C b .
步骤S4311中,计算机系统从图像块连通性矩阵中提取与目标图像分块相关的连通性特征。图像块连通性矩阵是一个二维数组,其行数和列数均等于图像分块的数量m,矩阵中的每个元素表示两个图像分块之间的连通性参量。这些参量的取值范围通常为[0,1],其中0表示两个图像分块之间无关系,1表示关系密切。In step S4311, the computer system extracts connectivity features related to the target image block from the image block connectivity matrix. The image block connectivity matrix is a two-dimensional array, the number of rows and columns of which are equal to the number of image blocks m, and each element in the matrix represents a connectivity parameter between two image blocks. The value range of these parameters is usually [0,1], where 0 indicates no relationship between the two image blocks and 1 indicates a close relationship.
步骤S4312中,计算机系统将目标图像分块对应的特征提取隐式表示与从连通性矩阵中提取的连通性特征进行合并。合并操作通常是通过拼接实现的,即将特征提取隐式表示与每个连通性特征向量首尾相连,形成一个新的向量。In step S4312, the computer system merges the feature extraction implicit representation corresponding to the target image block with the connectivity feature extracted from the connectivity matrix. The merging operation is usually implemented by splicing, that is, the feature extraction implicit representation is connected end to end with each connectivity feature vector to form a new vector.
为了简化说明,假设采用一种简化的合并方法:将每个连通性参量作为一个额外的特征维度添加到特征提取隐式表示的末尾。For simplicity of explanation, a simplified merging approach is assumed: each connectivity parameter is added as an additional feature dimension to the end of the feature extraction implicit representation.
例如,假设目标图像分块B对应的特征提取隐式表示为一个128维的特征向量fB。在获取到B分块分别针对A、B、C、D四个分块的连通性参量[0.8, 0, 0.5, 0.3]后,系统将这些参量作为额外的特征维度添加到fB的末尾,得到一个新的合并连通性特征向量fB′,其维度为132(128维原始特征+4维连通性特征)。但请注意,在实际应用中,由于连通性参量是标量值,可能需要先通过某种方式(如嵌入层)将其转换为与原始特征向量维度相匹配的向量,然后再进行拼接。但在这里,采用简化的合并方法。For example, assume that the feature extraction corresponding to the target image block B is implicitly represented as a 128-dimensional feature vector fB. After obtaining the connectivity parameters [0.8, 0, 0.5, 0.3] of the four blocks A, B, C, and D of block B, the system adds these parameters to the end of fB as additional feature dimensions to obtain a new merged connectivity feature vector fB′, whose dimension is 132 (128-dimensional original features + 4-dimensional connectivity features). However, please note that in practical applications, since the connectivity parameters are scalar values, it may be necessary to first convert them into vectors that match the dimensions of the original feature vectors through some method (such as an embedding layer) and then concatenate them. But here, a simplified merging method is adopted.
另一种做法是使用图卷积网络(GCN)等图神经网络来自然地处理这种图结构数据。在GCN中,连通性矩阵作为图的邻接矩阵输入,而特征提取隐式表示作为节点的初始特征输入。GCN通过图卷积操作自动地将节点的特征与其邻居节点的特征进行融合,而无需显式地进行特征拼接。Another approach is to use graph neural networks such as graph convolutional networks (GCN) to naturally process this graph structure data. In GCN, the connectivity matrix is used as the adjacency matrix input of the graph, and the feature extraction implicit representation is used as the initial feature input of the node. GCN automatically fuses the features of a node with the features of its neighboring nodes through graph convolution operations without explicit feature splicing.
步骤S4313中,根据m个连通性参量对m个合并连通性特征进行加权求和,得到目标图像分块对应的图像关联特征。也就是将m个连通性参量作为权重,对m个合并连通性特征进行加权求和。In step S4313, the m merged connectivity features are weighted and summed according to the m connectivity parameters to obtain the image association features corresponding to the target image block. That is, the m connectivity parameters are used as weights to perform weighted summation on the m merged connectivity features.
步骤S4314中,计算机系统对目标图像分块对应的图像关联特征进行激活操作。激活函数是神经网络中常用的一种非线性变换函数,它能够引入非线性因素,使得神经网络能够学习更加复杂的模式。在图像识别或特征提取任务中,常用的激活函数包括ReLU(Rectified Linear Unit)、Sigmoid、Tanh等。这些函数各有优缺点,但在大多数情况下,ReLU函数因其简单性和有效性而被广泛使用。如果使用了图卷积网络等图神经网络来处理图像块之间的连通性关系,则激活函数通常会内置在图卷积层的输出中。也就是说,在图卷积层对节点特征进行聚合和变换后,会自动应用激活函数来引入非线性因素。In step S4314, the computer system activates the image-related features corresponding to the target image blocks. The activation function is a nonlinear transformation function commonly used in neural networks, which can introduce nonlinear factors so that the neural network can learn more complex patterns. In image recognition or feature extraction tasks, commonly used activation functions include ReLU (Rectified Linear Unit), Sigmoid, Tanh, etc. These functions have their own advantages and disadvantages, but in most cases, the ReLU function is widely used due to its simplicity and effectiveness. If a graph neural network such as a graph convolutional network is used to process the connectivity relationship between image blocks, the activation function is usually built into the output of the graph convolutional layer. That is, after the graph convolutional layer aggregates and transforms the node features, the activation function is automatically applied to introduce nonlinear factors.
综上所述,情况1的实现方式通过逐步合并目标图像分块的特征提取隐式表示和连通性特征,并应用激活函数来生成分级子隐式表示。然而,请注意在实际应用中,更常见且有效的方法是使用图卷积网络等图神经网络来自动处理图像块之间的连通性关系和特征融合问题。通过多层图卷积操作和非线性激活函数的应用,图神经网络能够逐步生成更加准确和全面的节点表示(即分级隐式表示),从而为后续的损伤识别任务提供有力的支持。In summary, the implementation of case 1 is to gradually merge the feature extraction implicit representation and connectivity features of the target image blocks, and apply activation functions to generate hierarchical sub-implicit representations. However, please note that in practical applications, a more common and effective method is to use graph neural networks such as graph convolutional networks to automatically handle the connectivity relationship and feature fusion problems between image blocks. Through the application of multi-layer graph convolution operations and nonlinear activation functions, graph neural networks can gradually generate more accurate and comprehensive node representations (i.e., hierarchical implicit representations), thereby providing strong support for subsequent damage identification tasks.
作为本申请的一种实现方案,所述步骤S500,根据所述合并隐式表示对所述图像分块进行桥梁结构损伤识别,得到图像分块损伤识别结果,可以包括:As an implementation scheme of the present application, the step S500, performing bridge structure damage identification on the image blocks according to the combined implicit representation to obtain image block damage identification results, may include:
步骤S510:将所述合并隐式表示传入桥梁结构损伤识别模型的损伤识别模块。Step S510: passing the merged implicit representation into a damage identification module of a bridge structure damage identification model.
步骤S520:在所述损伤识别模块中,对所述合并隐式表示进行多层感知器处理,得到所述图像分块对应的损伤特征。Step S520: In the damage identification module, multi-layer perceptron processing is performed on the combined implicit representation to obtain damage features corresponding to the image blocks.
步骤S530:对所述损伤特征进行规范化,得到所述损伤特征对应的规范化损伤特征。Step S530: normalize the damage feature to obtain a normalized damage feature corresponding to the damage feature.
步骤S540:根据所述规范化损伤特征对所述图像分块进行桥梁结构损伤识别,得到图像分块损伤识别结果。Step S540: performing bridge structure damage identification on the image blocks according to the normalized damage features to obtain image block damage identification results.
步骤S510中,计算机系统将步骤S400中生成的合并隐式表示传递给桥梁结构损伤识别模型中的损伤识别模块,例如是一个分类器。In step S510, the computer system transmits the combined implicit representation generated in step S400 to a damage identification module in the bridge structure damage identification model, such as a classifier.
步骤S520中,在损伤识别模块内部,多层感知器(MLP)作为一种前馈人工神经网络,被用来对合并隐式表示进行进一步的处理和分析。多层感知器通过多层非线性变换,能够学习输入数据(即合并隐式表示)与输出目标(即损伤特征)之间的复杂映射关系。In step S520, within the damage identification module, a multilayer perceptron (MLP) as a feedforward artificial neural network is used to further process and analyze the merged implicit representation. The multilayer perceptron can learn the complex mapping relationship between the input data (i.e., the merged implicit representation) and the output target (i.e., the damage feature) through multiple layers of nonlinear transformation.
多层感知器通常由输入层、若干隐藏层和输出层组成。在输入层,合并隐式表示作为输入数据被传递给网络;在隐藏层,网络通过一系列非线性激活函数(如ReLU、Sigmoid等)和权重矩阵对输入数据进行变换和组合;在输出层,网络输出变换后的特征向量,即损伤特征。A multilayer perceptron usually consists of an input layer, several hidden layers, and an output layer. In the input layer, the merged implicit representation is passed to the network as input data; in the hidden layer, the network transforms and combines the input data through a series of nonlinear activation functions (such as ReLU, Sigmoid, etc.) and weight matrices; in the output layer, the network outputs the transformed feature vector, i.e., the damage feature.
例如,假设合并隐式表示为一个256维的特征向量,表示图像分块在多个维度上的综合信息。这个向量被送入多层感知器网络。网络首先通过输入层接收这个向量,然后经过若干隐藏层的处理(每个隐藏层可能包含数百甚至数千个神经元,并通过ReLU激活函数引入非线性),最终在输出层生成一个较低维度的特征向量(如32维),这个向量即为图像分块对应的损伤特征。For example, suppose the merged implicit representation is a 256-dimensional feature vector, which represents the comprehensive information of the image block in multiple dimensions. This vector is fed into the multi-layer perceptron network. The network first receives this vector through the input layer, and then processes it through several hidden layers (each hidden layer may contain hundreds or even thousands of neurons, and nonlinearity is introduced through the ReLU activation function), and finally generates a lower-dimensional feature vector (such as 32 dimensions) at the output layer. This vector is the damage feature corresponding to the image block.
在得到损伤特征后,为了后续处理的方便和准确性,步骤S530对这些特征进行规范化处理。规范化是指将特征值缩放到一个统一的范围内(如[0, 1]或[-1, 1]),以消除不同特征之间量纲和数值范围的影响。一种常用的规范化方法是最小-最大规范化(Min-MaxNormalization)。本申请实施例中,假设得到的损伤特征向量为f=[f1,f2,…,f32],其中每个元素fi都代表了一个特定的损伤相关特征。可以对每个特征元素应用最小-最大规范化公式进行处理,得到规范化后的损伤特征向量fnorm。After obtaining the damage features, for the convenience and accuracy of subsequent processing, step S530 normalizes these features. Normalization refers to scaling the feature values to a uniform range (such as [0, 1] or [-1, 1]) to eliminate the impact of dimensions and numerical ranges between different features. A commonly used normalization method is Min-Max Normalization. In the embodiment of the present application, it is assumed that the damage feature vector obtained is f=[f1,f2,…,f32], where each element fi represents a specific damage-related feature. The minimum-maximum normalization formula can be applied to each feature element to obtain a normalized damage feature vector fnorm.
在得到规范化损伤特征后,步骤S540利用这些特征对图像分块进行桥梁结构损伤识别。这一过程例如涉及到一个分类器或回归器模型,该模型能够根据输入的规范化损伤特征预测图像分块是否存在损伤及其损伤类型或程度。After the normalized damage features are obtained, step S540 uses these features to identify bridge structure damage on the image blocks. This process, for example, involves a classifier or regressor model that can predict whether the image block has damage and its damage type or degree based on the input normalized damage features.
在实际应用中,根据任务的具体需求(如损伤的有无、损伤的类型、损伤的程度等),可以选择不同的模型进行损伤识别。对于二分类问题(如判断图像分块是否存在损伤),可以使用逻辑回归、支持向量机(SVM)或深度学习中的卷积神经网络(CNN)分类器等模型。对于多分类问题(如判断损伤的具体类型),则可以使用softmax回归或多类SVM等模型。对于回归问题(如预测损伤的程度),则可以使用线性回归、岭回归或随机森林等模型。In practical applications, different models can be selected for damage identification according to the specific requirements of the task (such as the presence or absence of damage, the type of damage, the degree of damage, etc.). For binary classification problems (such as determining whether an image block is damaged), models such as logistic regression, support vector machine (SVM) or convolutional neural network (CNN) classifiers in deep learning can be used. For multi-classification problems (such as determining the specific type of damage), models such as softmax regression or multi-class SVM can be used. For regression problems (such as predicting the degree of damage), models such as linear regression, ridge regression or random forest can be used.
假设的任务是判断图像分块是否存在损伤,并识别其损伤类型(如裂纹、锈蚀等)。在这种情况下,可以选择一个多类SVM分类器作为损伤识别模型。该模型在训练阶段已经学习到了规范化损伤特征与损伤类别之间的映射关系。在测试阶段(即实际损伤识别过程中),模型接收规范化损伤特征作为输入,并输出一个概率分布向量,表示图像分块属于不同损伤类别的概率。最后,选择概率最高的类别作为最终的损伤识别结果。The assumed task is to determine whether there is damage in the image block and identify its damage type (such as cracks, rust, etc.). In this case, a multi-class SVM classifier can be selected as the damage recognition model. The model has learned the mapping relationship between normalized damage features and damage categories during the training phase. In the test phase (i.e., the actual damage recognition process), the model receives the normalized damage features as input and outputs a probability distribution vector, which represents the probability that the image block belongs to different damage categories. Finally, the category with the highest probability is selected as the final damage recognition result.
例如,假设规范化损伤特征向量fnorm经过多类SVM分类器处理后输出的概率分布向量为[0.1,0.8,0.1],分别对应无损伤、裂纹、锈蚀三个类别。根据这个概率分布向量,可以判断该图像分块存在裂纹损伤的可能性最高(概率为0.8),因此将其识别为裂纹损伤。For example, suppose the probability distribution vector output by the normalized damage feature vector fnorm after being processed by the multi-class SVM classifier is [0.1, 0.8, 0.1], which corresponds to the three categories of no damage, crack, and rust. Based on this probability distribution vector, it can be judged that the image block has the highest probability of crack damage (probability is 0.8), so it is identified as crack damage.
通过以上步骤的详细解释和实例说明,可以清晰地看到步骤S500是如何将合并隐式表示转化为具体的图像分块损伤识别结果的。这一过程不仅涉及到了多层感知器的特征提取和规范化处理等技术细节,还涉及到了分类器或回归器模型的选择和应用等高级机器学习技术。Through the detailed explanation and example of the above steps, we can clearly see how step S500 transforms the merged implicit representation into a specific image block damage recognition result. This process not only involves technical details such as feature extraction and normalization processing of multi-layer perceptrons, but also involves advanced machine learning techniques such as the selection and application of classifier or regressor models.
可选的实现方式中,所述规范化损伤特征包括y个损伤类别分别对应的支持度,y∈(1,+∞)。那么,所述步骤S540,根据所述规范化损伤特征对所述图像分块进行桥梁结构损伤识别,得到图像分块损伤识别结果,可以包括:In an optional implementation, the normalized damage feature includes the support corresponding to y damage categories, y∈(1,+∞). Then, the step S540, performing bridge structure damage identification on the image blocks according to the normalized damage feature to obtain image block damage identification results, may include:
步骤S541:在所述规范化损伤特征的y个所述支持度中确定最大的支持度。Step S541: Determine the maximum support among the y supports of the normalized damage features.
步骤S542:当所述最大的支持度大于参考支持度时,根据所述最大支持度对应的损伤类别,对所述图像分块进行桥梁结构损伤识别,得到图像分块损伤识别结果。Step S542: When the maximum support is greater than the reference support, bridge structure damage identification is performed on the image blocks according to the damage category corresponding to the maximum support to obtain image block damage identification results.
步骤S543:当所述最大的支持度不大于参考支持度时,将所述图像分块确定为图像分块损伤识别结果。Step S543: When the maximum support is not greater than the reference support, the image block is determined as an image block damage identification result.
步骤S541中,计算机系统处理的是规范化后的损伤特征,这些特征以支持度的形式表示图像分块属于不同损伤类别的可能性。支持度是一个介于0和1之间的数值,用于量化图像分块与各个损伤类别之间的匹配程度。In step S541, the computer system processes the normalized damage features, which represent the possibility of the image block belonging to different damage categories in the form of support. The support is a value between 0 and 1, which is used to quantify the degree of match between the image block and each damage category.
例如一座大型桥梁,其结构复杂,常见的损伤类型包括裂纹、锈蚀、剥落等。为了监测这些损伤,在桥梁上安装了高清摄像头,并定期采集图像进行分析。经过前面的步骤处理,已经得到了每个图像分块的规范化损伤特征,这些特征包含了对裂纹、锈蚀、剥落等y个损伤类别的支持度。For example, a large bridge has a complex structure, and common damage types include cracks, rust, and spalling. In order to monitor these damages, a high-definition camera is installed on the bridge, and images are collected regularly for analysis. After the previous steps, the normalized damage features of each image block have been obtained. These features include the support for y damage categories such as cracks, rust, and spalling.
例如,某个图像分块的规范化损伤特征可能如下所示(假设y=3,即只考虑裂纹、锈蚀、剥落三种损伤类别):For example, the normalized damage features of an image block may be as follows (assuming y=3, that is, only considering three damage categories: crack, rust, and spalling):
规范化损伤特征= [0.7, 0.2, 0.1],其中,第一个元素0.7表示该图像分块属于裂纹损伤的支持度,第二个元素0.2表示属于锈蚀损伤的支持度,第三个元素0.1表示属于剥落损伤的支持度。 在步骤S541中,计算机系统会遍历这个特征向量,找出其中的最大值,即最大的支持度。在这个例子中,最大的支持度是0.7,对应的损伤类别是裂纹。Normalized damage feature = [0.7, 0.2, 0.1], where the first element 0.7 represents the support of the image block belonging to crack damage, the second element 0.2 represents the support of corrosion damage, and the third element 0.1 represents the support of spalling damage. In step S541, the computer system will traverse this feature vector to find the maximum value, that is, the maximum support. In this example, the maximum support is 0.7, and the corresponding damage category is crack.
步骤S542中,在确定了最大的支持度之后,计算机系统将其与一个预设的参考支持度进行比较。参考支持度是一个阈值,用于判断图像分块是否存在明显的损伤迹象。只有当最大的支持度超过这个阈值时,系统才会认为图像分块存在对应类别的损伤。参考支持度的设定通常基于大量的实验数据和专家经验。它应该足够高,以确保识别的准确性,但又不能过高,以免漏检真正的损伤。在实际应用中,参考支持度可能需要根据不同的桥梁类型、图像质量、环境条件等因素进行调整。In step S542, after determining the maximum support, the computer system compares it with a preset reference support. The reference support is a threshold used to determine whether there are obvious signs of damage in the image block. Only when the maximum support exceeds this threshold will the system consider that the image block has damage of the corresponding category. The setting of the reference support is usually based on a large amount of experimental data and expert experience. It should be high enough to ensure the accuracy of recognition, but not too high to avoid missing real damage. In actual applications, the reference support may need to be adjusted according to different bridge types, image quality, environmental conditions and other factors.
假设设定的参考支持度为0.5。在步骤S542中,计算机系统会比较最大的支持度0.7与参考支持度0.5。由于0.7大于0.5,系统认为该图像分块存在明显的裂纹损伤。 因此,系统会根据最大支持度对应的损伤类别(即裂纹)对图像分块进行损伤识别,并生成相应的损伤识别结果。这个结果可以以报告的形式呈现给桥梁管理人员,指明图像分块的具体位置、损伤类型以及损伤程度(虽然在这个步骤中没有直接计算损伤程度,但系统可以基于其他信息或算法来估计)。Assume that the reference support is set to 0.5. In step S542, the computer system compares the maximum support of 0.7 with the reference support of 0.5. Since 0.7 is greater than 0.5, the system believes that there is obvious crack damage in the image block. Therefore, the system will identify the damage of the image block according to the damage category (i.e., crack) corresponding to the maximum support, and generate the corresponding damage identification result. This result can be presented to the bridge management personnel in the form of a report, indicating the specific location of the image block, the type of damage, and the degree of damage (although the degree of damage is not directly calculated in this step, the system can estimate it based on other information or algorithms).
步骤S543中,当最大的支持度不大于参考支持度时,将图像分块确定为图像分块损伤识别结果 如果最大的支持度没有超过参考支持度,说明图像分块与所有预设的损伤类别之间的匹配程度都不够高,即没有明显的损伤迹象。在这种情况下,计算机系统可以将图像分块标记为“无损伤”或“正常”,并将其作为图像分块损伤识别结果的一部分。在步骤S543中,计算机系统会比较最大的支持度0.3与参考支持度0.5,并发现0.3小于0.5。因此,系统会将该图像分块确定为“无损伤”或“正常”,并生成相应的损伤识别结果。这个结果同样会被报告给桥梁管理人员,以帮助他们了解桥梁的整体健康状况。In step S543, when the maximum support is not greater than the reference support, the image block is determined as the image block damage identification result. If the maximum support does not exceed the reference support, it means that the degree of match between the image block and all preset damage categories is not high enough, that is, there is no obvious sign of damage. In this case, the computer system can mark the image block as "no damage" or "normal" and use it as part of the image block damage identification result. In step S543, the computer system compares the maximum support of 0.3 with the reference support of 0.5, and finds that 0.3 is less than 0.5. Therefore, the system determines the image block as "no damage" or "normal" and generates the corresponding damage identification result. This result will also be reported to bridge managers to help them understand the overall health of the bridge.
本申请实施例中,前述提到的桥梁结构损伤识别模型,其训练的过程可以参考以下步骤:In the embodiment of the present application, the training process of the bridge structure damage identification model mentioned above can refer to the following steps:
步骤T100:获取待训练桥梁结构损伤识别模型、桥梁图像模板和所述桥梁图像模板对应的损伤先验标记;Step T100: obtaining a damage recognition model for a bridge structure to be trained, a bridge image template, and a damage priori mark corresponding to the bridge image template;
步骤T200:在所述待训练桥梁结构损伤识别模型中,对所述桥梁图像模板中的图像分块进行图像隐式表示提取,得到所述桥梁图像模板中的图像分块对应的t个模板图像隐式表示,其中,t∈(0,∞);t个所述模板图像隐式表示用以表示不同类型的特征,所述桥梁图像模板中的图像分块是对所述桥梁图像模板进行分块划分获得的;Step T200: in the bridge structure damage recognition model to be trained, extracting implicit representations of images from image blocks in the bridge image template, and obtaining t implicit representations of template images corresponding to the image blocks in the bridge image template, wherein t∈(0,∞); the t implicit representations of the template images are used to represent different types of features, and the image blocks in the bridge image template are obtained by dividing the bridge image template into blocks;
步骤T300:对t个所述模板图像隐式表示进行合并,得到所述桥梁图像模板中的图像分块对应的模板合并隐式表示,对所述模板合并隐式表示进行适应性特征集成,得到用以突出表征所述模板合并隐式表示的模板适应性隐式表示;Step T300: merging t implicit representations of the template images to obtain a template merged implicit representation corresponding to the image blocks in the bridge image template, and performing adaptive feature integration on the template merged implicit representation to obtain a template adaptive implicit representation for highlighting and characterizing the template merged implicit representation;
步骤T400:对所述桥梁图像模板对应的模板图像块连通性矩阵和所述桥梁图像模板中的图像分块对应的模板输出隐式表示进行逐级合并,得到所述桥梁图像模板中的图像分块对应的模板合并隐式表示;所述模板输出隐式表示是通过所述模板适应性隐式表示和所述模板合并隐式表示进行获取;所述模板图像块连通性矩阵用以表征所述桥梁图像模板中的图像分块之间的相似度;Step T400: The template image block connectivity matrix corresponding to the bridge image template and the template output implicit representation corresponding to the image blocks in the bridge image template are merged step by step to obtain the template merged implicit representation corresponding to the image blocks in the bridge image template; the template output implicit representation is obtained by the template adaptability implicit representation and the template merged implicit representation; the template image block connectivity matrix is used to characterize the similarity between the image blocks in the bridge image template;
步骤T500:根据所述模板合并隐式表示和所述损伤先验标记,对所述待训练桥梁结构损伤识别模型进行参量更新,得到桥梁结构损伤识别模型。Step T500: According to the template, the implicit representation and the damage priori mark are merged to update the parameters of the bridge structure damage identification model to be trained to obtain the bridge structure damage identification model.
步骤T100中,计算机系统首先需要准备训练所需的所有资源。这包括一个待训练的桥梁结构损伤识别模型(可以是一个基于深度学习的神经网络模型,如卷积神经网络CNN、图卷积网络GCN等)、一系列桥梁图像模板(作为训练样本)以及这些图像模板对应的损伤先验标记。In step T100, the computer system first needs to prepare all resources required for training, including a bridge structure damage recognition model to be trained (which can be a neural network model based on deep learning, such as convolutional neural network CNN, graph convolutional network GCN, etc.), a series of bridge image templates (as training samples) and the damage prior labels corresponding to these image templates.
假设采用一个结合了CNN和GCN的混合神经网络模型。CNN部分用于提取图像分块的视觉特征,而GCN部分则用于处理图像分块之间的空间关系。收集了大量包含不同桥梁结构(如悬索桥、斜拉桥、拱桥等)的高清图像。这些图像覆盖了桥梁在不同光照条件、不同视角下的外观表现。Assume that a hybrid neural network model combining CNN and GCN is used. The CNN part is used to extract visual features of image blocks, while the GCN part is used to process the spatial relationship between image blocks. A large number of high-definition images of different bridge structures (such as suspension bridges, cable-stayed bridges, arch bridges, etc.) are collected. These images cover the appearance of bridges under different lighting conditions and different viewing angles.
损伤先验标记由桥梁工程师或专家团队对每一张桥梁图像模板进行标注,明确指出图像中哪些区域存在损伤(如裂纹、锈蚀、剥落等)及其损伤类型。这些标记通常以像素级或区域级的形式存在,如通过掩码图像或边界框表示损伤区域。Damage prior marking is done by bridge engineers or expert teams to annotate each bridge image template, clearly indicating which areas in the image have damage (such as cracks, rust, spalling, etc.) and the type of damage. These markings are usually in the form of pixel level or region level, such as through mask images or bounding boxes to represent the damaged area.
步骤T200中,计算机系统利用待训练桥梁结构损伤识别模型中的特征提取模块(如CNN的前几层),对每张桥梁图像模板进行分块处理,并对每个图像分块进行图像隐式表示提取。这些隐式表示包含了图像分块在不同维度上的特征信息。In step T200, the computer system uses the feature extraction module (such as the first few layers of CNN) in the bridge structure damage recognition model to be trained to process each bridge image template into blocks, and extracts the image implicit representation for each image block. These implicit representations contain the feature information of the image block in different dimensions.
将每张桥梁图像模板划分为多个固定大小(如64×64像素)或根据内容自适应调整大小的图像分块。对于每个图像分块,通过CNN提取其纹理、颜色、形状等多种视觉特征。例如,使用多个卷积层逐步降低图像的空间分辨率,同时增加特征图的通道数,从而捕捉到更加抽象和高级的特征表示。每个图像分块经过CNN处理后,得到一个高维的特征向量(如128维或256维),该向量即为该图像分块的隐式表示。Each bridge image template is divided into multiple image blocks of fixed size (such as 64×64 pixels) or adaptively resized according to the content. For each image block, CNN is used to extract multiple visual features such as texture, color, and shape. For example, multiple convolutional layers are used to gradually reduce the spatial resolution of the image while increasing the number of channels of the feature map to capture more abstract and advanced feature representations. After each image block is processed by CNN, a high-dimensional feature vector (such as 128 or 256 dimensions) is obtained, which is the implicit representation of the image block.
步骤T300中,计算机系统将步骤T200中提取的多个模板图像隐式表示进行合并,并通过适应性特征集成技术(如自注意力机制)来强化关键特征,从而得到模板合并隐式表示和模板适应性隐式表示。In step T300, the computer system merges the implicit representations of multiple template images extracted in step T200, and strengthens key features through adaptive feature integration technology (such as self-attention mechanism), thereby obtaining template merged implicit representation and template adaptive implicit representation.
将同一图像模板中所有图像分块的隐式表示进行拼接(concatenation)或加权平均等操作,生成该图像模板的合并隐式表示。利用自注意力机制对合并隐式表示中的不同特征进行加权处理。具体来说,首先通过线性变换生成请求特征、索引特征和输出特征,然后计算请求特征与索引特征之间的相似度矩阵,并通过softmax函数进行归一化得到注意力权重。最后,使用这些注意力权重对输出特征进行加权求和,得到适应性隐式表示。The implicit representations of all image blocks in the same image template are concatenated or weighted averaged to generate a merged implicit representation of the image template. The self-attention mechanism is used to weight different features in the merged implicit representation. Specifically, the request feature, index feature, and output feature are first generated through linear transformation, and then the similarity matrix between the request feature and the index feature is calculated and normalized through the softmax function to obtain the attention weight. Finally, these attention weights are used to weight the output features to obtain the adaptive implicit representation.
步骤T400中,计算机系统利用图像块连通性矩阵(表征图像分块之间的空间关系)和模板输出隐式表示(通过模板适应性隐式表示和模板合并隐式表示获取),通过逐级合并子模块(如图卷积网络GCN)进行上下文特征提取和融合。In step T400, the computer system uses the image block connectivity matrix (characterizing the spatial relationship between image blocks) and the template output implicit representation (obtained through the template adaptability implicit representation and the template merging implicit representation) to extract and fuse contextual features by merging sub-modules step by step (such as the convolutional network GCN in the figure).
构建一个二维数组作为图像块连通性矩阵,其中的元素值表示图像分块之间的连通性(或相似性)。例如,相邻图像分块之间的连通性值较高,而不相邻分块之间的连通性值较低。将模板输出隐式表示和图像块连通性矩阵作为输入传递给GCN模型。GCN通过多层图卷积操作逐步融合图像分块之间的信息。在每一层中,GCN会根据连通性矩阵定义的图结构以及当前层的节点特征(即图像分块的隐式表示),通过聚合邻居节点的信息来更新每个节点的表示。经过多层GCN处理后,最终得到融合了上下文信息的模板合并隐式表示。Construct a two-dimensional array as the image block connectivity matrix, in which the element values represent the connectivity (or similarity) between image blocks. For example, the connectivity value between adjacent image blocks is high, while the connectivity value between non-adjacent blocks is low. Pass the template output implicit representation and the image block connectivity matrix as input to the GCN model. GCN gradually fuses the information between image blocks through multi-layer graph convolution operations. In each layer, GCN updates the representation of each node by aggregating the information of neighboring nodes according to the graph structure defined by the connectivity matrix and the node features of the current layer (that is, the implicit representation of the image block). After multi-layer GCN processing, the template merged implicit representation that integrates the context information is finally obtained.
步骤T500中,计算机系统使用模板合并隐式表示作为特征输入,结合损伤先验标记来计算损失函数(如交叉熵损失、均方误差损失等),并通过反向传播算法对模型参数进行更新优化。In step T500, the computer system uses the template to merge implicit representations as feature inputs, combines damage prior markers to calculate loss functions (such as cross entropy loss, mean square error loss, etc.), and updates and optimizes model parameters through a back-propagation algorithm.
假设采用交叉熵损失函数来衡量模型预测与真实损伤标记之间的差异。对于每个图像模板中的每个图像分块(或其对应的区域),计算预测损伤概率与真实损伤标记之间的交叉熵损失。Assume that the cross entropy loss function is used to measure the difference between the model prediction and the true damage mark. For each image block (or its corresponding region) in each image template, the cross entropy loss between the predicted damage probability and the true damage mark is calculated.
根据损失函数的梯度信息,通过反向传播算法将误差从输出层逐层反向传播到输入层,并根据梯度下降法或其他优化算法对模型参数进行更新。这一过程重复多次(即多个训练周期),直到损失函数值收敛到一个稳定的低值或达到预设的训练轮次为止。According to the gradient information of the loss function, the error is back-propagated from the output layer to the input layer layer by layer through the back-propagation algorithm, and the model parameters are updated according to the gradient descent method or other optimization algorithms. This process is repeated multiple times (i.e. multiple training cycles) until the loss function value converges to a stable low value or reaches the preset training round.
通过以上步骤的反复迭代和优化,计算机系统能够训练出一个准确且鲁棒的桥梁结构损伤识别模型。该模型能够自动从输入的桥梁图像中提取关键特征信息并融合上下文关系从而实现对桥梁结构损伤的精准识别。在实际应用中这一模型可以部署在桥梁监测系统中实时监测桥梁结构的健康状况为桥梁的维护和管理提供有力支持。Through repeated iteration and optimization of the above steps, the computer system can train an accurate and robust bridge structure damage identification model. The model can automatically extract key feature information from the input bridge image and integrate the contextual relationship to achieve accurate identification of bridge structure damage. In practical applications, this model can be deployed in the bridge monitoring system to monitor the health status of the bridge structure in real time and provide strong support for the maintenance and management of the bridge.
本申请实施例提供一种计算机系统,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法中的部分或全部步骤。An embodiment of the present application provides a computer system, including a memory and a processor, wherein the memory stores a computer program that can be executed on the processor, and when the processor executes the program, some or all of the steps in the above method are implemented.
图2为本申请实施例提供的一种计算机系统的硬件实体示意图,如图2所示,该计算机系统1000的硬件实体包括:处理器1001和存储器1002,其中,存储器1002存储有可在处理器1001上运行的计算机程序,处理器1001执行程序时实现上述任一实施例的方法中的步骤。Figure 2 is a schematic diagram of the hardware entity of a computer system provided in an embodiment of the present application. As shown in Figure 2, the hardware entity of the computer system 1000 includes: a processor 1001 and a memory 1002, wherein the memory 1002 stores a computer program that can be run on the processor 1001, and the processor 1001 implements the steps in the method of any of the above embodiments when executing the program.
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。The above is only an implementation method of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411173109.5A CN118691917B (en) | 2024-08-26 | 2024-08-26 | Bridge structure damage identification method and system based on machine vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411173109.5A CN118691917B (en) | 2024-08-26 | 2024-08-26 | Bridge structure damage identification method and system based on machine vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118691917A CN118691917A (en) | 2024-09-24 |
CN118691917B true CN118691917B (en) | 2024-11-05 |
Family
ID=92772912
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411173109.5A Active CN118691917B (en) | 2024-08-26 | 2024-08-26 | Bridge structure damage identification method and system based on machine vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118691917B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117496211A (en) * | 2022-07-20 | 2024-02-02 | 天津大学 | Method for predicting drug hepatotoxicity based on histopathological image and graph representation frame |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113628157A (en) * | 2020-05-06 | 2021-11-09 | 德克萨斯大学体系董事会 | System and method for characterizing a tumor microenvironment using pathology images |
CN113011763B (en) * | 2021-03-29 | 2024-09-17 | 华南理工大学 | Bridge damage identification method based on space-time diagram convolution attention |
CN115620270B (en) * | 2022-09-28 | 2025-01-07 | 华能伊敏煤电有限责任公司 | Bucket wheel damage judging method and system based on image recognition |
CN116704512A (en) * | 2023-06-20 | 2023-09-05 | 中国南方电网有限责任公司超高压输电公司 | A meter recognition method and system integrating semantic and visual information |
-
2024
- 2024-08-26 CN CN202411173109.5A patent/CN118691917B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117496211A (en) * | 2022-07-20 | 2024-02-02 | 天津大学 | Method for predicting drug hepatotoxicity based on histopathological image and graph representation frame |
Also Published As
Publication number | Publication date |
---|---|
CN118691917A (en) | 2024-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4532915B2 (en) | Pattern recognition learning method, pattern recognition learning device, image input device, computer program, and computer-readable recording medium | |
CN109684922B (en) | A multi-model recognition method for finished dishes based on convolutional neural network | |
CN106295124B (en) | The method of a variety of image detecting technique comprehensive analysis gene subgraph likelihood probability amounts | |
Hasan et al. | Fingerprint image enhancement and recognition algorithms: a survey | |
JP7225731B2 (en) | Imaging multivariable data sequences | |
CN118037091A (en) | Intelligent report working quality detection method and system based on computer vision technology | |
Pereira et al. | Goat leather quality classification using computer vision and machine learning | |
CN108492298A (en) | Based on the multispectral image change detecting method for generating confrontation network | |
Kailkhura et al. | Ensemble learning-based approach for crack detection using CNN | |
CN113762151A (en) | A fault data processing method, system and fault prediction method | |
CN118691917B (en) | Bridge structure damage identification method and system based on machine vision | |
Mursalin et al. | EpNet: A deep neural network for ear detection in 3D point clouds | |
Khavalko et al. | Classification and Recognition of Medical Images Based on the SGTM Neuroparadigm. | |
CN117705059A (en) | Positioning method and system for remote sensing mapping image of natural resource | |
CN111461130B (en) | High-precision image semantic segmentation algorithm model and segmentation method | |
Mitra et al. | Machine learning approach for signature recognition by harris and surf features detector | |
CN113361422A (en) | Face recognition method based on angle space loss bearing | |
CN111753723A (en) | Fingerprint identification method and device based on density calibration | |
SINGH et al. | Development of a local binary pattern descriptor for texture analysis using deep convolutional neural network | |
JP7468155B2 (en) | Method, apparatus and computer program | |
CN118277371B (en) | Intelligent image data processing method based on switch interface detection | |
JP7655115B2 (en) | Method for determining quality of inspection data using machine learning model, information processing device, and computer program | |
CN119169489B (en) | A method for identifying insulator defects | |
CN118504847B (en) | Intelligent beam field management method and system based on digital twin technology | |
CN119099406A (en) | A remote intelligent monitoring management system and method for equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |