CN119379622A

CN119379622A - Vehicle damage assessment method and device, electronic device and storage medium

Info

Publication number: CN119379622A
Application number: CN202411407089.3A
Authority: CN
Inventors: 赵霄鸿; 齐康如
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2024-10-10
Filing date: 2024-10-10
Publication date: 2025-01-28

Abstract

The embodiment of the application provides a vehicle damage assessment method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the steps of obtaining a vehicle picture to be damaged, inputting the vehicle picture to be damaged to an encoder to generate potential features, inputting the potential features to a decoder to obtain target feature vectors, converting the potential features into multi-scale features through a multi-scale adapter, inputting the multi-scale features to an auxiliary detection head to generate a plurality of candidate region feature matrixes, matching the candidate region feature matrixes with labeling true values in a one-to-many matching mode to obtain a matched target region feature matrix, and obtaining a vehicle damage assessment result corresponding to the vehicle picture to be damaged according to the target region feature matrixes and the target feature vectors. The embodiment of the application can improve the characteristic expression capability of the encoder in the DETR frame, thereby effectively improving the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle.

Description

Vehicle damage assessment method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for vehicle damage assessment, an electronic device, and a storage medium.

Background

DETR (DEtectionTRansformer) is a transducer-based end-to-end target detection network, which has been widely used in the field of computer vision. When DETR is applied to automatic damage assessment of a vehicle, target detection of vehicle damage may be translated into a collective predictive question. The process is similar to the effect of a conventional anchor point, which converts a sequence of lesion images into a sequence of sets. However, most of the research in the industry is focused on a one-to-one matching mechanism, however, experimental results show that the one-to-one matching mechanism is provided to cause unstable loss in the initial stage of model training, and a real label cannot be matched with a query stably. Moreover, DETR has to be optimized for model performance and training strategies in identifying minor injuries on vehicles. Thus, existing DETR framework applications do not have high accuracy in automatic loss assessment of vehicles.

Disclosure of Invention

The embodiment of the application mainly aims to provide a vehicle damage assessment method and device, electronic equipment and storage medium, which can improve the characteristic expression capability of an encoder in a DETR frame, thereby effectively improving the accuracy of the application of the DETR frame to a vehicle automatic damage assessment scene and enabling the DETR frame to be more suitable for vehicle damage detection.

To achieve the above object, a first aspect of an embodiment of the present application provides a vehicle damage assessment method, including:

Acquiring a picture of a vehicle to be damaged;

Inputting the image of the vehicle to be damaged to an encoder to generate potential characteristics;

inputting the potential features to a decoder to obtain target feature vectors;

converting the potential features to multi-scale features through a multi-scale adapter;

inputting the multi-scale features to an auxiliary detection head to generate a plurality of candidate region feature matrixes;

matching a plurality of candidate region feature matrixes with the labeling true value in a one-to-many matching mode to obtain a matched target region feature matrix;

And obtaining a vehicle damage assessment result corresponding to the vehicle picture to be damaged according to the target area feature matrix and the target feature vector.

In some embodiments, the method is applied to a vehicle impairment model, where the vehicle impairment model includes the encoder, the decoder, the multi-scale adapter, and the auxiliary detection head, the output end of the encoder is connected to the input end of the decoder and the input end of the multi-scale adapter, respectively, the output end of the multi-scale adapter is connected to the input end of the auxiliary detection head, the output end of the decoder and the output end of the auxiliary detection head are spliced together, the branches where the encoder and the decoder are located form a main branch, and the branches where the encoder, the multi-scale adapter, and the auxiliary detection head are located form an auxiliary branch.

In some embodiments, the training method of the vehicle impairment model comprises:

calculating a first loss function of the main branch by adopting a Hungary matching function;

calculating a second Loss function of the auxiliary branch by adopting a Focal Loss function;

determining a target loss function from the first loss function and the second loss function;

And training the vehicle loss assessment model based on the target loss function to obtain the trained vehicle loss assessment model.

In some embodiments, the inputting the to-be-damaged vehicle picture to an encoder generates a latent feature, including:

performing bilinear interpolation on the to-be-damaged vehicle picture to obtain a preprocessed picture;

inputting the preprocessed picture to an encoder to generate the latent feature.

In some embodiments, the performing bilinear interpolation on the to-be-damaged vehicle picture to obtain a preprocessed picture includes:

Determining four corner points around each pixel point in the vehicle picture to be damaged;

determining the pixel value of each pixel point according to the weighted average value of the four corner points;

And converting the image of the vehicle to be damaged into the preprocessing image according to the pixel value of each pixel point.

In some embodiments, the matching the candidate region feature matrix with the labeling truth value by using a one-to-many matching method to obtain a matched target region feature matrix includes:

Matching each candidate region feature matrix with the labeling true value one by one to obtain an overlapping area proportion value corresponding to the overlapping region of each candidate region feature matrix and the labeling true value;

And determining the candidate region feature matrix with the overlapping area proportion value larger than a preset threshold value as the target region feature matrix.

In some embodiments, the encoder is a transducer encoder, the decoder is a transducer decoder, the multi-scale adapter employs an identity mapping structure, and the auxiliary detection head selects an ats auxiliary detection head for the adaptive training samples.

To achieve the above object, a second aspect of the embodiments of the present application provides a vehicle damage assessment device, including:

the acquisition module is used for acquiring a picture of the vehicle to be damaged;

The encoding module is used for inputting the image of the vehicle to be damaged to an encoder to generate potential characteristics;

the decoding module is used for inputting the potential characteristics to a decoder to obtain target characteristic vectors;

a conversion module for converting the potential features into multi-scale features through a multi-scale adapter;

the generating module is used for inputting the multi-scale characteristics to the auxiliary detection head and generating a plurality of candidate region characteristic matrixes;

The matching module is used for matching the candidate region feature matrixes with the labeling true values in a one-to-many matching mode to obtain a matched target region feature matrix;

And the output module is used for obtaining a vehicle damage assessment result corresponding to the vehicle picture to be damaged according to the target area feature matrix and the target feature vector.

To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, including a memory storing a computer program and a processor implementing the method according to the first aspect when the processor executes the computer program.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of the first aspect.

The vehicle damage assessment method, the device, the electronic equipment and the storage medium are used for acquiring a vehicle picture to be subjected to damage assessment, inputting the vehicle picture to be subjected to damage assessment into an encoder to generate potential features, inputting the potential features into a decoder to obtain target feature vectors, converting the potential features into multi-scale features through a multi-scale adapter, inputting the multi-scale features into an auxiliary detection head to generate a plurality of candidate region feature matrixes, matching the candidate region feature matrixes with a labeling true value in a one-to-many matching mode to obtain a matched target region feature matrix, and obtaining a vehicle damage assessment result corresponding to the vehicle picture to be subjected to damage assessment according to the target region feature matrixes and the target feature vectors. According to the embodiment of the application, the auxiliary detection head is introduced on the basis of the existing DETR framework, so that potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged. Based on the method, the characteristic expression capability of the encoder in the DETR frame can be improved, so that the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle is effectively improved, and the DETR frame is more suitable for vehicle damage detection.

Drawings

FIG. 1 is a flow chart of a method for vehicle damage assessment provided by an embodiment of the present application;

Fig. 2 is a schematic structural diagram of a vehicle damage assessment model according to an embodiment of the present application;

FIG. 3 is a flowchart of a training method for a vehicle impairment model provided by an embodiment of the present application;

fig. 4 is a flowchart of step S102 in fig. 1;

fig. 5 is a flowchart of step S401 in fig. 4;

fig. 6 is a flowchart of step S106 in fig. 1;

Fig. 7 is a schematic structural diagram of a vehicle damage assessment device according to an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

First, several nouns involved in the present application are parsed:

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), a new technical science that explores, develops, simulates, extends and expands the theory, method, technology and application of human intelligence, is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that reacts in a similar way to human intelligence, including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

DETR (DEtection TRansformer) is a transducer-based end-to-end target detection model, proposed by Facebook AI RESEARCH team. It adopts a transducer architecture to convert the object detection task into a set prediction problem, i.e. predicting the class, position and number of objects by matching the input image with the encoding of the object set. The DETR is characterized by the absence of NMS post-processing steps and anchors, its performance is comparable to FASTER RCNN, and can be easily migrated to other tasks such as panorama segmentation. The DETR approach represents a new approach in the field of target detection, brings alternatives to traditional approaches, and demonstrates the potential for target detection using a transducer.

ATSS (ADAPTIVE TRAINING SAMPLE Selection), an adaptive training sample Selection method, aims to automatically select positive and negative training samples according to the statistical characteristics of the object. The core point of this approach is that in target detection, the key to the difference in performance of the Anchor-based and Anchor-free detectors is the manner in which positive and negative samples are defined. The ATSS demonstrates the essential difference between the two methods by the practice and proposes an adaptive training sample selection method to solve the difference between the Anchor-free and Anchor-based methods.

Based on the above, the embodiment of the application provides a vehicle damage assessment method and device, electronic equipment and storage medium, wherein a vehicle damage assessment method and device are provided, a vehicle picture to be subjected to damage assessment is obtained, the vehicle picture to be subjected to damage assessment is input into an encoder to generate potential features, the potential features are input into a decoder to obtain target feature vectors, the potential features are converted into multi-scale features through a multi-scale adapter, the multi-scale features are input into an auxiliary detection head to generate a plurality of candidate region feature matrices, the plurality of candidate region feature matrices are matched with labeling true values in a one-to-many matching mode to obtain a matched target region feature matrix, and a vehicle damage assessment result corresponding to the vehicle picture to be subjected to damage assessment is obtained according to the target region feature matrix and the target feature vectors. According to the embodiment of the application, the auxiliary detection head is added on the basis of the existing DETR framework, so that potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged. Based on the method, the characteristic expression capability of the encoder in the DETR frame can be improved, so that the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle is effectively improved, and the DETR frame is more suitable for vehicle damage detection.

The embodiment of the application provides a vehicle damage assessment method and device, electronic equipment and storage medium, and specifically, the following embodiment is used for explaining, and firstly describing the vehicle damage assessment method in the embodiment of the application.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a vehicle damage assessment method, and relates to the technical field of artificial intelligence. The vehicle damage assessment method provided by the embodiment of the application can be applied to the terminal, can be applied to the server side, and can also be software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., the server may be configured as an independent physical server, may be configured as a server cluster or a distributed system formed by a plurality of physical servers, and may be configured as a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligent platforms, and the software may be an application for implementing a vehicle damage assessment method, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It should be noted that, in each specific embodiment of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through popup or jump to a confirmation page and the like, and after the independent permission or independent consent of the user is definitely acquired, the necessary relevant data of the user for enabling the embodiment of the application to normally operate is acquired.

Fig. 1 is an optional flowchart of a vehicle damage assessment method according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S107.

Step S101, obtaining a picture of a vehicle to be damaged;

Step S102, inputting a picture of a vehicle to be damaged into an encoder to generate potential characteristics;

Step S103, inputting the potential features into a decoder to obtain target feature vectors;

Step S104, converting the potential features into multi-scale features through a multi-scale adapter;

Step S105, inputting the multi-scale features to an auxiliary detection head to generate a plurality of candidate region feature matrixes;

Step S106, matching the candidate regional feature matrixes with the labeling true values in a one-to-many matching mode to obtain a matched target regional feature matrix;

and step S107, obtaining a vehicle damage assessment result corresponding to the vehicle picture to be damaged according to the target area feature matrix and the target feature vector.

In step S101 of some embodiments, a picture of the vehicle to be damaged is acquired. Specifically, the image of the vehicle to be damaged uploaded by the user can be obtained from the mobile terminal or the cloud server. The image of the vehicle to be damaged can be obtained by shooting the vehicle to be damaged from different angles through a mobile device with a camera, and the vehicle to be damaged can comprise a plurality of different vehicle parts. In a vehicle automatic damage assessment scenario, a user may be at a vehicle damage self-help claim business side interface and a customer may conduct vehicle damage assessment by shooting and uploading damaged vehicle components to a vehicle damage assessment system.

In step S102 of some embodiments, a picture of the vehicle to be damaged is input to an encoder, generating a latent feature. Specifically, the vehicle picture to be damaged is input into a pre-trained encoder, which can convert the vehicle picture to be damaged into a series of values, namely potential features, which capture key information in the vehicle picture to be damaged, wherein the key information comprises damage conditions of the vehicle.

In step S103 of some embodiments, the potential feature is input to a decoder, resulting in a target feature vector. The potential features obtained from the encoder are passed as inputs to the decoder, which processes the potential features and outputs a target feature vector, which may be used to represent the type of damage and the extent of damage to the vehicle as part of the vehicle impairment detection result.

In step S104 of some embodiments, the potential features are converted to multi-scale features by a multi-scale adapter. The potential features of the encoder output may be converted to a multi-scale feature pyramid { F ₁,F₂...,F_J } using a multi-scale adapter. The multi-scale adapter may be one or more neural network layers capable of receiving potential features and outputting feature representations of different scales. The multi-scale adapter may contain convolution kernels of different sizes, pooling layers, upsampling layers, or attention mechanisms, etc., to capture features of different scales. Based on the method, the multi-scale adapter can convert potential features into multi-scale features, and the multi-scale features can capture information of different levels, so that performance and robustness of the model in processing responsible tasks are improved. In a vehicle automatic impairment scenario, utilizing multi-scale features can help the model more accurately capture key information in the vehicle image and generate more reliable prediction results.

In step S105 of some embodiments, the multi-scale features are input to an auxiliary detection head, generating a plurality of candidate region feature matrices. The multi-scale features { F ₁,F₂...,F_J } are fed into an auxiliary detection head that uses a one-to-many label distribution scheme. The embodiment of the application adds the auxiliary detection head on the basis of the existing DETR framework, so that the potential features output by the encoder can be additionally input to the auxiliary detection head to generate a plurality of candidate region feature matrixes. It should be noted that, the auxiliary detecting head may be a conventional ats detecting head, and the detecting head has a light-weight structure, so that the additional training cost is small.

In step S106 of some embodiments, a one-to-many matching method is adopted to match the multiple candidate region feature matrices with the labeling truth values, so as to obtain a target region feature matrix after matching is completed. The candidate region feature matrixes comprise partial features of the detection target candidate region, and the candidate region feature matrixes are matched with the labeling true value to obtain the target region feature matrix which is relatively close to the labeling true value.

In step S107 of some embodiments, a vehicle damage assessment result corresponding to the to-be-damaged vehicle picture is obtained according to the target region feature matrix and the target feature vector. And splicing the matched target region feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged.

The method comprises the steps of S101 to S107, namely obtaining a vehicle picture to be damaged, inputting the vehicle picture to be damaged to an encoder to generate potential features, inputting the potential features to a decoder to obtain target feature vectors, converting the potential features into multi-scale features through a multi-scale adapter, inputting the multi-scale features to an auxiliary detection head to generate a plurality of candidate region feature matrixes, matching the candidate region feature matrixes with labeling true values in a one-to-many matching mode to obtain a matched target region feature matrix, and obtaining a vehicle damage assessment result corresponding to the vehicle picture to be damaged according to the target region feature matrixes and the target feature vectors. According to the embodiment of the application, the auxiliary detection head is added on the basis of the existing DETR framework, so that potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged. Based on the method, the characteristic expression capability of the encoder in the DETR frame can be improved, so that the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle is effectively improved, and the DETR frame is more suitable for vehicle damage detection.

In some embodiments, the vehicle impairment determination method according to the embodiments of the present application may be applied to a vehicle impairment model, as shown in fig. 2, where the vehicle impairment model includes an encoder, a decoder, a multi-scale adapter, and an auxiliary detection head.

Compared with the existing DETR framework, the vehicle damage assessment model of the embodiment of the application is mainly added with a multi-scale adapter and an auxiliary detection head on the basis of the vehicle damage assessment model, so that two branches, namely a main branch which is matched one to one and an auxiliary branch which is matched one to many, are formed. Specifically, the output end of the encoder is connected to the input end of the decoder and the input end of the multi-scale adapter respectively, the output end of the multi-scale adapter is connected to the input end of the auxiliary detection head, the output end of the decoder and the output end of the auxiliary detection head are spliced together, the branches where the encoder and the decoder are located form a main branch, and the branches where the encoder, the multi-scale adapter and the auxiliary detection head are located form an auxiliary branch. The vehicle damage assessment model integrates the auxiliary head and the output of the encoder, improves the training efficiency and effect of the encoder and the decoder by utilizing one-to-many label distribution, and enables the DETR framework to be more suitable for detecting vehicle damage.

Based on the above, the vehicle damage assessment method of the embodiment of the application adds the auxiliary detection head on the basis of the existing DETR framework, so that the potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged.

In some embodiments, the vehicle loss assessment model according to the embodiments of the present application employs a transducer encoder, a transducer decoder, and a multiscale adapter employing an identity mapping structure, the auxiliary detection head selects an ats auxiliary detection head for the adaptive training samples.

Referring to fig. 3, in some embodiments, the training method of the vehicle impairment model may include, but is not limited to, steps S301 to S304:

step S301, calculating a first loss function of the main branch by adopting a Hungary matching function;

Step S302, calculating a second Loss function of the auxiliary branch by adopting a Focal local function;

Step S303, determining a target loss function according to the first loss function and the second loss function;

and step S304, training the vehicle loss assessment model based on the target loss function to obtain a trained vehicle loss assessment model.

In the training process of a vehicle impairment model, a collaborative hybrid allocation training method is employed to improve feature learning in an encoder and attention learning in a decoder.

The vehicle damage assessment model is provided with two branches, namely a main branch and an auxiliary branch, wherein the main branch adopts one-to-one matching, and the auxiliary branch adopts one-to-many matching. In the training phase, the loss function of the whole algorithm is composed of two parts, namely an original one-to-one set matching branch and a loss of a decoder layer in an auxiliary branch, and the loss function is expressed as follows:

L_global＝L_dec+αL_enc

Where L _dec denotes the first Loss function of the main branch, using the Hungary matching function, L _enc denotes the Loss of the L-th decoder layer in the auxiliary branch, using the Focal Loss function, alpha is the balance Loss coefficient, and alpha L _enc constitutes the second Loss function of the auxiliary branch. All queries in the auxiliary branch are considered positive queries. The vehicle damage assessment model after collaborative hybrid distribution training can be used for target detection and vehicle damage assessment, and a prediction result Y can be obtained after a vehicle picture X to be damaged to be determined is input into the trained vehicle damage assessment model.

Based on the above, the application provides a collaborative hybrid distribution training method suitable for automatic damage assessment of vehicles, so as to learn a higher detector based on DETR from various label distribution modes. This new training method can improve encoder feature learning by training multiple auxiliary heads supervised by one-to-many label distribution. The method performs additional customized positive queries by extracting positive sample coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder.

Referring to fig. 4, in some embodiments, step S102 may include, but is not limited to, steps S401 to S402:

step S401, bilinear interpolation processing is carried out on the vehicle picture to be estimated and damaged, and a preprocessed picture is obtained;

step S402, inputting the preprocessed picture to the encoder, and generating the latent feature.

In some embodiments, bilinear interpolation is performed on the vehicle picture to be estimated, and the bilinear interpolation has the advantages of accurate normal calculation, accurate high light field, no limitation on light sources and viewpoints and high speed. The preprocessed picture is input to the encoder, generating potential features. The encoder may convert the preprocessed picture into a series of values, i.e., potential features, that capture key information in the picture of the vehicle to be damaged, including, for example, the damage condition of the vehicle.

Referring to fig. 5, in some embodiments, step S401 may include, but is not limited to, steps S501 to S502:

step S501, four corner points around each pixel point in a vehicle picture to be damaged are determined;

step S502, determining the pixel value of each pixel point according to the weighted average value of four corner points;

step S503, converting the image of the vehicle to be damaged into a preprocessing image according to the pixel value of each pixel point.

In some embodiments, bilinear interpolation is performed on the vehicle image X to be damaged, specifically, each pixel H (X, y) on the vehicle image X to be damaged is processed.

Wherein Q11 (X1, y1, Q12 (X1, y 2), Q21 (X2, y 1), Q22 (X2, y 1) are four points around the pixel H, by which the picture is converted from the original picture (X) 1000X 667 to a new picture624×328。

Referring to fig. 6, in some embodiments, step S106 may include, but is not limited to, steps S601 to S602:

Step S601, matching each candidate region feature matrix with a labeling true value one by one to obtain an overlapping area proportion value corresponding to a region where each candidate region feature matrix and the labeling true value overlap each other;

Step S602, determining the candidate region feature matrix with the overlapping area proportion value larger than the preset threshold value as the target region feature matrix.

In some embodiments, in the one-to-many matching auxiliary branches of the vehicle damage assessment model, each candidate region feature matrix may be compared with the labeling true value one by one, so as to obtain an overlapping area proportion value corresponding to a region where each candidate region feature matrix and the labeling true value overlap each other. And the candidate area feature matrixes with the overlapping area proportion value exceeding the preset threshold value represent that the candidate area feature matrixes are relatively close to the labeling true value, so that the candidate area feature matrixes are judged to be target area feature matrixes. It should be noted that, the preset threshold may be configured according to an actual service requirement, and the specific numerical value of the preset threshold is not limited in this embodiment.

Referring to fig. 7, an embodiment of the present application further provides a vehicle damage assessment device, which can implement the vehicle damage assessment method, where the device includes:

An obtaining module 710, configured to obtain a picture of a vehicle to be damaged;

an encoding module 720, configured to input a to-be-damaged vehicle picture to an encoder, and generate a potential feature;

a decoding module 730, configured to input the potential feature to a decoder to obtain a target feature vector;

a conversion module 740 for converting the potential features into multi-scale features through the multi-scale adapter;

A generating module 750, configured to input the multi-scale features to the auxiliary detection head, and generate a plurality of candidate region feature matrices;

the matching module 760 is configured to match the plurality of candidate region feature matrices with the labeling truth value in a one-to-many matching manner, so as to obtain a target region feature matrix after matching is completed;

and the output module 770 is configured to obtain a vehicle damage assessment result corresponding to the to-be-damaged vehicle picture according to the target region feature matrix and the target feature vector.

In some embodiments of the present application, the obtaining module 710 obtains a to-be-damaged vehicle picture, the encoding module 720 inputs the to-be-damaged vehicle picture to the encoder to generate a potential feature, the decoding module 730 inputs the potential feature to the decoder to obtain a target feature vector, the converting module 740 converts the potential feature to a multi-scale feature through a multi-scale adapter, the generating module 750 inputs the multi-scale feature to the auxiliary detecting head to generate a plurality of candidate region feature matrices, the matching module 760 matches the plurality of candidate region feature matrices with labeling true values in a one-to-many matching manner to obtain a matched target region feature matrix, and the output module 770 obtains a vehicle damage determination result corresponding to the to-be-damaged vehicle picture according to the target region feature matrix and the target feature vector.

In some embodiments of the application, a picture of the vehicle to be damaged is obtained. Specifically, the image of the vehicle to be damaged uploaded by the user can be obtained from the mobile terminal or the cloud server. The image of the vehicle to be damaged can be obtained by shooting the vehicle to be damaged from different angles through a mobile device with a camera, and the vehicle to be damaged can comprise a plurality of different vehicle parts. In a vehicle automatic damage assessment scenario, a user may be at a vehicle damage self-help claim business side interface and a customer may conduct vehicle damage assessment by shooting and uploading damaged vehicle components to a vehicle damage assessment system.

In some embodiments of the application, a picture of the vehicle to be damaged is input to an encoder, generating a latent feature. Specifically, the vehicle picture to be damaged is input into a pre-trained encoder, which can convert the vehicle picture to be damaged into a series of values, namely potential features, which capture key information in the vehicle picture to be damaged, wherein the key information comprises damage conditions of the vehicle.

In some embodiments of the application, the potential features are input to a decoder to obtain the target feature vector. The potential features obtained from the encoder are passed as inputs to the decoder, which processes the potential features and outputs a target feature vector, which may be used to represent the type of damage and the extent of damage to the vehicle as part of the vehicle impairment detection result.

In some embodiments of the application, the potential features are converted to multi-scale features by a multi-scale adapter. The potential features of the encoder output may be converted to a multi-scale feature pyramid { F ₁,F₂...,F_J } using a multi-scale adapter. The multi-scale adapter may be one or more neural network layers capable of receiving potential features and outputting feature representations of different scales. The multi-scale adapter may contain convolution kernels of different sizes, pooling layers, upsampling layers, or attention mechanisms, etc., to capture features of different scales. Based on the method, the multi-scale adapter can convert potential features into multi-scale features, and the multi-scale features can capture information of different levels, so that performance and robustness of the model in processing responsible tasks are improved. In a vehicle automatic impairment scenario, utilizing multi-scale features can help the model more accurately capture key information in the vehicle image and generate more reliable prediction results.

In some embodiments of the application, the multi-scale features are input to an auxiliary detection head to generate a plurality of candidate region feature matrices. The multi-scale features { F ₁,F₂...,F_J } are fed into an auxiliary detection head that uses a one-to-many label distribution scheme. The embodiment of the application adds the auxiliary detection head on the basis of the existing DETR framework, so that the potential features output by the encoder can be additionally input to the auxiliary detection head to generate a plurality of candidate region feature matrixes. It should be noted that, the auxiliary detecting head may be a conventional ats detecting head, and the detecting head has a light-weight structure, so that the additional training cost is small.

In some embodiments of the present application, a one-to-many matching method is used to match the multiple candidate region feature matrices with the labeling truth values, so as to obtain a target region feature matrix after matching is completed. The candidate region feature matrixes comprise partial features of the detection target candidate region, and the candidate region feature matrixes are matched with the labeling true value to obtain the target region feature matrix which is relatively close to the labeling true value.

In some embodiments of the present application, a vehicle damage determination result corresponding to a to-be-damaged vehicle picture is obtained according to the target region feature matrix and the target feature vector. And splicing the matched target region feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged.

Based on this, the vehicle damage assessment device according to the embodiment of the present application obtains a vehicle picture to be damaged by the obtaining module 710, the encoding module 720 inputs the vehicle picture to be damaged to the encoder to generate a potential feature, the decoding module 730 inputs the potential feature to the decoder to obtain a target feature vector, the converting module 740 converts the potential feature to a multi-scale feature through the multi-scale adapter, the generating module 750 inputs the multi-scale feature to the auxiliary detecting head to generate a plurality of candidate region feature matrices, the matching module 760 matches the plurality of candidate region feature matrices with a labeling true value in a one-to-many matching manner to obtain a matched target region feature matrix, and the output module 770 obtains a vehicle damage assessment result corresponding to the vehicle picture to be damaged according to the target region feature matrix and the target feature vector. The method comprises the steps of obtaining a vehicle picture to be damaged, inputting the vehicle picture to be damaged to an encoder to generate potential features, inputting the potential features to a decoder to obtain target feature vectors, converting the potential features to multi-scale features through a multi-scale adapter, inputting the multi-scale features to an auxiliary detection head to generate a plurality of candidate region feature matrixes, matching the candidate region feature matrixes with labeling true values in a one-to-many matching mode to obtain a matched target region feature matrix, and obtaining a vehicle damage determination result corresponding to the vehicle picture to be damaged according to the target region feature matrixes and the target feature vectors. According to the embodiment of the application, the auxiliary detection head is introduced on the basis of the existing DETR framework, so that potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged. Based on the method, the characteristic expression capability of the encoder in the DETR frame can be improved, so that the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle is effectively improved, and the DETR frame is more suitable for vehicle damage detection.

The specific implementation of the vehicle damage assessment device is basically the same as the specific embodiment of the vehicle damage assessment method, and will not be described herein.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the vehicle damage assessment method when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 8, fig. 8 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

The processor 801 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application.

Memory 802 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM), among others. The memory 802 may store an operating system and other application programs, when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 802, and the processor 801 invokes a vehicle damage assessment method for implementing the embodiments of the present disclosure, that is, by acquiring a vehicle image to be damaged, inputting the vehicle image to be damaged to an encoder to generate potential features, inputting the potential features to a decoder to obtain a target feature vector, converting the potential features to multi-scale features by a multi-scale adapter, inputting the multi-scale features to an auxiliary detection head to generate a plurality of candidate region feature matrices, matching the plurality of candidate region feature matrices with labeling truth values in a one-to-many matching manner to obtain a target region feature matrix after matching, and obtaining a vehicle damage assessment result corresponding to the vehicle image to be damaged according to the target region feature matrix and the target feature vector. According to the embodiment of the application, the auxiliary detection head is introduced on the basis of the existing DETR framework, so that potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged. Based on the method, the characteristic expression capability of the encoder in the DETR frame can be improved, so that the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle is effectively improved, and the DETR frame is more suitable for vehicle damage detection.

An input/output interface 803 for implementing information input and output.

The communication interface 804 is configured to implement communication interaction between the device and other devices, and may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

A bus that transfers information between the various components of the device (e.g., processor 801, memory 802, input/output interface 803, and communication interface 804).

Wherein the processor 801, the memory 802, the input/output interface 803, and the communication interface 804 implement communication connection between each other inside the device through a bus.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the vehicle damage assessment method when being executed by a processor.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The vehicle damage assessment method, the vehicle damage assessment device, the electronic equipment and the storage medium provided by the embodiment of the application are characterized in that a vehicle picture to be subjected to damage assessment is obtained, the vehicle picture to be subjected to damage assessment is input into an encoder to generate potential features, the potential features are input into a decoder to obtain target feature vectors, the potential features are converted into multi-scale features through a multi-scale adapter, the multi-scale features are input into an auxiliary detection head to generate a plurality of candidate region feature matrixes, the plurality of candidate region feature matrixes are matched with labeling true values in a one-to-many matching mode to obtain a matched target region feature matrix, and a vehicle damage assessment result corresponding to the vehicle picture to be subjected to damage assessment is obtained according to the target region feature matrixes and the target feature vectors. According to the embodiment of the application, the auxiliary detection head is introduced on the basis of the existing DETR framework, so that potential features output by the encoder can be additionally input into the auxiliary detection head to generate a plurality of candidate region feature matrixes, and the candidate region feature matrixes are matched with the labeling true value in a one-to-many matching mode to obtain the matched target region feature matrix. And then splicing the matched target area feature matrix with the target feature vector output by the decoder, and finally outputting a vehicle damage assessment result corresponding to the vehicle picture to be damaged. Based on the method, the characteristic expression capability of the encoder in the DETR frame can be improved, so that the accuracy of the application of the DETR frame to the automatic damage assessment scene of the vehicle is effectively improved, and the DETR frame is more suitable for vehicle damage detection.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable programs, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable programs, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. The storage medium includes various media capable of storing programs, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method of vehicle damage assessment, the method comprising:

Acquiring a picture of a vehicle to be damaged;

inputting the potential features to a decoder to obtain target feature vectors;

2. The method according to claim 1, wherein the method is applied to a vehicle impairment model, the vehicle impairment model comprising the encoder, the decoder, the multi-scale adapter and the auxiliary detection head, the output of the encoder being connected to the input of the decoder and the input of the multi-scale adapter, respectively, the output of the multi-scale adapter being connected to the input of the auxiliary detection head, the output of the decoder and the output of the auxiliary detection head being spliced together, the branches in which the encoder and the decoder are located constituting a main branch, and the branches in which the encoder, the multi-scale adapter and the auxiliary detection head are located constituting an auxiliary branch.

3. The method of claim 2, wherein the training method of the vehicle impairment model comprises:

4. The method of claim 1, wherein the inputting the picture of the vehicle to be damaged to an encoder generates a latent feature comprising:

5. The method of claim 4, wherein performing bilinear interpolation on the to-be-damaged vehicle picture to obtain a preprocessed picture comprises:

6. The method of claim 1, wherein the matching the candidate region feature matrices with the labeling truth values in a one-to-many matching manner to obtain a matched target region feature matrix comprises:

7. The method of any of claims 1 to 6, wherein the encoder is a fransformer encoder, the decoder is a fransformer decoder, the multi-scale adapter employs an identity mapping structure, and the auxiliary detection head selects an ats auxiliary detection head for the adaptive training samples.

8. A vehicle damage assessment device, the device comprising:

9. An electronic device comprising a memory storing a computer program and a processor that when executing the computer program implements the vehicle impairment estimation method according to any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the vehicle impairment estimation method according to any one of claims 1 to 7.