[go: up one dir, main page]

CN118587456B - Image comparison method, device, electronic equipment and storage medium - Google Patents

Image comparison method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118587456B
CN118587456B CN202411056302.0A CN202411056302A CN118587456B CN 118587456 B CN118587456 B CN 118587456B CN 202411056302 A CN202411056302 A CN 202411056302A CN 118587456 B CN118587456 B CN 118587456B
Authority
CN
China
Prior art keywords
feature
image
characteristic
features
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411056302.0A
Other languages
Chinese (zh)
Other versions
CN118587456A (en
Inventor
陈畅怀
吴宇西
张翌晨
于增春
张逸扬
叶梦宇
车军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202411056302.0A priority Critical patent/CN118587456B/en
Publication of CN118587456A publication Critical patent/CN118587456A/en
Application granted granted Critical
Publication of CN118587456B publication Critical patent/CN118587456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种图像比对方法、装置、电子设备及存储介质,涉及到图像处理技术领域,包括:获取第一图像及第二图像;获取特性提示消息,其中,所述特性提示消息表示第一图像及第二图像中用于比对的目标特征类型;对所述第一图像进行多种类型特征的提取得到第一特征集合,对所述第二图像进行多种类型特征的提取得到第二特征集合;根据所述第一特征集合、所述第二特征集合、所述特性提示消息,确定图像数据比对结果。本申请实施例实现了提高图像处理的效率。

The embodiment of the present application provides an image comparison method, device, electronic device and storage medium, which relates to the field of image processing technology, including: obtaining a first image and a second image; obtaining a characteristic prompt message, wherein the characteristic prompt message indicates the target feature type used for comparison in the first image and the second image; extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set; determining the image data comparison result according to the first feature set, the second feature set and the characteristic prompt message. The embodiment of the present application improves the efficiency of image processing.

Description

Image comparison method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image comparison method, an image comparison device, an electronic device, and a storage medium.
Background
Along with the rapid development of deep learning, the analysis processing methods of target recognition and image retrieval on images become common application technical means. In the image processing process, the to-be-processed image and the processed image are often required to be subjected to feature comparison, but because the characteristics of various images are various, even in the same application scene, a plurality of models can be trained respectively for realizing feature extraction aiming at different images, so that feature comparison can be better realized. For example, in a target classification scene, when a sample image to be classified comprises images with large style differences such as natural images, cartoon images, sketches and the like, a plurality of models need to be trained for reasoning respectively aiming at the images with various styles, and in a pedestrian re-recognition scene, when the sample image to be recognized has various different problems such as incomplete human body, different imaging signals (such as infrared signals and visible light), different postures, illumination changes, definition differences and the like, a plurality of models need to be trained for searching respectively aiming at the various problems.
However, when the number of the required models is too large due to too many problem types in the same application scene, not only is the training complexity of the models too high, but also manual research and judgment are required in the selection matching between the processing model and the image to be processed, so that the efficiency of image processing is too low.
Disclosure of Invention
The embodiment of the application aims to provide an image comparison method, an image comparison device, electronic equipment and a storage medium, so as to improve the efficiency of image processing. The specific technical scheme is as follows:
In a first aspect, an embodiment of the present application provides an image comparison method, including:
Acquiring a first image and a second image;
Acquiring a characteristic prompt message, wherein the characteristic prompt message represents a target feature type for comparison in a first image and a second image;
extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set;
and determining an image data comparison result according to the first feature set, the second feature set and the characteristic prompt message.
In one embodiment of the present application, the extracting the multiple types of features from the first image to obtain a first feature set, and the extracting the multiple types of features from the second image to obtain a second feature set includes:
The method comprises the steps of inputting a first image into a first feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the first feature extraction model to obtain a plurality of first image features with different granularities, carrying out feature extraction on each first image feature by utilizing a plurality of granularity feature encoders of the first feature extraction model to obtain a first feature set, wherein one granularity feature encoder is connected with one feature extraction layer, and the granularity feature encoder is used for carrying out feature extraction on the first image features output by the feature extraction layer connected with the granularity feature encoder;
and inputting the second image into the first feature extraction model to obtain a second feature set.
In one embodiment of the present application, the extracting the multiple types of features from the first image to obtain a first feature set, and the extracting the multiple types of features from the second image to obtain a second feature set includes:
the method comprises the steps of inputting a first image into a second feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the second feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of type feature encoders of the second feature extraction model to obtain a first feature set, wherein one type feature encoder is connected with one feature extraction layer, one feature extraction layer is connected with one or more type feature encoders, and the type feature encoders are used for carrying out feature extraction on the first image features output by the feature extraction layers connected with the type feature encoders;
and inputting the second image into the second feature extraction model to obtain a second feature set.
In one embodiment of the present application, the extracting the multiple types of features from the first image to obtain a first feature set, and the extracting the multiple types of features from the second image to obtain a second feature set includes:
The method comprises the steps of inputting a first image into a third characteristic extraction model, carrying out characteristic extraction on the first image through a characteristic extraction layer in a sub-network of the third characteristic extraction model to obtain a plurality of different types of first image characteristics, and carrying out characteristic extraction on each first image characteristic by utilizing a plurality of type characteristic encoders in the sub-network to obtain a first characteristic set, wherein each sub-network comprises a characteristic extractor and a type characteristic encoder which are connected, and the type characteristic encoder is used for carrying out characteristic extraction on the first image characteristics output by the characteristic extraction layer connected with the sub-network;
and inputting the second image into the third feature extraction model to obtain a third feature set.
In one embodiment of the present application, the extracting the multiple types of features from the first image to obtain a first feature set, and the extracting the multiple types of features from the second image to obtain a second feature set includes:
Respectively inputting the first image and the second image into a visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set;
the visual large model is obtained through training in the following mode:
Selecting a sample image with first feature type features and an image characteristic expert model for extracting the first feature type features;
taking the currently selected image characteristic expert model as a teacher model, taking the vision large model as a student model, and taking the currently selected sample image as input, training the vision large model;
And selecting a sample image with the characteristics of other characteristic types, and extracting an image characteristic expert model of the other characteristic types to train the vision large model until the training of the preset various characteristic types is completed.
In one embodiment of the present application, the determining the image data comparison result according to the first feature set, the second feature set, and the characteristic hint message includes:
encoding the characteristic prompt message to obtain a first encoding characteristic;
splicing the first coding feature and the first feature set to obtain a first splicing feature, and splicing the first coding feature and the second feature set to obtain a second splicing feature;
Performing feature screening on the first spliced features through a feature screening model to obtain first screening features;
And calculating the similarity between the first screening feature and the second screening feature to obtain the image data comparison result.
In one embodiment of the present application, the determining the image data comparison result according to the first feature set, the second feature set, and the characteristic hint message includes:
Calculating a first attention weight of each type of feature in the first feature set and a second attention weight of each type of feature in the second feature set based on the feature hint message;
Weighting and fusing various types of features in the first feature set according to the first attention weight to obtain a first target feature set;
weighting and fusing all types of features in the second feature set according to the second attention weight to obtain a second target feature set;
And calculating the similarity between the first target feature set and the second target feature set to obtain an image data comparison result.
In a second aspect, an embodiment of the present application provides an image comparison method, including:
Acquiring a first image and a second image;
Acquiring a characteristic prompt message, wherein the characteristic prompt message represents a target feature type for comparison in a first image and a second image;
based on the characteristic prompt message, extracting the target feature type from the first image to obtain a third feature set, and extracting the target feature type from the second image to obtain a fourth feature set;
and determining an image data comparison result according to the third feature set and the fourth feature set.
In one embodiment of the present application, the extracting the target feature type from the first image to obtain a third feature set and the extracting the target feature type from the second image to obtain a fourth feature set based on the feature hint message includes:
determining each target type feature encoder for extracting the features of the target feature type in a second feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining the layer number N of the target feature extraction layer with the deepest layer number;
performing feature extraction on the first image by using the first N feature extraction layers of the second feature extraction model to obtain a third image feature;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
In one embodiment of the present application, the extracting the target feature type from the first image to obtain a third feature set and the extracting the target feature type from the second image to obtain a fourth feature set based on the feature hint message includes:
Determining each target type feature encoder for extracting the features of the target feature type in a third feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining an N-th subnetwork corresponding to the target feature extraction layer with the deepest layer number;
Extracting the characteristics of the first image by utilizing the characteristic extraction layers in the first N subnetworks of the third characteristic extraction model to obtain third image characteristics;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
In one embodiment of the application, the second feature extraction model comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, wherein the type of feature encoder comprises a texture feature encoder, a contour feature encoder, a pattern feature encoder, an accessory feature encoder, a component feature encoder, a color feature encoder and a global feature encoder, the first feature extraction layer is connected with the texture feature encoder and the contour feature encoder, the second feature extraction layer is connected with the pattern feature encoder and the accessory feature encoder, the third feature extraction layer is connected with the component feature encoder, and the fourth feature extraction layer is connected with the color feature encoder and the global feature encoder.
In one embodiment of the application, the third feature extraction model comprises a first sub-network for extracting texture features, a second sub-network for extracting contour features, a third sub-network for extracting pattern features, a fourth sub-network for extracting attachment features, a fifth sub-network for extracting component features, a sixth sub-network for extracting color features, and a seventh sub-network for extracting global features.
In one embodiment of the present application, the acquiring the characteristic hint message includes:
Respectively inputting the first image and the second image into an image characteristic classification model, and extracting image type information of the first image and the second image;
And determining the common feature type of the first image and the second image based on the preset corresponding relation between the image type information and the feature type, and obtaining the feature prompt message.
In one embodiment of the present application, the image data comparison result is a target re-recognition result or an image classification result.
In a third aspect, an embodiment of the present application provides an image comparing apparatus, including:
the image acquisition module is used for acquiring a first image and a second image;
the information acquisition module is used for acquiring a characteristic prompt message, wherein the characteristic prompt message represents a target feature type for comparison in a first image and a second image;
the feature extraction module is used for extracting various types of features from the first image to obtain a first feature set, and extracting various types of features from the second image to obtain a second feature set;
And the image comparison module is used for determining an image data comparison result according to the first feature set, the second feature set and the characteristic prompt message.
In one embodiment of the present application, the feature extraction module is specifically configured to:
The method comprises the steps of inputting a first image into a first feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the first feature extraction model to obtain a plurality of first image features with different granularities, carrying out feature extraction on each first image feature by utilizing a plurality of granularity feature encoders of the first feature extraction model to obtain a first feature set, wherein one granularity feature encoder is connected with one feature extraction layer, and the granularity feature encoder is used for carrying out feature extraction on the first image features output by the feature extraction layer connected with the granularity feature encoder;
and inputting the second image into the first feature extraction model to obtain a second feature set.
In one embodiment of the present application, the feature extraction module is specifically configured to:
the method comprises the steps of inputting a first image into a second feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the second feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of type feature encoders of the second feature extraction model to obtain a first feature set, wherein one type feature encoder is connected with one feature extraction layer, one feature extraction layer is connected with one or more type feature encoders, and the type feature encoders are used for carrying out feature extraction on the first image features output by the feature extraction layers connected with the type feature encoders;
and inputting the second image into the second feature extraction model to obtain a second feature set.
In one embodiment of the present application, the feature extraction module is specifically configured to:
The method comprises the steps of inputting a first image into a third characteristic extraction model, carrying out characteristic extraction on the first image through a characteristic extraction layer in a sub-network of the third characteristic extraction model to obtain a plurality of different types of first image characteristics, and carrying out characteristic extraction on each first image characteristic by utilizing a plurality of type characteristic encoders in the sub-network to obtain a first characteristic set, wherein each sub-network comprises a characteristic extractor and a type characteristic encoder which are connected, and the type characteristic encoder is used for carrying out characteristic extraction on the first image characteristics output by the characteristic extraction layer connected with the sub-network;
and inputting the second image into the third feature extraction model to obtain a third feature set.
In one embodiment of the present application, the feature extraction module is specifically configured to:
Respectively inputting the first image and the second image into a visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set;
the visual large model is obtained through training in the following mode:
Selecting a sample image with first feature type features and an image characteristic expert model for extracting the first feature type features;
taking the currently selected image characteristic expert model as a teacher model, taking the vision large model as a student model, and taking the currently selected sample image as input, training the vision large model;
And selecting a sample image with the characteristics of other characteristic types, and extracting an image characteristic expert model of the other characteristic types to train the vision large model until the training of the preset various characteristic types is completed.
In one embodiment of the present application, the image comparison module is specifically configured to:
encoding the characteristic prompt message to obtain a first encoding characteristic;
splicing the first coding feature and the first feature set to obtain a first splicing feature, and splicing the first coding feature and the second feature set to obtain a second splicing feature;
Performing feature screening on the first spliced features through a feature screening model to obtain first screening features;
And calculating the similarity between the first screening feature and the second screening feature to obtain the image data comparison result.
In one embodiment of the present application, the image comparison module is specifically configured to:
Calculating a first attention weight of each type of feature in the first feature set and a second attention weight of each type of feature in the second feature set based on the feature hint message;
Weighting and fusing various types of features in the first feature set according to the first attention weight to obtain a first target feature set;
weighting and fusing all types of features in the second feature set according to the second attention weight to obtain a second target feature set;
And calculating the similarity between the first target feature set and the second target feature set to obtain an image data comparison result.
In a fourth aspect, an embodiment of the present application provides an image comparing apparatus, including:
the image acquisition module is used for acquiring a first image and a second image;
the information acquisition module is used for acquiring a characteristic prompt message, wherein the characteristic prompt message represents a target feature type for comparison in a first image and a second image;
The feature extraction module is used for extracting the target feature type from the first image to obtain a third feature set and extracting the target feature type from the second image to obtain a fourth feature set based on the feature prompt message;
And the image comparison module is used for determining an image data comparison result according to the third characteristic set and the fourth characteristic set.
In one embodiment of the present application, the feature extraction module is specifically configured to:
determining each target type feature encoder for extracting the features of the target feature type in a second feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining the layer number N of the target feature extraction layer with the deepest layer number;
performing feature extraction on the first image by using the first N feature extraction layers of the second feature extraction model to obtain a third image feature;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
In one embodiment of the present application, the feature extraction module is specifically configured to:
Determining each target type feature encoder for extracting the features of the target feature type in a third feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining an N-th subnetwork corresponding to the target feature extraction layer with the deepest layer number;
Extracting the characteristics of the first image by utilizing the characteristic extraction layers in the first N subnetworks of the third characteristic extraction model to obtain third image characteristics;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
In one embodiment of the application, the second feature extraction model comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, wherein the type of feature encoder comprises a texture feature encoder, a contour feature encoder, a pattern feature encoder, an accessory feature encoder, a component feature encoder, a color feature encoder and a global feature encoder, the first feature extraction layer is connected with the texture feature encoder and the contour feature encoder, the second feature extraction layer is connected with the pattern feature encoder and the accessory feature encoder, the third feature extraction layer is connected with the component feature encoder, and the fourth feature extraction layer is connected with the color feature encoder and the global feature encoder.
In one embodiment of the application, the third feature extraction model comprises a first sub-network for extracting texture features, a second sub-network for extracting contour features, a third sub-network for extracting pattern features, a fourth sub-network for extracting attachment features, a fifth sub-network for extracting component features, a sixth sub-network for extracting color features, and a seventh sub-network for extracting global features.
In one embodiment of the present application, the message obtaining module is specifically configured to:
Respectively inputting the first image and the second image into an image characteristic classification model, and extracting image type information of the first image and the second image;
And determining the common feature type of the first image and the second image based on the preset corresponding relation between the image type information and the feature type, and obtaining the feature prompt message.
In one embodiment of the present application, the image data comparison result is a target re-recognition result or an image classification result.
The embodiment of the application also provides electronic equipment, which comprises:
a memory for storing a computer program;
and the processor is used for realizing any one of the image comparison methods when executing the programs stored in the memory.
The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes any one of the image comparison methods when being executed by a processor.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the image comparison methods described above.
The embodiment of the application has the beneficial effects that:
according to the image comparison method provided by the embodiment of the application, the first image, the second image and the characteristic prompt message which represents the type of the target characteristic used for comparison in the first image and the second image are obtained, the first image is subjected to extraction of multiple types of characteristics to obtain the first characteristic set, the second image is subjected to extraction of multiple types of characteristics to obtain the second characteristic set, and the image data comparison result is determined according to the first characteristic set, the second characteristic set and the characteristic prompt message. In the process of comparing the image data, proper features are screened for comparison based on the characteristic prompt message, the situations that a plurality of models are required to be trained for different types of images and matching is selected between the plurality of models and the image to be processed for processing are avoided, the complexity of image processing is reduced, and the efficiency of image processing is improved.
Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the application, and other embodiments may be obtained according to these drawings to those skilled in the art.
Fig. 1-1 is a schematic flow chart of a first image comparison method according to an embodiment of the present application;
FIGS. 1-2 are flowcharts illustrating a first image comparison method according to an embodiment of the present application;
FIG. 2-1 shows a possible implementation of step S102 according to an embodiment of the present application;
fig. 2-2 are flowcharts illustrating a second image comparison method according to an embodiment of the present application;
FIG. 3-1 shows a first possible implementation manner of step S103 according to an embodiment of the present application;
Fig. 3-2 are flowcharts illustrating a third image comparison method according to an embodiment of the present application;
FIG. 4-1 shows a second possible implementation manner of step S103 according to an embodiment of the present application;
Fig. 4-2 is a flowchart illustrating a fourth image comparison method according to an embodiment of the present application;
FIG. 5-1 shows a third possible implementation manner of step S103 according to an embodiment of the present application;
FIG. 5-2 is a flowchart illustrating a fifth image comparison method according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a sixth image comparison method according to an embodiment of the present application;
FIG. 7-1 shows a first possible implementation of step S104 according to an embodiment of the present application;
FIG. 7-2 is a flowchart illustrating a seventh image comparison method according to an embodiment of the present application;
FIG. 8-1 is a second possible implementation of step S104 provided by an embodiment of the present application;
FIG. 8-2 is a flowchart illustrating an eighth image comparison method according to an embodiment of the present application;
FIG. 9 is a flow chart of a second image comparison method according to an embodiment of the present application;
FIG. 10-1 shows a first possible implementation of step S903 provided by an embodiment of the present application;
FIG. 10-2 is a flowchart illustrating a ninth image comparison method according to an embodiment of the present application;
FIG. 11 is a second possible implementation of step S903 provided by an embodiment of the present application;
FIG. 12-1 is a flowchart illustrating a tenth image comparison method according to an embodiment of the present application;
FIG. 12-2 is a flowchart illustrating an eleventh image comparison method according to an embodiment of the present application;
FIG. 12-3 is a flowchart illustrating a twelfth image comparison method according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a first image comparing apparatus according to an embodiment of the present application;
Fig. 14 is a schematic structural diagram of a second image comparing apparatus according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.
Under the condition that the types of problems in the same application scene are too many in the related art, the number of required models is large, the training complexity of the models is too high, manual research and judgment are also needed on selection matching between the processing models and the images to be processed, and the result error of image processing is large, so that the accuracy of image processing is low. In order to solve at least one of the above problems, an embodiment of the present application provides an image comparison method, which may be performed by an electronic device, in particular, a personal terminal, a server, or the like.
Hereinafter, an image comparison method provided by the embodiment of the present application will be described in detail. Referring to fig. 1-1, fig. 1-1 provides a flowchart of a first image comparison method according to an embodiment of the present application, including:
Step S101, a first image and a second image are acquired.
The first image and the second image are image data to be compared, specifically, the first image and the second image can be marked or unmarked images respectively, for example, in an application scene of object re-identification, the first image is a query image of an object to be identified, the second image is an image in a database of marked objects, whether the objects consistent with the marked objects in the second image exist in the first image is compared, in an application scene of image classification, the first image is a label-free image to be classified, the second image is a registered image containing known classification labels, whether the classes of the first image and the second image are consistent is compared, and in an application scene of image similarity matching, the first image and the second image can be unmarked images, and whether the first image and the second image are matched is determined by comparing the similarity between the two images.
In one example, when the second image is a marked image and is expected to provide a sample comparison for an unmarked first image, the second image may be a plurality of images in a pre-processed image database, and the images are sequentially compared with the first image until a comparison result is determined.
Step S102, a characteristic prompt message is obtained, wherein the characteristic prompt message represents the target feature type used for comparison in the first image and the second image.
There are a plurality of types of features in an image, for example, in an application scene of pedestrian re-recognition, the image has the feature type examples shown in table 1, and in an application scene of image classification, the image has the feature type examples shown in table 2. Different types of features are extracted from different images with different difficulty, so that the different types of features are extracted as processing basis for different images, and the efficiency and the accuracy of image processing can be improved. For example, if color feature extraction is difficult and texture feature extraction is easy in an infrared image, image processing should be performed on the infrared image extracted texture feature, and if there is an incomplete target in the image, image processing should be performed on the image extracted target component feature.
TABLE 1
TABLE 2
When the first image and the second image are respectively different types of images, the type features which are clear, easy to extract and have commonality for the first image and the second image are extracted to be used as the basis for image comparison. For example, in an application scene of pedestrian re-recognition, when the first image is an infrared image and the second image is a visible light image, suitable features should be features of which the contour, texture and the like are irrelevant to colors and co-occur in different types of image spaces, when the first image and the second image are all human complete visible light images, suitable features should be complete features (all types of features), when the first image is a human incomplete and the second image is a human complete visible light image, suitable features should be human part features, pattern features and the like, and when the first image and the second image are respectively positive and reverse to the complete human visible light images, suitable features should be features of which the global features, human contours, clothes colors and the like co-occur under different postures/visual angles.
And acquiring characteristic prompt messages aiming at the first image and the second image, wherein the characteristic prompt messages can represent common characteristic types in the first image and the second image, namely target characteristic types for comparison of the first image and the second image.
In one example, the acquisition of the characteristic prompt message may be implemented by a preconfigured algorithm model, or may be manually selected according to the characteristic of the image and then input.
Step S103, extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set.
Step S104, determining an image data comparison result according to the first feature set, the second feature set and the characteristic prompt message.
The extraction of the multiple types of features of the first image may be that all the features in the first image are extracted, that is, complete features, only features of the target feature type may be extracted according to the feature prompt message, or after the complete features of the first image are extracted, unclear and unsuitable features may be manually screened out, and finally the first feature set is obtained.
The second image is subjected to extraction of various types of features, namely complete features, all the features in the second image can be extracted, only the features of the target feature type can be extracted according to the feature prompt message, and after the complete features of the second image are extracted, unclear and unsuitable features can be manually screened out, so that a second feature set is finally obtained. When the second image is a marked image and a sample comparison is provided for the first image which is not marked, the second image can be a plurality of images in a pre-processed image database, and the complete characteristics of the second image are stored in the image database, so that the characteristic prompt message is not required to be considered, and the complete characteristics are taken as a second characteristic set.
And comparing the first feature set with the second feature set, so as to determine an image data comparison result. By way of example, the comparison of feature sets may be achieved by means of feature comparison models, feature comparison algorithms, and the like.
In one embodiment of the application, in an application scene of target re-recognition/pedestrian re-recognition, an image data comparison result is a target re-recognition result, in an image classification application scene, an image data comparison result is an image classification result, and in an image similarity matching application scene, the image data comparison result is an image matching result. The image comparison process is illustrated in fig. 1-2.
As can be seen from the above, in the image comparison method provided by the embodiment of the present application, by acquiring the first image, the second image, and the characteristic prompt message indicating the type of the target feature used for comparison in the first image and the second image, extracting multiple types of features from the first image to obtain a first feature set, extracting multiple types of features from the second image to obtain a second feature set, and determining the image data comparison result according to the first feature set, the second feature set, and the characteristic prompt message. In the process of comparing the image data, proper features are screened for comparison based on the characteristic prompt message, the situations that a plurality of models are required to be trained for different types of images and matching is selected between the plurality of models and the image to be processed for processing are avoided, the complexity of image processing is reduced, and the efficiency of image processing is improved.
In one embodiment of the present application, as shown in fig. 2-1, the step S102 of obtaining the characteristic prompt message includes:
Step S201, respectively inputting the first image and the second image into an image characteristic classification model, and extracting image type information of the first image and the second image;
Step S202, determining a common feature type of the first image and the second image based on a preset corresponding relation between the image type information and the feature type, and obtaining the feature prompt message.
The image characteristic classification model is used for identifying the type of the image and obtaining image type information. The image type information can represent characteristics of the image, for example, whether the image is an infrared/visible light image, whether there is a salient pattern, an accessory, or the like.
The image characteristic classification model can be obtained by training a plurality of marked sample images, the marked sample images are input into the image characteristic classification model, the image type information of the sample images is output through the image characteristic classification model, the obtained image type information is compared with the marked image type information, the image characteristic classification model loss is calculated, the image characteristic classification model is adjusted by the loss until the image characteristic classification model converges, and the training is completed.
And respectively inputting the first image and the second image into an image characteristic classification model, extracting image type information of the first image and the second image, and determining a common characteristic type of the first image and the second image based on a corresponding relation between preset image type information and characteristic types, namely determining target characteristic types which are suitable for extraction of the first image and the second image, so as to obtain a characteristic prompt message. Specifically, the corresponding relation between the image type information and the feature types is preset, the feature type suitable for extracting each type of image is determined for a plurality of image types, and the corresponding relation is established for the feature types. For example, a correspondence may be established between the infrared image type and the texture features, the contour features, a correspondence may be established between the target incomplete image type and the component features, and a correspondence may be established between the visible light image type with the salient pattern and the salient pattern features.
Exemplary image type information includes at least one of subject target integrity, subject target pose, subject target color, image perspective, image light type, image sharpness, illumination information, image resolution, image style, salient patterns in images, and accessory categories.
Exemplary, image comparison flow Cheng Shi is shown, for example, in fig. 2-2.
As can be seen from the above, in the image comparison method provided by the embodiment of the application, the common feature type of the first image and the second image is determined through the image feature classification model, so as to obtain the feature prompt message, and realize the analysis of the image features. Based on the characteristic guidance provided by the comparison of the first image and the second image, only the characteristics of the target characteristic type are required to be selected from the characteristics of multiple types to be compared, so that errors easily caused by unsuitable type characteristic comparison are avoided, manual interference required to be carried out when the multiple models are selected for application is also avoided, the accuracy of image processing is improved, and the processing problem of the multiple types of images can be simultaneously solved under the condition of having the multiple models, and the efficiency of image processing is further improved.
In one embodiment of the present application, as shown in fig. 3-1, the step S103 of extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set includes:
step S301, inputting the first image into a first feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the first feature extraction model to obtain a plurality of first image features with different granularities, and carrying out feature extraction on each first image feature by utilizing a plurality of granularity feature encoders of the first feature extraction model to obtain a first feature set.
The granularity characteristic encoder is connected with a characteristic extraction layer and is used for extracting the characteristics of the first image characteristics output by the characteristic extraction layer connected with the granularity characteristic encoder;
step S302, inputting the second image into the first feature extraction model to obtain a second feature set.
The first feature extraction model can extract feature images with different granularities from feature images with different depths, wherein the different granularities indicate that the processed image range and the processed image definition are different. Specifically, different types of feature extraction require different extraction ranges and resolutions, for example, global features require a larger extraction range and have lower requirements for resolution, and texture features have lower requirements for extraction range but have higher requirements for resolution.
Inputting the first image into a first feature extraction model, extracting features of the first image (processing the first image) by utilizing a plurality of feature extraction layers of the first feature extraction model aiming at the first image with different depths, obtaining first image features with a plurality of granularities at proper image depths, extracting the first image features through a plurality of granularity feature encoders connected with the first image features, and obtaining a plurality of types of features of the first image, so as to obtain a first feature set, namely a complete feature.
The second image is input into the first feature extraction model, the second image features with multiple granularities are obtained by utilizing multiple feature extraction layers, and multiple types of features are extracted by multiple granularity feature encoders to obtain a second feature set, namely a complete feature.
The method is characterized in that after the granularity feature encoder extracts a plurality of types of features, the obtained features can be simply spliced to obtain complete features, or various feature transformation networks can be utilized to perform feature transformation on the features, such as fingering, discretization, data smoothing, normalization (standardization), numerical value and normalization, and the like, and the complete features can be obtained according to the requirements of the features obtained in practice.
Illustratively, the first feature set and the second feature set may each be a high-dimensional feature vector.
In one example, the formalized representation of the first feature extraction model extracting the complete feature may be as follows:
;
;
;
;
;
wherein, For the sake of a complete set of features,For the input images (first image, second image),Is of depth ofIs provided with a feature extraction layer of (a),For inputting imagesThrough the process ofThe feature map after the processing is performed,For a granularity 1 encoder, the meaning of other symbols and so on,Is the number of multi-granularity encoders.
Formalized representation of feature transformations based on individual features is as follows:
;
Or alternatively
;
Or alternatively
;
Or other combination transformations, are not explicitly recited herein. Wherein, AndAre feature transformation small networks.
Exemplary image comparison flows Cheng Shi for processing the first image and the second image using the first feature extraction model are shown, for example, in fig. 3-2, wherein the first feature extraction layer N1 Layers, the second feature extraction layer N2 Layers, and the third feature extraction layer N3 Layers represent three feature extraction Layers, respectively.
As can be seen from the above, in the image comparison method provided by the embodiment of the application, the features of different granularities of the first image and the second image are extracted to obtain complete features through the plurality of feature extraction layers and granularity feature encoders in the first feature extraction model, so that missing errors of feature comparison are avoided, and the accuracy of image processing is improved.
In one embodiment of the present application, as shown in fig. 4-1, the step S103 of extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set includes:
Step S401, inputting the first image into a second feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the second feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of types of feature encoders of the second feature extraction model to obtain a first feature set.
One type of feature encoder is connected with one feature extraction layer, one feature extraction layer is connected with one or more types of feature encoders, and the type of feature encoder is used for carrying out feature extraction on first image features output by the feature extraction layer connected with the type of feature encoder;
Step S402, inputting the second image into the second feature extraction model to obtain a second feature set.
The multiple feature extraction layers of the second feature extraction model are respectively used for extracting one type of feature, the first image is input into the second feature extraction model, the multiple feature extraction layers of the second feature extraction model are used for respectively extracting multiple types of features of the first image, namely first image features (processing the first image), and then the multiple types of feature encoders connected with the first image features are used for extracting the first image features to obtain multiple types of features of the first image, namely the complete feature set.
The obtaining process of the second feature set is similar to that of the first image, the second image is input into a second feature extraction model, a plurality of feature extraction layers are utilized to obtain a plurality of types of second image features, and a plurality of types of features are extracted through a plurality of types of feature encoders to obtain a second feature set, namely a complete feature.
The method is characterized in that after the various types of feature encoders extract the various types of features, the obtained features can be simply spliced to obtain complete features, or the various features can be subjected to feature transformation by utilizing various types of feature transformation small networks, such as fingering, discretization, data smoothing, normalization (standardization), numerical value, normalization and the like, and the complete features can be obtained according to the requirements of the actually obtained features.
Illustratively, the first feature set and the second feature set may each be a high-dimensional feature vector.
In one example, the formalized representation of the second feature extraction model extracting the complete feature may be as follows:
;
;
;
;
;
wherein, In the case of a texture feature,As a feature of the outline,In order for the pattern feature to be a feature,As a feature of the appendages,In order to input an image of the subject,Is of depth ofIs provided with a sub-network of (c),For inputting imagesThrough the process ofThe feature map after the processing is performed,In the case of a texture feature encoder,In the case of a contour feature encoder,In the case of a pattern feature encoder,For the appendant feature encoder, the meaning of the other symbols and so on,Is the number of multi-type encoders.
Formalized representation of feature transformations based on individual features is as follows:
;
Or alternatively
;
Or alternatively
;
Or other combination transformations, are not explicitly recited herein. Wherein, AndAre feature transformation small networks.
Exemplary image comparison flow Cheng Shi for processing the first image and the second image using the second feature extraction model is shown, for example, in fig. 4-2, wherein the first feature extraction layer N1 Layers, the second feature extraction layer N2 Layers, the third feature extraction layer N3 Layers, and the fourth feature extraction layer N4 Layers represent four feature extraction Layers, respectively.
As can be seen from the above, in the image comparison method provided by the embodiment of the present application, the plurality of feature extraction layers and the type feature encoder in the second feature extraction model extract the different types of features of the first image and the second image to obtain complete features, so that missing errors in feature comparison are avoided, and accuracy of image processing is improved.
In one embodiment of the present application, as shown in fig. 5-1, the step S103 of extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set includes:
Step S501, inputting a first image into a third feature extraction model, carrying out feature extraction on the first image through a feature extraction layer in a sub-network of the third feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of types of feature encoders in the sub-network to obtain a first feature set.
Each sub-network comprises a feature extractor and a type feature encoder which are connected, wherein the type feature encoder is used for extracting the features of the first image output by the feature extraction layer connected with the sub-network;
step S502, inputting the second image into the third feature extraction model to obtain a third feature set.
The multiple sub-networks of the third feature extraction model are respectively used for extracting one type of feature, a first image is input into the second feature extraction model, multiple types of features of the first image, namely first image features (processing the first image) are respectively extracted by utilizing feature extraction layers in the multiple sub-networks of the third feature extraction model, and then the first image features are extracted by multiple types of feature encoders connected with the feature extraction layers in the sub-networks, so that multiple types of features of the first image are obtained, and a first feature set, namely complete feature is obtained.
The obtaining process of the second feature set is similar to that of the first image, the second image is input into a third feature extraction model, multiple types of second image features are obtained by utilizing feature extraction layers in multiple sub-networks, and multiple types of features are extracted by type feature encoders in multiple sub-networks, so that a second feature set, namely complete features, is obtained.
The method is characterized in that after the various types of feature encoders extract the various types of features, the obtained features can be simply spliced to obtain complete features, or the various features can be subjected to feature transformation by utilizing various types of feature transformation small networks, such as fingering, discretization, data smoothing, normalization (standardization), numerical value, normalization and the like, and the complete features can be obtained according to the requirements of the actually obtained features.
Illustratively, the first feature set and the second feature set may each be a high-dimensional feature vector.
In one example, the formalized representation of the third feature extraction model extracting complete features may be as follows:
;
;
;
;
;
;
;
wherein, In the case of a texture feature,As a feature of the outline,In order for the pattern feature to be a feature,As a feature of the appendages,In order to input an image of the subject,In the case of a texture feature encoder,In the case of a contour feature encoder,In the case of a pattern feature encoder,For the appendant feature encoder, the meaning of the other symbols and so on,Is the number of multi-type encoders.
Formalized representation of feature transformations based on individual features is as follows:
;
Or alternatively
;
Or alternatively
;
Or other combination transformations, are not explicitly recited herein. Wherein, AndAre feature transformation small networks.
By way of example, image comparison flow Cheng Shi for processing the first image and the second image using the third feature extraction model is shown in figures 5-2,
As can be seen from the above, according to the image comparison method provided by the embodiment of the application, the feature extraction layers and the type feature encoders in the multiple sub-networks in the third feature extraction model extract the features of different types of the first image and the second image to obtain complete features, so that missing errors of feature comparison are avoided, and the accuracy of image processing is improved.
In one embodiment of the present application, the step S103 performs extraction of multiple types of features on the first image to obtain a first feature set, and performs extraction of multiple types of features on the second image to obtain a second feature set, where the steps include:
Respectively inputting the first image and the second image into a visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set;
the visual large model is obtained through training in the following mode:
A sample image is selected having a first feature type feature, and an image characteristic expert model is used to extract the first feature type feature. The first feature type is selected according to actual requirements.
Taking the currently selected image characteristic expert model as a teacher model, taking the large visual model as a student model, and taking the currently selected sample image as input, and training the large visual model.
And selecting a sample image with the characteristics of other characteristic types, and extracting an image characteristic expert model of the other characteristic types to train the vision large model until the training of the preset various characteristic types is completed.
The visual large model may be formed by a visual model obtained by image-text cross-modal contrast learning training (in an image classification application scene, for example, an EVA series visual model (a visual basic model) and an EVA02 series visual model (a visual model)), or may be obtained by a ReID (pedestrian re-recognition) feature decoupling training mode (in a pedestrian re-recognition application scene, for example, humanBench (an artificial talent model), greyReID (a visual large model), etc., a sample image training set may be formed by images with various image characteristics, image characteristic guidance may be added in the training process, and expert models with respective image characteristics (for example, an obtained non-color feature extraction model using gray pattern, an obtained feature extraction model with color information obtained using RGB image sample training, an obtained feature extraction model using RGB image sample training, and a low-resolution feature extraction model obtained using low-resolution sample training) may be used to perform guide training, so that the visual large model has the capability of extracting features.
And respectively inputting the first image and the second image into the trained visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set, namely complete features.
Exemplary, image comparison flow Cheng Shi for processing the first image and the second image using the visual large model is shown, for example, in FIG. 6.
As can be seen from the above, according to the image comparison method provided by the embodiment of the application, the multiple types of features of the first image and the second image are extracted through the trained visual large model to obtain complete features, so that missing errors of feature comparison are avoided, and the accuracy of image processing is improved.
In one embodiment of the present application, as shown in fig. 7-1, the step S104 of determining the image data comparison result according to the first feature set, the second feature set, and the feature hint message includes:
Step S701, coding the characteristic prompt message to obtain a first coding feature;
step S702, stitching the first coding feature and the first feature set to obtain a first stitching feature, and stitching the first coding feature and the second feature set to obtain a second stitching feature;
step S703, performing feature screening on the first spliced features through a feature screening model to obtain first screening features;
step S704, calculating a similarity between the first screening feature and the second screening feature, to obtain the image data comparison result.
The first encoded feature represents a target feature type required for the first image to be compared with the second image for providing guidance for feature screening. And respectively splicing the first coding features with the first feature set and the second feature set, respectively inputting the spliced first splicing features and second splicing features into a feature screening model, and respectively carrying out feature screening on the first splicing features and the second splicing features under the guidance of the first coding features through the feature screening model to obtain first screening features and second screening features of the target feature types.
The feature screening model can be an image-text multi-mode visual feature screening model, and is obtained by training sample feature pairs obtained through pre-screening in a contrast learning mode.
And calculating the similarity between the first screening feature and the second screening feature to obtain an image data comparison result. Specifically, the method can be realized by adopting a similarity function commonly used in search algorithms such as Euclidean distance, cosine similarity, mahalanobis distance and the like, and can also be realized by adopting a similarity calculation model (the similarity calculation model is obtained by training a sample feature pair marked in advance).
Exemplary, image comparison flow Cheng Shi is shown, for example, in fig. 7-2.
As can be seen from the above, in the image comparison method provided by the embodiment of the present application, the first feature set and the second feature set are respectively spliced with the encoded characteristic prompt message to obtain the features of the target feature type in the first feature set and the second feature set, and then the similarity between the two is calculated, so that the comparison of the first image and the second image is realized, the comparison of the first image and the second image can be automatically screened, the complexity of image comparison is reduced, and the efficiency of image processing is improved in a low-cost manner.
In one embodiment of the present application, as shown in fig. 8-1, the step S104 of determining the image data comparison result according to the first feature set, the second feature set, and the feature hint message includes:
Step S801, calculating first attention weights of various types of features in the first feature set and second attention weights of various types of features in the second feature set based on the feature prompt message;
step S802, carrying out weighted fusion on various types of features in the first feature set according to the first attention weight to obtain a first target feature set;
step 803, performing weighted fusion on each type of feature in the second feature set according to the second attention weight to obtain a second target feature set;
Step S804, calculating the similarity between the first target feature set and the second target feature set, to obtain an image data comparison result.
The attention weight is calculated for each type of feature based on the feature hint message, for example, a weight coefficient 2 is given to a target feature type included in the feature hint message, a weight coefficient 1 is given to a feature type which is not included in the feature hint message but considered to be easy to compare according to historical experience, a weight coefficient 0 is given to a feature type which is not included in the feature hint message and considered to be difficult to compare according to historical experience, the attention weight is calculated for each type of feature in the first feature set and the second feature set, a specific calculation mode can be set according to actual requirements, and the attention weight can be obtained by a pre-trained feature importance estimation model (by training a sample feature image marked with a plurality of feature attention weights) by an exemplary method.
And respectively carrying out weighted fusion on various types of features in the first feature set and the second feature set according to the first attention weight and the second attention weight to obtain a first target feature set and a second target feature set which only comprise the type features required by comparison, and calculating the similarity between the first target feature set and the second target feature set to obtain an image data comparison result. Specifically, the method can be realized by adopting a similarity function commonly used in search algorithms such as Euclidean distance, cosine similarity, mahalanobis distance and the like, and can also be realized by adopting a similarity calculation model (the similarity calculation model is obtained by training a sample feature pair marked in advance).
In the image classification application scene, if the second image is a plurality of registered images with classified labels, the image data comparison result indicates that the classification of the first image is the class label of the second image with the highest similarity.
Exemplary, image comparison flow Cheng Shi is shown, for example, in fig. 8-2.
As can be seen from the above, in the image comparison method provided by the embodiment of the present application, by calculating the attention weights of the first feature set and the second feature set, and respectively performing weighted fusion on the first feature set and the second feature set, the first target feature set and the second target feature set, which only include the required feature types, are obtained, and then the similarity between the first feature set and the second feature set is calculated, so that the comparison between the first image and the second image is realized, and the comparison between the first image and the second image can be automatically screened, so that the complexity of image comparison is reduced, and the efficiency of image processing is improved in a low-cost manner.
Referring to fig. 9, the embodiment of the application further provides a flowchart of an image comparison method, which includes:
Step S901, acquiring a first image and a second image;
step S902, acquiring a characteristic prompt message, wherein the characteristic prompt message represents a target feature type for comparison in a first image and a second image;
Step S903, based on the characteristic prompt message, extracting a target feature type from the first image to obtain a third feature set, and extracting a target feature type from the second image to obtain a fourth feature set;
step S904, determining an image data comparison result according to the third feature set and the fourth feature set.
And directly extracting target feature types of the first image and the second image after the characteristic prompt message is obtained, obtaining a third feature set and a fourth feature set, and determining an image data comparison result based on the third feature set and the fourth feature set.
As can be seen from the above, in the image comparison method provided by the embodiment of the present application, by acquiring the first image, the second image, and the characteristic prompt message indicating the target feature type for comparison in the first image and the second image, after the characteristic prompt message is obtained, the first image and the second image are extracted to obtain the third feature set and the fourth feature set, and the image data comparison result is determined based on the third feature set and the fourth feature set. In the process of comparing the image data, proper features are screened for comparison based on the characteristic prompt message, the situations that a plurality of models are required to be trained for different types of images and matching is selected between the plurality of models and the image to be processed for processing are avoided, the complexity of image processing is reduced, and the efficiency of image processing is improved.
In one embodiment of the present application, as shown in fig. 10-1, the step S903, based on the characteristic prompt message, performs extraction of a target feature type on the first image to obtain a third feature set, and performs extraction of a target feature type on the second image to obtain a fourth feature set, where the step includes:
Step S1001, determining each target type feature encoder for extracting the feature of the target feature type in the second feature extraction model according to the feature prompt message;
step S1002, determining target feature extraction layers connected to each target type feature encoder, and determining the number N of the target feature extraction layers with the deepest number of layers;
step S1003, performing feature extraction on the first image by using the first N feature extraction layers of the second feature extraction model to obtain a third image feature;
And step S1004, carrying out feature extraction on the third image features by using the target type feature encoder to obtain a third feature set.
The feature extraction layers of the second feature extraction model are respectively used for extracting one type of feature, and as different types of feature encoders have independent structures and are input into different depth maps of the backbone network, after determining which type of feature needs to be used, the backbone network reasoning process can be stopped early after being executed until the depth corresponding to the necessary feature map is output, and the target feature encoder of the necessary type is also only required to be executed.
Early stop refers to executing only the necessary feature extraction reasoning process, stopping the complete model reasoning process in advance. For example, if the characteristic hint message indicates that only texture features and contour features need to be extracted, the feature extraction process may only execute the texture feature encoder and the contour feature encoder corresponding to N1 layers, and if the characteristic hint message indicates that only pattern feature encoder and component feature encoder need to be extracted, the feature extraction process only executes the pattern feature encoder and the component feature encoder corresponding to N1 layers, N2 layers, and N3 layers.
According to the characteristic prompt message, determining each target type feature encoder for extracting the features of the target feature type in the second feature extraction model, determining target feature extraction layers connected with each target type feature encoder, and determining the layer number N of the target feature extraction layer with the deepest layer number.
Inputting the first image into a second feature extraction model, respectively extracting a third image feature (processing the first image) which is a target type feature of the first image by utilizing the first N feature extraction layers of the second feature extraction model, extracting the third image feature through a target type feature encoder connected with the third image feature to obtain a feature of the target type of the first image, and obtaining a third feature set.
The fourth feature set is obtained in a similar manner to the third feature set.
For example, when the second image is a marked image, its complete features are extracted in advance and stored in the image database, the fourth feature set may also be the complete features obtained from the image database, and only the feature extraction of the first image that is not marked is performed early stop.
In one embodiment of the application, the second feature extraction model comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, wherein the type of feature encoder comprises a texture feature encoder, a contour feature encoder, a pattern feature encoder, an accessory feature encoder, a component feature encoder, a color feature encoder and a global feature encoder, the first feature extraction layer is connected with the texture feature encoder and the contour feature encoder, the second feature extraction layer is connected with the pattern feature encoder and the accessory feature encoder, the third feature extraction layer is connected with the component feature encoder, and the fourth feature extraction layer is connected with the color feature encoder and the global feature encoder.
Exemplary, image comparison flow Cheng Shi is shown, for example, in FIG. 10-2.
It can be seen from the above that, according to the image comparison method provided by the embodiment of the application, each target type feature encoder for extracting the features of the target feature type in the second feature extraction model is determined based on the feature prompt message, the target feature extraction layers connected with each target type feature encoder are respectively determined, the number of layers N of the target feature extraction layers with the deepest layer is determined, and the first image is subjected to feature extraction by using the first N layer feature extraction layers of the second feature extraction model to obtain the third image feature. Through the strategy of stopping as early as needed in the feature extraction process based on the target type features, the computing resources of feature extraction are saved, the processing cost is reduced, and the efficiency of feature extraction and even image processing is improved.
In one embodiment of the present application, as shown in fig. 11, the step S903, based on the characteristic prompt message, performs extraction of a target feature type on the first image to obtain a third feature set, and performs extraction of a target feature type on the second image to obtain a fourth feature set, where the step includes:
Step S1101, determining each target type feature encoder for extracting the feature of the target feature type in the third feature extraction model according to the feature prompt message;
Step S1102, determining target feature extraction layers connected to each target type feature encoder, and determining an nth sub-network corresponding to the target feature extraction layer with the deepest layer number;
step S1103, performing feature extraction on the first image by using feature extraction layers in the first N sub-networks of the third feature extraction model to obtain a third image feature;
and step S1104, performing feature extraction on the third image feature by using the object type feature encoder to obtain a third feature set.
The multiple sub-networks of the third feature extraction model are respectively used for extracting one type of feature, and as the sub-networks of different types have independent structures and are input into different depth maps of the main network, after determining which type of feature needs to be used, the main network reasoning process can be stopped early after being executed until the depth corresponding to the output necessary feature map is output, and the target feature encoder in the sub-network of the necessary type is also only needed to be executed.
Early stop refers to executing only the necessary feature extraction reasoning process, stopping the complete model reasoning process in advance. For example, if the characteristic hint message indicates that only texture features need to be extracted, the feature extraction process may only be performed by the texture feature encoder in the N1 layers sub-network, and if the characteristic hint message indicates that only pattern feature encoder and component feature encoder need to be extracted, the feature extraction process may only be performed by the pattern feature encoder and the component feature encoder in the N1 layers, N2 layers sub-network.
According to the characteristic prompt message, determining each target type feature encoder for extracting the features of the target feature type in the third feature extraction model, determining the sub-network corresponding to the target feature extraction layer connected with each target type feature encoder, and determining the Nth sub-network corresponding to the target feature extraction layer with the deepest layer number.
Inputting the first image into a third feature extraction model, respectively extracting target type features of the first image, namely third image features (processing the first image) by using target feature extraction layers in the first N subnetworks of the third feature extraction model, extracting the third image features through a target type feature encoder to obtain the features of the target feature types of the first image, and obtaining a third feature set.
The fourth feature set is obtained in a similar manner to the third feature set.
For example, when the second image is a marked image, its complete features are extracted in advance and stored in the image database, the fourth feature set may also be the complete features obtained from the image database, and only the feature extraction of the first image that is not marked is performed early stop.
In one embodiment of the application, the third feature extraction model comprises a first sub-network for extracting texture features, a second sub-network for extracting contour features, a third sub-network for extracting pattern features, a fourth sub-network for extracting attachment features, a fifth sub-network for extracting component features, a sixth sub-network for extracting color features, and a seventh sub-network for extracting global features.
It can be seen that, according to the image comparison method provided by the embodiment of the application, each target type feature encoder for extracting the features of the target feature type in the second feature extraction model is determined based on the feature prompt message, the Nth sub-network corresponding to the target feature extraction layer connected by each target type feature encoder is respectively determined, the Nth sub-network corresponding to the target feature extraction layer with the deepest layer number is determined, and the feature extraction is performed on the first image by using the target feature extraction layer in the first N sub-networks of the second feature extraction model to obtain the third image feature. Through the strategy of stopping as early as needed in the feature extraction process based on the target type features, the computing resources of feature extraction are saved, the processing cost is reduced, and the efficiency of feature extraction and even image processing is improved.
For example, in the image classification application scenario, the flow example graph of image comparison may be as shown in fig. 12-1, fig. 12-2, and fig. 12-3, where after obtaining the features, a final image classification result is obtained by using a classifier.
In one embodiment of the present application, as shown in fig. 13, there is provided a schematic structural diagram of a first image comparing apparatus, including:
an image acquisition module 1301, configured to acquire a first image and a second image;
a message obtaining module 1302, configured to obtain a characteristic prompt message, where the characteristic prompt message indicates a target feature type for comparison in the first image and the second image;
the feature extraction module 1303 is configured to extract multiple types of features from the first image to obtain a first feature set, and extract multiple types of features from the second image to obtain a second feature set;
An image comparison module 1304, configured to determine an image data comparison result according to the first feature set, the second feature set, and the characteristic hint message.
As can be seen from the above, in the image comparison device provided by the embodiment of the present application, by acquiring the first image, the second image, and the characteristic prompt message indicating the type of the target feature used for comparison in the first image and the second image, extracting multiple types of features from the first image to obtain a first feature set, extracting multiple types of features from the second image to obtain a second feature set, and determining the image data comparison result according to the first feature set, the second feature set, and the characteristic prompt message. In the process of comparing the image data, proper features are screened for comparison based on the characteristic prompt message, the situations that a plurality of models are required to be trained for different types of images and matching is selected between the plurality of models and the image to be processed for processing are avoided, the complexity of image processing is reduced, and the efficiency of image processing is improved.
In one embodiment of the present application, the feature extraction module 1303 is specifically configured to:
The method comprises the steps of inputting a first image into a first feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the first feature extraction model to obtain a plurality of first image features with different granularities, carrying out feature extraction on each first image feature by utilizing a plurality of granularity feature encoders of the first feature extraction model to obtain a first feature set, wherein one granularity feature encoder is connected with one feature extraction layer, and the granularity feature encoder is used for carrying out feature extraction on the first image features output by the feature extraction layer connected with the granularity feature encoder;
and inputting the second image into the first feature extraction model to obtain a second feature set.
As can be seen from the above, the image comparison device provided by the embodiment of the application extracts the features of different granularities of the first image and the second image to obtain complete features through the plurality of feature extraction layers and granularity feature encoders in the first feature extraction model, thereby avoiding missing errors in feature comparison and improving the accuracy of image processing.
In one embodiment of the present application, the feature extraction module 1303 is specifically configured to:
the method comprises the steps of inputting a first image into a second feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the second feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of type feature encoders of the second feature extraction model to obtain a first feature set, wherein one type feature encoder is connected with one feature extraction layer, one feature extraction layer is connected with one or more type feature encoders, and the type feature encoders are used for carrying out feature extraction on the first image features output by the feature extraction layers connected with the type feature encoders;
and inputting the second image into the second feature extraction model to obtain a second feature set.
As can be seen from the above, the image comparison device provided by the embodiment of the application extracts the different types of features of the first image and the second image to obtain the complete features through the plurality of feature extraction layers and the type feature encoder in the second feature extraction model, thereby avoiding missing errors in feature comparison and improving the accuracy of image processing.
In one embodiment of the present application, the feature extraction module 1303 is specifically configured to:
The method comprises the steps of inputting a first image into a third characteristic extraction model, carrying out characteristic extraction on the first image through a characteristic extraction layer in a sub-network of the third characteristic extraction model to obtain a plurality of different types of first image characteristics, and carrying out characteristic extraction on each first image characteristic by utilizing a plurality of type characteristic encoders in the sub-network to obtain a first characteristic set, wherein each sub-network comprises a characteristic extractor and a type characteristic encoder which are connected, and the type characteristic encoder is used for carrying out characteristic extraction on the first image characteristics output by the characteristic extraction layer connected with the sub-network;
and inputting the second image into the third feature extraction model to obtain a third feature set.
As can be seen from the above, the image comparison device provided by the embodiment of the application extracts the different types of features of the first image and the second image to obtain complete features through the feature extraction layers and the type feature encoders in the multiple sub-networks in the third feature extraction model, thereby avoiding missing errors in feature comparison and improving the accuracy of image processing.
In one embodiment of the present application, the feature extraction module 1303 is specifically configured to:
Respectively inputting the first image and the second image into a visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set;
the visual large model is obtained through training in the following mode:
Selecting a sample image with first feature type features and an image characteristic expert model for extracting the first feature type features;
taking the currently selected image characteristic expert model as a teacher model, taking the vision large model as a student model, and taking the currently selected sample image as input, training the vision large model;
And selecting a sample image with the characteristics of other characteristic types, and extracting an image characteristic expert model of the other characteristic types to train the vision large model until the training of the preset various characteristic types is completed.
As can be seen from the above, the image comparison device provided by the embodiment of the application extracts multiple types of features of the first image and the second image through the trained visual large model to obtain complete features, thereby avoiding missing errors of feature comparison and improving the accuracy of image processing.
In one embodiment of the present application, the image comparison module 1304 is specifically configured to:
encoding the characteristic prompt message to obtain a first encoding characteristic;
splicing the first coding feature and the first feature set to obtain a first splicing feature, and splicing the first coding feature and the second feature set to obtain a second splicing feature;
Performing feature screening on the first spliced features through a feature screening model to obtain first screening features;
And calculating the similarity between the first screening feature and the second screening feature to obtain the image data comparison result.
As can be seen from the above, in the image comparison device provided by the embodiment of the present application, the first feature set and the second feature set are respectively spliced with the encoded characteristic prompt message to obtain the features of the target feature type in the first feature set and the second feature set, and then the similarity between the two is calculated, so that the comparison between the first image and the second image is realized, the comparison of automatically screening suitable features can be realized, the complexity of image comparison is reduced, and the efficiency of image processing is improved in a low-cost manner.
In one embodiment of the present application, the image comparison module 1304 is specifically configured to:
Calculating a first attention weight of each type of feature in the first feature set and a second attention weight of each type of feature in the second feature set based on the feature hint message;
Weighting and fusing various types of features in the first feature set according to the first attention weight to obtain a first target feature set;
weighting and fusing all types of features in the second feature set according to the second attention weight to obtain a second target feature set;
And calculating the similarity between the first target feature set and the second target feature set to obtain an image data comparison result.
As can be seen from the above, in the image comparison device provided by the embodiment of the present application, by calculating the attention weights of the first feature set and the second feature set, and respectively performing weighted fusion on the first feature set and the second feature set, the first target feature set and the second target feature set, which only include the required feature types, are obtained, and then the similarity between the first feature set and the second feature set is calculated, so that the comparison between the first image and the second image is realized, and the comparison between the first image and the second image can be automatically screened, so that the complexity of image comparison is reduced, and the efficiency of image processing is improved in a low-cost manner.
In one embodiment of the present application, as shown in fig. 14, there is provided a schematic structural diagram of a second image comparing apparatus, including:
an image acquisition module 1401 for acquiring a first image and a second image;
a message obtaining module 1402, configured to obtain a characteristic prompt message, where the characteristic prompt message indicates a target feature type for comparison in the first image and the second image;
a feature extraction module 1403, configured to extract a target feature type from the first image to obtain a third feature set, and extract a target feature type from the second image to obtain a fourth feature set based on the feature hint message;
The image comparison module 1404 is configured to determine an image data comparison result according to the third feature set and the fourth feature set.
As can be seen from the above, in the image comparison device provided by the embodiment of the present application, the first image, the second image, and the characteristic prompt message indicating the target feature type used for comparison in the first image and the second image are obtained, after the characteristic prompt message is obtained, the first image and the second image are extracted to obtain the third feature set and the fourth feature set, and the image data comparison result is determined based on the third feature set and the fourth feature set. In the process of comparing the image data, proper features are screened for comparison based on the characteristic prompt message, the situations that a plurality of models are required to be trained for different types of images and matching is selected between the plurality of models and the image to be processed for processing are avoided, the complexity of image processing is reduced, and the efficiency of image processing is improved.
In one embodiment of the present application, the feature extraction module 1403 is specifically configured to:
determining each target type feature encoder for extracting the features of the target feature type in a second feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining the layer number N of the target feature extraction layer with the deepest layer number;
performing feature extraction on the first image by using the first N feature extraction layers of the second feature extraction model to obtain a third image feature;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
It can be seen from the above that the image comparison device provided by the embodiment of the application determines each target type feature encoder for extracting the features of the target feature type in the second feature extraction model based on the feature prompt message, determines the target feature extraction layers connected with each target type feature encoder, determines the number of layers N of the target feature extraction layer with the deepest layer number, and performs feature extraction on the first image by using the first N layer feature extraction layers of the second feature extraction model to obtain the third image feature. Through the strategy of stopping as early as needed in the feature extraction process based on the target type features, the computing resources of feature extraction are saved, the processing cost is reduced, and the efficiency of feature extraction and even image processing is improved.
In one embodiment of the present application, the feature extraction module 1403 is specifically configured to:
Determining each target type feature encoder for extracting the features of the target feature type in a third feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining an N-th subnetwork corresponding to the target feature extraction layer with the deepest layer number;
Extracting the characteristics of the first image by utilizing the characteristic extraction layers in the first N subnetworks of the third characteristic extraction model to obtain third image characteristics;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
It can be seen from the above that, the image comparison device provided by the embodiment of the application determines each target type feature encoder for extracting the features of the target feature type in the second feature extraction model based on the feature prompt message, determines the nth sub-network corresponding to the target feature extraction layer connected to each target type feature encoder, determines the nth sub-network corresponding to the target feature extraction layer with the deepest layer number, and performs feature extraction on the first image by using the target feature extraction layer in the first N sub-networks of the second feature extraction model to obtain the third image feature. Through the strategy of stopping as early as needed in the feature extraction process based on the target type features, the computing resources of feature extraction are saved, the processing cost is reduced, and the efficiency of feature extraction and even image processing is improved.
In one embodiment of the application, the second feature extraction model comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, wherein the type of feature encoder comprises a texture feature encoder, a contour feature encoder, a pattern feature encoder, an accessory feature encoder, a component feature encoder, a color feature encoder and a global feature encoder, the first feature extraction layer is connected with the texture feature encoder and the contour feature encoder, the second feature extraction layer is connected with the pattern feature encoder and the accessory feature encoder, the third feature extraction layer is connected with the component feature encoder, and the fourth feature extraction layer is connected with the color feature encoder and the global feature encoder.
In one embodiment of the application, the third feature extraction model comprises a first sub-network for extracting texture features, a second sub-network for extracting contour features, a third sub-network for extracting pattern features, a fourth sub-network for extracting attachment features, a fifth sub-network for extracting component features, a sixth sub-network for extracting color features, and a seventh sub-network for extracting global features.
In one embodiment of the present application, the message obtaining module 1402 is specifically configured to:
Respectively inputting the first image and the second image into an image characteristic classification model, and extracting image type information of the first image and the second image;
And determining the common feature type of the first image and the second image based on the preset corresponding relation between the image type information and the feature type, and obtaining the feature prompt message.
As can be seen from the above, the image comparison device provided by the embodiment of the application determines the common feature type of the first image and the second image through the image feature classification model, obtains the feature prompt message, and realizes the analysis of the image features. Based on the characteristic guidance provided by the comparison of the first image and the second image, only the characteristics of the target characteristic type are required to be selected from the characteristics of multiple types to be compared, so that errors easily caused by unsuitable type characteristic comparison are avoided, manual interference required to be carried out when the multiple models are selected for application is also avoided, the accuracy of image processing is improved, and the processing problem of the multiple types of images can be simultaneously solved under the condition of having the multiple models, and the efficiency of image processing is further improved.
In one embodiment of the present application, the image data comparison result is a target re-recognition result or an image classification result.
Note that the mannequin in this embodiment is not a mannequin for a specific user, and cannot reflect personal information of a specific user.
The embodiment of the application also provides an electronic device, as shown in fig. 15, including:
a memory 1501 for storing a computer program;
The processor 1502 is configured to implement any one of the image comparison methods described above when executing the program stored in the memory 1501.
And the electronic device may further comprise a communication bus and/or a communication interface, through which the processor 1502, the communication interface, and the memory 1501 communicate with each other.
The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the image comparison methods described above.
In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the image comparison methods of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a Solid state disk (Solid STATE DISK, SSD), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (18)

1. An image comparison method, comprising:
Acquiring a first image and a second image;
The characteristic prompting message is obtained, wherein the characteristic prompting message represents a target feature type used for comparison in a first image and a second image, and comprises the steps of respectively inputting the first image and the second image into an image characteristic classification model, extracting image type information of the first image and the second image, determining a common feature type of the first image and the second image based on a corresponding relation between preset image type information and feature type, and obtaining the characteristic prompting message, wherein the corresponding relation between preset image type information and feature type comprises a corresponding relation between infrared image type, texture feature and contour feature, a corresponding relation between incomplete image type of a target and part feature, and a corresponding relation between an image type of visible light and a significant pattern feature;
extracting multiple types of features from the first image to obtain a first feature set, and extracting multiple types of features from the second image to obtain a second feature set;
and determining an image data comparison result according to the first feature set, the second feature set and the characteristic prompt message.
2. The method of claim 1, wherein the extracting the plurality of types of features from the first image to obtain a first feature set, and the extracting the plurality of types of features from the second image to obtain a second feature set, comprises:
The method comprises the steps of inputting a first image into a first feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the first feature extraction model to obtain a plurality of first image features with different granularities, carrying out feature extraction on each first image feature by utilizing a plurality of granularity feature encoders of the first feature extraction model to obtain a first feature set, wherein one granularity feature encoder is connected with one feature extraction layer, and the granularity feature encoder is used for carrying out feature extraction on the first image features output by the feature extraction layer connected with the granularity feature encoder;
and inputting the second image into the first feature extraction model to obtain a second feature set.
3. The method of claim 1, wherein the extracting the plurality of types of features from the first image to obtain a first feature set, and the extracting the plurality of types of features from the second image to obtain a second feature set, comprises:
the method comprises the steps of inputting a first image into a second feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the second feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of type feature encoders of the second feature extraction model to obtain a first feature set, wherein one type feature encoder is connected with one feature extraction layer, one feature extraction layer is connected with one or more type feature encoders, and the type feature encoders are used for carrying out feature extraction on the first image features output by the feature extraction layers connected with the type feature encoders;
and inputting the second image into the second feature extraction model to obtain a second feature set.
4. The method of claim 1, wherein the extracting the plurality of types of features from the first image to obtain a first feature set, and the extracting the plurality of types of features from the second image to obtain a second feature set, comprises:
The method comprises the steps of inputting a first image into a third characteristic extraction model, carrying out characteristic extraction on the first image through a characteristic extraction layer in a sub-network of the third characteristic extraction model to obtain a plurality of different types of first image characteristics, and carrying out characteristic extraction on each first image characteristic by utilizing a plurality of type characteristic encoders in the sub-network to obtain a first characteristic set, wherein each sub-network comprises a characteristic extractor and a type characteristic encoder which are connected, and the type characteristic encoder is used for carrying out characteristic extraction on the first image characteristics output by the characteristic extraction layer connected with the sub-network;
and inputting the second image into the third feature extraction model to obtain a third feature set.
5. The method of claim 1, wherein the extracting the plurality of types of features from the first image to obtain a first feature set, and the extracting the plurality of types of features from the second image to obtain a second feature set, comprises:
Respectively inputting the first image and the second image into a visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set;
the visual large model is obtained through training in the following mode:
Selecting a sample image with first feature type features and an image characteristic expert model for extracting the first feature type features;
taking the currently selected image characteristic expert model as a teacher model, taking the vision large model as a student model, and taking the currently selected sample image as input, training the vision large model;
And selecting a sample image with the characteristics of other characteristic types, and extracting an image characteristic expert model of the other characteristic types to train the vision large model until the training of the preset various characteristic types is completed.
6. The method according to claim 1, wherein determining an image data comparison result according to the first feature set, the second feature set, and the characteristic hint message comprises:
encoding the characteristic prompt message to obtain a first encoding characteristic;
splicing the first coding feature and the first feature set to obtain a first splicing feature, and splicing the first coding feature and the second feature set to obtain a second splicing feature;
Performing feature screening on the first spliced features through a feature screening model to obtain first screening features;
And calculating the similarity between the first screening feature and the second screening feature to obtain the image data comparison result.
7. The method according to claim 1, wherein determining an image data comparison result according to the first feature set, the second feature set, and the characteristic hint message comprises:
Calculating a first attention weight of each type of feature in the first feature set and a second attention weight of each type of feature in the second feature set based on the feature hint message;
Weighting and fusing various types of features in the first feature set according to the first attention weight to obtain a first target feature set;
weighting and fusing all types of features in the second feature set according to the second attention weight to obtain a second target feature set;
And calculating the similarity between the first target feature set and the second target feature set to obtain an image data comparison result.
8. An image comparison method, comprising:
Acquiring a first image and a second image;
The characteristic prompting message is obtained, wherein the characteristic prompting message represents a target feature type used for comparison in a first image and a second image, and comprises the steps of respectively inputting the first image and the second image into an image characteristic classification model, extracting image type information of the first image and the second image, determining a common feature type of the first image and the second image based on a corresponding relation between preset image type information and feature type, and obtaining the characteristic prompting message, wherein the corresponding relation between preset image type information and feature type comprises a corresponding relation between infrared image type, texture feature and contour feature, a corresponding relation between incomplete image type of a target and part feature, and a corresponding relation between an image type of visible light and a significant pattern feature;
based on the characteristic prompt message, extracting the target feature type from the first image to obtain a third feature set, and extracting the target feature type from the second image to obtain a fourth feature set;
and determining an image data comparison result according to the third feature set and the fourth feature set.
9. The method of claim 8, wherein the extracting the target feature type from the first image to obtain a third feature set and the extracting the target feature type from the second image to obtain a fourth feature set based on the feature hint message comprises:
determining each target type feature encoder for extracting the features of the target feature type in a second feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining the layer number N of the target feature extraction layer with the deepest layer number;
performing feature extraction on the first image by using the first N feature extraction layers of the second feature extraction model to obtain a third image feature;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
10. The method of claim 8, wherein the extracting the target feature type from the first image to obtain a third feature set and the extracting the target feature type from the second image to obtain a fourth feature set based on the feature hint message comprises:
Determining each target type feature encoder for extracting the features of the target feature type in a third feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining an N-th subnetwork corresponding to the target feature extraction layer with the deepest layer number;
Extracting the characteristics of the first image by utilizing the characteristic extraction layers in the first N subnetworks of the third characteristic extraction model to obtain third image characteristics;
and extracting the characteristics of the third image characteristic by using the target type characteristic encoder to obtain a third characteristic set.
11. The method according to claim 3 or 9, wherein the second feature extraction model comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and a fourth feature extraction layer, wherein the type feature encoder comprises a texture feature encoder, a contour feature encoder, a pattern feature encoder, an appendage feature encoder, a component feature encoder, a color feature encoder, and a global feature encoder, wherein the first feature extraction layer is connected to the texture feature encoder and the contour feature encoder, wherein the second feature extraction layer is connected to the pattern feature encoder and the appendage feature encoder, wherein the third feature extraction layer is connected to the component feature encoder, and wherein the fourth feature extraction layer is connected to the color feature encoder and the global feature encoder.
12. The method according to claim 4 or 10, wherein the third feature extraction model comprises a first sub-network for extracting texture features, a second sub-network for extracting contour features, a third sub-network for extracting pattern features, a fourth sub-network for extracting accessory features, a fifth sub-network for extracting component features, a sixth sub-network for extracting color features, a seventh sub-network for extracting global features.
13. An image comparison apparatus, comprising:
the image acquisition module is used for acquiring a first image and a second image;
the information acquisition module is used for acquiring characteristic prompt information, wherein the characteristic prompt information represents target feature types used for comparison in a first image and a second image, and the characteristic prompt information comprises the steps of respectively inputting the first image and the second image into an image characteristic classification model, extracting image type information of the first image and the second image, determining a common feature type of the first image and the second image based on a corresponding relation between preset image type information and feature types, and obtaining the characteristic prompt information, wherein the corresponding relation between preset image type information and feature types comprises a corresponding relation between infrared image types, texture features and contour features, a corresponding relation between an incomplete image type of a target and part features, and a corresponding relation between an image type with a visible light and a significant pattern feature;
the feature extraction module is used for extracting various types of features from the first image to obtain a first feature set, and extracting various types of features from the second image to obtain a second feature set;
And the image comparison module is used for determining an image data comparison result according to the first feature set, the second feature set and the characteristic prompt message.
14. The apparatus according to claim 13, wherein the feature extraction module is specifically configured to:
The method comprises the steps of inputting a first image into a first feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the first feature extraction model to obtain a plurality of first image features with different granularities, carrying out feature extraction on each first image feature by utilizing a plurality of granularity feature encoders of the first feature extraction model to obtain a first feature set, wherein one granularity feature encoder is connected with one feature extraction layer, and the granularity feature encoder is used for carrying out feature extraction on the first image features output by the feature extraction layer connected with the granularity feature encoder;
inputting the second image into the first feature extraction model to obtain a second feature set;
The feature extraction module is specifically configured to:
the method comprises the steps of inputting a first image into a second feature extraction model, carrying out feature extraction on the first image through a plurality of feature extraction layers of the second feature extraction model to obtain a plurality of different types of first image features, and carrying out feature extraction on each first image feature by utilizing a plurality of type feature encoders of the second feature extraction model to obtain a first feature set, wherein one type feature encoder is connected with one feature extraction layer, one feature extraction layer is connected with one or more type feature encoders, and the type feature encoders are used for carrying out feature extraction on the first image features output by the feature extraction layers connected with the type feature encoders;
inputting the second image into the second feature extraction model to obtain a second feature set;
The feature extraction module is specifically configured to:
The method comprises the steps of inputting a first image into a third characteristic extraction model, carrying out characteristic extraction on the first image through a characteristic extraction layer in a sub-network of the third characteristic extraction model to obtain a plurality of different types of first image characteristics, and carrying out characteristic extraction on each first image characteristic by utilizing a plurality of type characteristic encoders in the sub-network to obtain a first characteristic set, wherein each sub-network comprises a characteristic extractor and a type characteristic encoder which are connected, and the type characteristic encoder is used for carrying out characteristic extraction on the first image characteristics output by the characteristic extraction layer connected with the sub-network;
Inputting the second image into the third feature extraction model to obtain a third feature set;
The feature extraction module is specifically configured to:
Respectively inputting the first image and the second image into a visual large model, and extracting various types of features from the first image and the second image to obtain a first feature set and a second feature set;
the visual large model is obtained through training in the following mode:
Selecting a sample image with first feature type features and an image characteristic expert model for extracting the first feature type features;
taking the currently selected image characteristic expert model as a teacher model, taking the vision large model as a student model, and taking the currently selected sample image as input, training the vision large model;
selecting sample images with other characteristic types and features, and extracting image characteristic expert models of the other characteristic types to train the vision large model until the preset multiple characteristic types are trained;
the image comparison module is specifically used for:
encoding the characteristic prompt message to obtain a first encoding characteristic;
splicing the first coding feature and the first feature set to obtain a first splicing feature, and splicing the first coding feature and the second feature set to obtain a second splicing feature;
Performing feature screening on the first spliced features through a feature screening model to obtain first screening features;
calculating the similarity between the first screening feature and the second screening feature to obtain the image data comparison result;
the image comparison module is specifically used for:
Calculating a first attention weight of each type of feature in the first feature set and a second attention weight of each type of feature in the second feature set based on the feature hint message;
Weighting and fusing various types of features in the first feature set according to the first attention weight to obtain a first target feature set;
weighting and fusing all types of features in the second feature set according to the second attention weight to obtain a second target feature set;
And calculating the similarity between the first target feature set and the second target feature set to obtain an image data comparison result.
15. An image comparison apparatus, comprising:
the image acquisition module is used for acquiring a first image and a second image;
the information acquisition module is used for acquiring characteristic prompt information, wherein the characteristic prompt information represents target feature types used for comparison in a first image and a second image, and the characteristic prompt information comprises the steps of respectively inputting the first image and the second image into an image characteristic classification model, extracting image type information of the first image and the second image, determining a common feature type of the first image and the second image based on a corresponding relation between preset image type information and feature types, and obtaining the characteristic prompt information, wherein the corresponding relation between preset image type information and feature types comprises a corresponding relation between infrared image types, texture features and contour features, a corresponding relation between an incomplete image type of a target and part features, and a corresponding relation between an image type with a visible light and a significant pattern feature;
The feature extraction module is used for extracting the target feature type from the first image to obtain a third feature set and extracting the target feature type from the second image to obtain a fourth feature set based on the feature prompt message;
And the image comparison module is used for determining an image data comparison result according to the third characteristic set and the fourth characteristic set.
16. The apparatus according to claim 15, wherein the feature extraction module is specifically configured to:
determining each target type feature encoder for extracting the features of the target feature type in a second feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining the layer number N of the target feature extraction layer with the deepest layer number;
performing feature extraction on the first image by using the first N feature extraction layers of the second feature extraction model to obtain a third image feature;
extracting features of the third image features by using the target type feature encoder to obtain a third feature set;
The feature extraction module is specifically configured to:
Determining each target type feature encoder for extracting the features of the target feature type in a third feature extraction model according to the feature prompt message;
Respectively determining target feature extraction layers connected with the target type feature encoders, and determining an N-th subnetwork corresponding to the target feature extraction layer with the deepest layer number;
Extracting the characteristics of the first image by utilizing the characteristic extraction layers in the first N subnetworks of the third characteristic extraction model to obtain third image characteristics;
extracting features of the third image features by using the target type feature encoder to obtain a third feature set;
The second feature extraction model comprises a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, wherein the type feature encoder comprises a texture feature encoder, a contour feature encoder, a pattern feature encoder, an accessory feature encoder, a component feature encoder, a color feature encoder and a global feature encoder, the first feature extraction layer is connected with the texture feature encoder and the contour feature encoder, the second feature extraction layer is connected with the pattern feature encoder and the accessory feature encoder, the third feature extraction layer is connected with the component feature encoder, and the fourth feature extraction layer is connected with the color feature encoder and the global feature encoder;
The third feature extraction model includes a first sub-network for extracting texture features, a second sub-network for extracting contour features, a third sub-network for extracting pattern features, a fourth sub-network for extracting attachment features, a fifth sub-network for extracting component features, a sixth sub-network for extracting color features, and a seventh sub-network for extracting global features.
17. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method of any of claims 1-12 when executing a program stored on a memory.
18. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-12.
CN202411056302.0A 2024-08-02 2024-08-02 Image comparison method, device, electronic equipment and storage medium Active CN118587456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411056302.0A CN118587456B (en) 2024-08-02 2024-08-02 Image comparison method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411056302.0A CN118587456B (en) 2024-08-02 2024-08-02 Image comparison method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118587456A CN118587456A (en) 2024-09-03
CN118587456B true CN118587456B (en) 2024-12-06

Family

ID=92537474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411056302.0A Active CN118587456B (en) 2024-08-02 2024-08-02 Image comparison method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118587456B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4074366B2 (en) * 1998-02-24 2008-04-09 コニカミノルタビジネステクノロジーズ株式会社 Image search apparatus and method, and recording medium storing image search program
CN106066887B (en) * 2016-06-12 2019-05-17 北京理工大学 A kind of sequence of advertisements image quick-searching and analysis method
CN107766855B (en) * 2017-10-25 2021-09-07 南京阿凡达机器人科技有限公司 Chessman positioning method and system based on machine vision, storage medium and robot
CN117853832A (en) * 2022-09-30 2024-04-09 杭州海康威视数字技术股份有限公司 Image processing model training method, device, equipment and storage medium
CN116704238A (en) * 2023-05-16 2023-09-05 飞诺门阵(北京)科技有限公司 Image detection method and device, electronic equipment and storage medium
CN118132786A (en) * 2024-01-02 2024-06-04 太保科技有限公司 Method, device, equipment and storage medium for retrieving similar pictures

Also Published As

Publication number Publication date
CN118587456A (en) 2024-09-03

Similar Documents

Publication Publication Date Title
WO2019100723A1 (en) Method and device for training multi-label classification model
WO2019100724A1 (en) Method and device for training multi-label classification model
CN110210513B (en) Data classification method and device and terminal equipment
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN113762326B (en) A data identification method, device, equipment and readable storage medium
CN112836625A (en) Face living body detection method and device and electronic equipment
CN111027576A (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN113870254B (en) Target object detection method and device, electronic equipment and storage medium
CN110968734A (en) A method and device for pedestrian re-identification based on deep metric learning
CN112560710B (en) Method for constructing finger vein recognition system and finger vein recognition system
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN118096924B (en) Image processing method, device, equipment and storage medium
JP2015036939A (en) Feature extraction program and information processing apparatus
CN112232147B (en) Method, device and system for self-adaptive acquisition of super-parameters of face model
CN113569081A (en) Image recognition method, device, equipment and storage medium
CN116051917B (en) Method for training image quantization model, method and device for searching image
CN118587456B (en) Image comparison method, device, electronic equipment and storage medium
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN112949590A (en) Cross-domain pedestrian re-identification model construction method and system
CN114186784B (en) Electrical examination scoring method, system, medium and equipment based on edge calculation
CN117011909A (en) Training method of face recognition model, face recognition method and device
CN111898465B (en) Method and device for acquiring face recognition model
CN113610080A (en) Cross-modal perception-based sensitive image identification method, device, equipment and medium
CN117197843B (en) Unsupervised human body part area determination method and device
CN113313079B (en) Training method and system of vehicle attribute recognition model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant