CN117079063A

CN117079063A - Feature extraction model processing, sample retrieval method and device and computer equipment

Info

Publication number: CN117079063A
Application number: CN202210484050.6A
Authority: CN
Inventors: 郭卉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2023-11-17
Anticipated expiration: 2042-05-06
Also published as: CN117079063B

Abstract

The present application relates to a feature extraction model processing, sample retrieval method, device and computer equipment. The method includes: performing feature extraction on the training sample through a feature extraction model to be trained to obtain a training feature vector; classifying and predicting the training sample based on the training feature vector to obtain a predicted semantic vector; and obtaining a label semantic vector of a training category label corresponding to the training sample. , the label semantic vector includes at least two activation label vector components; based on the label semantic vector, determine the expected semantic vector of the training sample, and the expected semantic vector contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components; Based on the difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector, the feature extraction model is trained to obtain the target feature extraction model. Using this method can improve the accuracy of feature extraction.

Description

Feature extraction model processing, sample retrieval method, device and computer equipment

技术领域Technical field

本申请涉及计算机技术领域，特别是涉及一种特征提取模型处理方法、装置、计算机设备、存储介质和计算机程序产品，以及一种样本检索方法、装置、计算机设备、存储介质和计算机程序产品。The present application relates to the field of computer technology, and in particular to a feature extraction model processing method, device, computer equipment, storage medium and computer program product, and a sample retrieval method, device, computer equipment, storage medium and computer program product.

背景技术Background technique

随着计算机技术的发展，出现了机器学习技术，通过机器学习可以训练用于各种机器学习模型，例如，可以训练用于特征提取的特征提取模型，该特征提取模型可以提取得到输入样本的特征，可以基于该特征对输入样本进行识别，得到识别结果。举个实际的例子，可以对一段句子进行特征提取，得到表示该句子的特征向量，基于该特征向量对句子进行翻译。With the development of computer technology, machine learning technology has emerged. Machine learning can be used to train various machine learning models. For example, a feature extraction model for feature extraction can be trained. The feature extraction model can extract the features of the input sample. , the input sample can be identified based on this feature, and the identification result can be obtained. To give a practical example, feature extraction can be performed on a sentence to obtain a feature vector representing the sentence, and the sentence can be translated based on the feature vector.

传统技术中，可以通过训练样本对特征提取模型进行训练，然而经常存在训练得到的特征提取模型所提取得到的特征准确度比较低的情况。In traditional technology, the feature extraction model can be trained through training samples. However, there are often situations where the accuracy of features extracted by the trained feature extraction model is relatively low.

发明内容Contents of the invention

基于此，有必要针对上述技术问题，提供一种能够提高特征提取准确度的特征提取模型处理方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。Based on this, it is necessary to address the above technical problems and provide a feature extraction model processing method, device, computer equipment, computer readable storage medium and computer program product that can improve the accuracy of feature extraction.

一方面，本申请提供了一种特征提取模型处理方法。所述方法包括：通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量；基于所述训练特征向量对所述训练样本进行分类预测，得到预测语义向量；获取所述训练样本对应的训练类别标签的标签语义向量，所述标签语义向量包括至少两个激活标签向量分量；基于所述标签语义向量，确定所述训练样本的期望语义向量，所述期望语义向量中，包含与所述至少两个激活标签向量分量的位置分布对应的期望激活向量分量；基于所述期望语义向量中各个所述期望激活向量分量，与所述预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失；基于所述分类损失对所述特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型；所述目标特征提取模型用于提取输入样本的样本特征向量。On the one hand, this application provides a feature extraction model processing method. The method includes: performing feature extraction on the training sample through a feature extraction model to be trained to obtain a training feature vector; classifying and predicting the training sample based on the training feature vector to obtain a predicted semantic vector; and obtaining the corresponding training sample The label semantic vector of the training category label, the label semantic vector includes at least two activation label vector components; based on the label semantic vector, determine the expected semantic vector of the training sample, the expected semantic vector includes the The expected activation vector component corresponding to the position distribution of the at least two activation label vector components; based on the relationship between each of the expected activation vector components in the expected semantic vector and the predicted vector component of the corresponding position in the predicted semantic vector The difference is determined to determine the classification loss; the feature extraction model is trained based on the classification loss, and when the training stop condition is met, the target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

另一方面，本申请还提供了一种特征提取模型处理装置。所述装置包括：特征提取模块，用于通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量；分类预测模块，用于基于所述训练特征向量对所述训练样本进行分类预测，得到预测语义向量；标签向量获取模块，用于获取所述训练样本对应的训练类别标签的标签语义向量，所述标签语义向量包括至少两个激活标签向量分量；期望向量确定模块，用于基于所述标签语义向量，确定所述训练样本的期望语义向量，所述期望语义向量中，包含与所述至少两个激活标签向量分量的位置分布对应的期望激活向量分量；分类损失确定模块，用于基于所述期望语义向量中各个所述期望激活向量分量，与所述预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失；模型训练模块，用于基于所述分类损失对所述特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型；所述目标特征提取模型用于提取输入样本的样本特征向量。On the other hand, this application also provides a feature extraction model processing device. The device includes: a feature extraction module, used to perform feature extraction on training samples through a feature extraction model to be trained, to obtain a training feature vector; a classification prediction module, used to perform classification prediction on the training sample based on the training feature vector , obtain the predicted semantic vector; the label vector acquisition module is used to obtain the label semantic vector of the training category label corresponding to the training sample, and the label semantic vector includes at least two activation label vector components; the expectation vector determination module is used to determine based on The label semantic vector determines the expected semantic vector of the training sample, and the expected semantic vector includes the expected activation vector component corresponding to the position distribution of the at least two activation label vector components; the classification loss determination module uses Determine the classification loss based on the difference between each of the expected activation vector components in the expected semantic vector and the predicted vector components at respective corresponding positions in the predicted semantic vector; a model training module for determining based on the classification loss The feature extraction model is trained, and when the training stop condition is met, a target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

另一方面，本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述特征提取模型处理方法的步骤。On the other hand, this application also provides a computer device. The computer device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the steps of the feature extraction model processing method are implemented.

另一方面，本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述特征提取模型处理方法的步骤。On the other hand, this application also provides a computer-readable storage medium. The computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the feature extraction model processing method are implemented.

另一方面，本申请还提供了一种计算机程序产品。所述计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述特征提取模型处理方法的步骤。On the other hand, this application also provides a computer program product. The computer program product includes a computer program that implements the steps of the feature extraction model processing method when executed by a processor.

上述特征提取模型处理方法、装置、计算机设备、存储介质和计算机程序产品，通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量，基于训练特征向量对训练样本进行分类预测，得到预测语义向量，获取训练样本对应的训练类别标签的标签语义向量，基于标签语义向量，确定训练样本的期望语义向量，进而基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失，通过该分类损失来训练特征提取模型，最终得到的目标特征提取模型可以学到趋向于标签语义向量的特征，从而可以在训练特征向量所在向量空间中对标签语义向量所在的语义空间进行模拟，使得提取的特征向量中可以储存表征语义的特征信息，提高了特征提取的准确性，并且由于标签语义向量包括至少两个激活标签向量分量，可以对训练类别标签进行更加准确地表征，而期望语义向量是基于标签语义向量确定的，并且包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，可以使得计算得到的分类损失更加准确，进一步提高了特征提取的准确性。The above feature extraction model processing method, device, computer equipment, storage medium and computer program product extracts features from training samples through the feature extraction model to be trained to obtain training feature vectors, and classifies and predicts training samples based on the training feature vectors to obtain Predict the semantic vector, obtain the label semantic vector of the training category label corresponding to the training sample, determine the expected semantic vector of the training sample based on the label semantic vector, and then determine the corresponding position in the predicted semantic vector based on each expected activation vector component in the expected semantic vector. The difference between the predicted vector components determines the classification loss. The feature extraction model is trained through this classification loss. The final target feature extraction model can learn features that tend to the label semantic vector, so that it can be used in the vector space where the training feature vector is located. The semantic space where the label semantic vector is located is simulated, so that the extracted feature vector can store feature information representing the semantics, which improves the accuracy of feature extraction, and because the label semantic vector includes at least two activation label vector components, it can The training category labels are more accurately characterized, and the expected semantic vector is determined based on the label semantic vector and contains the expected activation vector components corresponding to the position distribution of at least two activation label vector components, which can make the calculated classification loss more accurate. , further improving the accuracy of feature extraction.

另一方面，本申请提供了一种样本检索方法。所述方法包括：获取查询样本和候选召回样本集合；分别将所述查询样本和候选召回样本集合中的候选召回样本输入目标特征提取模型，得到所述查询样本对应的查询特征向量和候选召回样本对应的候选召回特征向量；其中，所述目标特征提取模型是通过分类损失对待训练的特征提取模型进行训练得到的，所述分类损失是基于所述期望语义向量中各个所述期望激活向量分量，与所述预测语义向量中各自对应位置的预测向量分量之间的差异确定的，所述预测语义向量是基于训练特征向量对所述训练样本进行分类预测得到的，所述期望语义向量是基于训练样本对应的训练类别标签的标签语义向量确定的，所述标签语义向量包括至少两个激活标签向量分量，所述期望语义向量中，包含与所述至少两个激活标签向量分量的位置分布对应的期望激活向量分量，所述训练特征向量是通过所述待训练的特征提取模型对所述训练样本进行特征提取得到的；基于所述查询特征向量和所述候选召回特征向量，从所述候选召回样本集合中确定所述查询样本对应的目标检索样本。On the other hand, this application provides a sample retrieval method. The method includes: obtaining a query sample and a candidate recall sample set; respectively inputting the query sample and the candidate recall sample in the candidate recall sample set into a target feature extraction model to obtain the query feature vector and candidate recall sample corresponding to the query sample. The corresponding candidate recall feature vector; wherein, the target feature extraction model is obtained by training the feature extraction model to be trained through classification loss, and the classification loss is based on each of the expected activation vector components in the expected semantic vector, The predicted semantic vector is determined by the difference between the predicted vector components corresponding to the corresponding positions in the predicted semantic vector. The predicted semantic vector is obtained by classifying and predicting the training sample based on the training feature vector. The expected semantic vector is based on the training The label semantic vector of the training category label corresponding to the sample is determined. The label semantic vector includes at least two activation label vector components. The expected semantic vector includes the position distribution corresponding to the at least two activation label vector components. Desired activation vector component, the training feature vector is obtained by feature extraction of the training sample through the feature extraction model to be trained; based on the query feature vector and the candidate recall feature vector, from the candidate recall The target retrieval sample corresponding to the query sample is determined in the sample set.

另一方面，本申请还提供了一种样本检索装置。所述装置包括：样本获取模块，用于获取查询样本和候选召回样本集合；特征提取模块，用于分别将所述查询样本和候选召回样本集合中的候选召回样本输入目标特征提取模型，得到所述查询样本对应的查询特征向量和候选召回样本对应的候选召回特征向量；其中，所述目标特征提取模型是通过分类损失对待训练的特征提取模型进行训练得到的，所述分类损失是基于所述期望语义向量中各个所述期望激活向量分量，与所述预测语义向量中各自对应位置的预测向量分量之间的差异确定的，所述预测语义向量是基于训练特征向量对所述训练样本进行分类预测得到的，所述期望语义向量是基于训练样本对应的训练类别标签的标签语义向量确定的，所述标签语义向量包括至少两个激活标签向量分量，所述期望语义向量中，包含与所述至少两个激活标签向量分量的位置分布对应的期望激活向量分量，所述训练特征向量是通过所述待训练的特征提取模型对所述训练样本进行特征提取得到的；检索模块，用于基于所述查询特征向量和所述候选召回特征向量，从所述候选召回样本集合中确定所述查询样本对应的目标检索样本。On the other hand, this application also provides a sample retrieval device. The device includes: a sample acquisition module, used to obtain a query sample and a candidate recall sample set; a feature extraction module, used to input the query sample and the candidate recall sample in the candidate recall sample set into a target feature extraction model to obtain the desired feature extraction model. The query feature vector corresponding to the query sample and the candidate recall feature vector corresponding to the candidate recall sample; wherein, the target feature extraction model is obtained by training the feature extraction model to be trained through classification loss, and the classification loss is based on the The difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector is determined, and the predicted semantic vector is used to classify the training sample based on the training feature vector. Predicted, the expected semantic vector is determined based on the label semantic vector of the training category label corresponding to the training sample, the label semantic vector includes at least two activation label vector components, and the expected semantic vector includes the The expected activation vector component corresponding to the position distribution of at least two activation label vector components, the training feature vector is obtained by feature extraction of the training sample through the feature extraction model to be trained; a retrieval module, used to perform feature extraction based on the The query feature vector and the candidate recall feature vector are used to determine the target retrieval sample corresponding to the query sample from the candidate recall sample set.

另一方面，本申请还提供了一种计算机设备。所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现上述样本检索方法的步骤。On the other hand, this application also provides a computer device. The computer device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the steps of the above sample retrieval method are implemented.

另一方面，本申请还提供了一种计算机可读存储介质。所述计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述样本检索方法的步骤。On the other hand, this application also provides a computer-readable storage medium. The computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the above sample retrieval method are implemented.

另一方面，本申请还提供了一种计算机程序产品。所述计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述样本检索方法的步骤。On the other hand, this application also provides a computer program product. The computer program product includes a computer program that implements the steps of the sample retrieval method when executed by a processor.

上述样本检索方法、装置、计算机设备、存储介质和计算机程序产品，由于目标特征提取模型是通过该分类损失来训练，最终得到的目标特征提取模型可以学到趋向于标签语义向量的特征，从而可以在训练特征向量所在向量空间中对标签语义向量所在的语义空间进行模拟，使得提取的特征向量中可以储存表征语义的特征信息，提高了特征提取的准确性，并且由于标签语义向量包括至少两个激活标签向量分量，可以对训练类别标签进行更加准确地表征，而期望语义向量是基于标签语义向量确定的，并且包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，可以使得计算得到的分类损失更加准确，进一步提高了特征提取的准确性，进而采用目标特征提取模型提取得到的特征向量进行样本检索，可以提高样本检索的准确度。In the above sample retrieval methods, devices, computer equipment, storage media and computer program products, since the target feature extraction model is trained through the classification loss, the final target feature extraction model can learn features that tend to the label semantic vector, so that it can The semantic space where the label semantic vector is located is simulated in the vector space where the training feature vector is located, so that feature information representing semantics can be stored in the extracted feature vector, which improves the accuracy of feature extraction, and because the label semantic vector includes at least two The activation label vector component can more accurately characterize the training category label, and the expected semantic vector is determined based on the label semantic vector, and contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, so that The calculated classification loss is more accurate, further improving the accuracy of feature extraction. Using the feature vector extracted by the target feature extraction model for sample retrieval can improve the accuracy of sample retrieval.

附图说明Description of the drawings

图1为一个实施例中特征提取模型处理方法和样本检索方法的应用环境图；Figure 1 is an application environment diagram of the feature extraction model processing method and sample retrieval method in one embodiment;

图2为一个实施例中特征提取模型处理方法的流程示意图；Figure 2 is a schematic flowchart of a feature extraction model processing method in one embodiment;

图3为一个实施例中训练特征向量和期望语义向量之间的映射关系图；Figure 3 is a mapping relationship diagram between training feature vectors and expected semantic vectors in one embodiment;

图4为一个实施例中特征学习效果比对图；Figure 4 is a comparison diagram of feature learning effects in one embodiment;

图5为一个实施例中特征提取模型的训练过程示意图；Figure 5 is a schematic diagram of the training process of the feature extraction model in one embodiment;

图6为一个实施例中样本检索方法的流程示意图；Figure 6 is a schematic flow chart of a sample retrieval method in one embodiment;

图7为一个实施例中通过索引进行检索的具体过程示意图；Figure 7 is a schematic diagram of the specific process of retrieval through indexing in one embodiment;

图8为一个实施例中特征提取模型处理装置的结构框图；Figure 8 is a structural block diagram of a feature extraction model processing device in one embodiment;

图9为一个实施例中样本检索装置的结构框图；Figure 9 is a structural block diagram of a sample retrieval device in one embodiment;

图10为一个实施例中计算机设备的内部结构图；Figure 10 is an internal structure diagram of a computer device in one embodiment;

图11为另一个实施例中计算机设备的内部结构图。Figure 11 is an internal structure diagram of a computer device in another embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习、自动驾驶、智慧交通等几大方向。Artificial intelligence technology is a comprehensive subject that covers a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, autonomous driving, smart transportation and other major directions.

计算机视觉技术(Computer Vision,CV)是一门研究如何使机器“看”的科学，更进一步的说，就是指用摄影机和电脑代替人眼对目标进行识别、测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建、自动驾驶、智慧交通等技术，还包括常见的人脸识别、指纹识别等生物特征识别技术。Computer vision technology (Computer Vision, CV) is a science that studies how to make machines "see". Furthermore, it refers to using cameras and computers instead of human eyes to perform machine vision such as target identification and measurement, and further to do graphics. Processing, so that computer processing becomes an image more suitable for human eye observation or transmitted to instrument detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous positioning and map construction, Technologies such as autonomous driving and smart transportation also include common biometric recognition technologies such as facial recognition and fingerprint recognition.

机器学习(Machine Learning,ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机设备怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机设备具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Specializes in studying how computer equipment can simulate or implement human learning behavior to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve its performance. Machine learning is the core of artificial intelligence and the fundamental way to make computer equipment intelligent. Its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.

本申请实施例提供的方案涉及人工智能的计算机视觉、机器学习等技术，具体通过如下实施例进行说明：The solutions provided by the embodiments of this application involve artificial intelligence, computer vision, machine learning and other technologies, which are specifically explained through the following examples:

本申请提供的特征提取模型处理方法以及样本检索方法，可以应用于如图1所示的应用环境中。其中，终端102通过网络与服务器104进行通信。数据存储系统可以存储服务器104需要处理的数据，例如训练样本数据。数据存储系统可以集成在服务器104上，也可以放在云上或其他网络服务器上。终端102可以但不限于是笔记本电脑、智能手机、平板电脑、台式电脑、智能电视、车载终端和便携式可穿戴设备。终端上可以设有应用程序，通过该应用程序可以对输入样本实现检索，获得目标检索样本，该应用程序可以是指安装在终端中的客户端，客户端(又可称为应用客户端、APP客户端)是指安装并运行在终端中的程序；应用程序也可以是指免安装的应用程序，即无需下载安装即可使用的应用程序，这类应用程序又俗称小程序，它通常作为子程序运行于客户端中；应用程序还可以是指通过浏览器打开的web应用程序；等等。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群或者云服务器来实现。可以理解的是，本申请实施例并不对终端和服务器的数量进行限制，终端可以是一个或者多个，服务器也可以是一个或者多个，具体可以根据需要进行设定。其中多个指的是至少两个。The feature extraction model processing method and sample retrieval method provided by this application can be applied in the application environment as shown in Figure 1. Among them, the terminal 102 communicates with the server 104 through the network. The data storage system can store data that the server 104 needs to process, such as training sample data. The data storage system can be integrated on the server 104, or placed on the cloud or other network servers. The terminal 102 may be, but is not limited to, a laptop computer, a smartphone, a tablet computer, a desktop computer, a smart TV, a vehicle-mounted terminal, and a portable wearable device. The terminal can be provided with an application program, through which the input sample can be retrieved and the target retrieval sample can be obtained. The application program can refer to a client installed in the terminal, and the client (also known as an application client, APP Client) refers to a program installed and run in the terminal; an application can also refer to an installation-free application, that is, an application that can be used without downloading and installing. This type of application is also commonly known as a small program, which is usually used as a sub-program. The program runs in the client; the application can also refer to a web application opened through a browser; etc. The server 104 can be implemented as an independent server or a server cluster or cloud server composed of multiple servers. It can be understood that the embodiments of the present application do not limit the number of terminals and servers. There can be one or more terminals and one or more servers. The specific settings can be set as needed. Where multiple refers to at least two.

可以理解的是，本申请实施例提供的特征提取模型处理方法以及样本检索方法，可以由终端102执行，也可以由服务器104执行，还可以由终端102和服务器104协同执行。例如，服务器104可以通过本申请实施例提供的特征提取模型处理方法获得目标特征提取模型，将目标特征提取模型发送至终端102，终端102基于该目标特征提取模型可以实现样本检索。It can be understood that the feature extraction model processing method and sample retrieval method provided by the embodiment of the present application can be executed by the terminal 102, the server 104, or the terminal 102 and the server 104. For example, the server 104 can obtain the target feature extraction model through the feature extraction model processing method provided in the embodiment of the present application, and send the target feature extraction model to the terminal 102. The terminal 102 can implement sample retrieval based on the target feature extraction model.

在一个实施例中，如图2所示，提供了一种特征提取模型处理方法，由计算机设备执行，该计算机设备可以是图1中的终端102，也可以是图1中的服务器104，还可以是终端102和服务器104所构成的系统。具体地，该特征提取模型处理方法包括以下步骤：In one embodiment, as shown in Figure 2, a feature extraction model processing method is provided, which is executed by a computer device. The computer device can be the terminal 102 in Figure 1, or the server 104 in Figure 1, or It may be a system composed of the terminal 102 and the server 104. Specifically, the feature extraction model processing method includes the following steps:

步骤202，通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量。Step 202: Extract features from the training samples through the feature extraction model to be trained to obtain training feature vectors.

其中，训练样本指的是用于训练特征提取模型的内容样本，内容可以是文本、音频或者图像中的任意一种。训练样本存在对应的训练类别标签，训练样本对应的训练类别标签用于标识训练样本的类别。例如，若训练样本为图像，则训练样本对应的训练类别标签可以是狗、猫、鱼等动物品种，珊瑚、松树、桂花等植物品种，或者放大镜、柜子、水瓶等物件种类。训练样本对应的类别标签可以是一个或者多个，多个指的是至少两个。当训练样本存在多个类别标签时，表示该训练样本可以归类到多个不同的类别，以训练样本为图片为例，若是训练样本中同时包含猫和狗，那么该训练样本可以归类到猫类别，也可以归类到狗类别。Among them, training samples refer to content samples used to train the feature extraction model, and the content can be any one of text, audio, or images. There is a corresponding training category label for the training sample, and the training category label corresponding to the training sample is used to identify the category of the training sample. For example, if the training samples are images, the training category labels corresponding to the training samples can be animal species such as dogs, cats, and fish, plant species such as corals, pine trees, osmanthus, or object types such as magnifying glasses, cabinets, and water bottles. The category labels corresponding to the training samples can be one or more, and multiple refers to at least two. When a training sample has multiple category labels, it means that the training sample can be classified into multiple different categories. Taking the training sample as a picture as an example, if the training sample contains both cats and dogs, then the training sample can be classified into The cat category can also be classified into the dog category.

待训练的特征提取模型指的需要进行参数调整的特征提取模型。特征提取模型为用于进行特征提取并输出特征向量的机器学习模型，特征提取模型中至少包括embedding(嵌入)模型，embedding模型用于输出特征向量，embedding模型输出的特征向量可以称为embedding向量，即嵌入表示向量。embedding模型可以是一层或者多层全连接层(fullconnection)构成的模型。The feature extraction model to be trained refers to the feature extraction model that requires parameter adjustment. The feature extraction model is a machine learning model used to extract features and output feature vectors. The feature extraction model at least includes an embedding model. The embedding model is used to output feature vectors. The feature vector output by the embedding model can be called an embedding vector. That is, the embedding represents the vector. The embedding model can be a model composed of one or more fully connected layers (fullconnection).

在一个实施例中，embedding模型可以对输出的特征向量进行归一化，使得特征向量的各个特征分量取值范围为-1至1。在其他实施例中，embedding模型还可以对特征向量进行二值量化，得到二值量化特征。其中，二值量化指的是将特征进行二值编码的过程，例如，可以将特征编码为取值为0、1的二进制码，在编码的过程中还可以进行比特压缩，例如，可以将特征向量压缩到48位。对特征向量进行二值量化得到的特征可以称为哈希特征，此时，embedding模型可以称为哈希量化模型。In one embodiment, the embedding model can normalize the output feature vector so that each feature component of the feature vector ranges from -1 to 1. In other embodiments, the embedding model can also perform binary quantization on the feature vector to obtain binary quantized features. Among them, binary quantization refers to the process of binary encoding of features. For example, the features can be encoded into binary codes with values 0 and 1. Bit compression can also be performed during the encoding process. For example, the features can be Vectors are compressed to 48 bits. The features obtained by binary quantization of the feature vector can be called hash features. At this time, the embedding model can be called a hash quantization model.

在一个实施例中，特征提取模型可以仅包括embedding模型，embedding模型的输入端可以连接已训练的基础神经网络模型的输出端，接收基础神经网络模型的输出作为输入。其中的基础神经网络模型可以是用于提取内容所包含的特征信息的模型，基础神经网络模型可以是基于人工智能的神经网络，例如可以是卷积神经网络(ConvolutionalNeural Networks,CNN)，还可以是ResNet101(深度残差网络101)或者ResNet18(深度残差网络18)等网络。其中卷积神经网络是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks)。具体地，计算机设备在获取到训练样本后，将训练样本输入已训练的基础神经网络模型中，通过已训练的基础神经网络模型提取特征信息，将提取的特征信息输入待训练的embedding模型中，得到训练特征向量。举例说明，以训练样本为图像为例，可以将图像输入已训练的CNN模型中提取得到图像特征，将提取的图像特征输入embedding模型中得到图像的训练特征向量。In one embodiment, the feature extraction model may only include an embedding model. The input end of the embedding model may be connected to the output end of the trained basic neural network model and receive the output of the basic neural network model as input. The basic neural network model may be a model used to extract feature information contained in the content. The basic neural network model may be a neural network based on artificial intelligence, for example, it may be a convolutional neural network (Convolutional Neural Networks, CNN), or it may be Networks such as ResNet101 (deep residual network 101) or ResNet18 (deep residual network 18). The convolutional neural network is a type of feedforward neural network (Feedforward Neural Networks) that includes convolutional calculations and has a deep structure. Specifically, after acquiring the training sample, the computer device inputs the training sample into the trained basic neural network model, extracts feature information through the trained basic neural network model, and inputs the extracted feature information into the embedding model to be trained, Get the training feature vector. For example, taking the training sample as an image, the image can be input into the trained CNN model to extract the image features, and the extracted image features can be input into the embedding model to obtain the training feature vector of the image.

在其他实施例中，特征提取模型可以包括基础神经网络模型和embedding模型，此时基础神经网络模型和embedding模型作为特征提取模型共同进行训练。具体地，计算机设备在获取到训练样本后，将训练样本输入待训练的基础神经网络模型中，通过待训练的基础神经网络模型提取特征信息，将提取的特征信息输入待训练的embedding模型中，得到训练特征向量。In other embodiments, the feature extraction model may include a basic neural network model and an embedding model, in which case the basic neural network model and the embedding model are trained together as a feature extraction model. Specifically, after acquiring the training sample, the computer device inputs the training sample into the basic neural network model to be trained, extracts feature information through the basic neural network model to be trained, and inputs the extracted feature information into the embedding model to be trained, Get the training feature vector.

可以理解的是，针对不同类型的样本可以训练不同的特征提取模型，例如，若样本为图像类型的样本，那么可以训练用于提取图像特征的特征提取模型，若样本为语音类型的样本，那么可以训练用于提取语音特征的特征提取模型。It is understandable that different feature extraction models can be trained for different types of samples. For example, if the sample is an image type sample, then a feature extraction model for extracting image features can be trained. If the sample is a speech type sample, then Feature extraction models for extracting speech features can be trained.

在一个实施例中，计算机设备本地存储有带标注的训练样本集合，计算机设备可以从本地存储的训练样本集合中获取训练样本，将训练样本输入待训练的特征提取模型，通过特征提取模型输出训练样本对应的训练特征向量。可以理解的是，对于同一训练样本，在不同的特征提取模型下，可以得到不同的训练特征向量。例如，训练特征向量可以是embedding模型直接输出的D维特征向量，或者是对特征向量归一化后得到的特征，还可以是哈希特征。In one embodiment, the computer device locally stores a set of annotated training samples. The computer device can obtain training samples from the locally stored training sample set, input the training samples into a feature extraction model to be trained, and output training through the feature extraction model. The training feature vector corresponding to the sample. It is understandable that for the same training sample, different training feature vectors can be obtained under different feature extraction models. For example, the training feature vector can be a D-dimensional feature vector directly output by the embedding model, or a feature obtained by normalizing the feature vector, or it can also be a hash feature.

在一个实施例中，训练样本集合中的训练样本可以通过从互联网搜索引擎进行搜索的方式来进行标注，以训练样本为图像为例，具体来说，对于某个标签，可以在搜索引擎中输入该标签，取返回的前N个图像作为带该标签的图像。如在某个搜索引擎搜索金毛，保存返回的前500张图作为训练样本集中“金毛”标签的图像样本。In one embodiment, the training samples in the training sample set can be labeled by searching from an Internet search engine. Taking the training sample as an image as an example, specifically, for a certain label, you can enter it in the search engine For this label, take the first N images returned as images with this label. For example, if you search for golden retriever on a certain search engine, save the first 500 images returned as image samples with the "golden retriever" label in the training sample set.

步骤204，基于训练特征向量对训练样本进行分类预测，得到预测语义向量。Step 204: Classify and predict the training samples based on the training feature vectors to obtain predicted semantic vectors.

其中，分类预测指的是基于训练特征向量对训练样本所属的类别进行预测，以确定出训练样本所属的具体类别。预测语义向量指的是分类预测输出的概率向量，该概率向量可以表征训练样本的类别信息，是包含语义信息的概率向量。Among them, classification prediction refers to predicting the category to which the training sample belongs based on the training feature vector to determine the specific category to which the training sample belongs. The predicted semantic vector refers to the probability vector of the classification prediction output. This probability vector can represent the category information of the training sample and is a probability vector containing semantic information.

具体地，计算机设备可以基于训练特征向量对训练样本进行分类预测，得到预测语义向量。Specifically, the computer device can perform classification prediction on the training sample based on the training feature vector to obtain the predicted semantic vector.

在一个实施例中，计算机设备可以采用分类模型对训练特征向量进行分类，即将训练特征向量输入到分类模型中，通过分类模型输出预测语义向量。其中，分类模型指的是可以进行类别识别的机器学习模型。在一些实施例中，分类模型可以包括一层或者多层全连接层，当分类模型包括多层全连接层时，前几层全连接层可以对训练特征向量抽取语义特征，因此称可以为语义层Semantic_layer，最后一层全连接层用于分类，称为分类层。语义层的数量可以根据需要进行确定，对于需要更多特征交叉的情况可以加深语义层。通过设置语义层，可以抽象出高阶特征从而更充分的挖掘分类信息。In one embodiment, the computer device can use a classification model to classify the training feature vector, that is, input the training feature vector into the classification model, and output the predicted semantic vector through the classification model. Among them, the classification model refers to a machine learning model that can perform category recognition. In some embodiments, the classification model may include one or more fully connected layers. When the classification model includes multiple fully connected layers, the first few fully connected layers can extract semantic features from the training feature vector, so it is called semantic Layer Semantic_layer, the last fully connected layer is used for classification, called the classification layer. The number of semantic layers can be determined as needed, and the semantic layers can be deepened if more feature intersections are required. By setting up a semantic layer, high-order features can be abstracted to more fully mine classification information.

步骤206，获取训练样本对应的训练类别标签的标签语义向量，标签语义向量包括至少两个激活标签向量分量。Step 206: Obtain the label semantic vector of the training category label corresponding to the training sample. The label semantic vector includes at least two activation label vector components.

其中，标签语义向量用于对训练类别标签的语义进行表征。标签向量分量指的是标签语义向量中所包含的向量分量，例如，假设标签语义向量为(1，1，0，0)，则其中1、1、0、0为标签向量分量。激活标签向量分量指的是标签语义向量中被激活的标签向量分量，即该标签向量分量为激活值。例如，可以用1表示激活，用0表示未激活，则标签语义向量为(1，1，0，0)中包括两个激活标签向量分量。本申请实施例中，对于每一个训练类别标签，用包括至少两个激活标签向量分量的标签语义向量来表达其语义信息，相较于独热编码(one-bot)可以进行更为具体和准确的表达语义信息。Among them, the label semantic vector is used to characterize the semantics of the training category label. The label vector component refers to the vector component contained in the label semantic vector. For example, assuming that the label semantic vector is (1, 1, 0, 0), then 1, 1, 0, 0 are the label vector components. The activated label vector component refers to the activated label vector component in the label semantic vector, that is, the label vector component is the activation value. For example, 1 can be used to represent activation and 0 can be used to represent inactivity, then the label semantic vector is (1, 1, 0, 0), which includes two activation label vector components. In the embodiment of the present application, for each training category label, a label semantic vector including at least two activation label vector components is used to express its semantic information, which can be more specific and accurate than one-hot encoding (one-bot). express semantic information.

具体地，训练样本属于训练样本集合，训练样本集合中各个训练样本的训练类别标签组成标签集合，预先对标签集合中个训练类别标签生成标签语义向量，并建立标签语义向量和训练类别标签之间的关联关系，从而在进行训练时，计算机设备可以从预先建立的训练类别标签与标签语义向量之间的关联关系中，获取到训练类别标签的标签语义向量。Specifically, the training samples belong to the training sample set, and the training category labels of each training sample in the training sample set form a label set. A label semantic vector is generated for each training category label in the label set in advance, and a relationship between the label semantic vector and the training category label is established. Therefore, when training, the computer device can obtain the label semantic vector of the training category label from the pre-established correlation between the training category label and the label semantic vector.

在一个实施例中，可以对标签集合中的各个训练类别标签进行元素拆分，将每一个训练类别标签拆分成多个元素，抽取各个训练类别标签之间的共现元素，如森林、公园都有草地元素，基于抽取出的所有元素的数量确定标签语义向量的向量维数，每一个元素对应一个特征位，特征位指的是特征分量所在的排序位置，进而可以生成各个训练类别标签的标签语义向量。举例说明，假设标签集合中包括三个训练类别标签，其中训练类别标签1包括三个元素(a，b，c)，训练类别标签2包括三个元素(a，d，e)，训练类别标签3包括三个元素(c，e)，则可以确定训练类别标签1的标签语义向量为(1，1，1，0，0)，训练类别标签2的标签语义向量为(1，0，0，1，1)，训练类别标签3的标签语义向量为(0，0，1，0，1)。In one embodiment, each training category label in the label set can be element-split, each training category label can be split into multiple elements, and co-occurring elements between each training category label can be extracted, such as forest and park. There are grass elements. The vector dimension of the label semantic vector is determined based on the number of all extracted elements. Each element corresponds to a feature bit. The feature bit refers to the sorting position of the feature component, and then the label of each training category can be generated. Label semantic vector. For example, assume that the label set includes three training category labels, where training category label 1 includes three elements (a, b, c), training category label 2 includes three elements (a, d, e), and training category label 3 includes three elements (c, e), then it can be determined that the label semantic vector of training category label 1 is (1, 1, 1, 0, 0), and the label semantic vector of training category label 2 is (1, 0, 0) , 1, 1), the label semantic vector of training category label 3 is (0, 0, 1, 0, 1).

在另一个实施例中，计算机设备还可以通过已训练的word2vec对标签集合中各个训练列表标签生成初始语义向量，将初始语义向量中各个特征分量与语义激活阈值进行比较，若某个特征分量大于激活阈值，则将该特征分量确定为激活特征分量，并设置该特征位的值为激活值，激活值例如可以是1，若某个特征分量小于激活阈值，则将该特征分量确定为非激活特征分量，并设置该特征位的值为非激活值，非激活值例如可以是0，最终得到标签语义向量。In another embodiment, the computer device can also generate an initial semantic vector for each training list label in the label set through the trained word2vec, and compare each feature component in the initial semantic vector with the semantic activation threshold. If a feature component is greater than If the activation threshold is set, the feature component is determined to be the activated feature component, and the value of the feature bit is set to the activation value. The activation value can be, for example, 1. If a feature component is less than the activation threshold, the feature component is determined to be inactive. feature component, and set the value of the feature bit to an inactive value. The inactive value can be, for example, 0, and finally the label semantic vector is obtained.

步骤208，基于标签语义向量，确定训练样本的期望语义向量，期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量。Step 208: Determine the expected semantic vector of the training sample based on the label semantic vector. The expected semantic vector includes the expected activation vector component corresponding to the position distribution of at least two activation label vector components.

其中，期望语义向量指的是对训练样本进行分类预测时所期望输出的语义向量，该语义向量可以作为训练过程中的监督信息，使得特征提取模型在训练过程中可以学习到语义信息。期望激活向量分量指的是被激活的期望向量分量，这里的期望向量分量指的是期望语义向量中所包含的向量分量，例如，假设期望语义向量为(1，0，0，1)，则其中1、1、0、0为期望向量分量。期望语义向量中包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，即如果标签语义向量中在某个特征位上具有激活标签向量分量，则在期望语义向量中的对应位置处存在期望激活向量分量。举个例子，假设标签语义向量为(1，1，0，0)，该标签语义向量在第一个特征位和第二个特征位均存在激活向量分量1，那么在期望语义向量中，第一个特征位和第二个特征位必然为1。Among them, the expected semantic vector refers to the semantic vector expected to be output when classifying and predicting training samples. This semantic vector can be used as supervision information in the training process, so that the feature extraction model can learn semantic information during the training process. The expected activation vector component refers to the activated expected vector component. The expected vector component here refers to the vector component contained in the expected semantic vector. For example, assuming that the expected semantic vector is (1, 0, 0, 1), then Among them, 1, 1, 0, 0 are the expected vector components. The expected semantic vector contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, that is, if the label semantic vector has an activation label vector component on a certain feature bit, then the corresponding position in the expected semantic vector There is a desired activation vector component at . For example, assuming that the label semantic vector is (1, 1, 0, 0), the label semantic vector has activation vector component 1 in both the first feature bit and the second feature bit, then in the expected semantic vector, the One flag and the second flag must be 1.

具体地，训练样本对应的训练类别标签是用于表征训练样本所属的类别的，训练类别标签的标签语义向量可以表达训练样本的语义信息，因此，基于标签语义向量，计算机设备可以确定训练样本的期望语义向量，由于期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，因此可以包含标签语义向量所表达的语义信息。Specifically, the training category label corresponding to the training sample is used to characterize the category to which the training sample belongs. The label semantic vector of the training category label can express the semantic information of the training sample. Therefore, based on the label semantic vector, the computer device can determine the training sample. The expected semantic vector, because the expected semantic vector contains expected activation vector components corresponding to the position distribution of at least two activation label vector components, can therefore contain semantic information expressed by the label semantic vector.

在一个实施例中，当训练样本对应一个训练类别标签时，该训练类别标签所代表的语义信息即为训练样本所包含的语义信息，因此计算机设备可以将该标签语义向量作为训练样本的期望语义向量。In one embodiment, when a training sample corresponds to a training category label, the semantic information represented by the training category label is the semantic information contained in the training sample, so the computer device can use the label semantic vector as the expected semantics of the training sample. vector.

在另一个实施例中，当训练样本对应多个训练类别标签时，期望语义向量需要能够表达各个训练类别标签的语义信息，此时，计算机设备可以对各个标签语义向量进行融合，得到能够对各个训练类别标签的语义信息进行联合表达的期望语义向量。In another embodiment, when the training sample corresponds to multiple training category labels, it is expected that the semantic vector needs to be able to express the semantic information of each training category label. At this time, the computer device can fuse the semantic vectors of each label to obtain the semantic vector of each training category. The semantic information of the training category labels is jointly expressed as an expected semantic vector.

在一个实施例中，计算机设备在对各个标签语义向量进行融合时，保证期望语义向量的维数和标签语义向量的维数一致，以避免因为标签数量过多导致期望语义向量维数过大。In one embodiment, when fusing each tag semantic vector, the computer device ensures that the dimension of the expected semantic vector is consistent with the dimension of the tag semantic vector, so as to avoid the dimension of the expected semantic vector being too large due to too many tags. .

步骤210，基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失。Step 210: Determine the classification loss based on the difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector.

其中，期望激活向量分量与预测向量分量之间位置对应指的是，期望激活向量分量在期望语义向量中的排序位置，与预测向量分量在预测语义向量中的排序位置相同。举例说明，假设期望语义向量为(a1，a2，a3，a4)，预测语义向量为(b1，b2，b3，b4)，则a1与b1位置对应，a2与b2位置对应，a3与b3位置对应，a4与b4位置对应。The position correspondence between the expected activation vector component and the predicted vector component refers to the sorting position of the expected activation vector component in the expected semantic vector, which is the same as the sorted position of the predicted vector component in the predicted semantic vector. For example, assuming that the expected semantic vector is (a1, a2, a3, a4) and the predicted semantic vector is (b1, b2, b3, b4), then a1 corresponds to the b1 position, a2 corresponds to the b2 position, and a3 corresponds to the b3 position. , a4 corresponds to the position of b4.

具体地，由于期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，因此，期望语义向量中包含至少两个期望激活向量分量，对于每一个期望激活向量分量，计算机设备可以计算该期望激活向量分量与预测语义向量中对应位置的预测向量分量之间的差异，对各个期望激活向量分量各自对应的差异进行统计，以确定分类损失。Specifically, since the expected semantic vector contains expected activation vector components corresponding to the position distribution of at least two activation label vector components, therefore, the expected semantic vector contains at least two expected activation vector components, for each expected activation vector component , the computer device can calculate the difference between the expected activation vector component and the predicted vector component at the corresponding position in the predicted semantic vector, and perform statistics on the corresponding differences of each expected activation vector component to determine the classification loss.

可以理解的是，期望语义向量中可能还包括非期望激活向量分量，对于每一个非期望激活向量分量，计算机设备可以计算该非期望激活向量分量与预测语义向量中对应位置的预测向量分量之间的差异，最后统计各个期望激活向量分量各自对应的差异以及各个非期望激活向量分量各自对应的差异，得到分类损失。It can be understood that the expected semantic vector may also include undesired activation vector components. For each undesired activation vector component, the computer device can calculate the relationship between the undesired activation vector component and the predicted vector component at the corresponding position in the predicted semantic vector. Finally, the corresponding differences of each expected activation vector component and the corresponding differences of each undesired activation vector component are counted to obtain the classification loss.

步骤212，基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型；目标特征提取模型用于提取输入样本的样本特征向量。Step 212: Train the feature extraction model based on the classification loss. When the training stop condition is met, the target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

具体地，计算机设备可以基于分类损失调整特征提取模型的模型参数并继续训练，直至满足训练停止条件时，得到目标特征提取模型。得到的目标特征提取模型用于提取输入样本的样本特征向量，通过该样本特征向量可以从样本数据库中检索相似样本，通过该样本特征向量还可以对输入样本进行分类识别，确定输入样本所属的类别。Specifically, the computer device can adjust the model parameters of the feature extraction model based on the classification loss and continue training until the training stop condition is met, and the target feature extraction model is obtained. The obtained target feature extraction model is used to extract the sample feature vector of the input sample. Similar samples can be retrieved from the sample database through the sample feature vector. The input sample can also be classified and identified through the sample feature vector to determine the category to which the input sample belongs. .

在一个实施例中，在训练过程中，计算机设备可以使用随机梯度下降算法、Adagrad((Adaptive Gradient，自适应梯度)算法、Adadelta(AdaGrad算法的改进)、RMSprop(AdaGrad算法的改进)、Adam(Adaptive Moment Estimation，自适应矩估计)算法等中的任意一种来调整特征提取模型的模型参数。In one embodiment, during the training process, the computer device may use a stochastic gradient descent algorithm, Adagrad (Adaptive Gradient, adaptive gradient) algorithm, Adadelta (an improvement of the AdaGrad algorithm), RMSprop (an improvement of the AdaGrad algorithm), Adam ( Adaptive Moment Estimation, adaptive moment estimation) algorithm, etc. to adjust the model parameters of the feature extraction model.

在一个实施例中，训练停止条件可以是模型参数不再发生变化，也可以是损失到达最小值，还可以是训练次数达到最大迭代次数等等。在其他实施例中，计算机设备可以对特征提取模型进行多轮(epoch)迭代训练，每轮训练完成，都会得到该轮的目标损失，则训练停止条件可以是某epoch下平均目标损失不再下降。In one embodiment, the training stop condition may be that the model parameters no longer change, the loss may reach a minimum value, the training number may reach a maximum iteration number, and so on. In other embodiments, the computer device can perform multiple rounds (epoch) of iterative training on the feature extraction model. After each round of training is completed, the target loss of the round will be obtained. The training stop condition may be that the average target loss no longer decreases in a certain epoch. .

上述特征提取模型处理方法中，上述特征提取模型处理方法、装置、计算机设备、存储介质和计算机程序产品，通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量，基于训练特征向量对训练样本进行分类预测，得到预测语义向量，获取训练样本对应的训练类别标签的标签语义向量，基于标签语义向量，确定训练样本的期望语义向量，进而基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失，通过该分类损失来训练特征提取模型，最终得到的目标特征提取模型可以学到趋向于标签语义向量的特征，从而可以在训练特征向量所在向量空间中对标签语义向量所在的语义空间进行模拟，使得提取的特征向量中可以储存表征语义的特征信息，提高了特征提取的准确性，并且由于标签语义向量包括至少两个激活标签向量分量，可以对训练类别标签进行更加准确地表征，而期望语义向量是基于标签语义向量确定的，并且包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，可以使得计算得到的分类损失更加准确，进一步提高了特征提取的准确性。In the above feature extraction model processing method, the above feature extraction model processing method, device, computer equipment, storage medium and computer program product perform feature extraction on the training sample through the feature extraction model to be trained to obtain a training feature vector, based on the training feature vector Classify and predict the training samples to obtain the predicted semantic vector, obtain the label semantic vector of the training category label corresponding to the training sample, determine the expected semantic vector of the training sample based on the label semantic vector, and then based on each expected activation vector component in the expected semantic vector, The difference between the predicted vector components at the corresponding positions in the predicted semantic vector determines the classification loss, and the feature extraction model is trained through this classification loss. The final target feature extraction model can learn features that tend to the label semantic vector, thereby The semantic space where the label semantic vector is located can be simulated in the vector space where the training feature vector is located, so that the extracted feature vector can store feature information representing the semantics, which improves the accuracy of feature extraction, and because the label semantic vector includes at least two The activation label vector components can more accurately represent the training category labels, and the expected semantic vector is determined based on the label semantic vector and contains the expected activation vector components corresponding to the position distribution of at least two activation label vector components, which can This makes the calculated classification loss more accurate and further improves the accuracy of feature extraction.

在一个实施例中，训练样本对应多个训练类别标签，基于标签语义向量，确定训练样本的期望语义向量包括：将各个标签语义向量中，排序位置相同的标签向量分量组成特征分量集合；对于包含激活标签向量分量的特征分量集合，将特征分量集合所属排序位置处的期望向量分量置为激活值；对于不包含激活标签向量分量的特征分量集合，将特征分量集合所属排序位置处的期望向量分量置为非激活值；各个期望向量分量按照各自对应的排序位置进行组合形成训练样本的期望语义向量。In one embodiment, the training sample corresponds to multiple training category labels. Based on the label semantic vector, determining the expected semantic vector of the training sample includes: forming a feature component set by label vector components with the same sorting position in each label semantic vector; for Activate the feature component set of the label vector component, and set the expected vector component at the sorting position to which the feature component set belongs as the activation value; for the feature component set that does not contain the activation label vector component, set the expected vector component at the sorting position to which the feature component set belongs. Set to an inactive value; each expected vector component is combined according to its corresponding sorting position to form the expected semantic vector of the training sample.

其中，排序位置指的指标签向量分量在标签语义向量中的位置，例如对于标签语义向量(a，b，c)，其中a的排序位置为第一位，b的排序位置为第二位，c的排序位置为第三位。Among them, the sorting position refers to the position of the label vector component in the label semantic vector. For example, for the label semantic vector (a, b, c), the sorting position of a is the first position, and the sorting position of b is the second position. The sorting position of c is third.

具体地，计算机设备可以将各个标签语义向量中，排序位置相同的标签向量分量组成特征分量集合，从而对应于每一个排序位置，可以得到包含多个标签向量分量的特征分量集合，对于每一特征分量集合，计算机设备可以判断该特征分量集合中是否包含激活标签向量分量，若该特征分量集合中包含激活标签向量分量，则可以将该特征分量集合所属排序位置处的期望向量分量置为激活值，若该特征分量集合中不包含激活标签向量分量，则可以将该特征分量集合所属排序位置处的期望向量分量置为非激活值，最后将期望向量分量按照各自对应的排序位置进行组合形成训练样本的期望语义向量。Specifically, the computer device can combine the label vector components with the same sorting position in each label semantic vector to form a feature component set, so that corresponding to each sorting position, a feature component set containing multiple label vector components can be obtained. For each feature Component set, the computer device can determine whether the feature component set contains an activation label vector component, and if the feature component set contains an activation label vector component, the expected vector component at the sorting position to which the feature component set belongs can be set as the activation value , if the feature component set does not contain an activation label vector component, the expected vector component at the sorting position to which the feature component set belongs can be set to an inactive value, and finally the expected vector components are combined according to their corresponding sorting positions to form training The expected semantic vector of the sample.

举例说明，假设训练样本对应两个训练类别标签A和B，其中A的标签语义向量为(1，0，1，0)，B的标签语义向量为(0，0，1，1)，其中排序位置为第一位的特征分量集合中包含激活标签向量分量1，则该排序位置处的期望向量分量为1，排序位置为第二位的特征分量集合中不包含激活标签向量分量，则该排序位置处的期望向量分量为0，排序位置为第三位的特征分量集合中包含激活标签向量分量1，则该排序位置处的期望向量分量为1，排序位置为第四位的特征分量集合中包含激活标签向量分量1，则该排序位置处的期望向量分量为1，最终得到的期望语义向量为(1，0，1，1)。For example, assume that the training sample corresponds to two training category labels A and B, where the label semantic vector of A is (1, 0, 1, 0), and the label semantic vector of B is (0, 0, 1, 1), where If the feature component set at the first sorting position contains the activation label vector component 1, then the expected vector component at the sorting position is 1, and the feature component set at the second sorting position does not contain the activation label vector component, then the expected vector component at the sorting position is 1. The expected vector component at the sorting position is 0, and the feature component set at the third sorting position contains the activation label vector component 1, then the expected vector component at the sorting position is 1, and the feature component set at the fourth sorting position contains activation label vector component 1, then the expected vector component at this sorting position is 1, and the final expected semantic vector is (1, 0, 1, 1).

上述实施例中，通过将包含激活标签向量分量的特征分量集合所属排序位置处的期望向量分量置为激活值，将不包含激活标签向量分量的特征分量集合所属排序位置处的期望向量分量置为非激活值，可以使得得到的期望语义向量通过有限的特征位表现无限的多标签，从而节省损失计算过程中的计算量，提高训练效率。In the above embodiment, by setting the expected vector component at the sorting position of the feature component set containing the activation label vector component as the activation value, the expected vector component at the sorting position of the feature component set not containing the activation label vector component is set as The non-activation value can make the expected semantic vector represent unlimited multiple labels through limited feature bits, thereby saving the amount of calculation in the loss calculation process and improving training efficiency.

在一个实施例中，基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失，包括：针对期望语义向量中的每一个期望向量分量，基于期望向量分量与预测语义向量中排序位置对应的预测向量分量之间的差异，确定期望向量分量所在排序位置处的分类损失分量初始值；对期望向量分量所在排序位置处的特征分量集合中的激活标签向量分量进行统计，基于统计结果确定期望向量分量所在排序位置处的激活程度；基于激活程度对分类损失分量初始值进行加权处理，得到分类损失分量目标值；统计各个分类损失分量目标值，以确定分类损失。In one embodiment, determining the classification loss based on the difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the respective corresponding position in the predicted semantic vector includes: for each expected vector in the expected semantic vector Component, based on the difference between the expected vector component and the predicted vector component corresponding to the sorted position in the predicted semantic vector, determine the initial value of the classification loss component at the sorted position where the expected vector component is located; for the feature component set at the sorted position where the expected vector component is located The activation label vector components in are counted, and the activation degree at the sorted position of the expected vector component is determined based on the statistical results; the initial value of the classification loss component is weighted based on the activation degree to obtain the classification loss component target value; each classification loss component target is counted value to determine the classification loss.

其中，统计可以是对激活标签向量分量的数量进行统计。基于统计结果得到的激活程度与统计结果呈正相关，即统计得到的数量值越大，激活程度越高，反之，统计得到的数量值越小，激活程度越低。The statistics may include statistics on the number of activation label vector components. The degree of activation based on statistical results is positively correlated with the statistical results, that is, the larger the numerical value obtained by statistics, the higher the degree of activation. On the contrary, the smaller the numerical value obtained by statistics, the lower the degree of activation.

具体地，计算机设备在确定了每一个期望向量分量所在排序位置处的分类损失分量初始值后，对于每一排序位置，可以统计该排序位置处的特征分量集合中的激活标签向量分量的数量，得到数量值，进而基于该数量值确定该排序位置处的激活程度，将激活程度作为该排序位置处的分类损失分量初始值的权重，将该权重与分类损失分量初始值相乘得到分类损失分量目标值，最后计算机设备可以将各个分类损失分量目标值相加，以确定分类损失。Specifically, after the computer device determines the initial value of the classification loss component at the sorting position where each desired vector component is located, for each sorting position, the number of activation label vector components in the feature component set at the sorting position can be counted, Obtain the quantitative value, and then determine the activation degree at the sorting position based on the quantitative value. Use the activation degree as the weight of the initial value of the classification loss component at the sorting position. Multiply this weight with the initial value of the classification loss component to obtain the classification loss component. target value, and finally the computer device can add the individual classification loss component target values to determine the classification loss.

在一个实施例中，计算机设备基于统计结果确定样本特征分量所在排序位置处的激活程度可以是：对于包含激活标签向量分量的特征分量集合，将统计得到的数量值作为该特征分量集合所在排序位置处的激活程度，而对于不包含激活标签向量分量的特征分量集合，数量值为0，则可以确定该特征分量集合所在排序位置处的激活程度为1。In one embodiment, the computer device determines the activation degree at the sorting position of the sample feature component based on the statistical results by: for a feature component set containing the activation label vector component, the statistically obtained quantitative value is used as the sorting position of the feature component set. The activation degree at the location, and for the feature component set that does not contain the activation label vector component, the quantity value is 0, then it can be determined that the activation degree at the sorted position of the feature component set is 1.

在其他实施例中，计算机设备在统计得到数量值后，可以进一步对数量值进行归一化处理，以得到最终的激活程度，其中，为避免忽略数量值为0的排序位置处的损失分量，可以将各个排序位置处统计得到的数量值加上预设的偏置值得到目标数量值，则激活程度可以是(目标数量值/各个排序位置对应的目标数量值之和)，其中预设的偏置值例如可以是1。举例说明，如上面的例子中，对于训练类别标签A和B，可以统计得到各个排序位置对应的数量值为1、0、2、1，加上预设的偏置值1后，得到目标数量值分别为2、1、3、2，则最后计算得到激活程度分别为1/4、1/8、3/8、1/4。In other embodiments, after the computer device obtains the quantitative value statistically, it can further normalize the quantitative value to obtain the final activation degree, wherein, in order to avoid ignoring the loss component at the sorting position where the quantitative value is 0, The target quantity value can be obtained by adding the statistical quantity value at each sorting position to the preset offset value. Then the activation degree can be (target quantity value/sum of the target quantity values corresponding to each sorting position), where the preset The offset value may be 1, for example. For example, in the above example, for the training category labels A and B, the quantity values corresponding to each sorting position can be statistically obtained as 1, 0, 2, 1. After adding the preset offset value of 1, the target quantity is obtained. The values are 2, 1, 3, and 2 respectively, and the final calculated activation levels are 1/4, 1/8, 3/8, and 1/4 respectively.

在一个实施例中，计算机设备在可以基于期望向量分量与预测语义向量中排序位置对应的预测向量分量之间的差异进行交叉熵损失计算，得到期望向量分量所在排序位置处的分类损失分量初始值。In one embodiment, the computer device can perform cross-entropy loss calculation based on the difference between the expected vector component and the predicted vector component corresponding to the sorted position in the predicted semantic vector, and obtain an initial value of the classification loss component at the sorted position where the expected vector component is located. .

上述实施例中，通过对期望向量分量所在排序位置处的特征分量集合中的激活标签向量分量进行统计，基于统计结果确定期望向量分量所在排序位置处的激活程度，基于激活程度对分类损失分量初始值进行加权处理，得到分类损失分量目标值，得到的损失分类目标值可以准确的反映各个分类损失分量初始值对分类损失的共享，进而统计各个分类损失分量目标值，以确定分类损失，可以得到更为准确的分类损失。In the above embodiment, by counting the activation label vector components in the feature component set at the sorted position where the expected vector component is located, the activation degree at the sorted position where the expected vector component is located is determined based on the statistical results, and the classification loss component is initialized based on the activation degree. The value is weighted to obtain the classification loss component target value. The obtained loss classification target value can accurately reflect the sharing of the classification loss by the initial value of each classification loss component, and then count the target values of each classification loss component to determine the classification loss. It can be obtained More accurate classification loss.

在一个实施例中，在基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型之前，上述方法还包括：基于训练样本的训练特征向量和期望语义向量之间的差异，确定语义约束损失，语义约束损失用于对特征提取模型所提取的训练特征向量进行语义约束；基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型，包括：基于语义约束损失和分类损失，确定目标损失；基于目标损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型。In one embodiment, before training the feature extraction model based on the classification loss and obtaining the target feature extraction model when the training stop condition is met, the above method further includes: based on the training sample between the training feature vector and the expected semantic vector. Difference, determine the semantic constraint loss, the semantic constraint loss is used to semantically constrain the training feature vector extracted by the feature extraction model; the feature extraction model is trained based on the classification loss, and when the training stop condition is met, the target feature extraction model is obtained, including : Determine the target loss based on semantic constraint loss and classification loss; train the feature extraction model based on the target loss. When the training stop condition is met, the target feature extraction model is obtained.

其中，语义约束损失与训练样本的训练特征向量和期望语义向量之间的差异呈正相关，用于对特征提取模型所提取的训练特征向量进行语义约束。通过对训练特征向量进行语义约束，可以使得训练特征向量可以学习到期望语义向量中的语义信息的分布，使得训练特征向量所表达的语义信息更加准确。Among them, the semantic constraint loss is positively related to the difference between the training feature vector and the expected semantic vector of the training sample, and is used to semantically constrain the training feature vector extracted by the feature extraction model. By applying semantic constraints to the training feature vector, the training feature vector can learn the distribution of semantic information in the expected semantic vector, making the semantic information expressed by the training feature vector more accurate.

具体地，计算机设备可以基于训练样本的训练特征向量和期望语义向量之间的差异进行回归损失计算，得到语义约束损失，进而基于语义约束损失和分类损失，确定目标损失，最终基于目标损失对特征提取模型进行参数调整并继续损失，直至训练停止条件时，得到目标特征提取模型。Specifically, the computer device can perform regression loss calculation based on the difference between the training feature vector of the training sample and the expected semantic vector to obtain the semantic constraint loss, and then determine the target loss based on the semantic constraint loss and the classification loss, and finally determine the feature based on the target loss. The extraction model performs parameter adjustment and continues loss until the training stop condition is reached, and the target feature extraction model is obtained.

上述实施例中，通过计算语义约束损失，结合语义约束损失和分类损失进行模型训练，可以使得特征提取模型学习到更为准确的语义信息，进一步提升特征提取的准确性。In the above embodiment, by calculating the semantic constraint loss and combining the semantic constraint loss and the classification loss for model training, the feature extraction model can learn more accurate semantic information and further improve the accuracy of feature extraction.

可以理解的是，为了保证提取得到的训练特征向量能够准确地学习到期望语义向量中的语义信息分布，必须保证训练特征向量的维数大于或者等于期望语义向量的维数，基于此，本申请提供以下实施例。It can be understood that in order to ensure that the extracted training feature vector can accurately learn the semantic information distribution in the expected semantic vector, it must be ensured that the dimension of the training feature vector is greater than or equal to the dimension of the expected semantic vector. Based on this, this application The following examples are provided.

在一个实施例中，基于训练样本的训练特征向量和期望语义向量之间的差异，确定语义约束损失，包括：若训练特征向量的维数和期望语义向量的维数一致，则计算训练特征向量和期望语义向量之间的向量距离，将向量距离作为训练样本对应的语义约束损失。In one embodiment, determining the semantic constraint loss based on the difference between the training feature vector of the training sample and the expected semantic vector includes: if the dimension of the training feature vector is consistent with the dimension of the expected semantic vector, then calculating the training feature The vector distance between the vector and the expected semantic vector, using the vector distance as the semantic constraint loss corresponding to the training sample.

本实施例中，若训练特征向量的维数和期望语义向量的维数一致时，那么对于训练特征向量中的每一个训练向量分量在期望语义向量中都存在对应的期望向量分量，即训练特征向量中的训练向量分量与期望语义向量中的期望向量分量存在一一映射关系。举例说明，参考图3中的(a)图，假设训练特征向量和期望语义向量的维数均为3，(a)图中的黑色圆圈代表期望语义向量中的期望向量分量，白色圆圈代表训练特征向量中的训练向量分量，由(a)图可以看出训练向量分量与期望向量分量存在一一映射关系。In this embodiment, if the dimension of the training feature vector is consistent with the dimension of the expected semantic vector, then for each training vector component in the training feature vector, there is a corresponding expected vector component in the expected semantic vector, that is, training There is a one-to-one mapping relationship between the training vector component in the feature vector and the expectation vector component in the expected semantic vector. For example, refer to (a) in Figure 3. Assume that the dimensions of the training feature vector and the expected semantic vector are both 3. The black circle in (a) represents the expected vector component in the expected semantic vector, and the white circle represents the training As for the training vector component in the feature vector, it can be seen from Figure (a) that there is a one-to-one mapping relationship between the training vector component and the expected vector component.

本实施例中，考虑到训练特征向量和期望语义向量为维数相同的向量，因此计算机设备可以直接计算训练特征向量和期望语义向量之间的向量距离得到回归损失，从而得到训练样本对应的语义约束损失。在具体实施时，向量距离例如可以是L2距离。In this embodiment, considering that the training feature vector and the expected semantic vector are vectors with the same dimension, the computer device can directly calculate the vector distance between the training feature vector and the expected semantic vector to obtain the regression loss, thereby obtaining the semantics corresponding to the training sample. Constraint loss. In specific implementation, the vector distance may be, for example, L2 distance.

上述实施例中，通过训练特征向量和期望语义向量之间的向量距离来得到语义约束损失，可以提高回归损失的计算效率，从而提高训练效率。In the above embodiment, the semantic constraint loss is obtained by using the vector distance between the training feature vector and the expected semantic vector, which can improve the calculation efficiency of the regression loss, thereby improving the training efficiency.

考虑在对样本进行表征时，需要对样本进行整体表征，而分类信息只是样本的一种高度抽象，对于相同分类下的不同样本，需要在表征分类的语义信息之后引入样本特有的信息以区分相同分类下的不同样本，因此训练特征向量的维数可以大于期望语义向量的维数的，以留出一定的特征位作为分类信息之外的补充，基于此，本申请进一步提供以下实施例。Consider that when characterizing a sample, it is necessary to characterize the sample as a whole, and the classification information is only a high degree of abstraction of the sample. For different samples under the same classification, it is necessary to introduce sample-specific information after the semantic information that characterizes the classification to distinguish the same Different samples under classification, therefore the dimension of the training feature vector can be larger than the dimension of the expected semantic vector, so as to leave certain feature bits as a supplement to the classification information. Based on this, this application further provides the following embodiments.

在一个实施例中，基于训练样本的训练特征向量和期望语义向量之间的差异，确定语义约束损失，包括：若训练特征向量的维数大于期望语义向量的维数，则对于期望语义向量中每个期望向量分量，从所述训练特征向量中选取一个训练向量分量，将选取的训练向量分量作为所述期望向量分量的映射向量分量；计算期望向量分量与映射向量分量之间的差异，得到期望向量分量对应的语义约束损失分量；统计各个语义约束损失分量，得到语义约束损失。In one embodiment, determining the semantic constraint loss based on the difference between the training feature vector of the training sample and the expected semantic vector includes: if the dimension of the training feature vector is greater than the dimension of the expected semantic vector, then for the expected semantic vector For each desired vector component, select a training vector component from the training feature vector, and use the selected training vector component as the mapping vector component of the desired vector component; calculate the difference between the desired vector component and the mapping vector component, and obtain Expect the semantic constraint loss component corresponding to the vector component; count each semantic constraint loss component to obtain the semantic constraint loss.

本实施例中，训练特征向量的维数大于期望语义向量的维数，那么在计算语义约束损失时，对于期望语义向量中每个期望向量分量，可以先从训练特征向量中确定与期望向量分量对应的训练向量分量，将确定的训练向量分量作为期望向量分量的映射向量分量，即在学习的过程中，用各个映射向量分量所在的特征位去学习各自对应的期望向量分量，那么可以基于期望向量分量和映射向量分量之间的差异进行回归损失计算，得到期望向量分量对应的语义约束损失分量，最后统计各个语义约束损失分量，得到语义约束损失分量。这里的统计可以是求和或者求平均。In this embodiment, the dimension of the training feature vector is greater than the dimension of the expected semantic vector. Then when calculating the semantic constraint loss, for each expected vector component in the expected semantic vector, the corresponding expected vector component can first be determined from the training feature vector. For the corresponding training vector component, the determined training vector component is used as the mapping vector component of the expected vector component. That is, in the learning process, the characteristic bits of each mapping vector component are used to learn the corresponding expected vector components. Then it can be based on the expected The difference between the vector component and the mapping vector component is used for regression loss calculation to obtain the semantic constraint loss component corresponding to the expected vector component. Finally, each semantic constraint loss component is counted to obtain the semantic constraint loss component. The statistics here can be summation or averaging.

可以理解的是，训练特征向量中未被选择的训练向量分量所在的特征位，即为可以用来补充类别信息之外的样本信息的表征位。It can be understood that the feature bits where the unselected training vector components in the training feature vector are located are representation bits that can be used to supplement sample information other than category information.

在一个实施例中，对于期望语义向量中每个期望向量分量，计算机设备可以按照一定的规则从训练特征向量中确定与该期望向量分量对应的训练向量分量，例如，对于每一个期望向量分量，可以从训练特征向量中选取与之位置对应的训练向量分量，将选取的训练向量分量作为该期望向量分量的映射向量分量。举例说明，假设训练特征向量为5维向量，期望语义向量为3维向量，则对于期望语义向量中的第一个特征位上的期望向量分量，将训练特征向量中第一个特征位上的训练向量分量作为映射向量分量，对于期望语义向量中的第二个特征位上的期望向量分量，将训练特征向量中第二个特征位上的训练向量分量作为映射向量分量，对于期望语义向量中的第三个特征位上的期望向量分量，将训练特征向量中第三个特征位上的训练向量分量作为映射向量分量。In one embodiment, for each expected vector component in the expected semantic vector, the computer device can determine the training vector component corresponding to the expected vector component from the training feature vector according to certain rules, for example, for each expected vector component, The training vector component corresponding to the position can be selected from the training feature vector, and the selected training vector component can be used as the mapping vector component of the expected vector component. For example, assuming that the training feature vector is a 5-dimensional vector and the expected semantic vector is a 3-dimensional vector, then for the expected vector component on the first feature bit in the expected semantic vector, the first feature bit in the training feature vector will be The training vector component is used as the mapping vector component. For the expected vector component on the second feature bit in the expected semantic vector, the training vector component on the second feature bit in the training feature vector is used as the mapping vector component. For the expected semantic vector in The expected vector component on the third feature bit of the training feature vector is the training vector component on the third feature bit as the mapping vector component.

可以理解的是，在其他一些实施例中，计算机设备还可以按照其他规则选取训练向量分量，只要保证每一个期望向量分量存在对应的映射向量分量。It can be understood that in some other embodiments, the computer device can also select training vector components according to other rules, as long as it is ensured that there is a corresponding mapping vector component for each desired vector component.

上述实施例中，若训练特征向量的维数大于期望语义向量的维数，计算期望向量分量与映射向量分量之间的差异，得到期望向量分量对应的语义约束损失分量，统计各个语义约束损失分量，得到语义约束损失，可以对语义约束损失进行准确计算，提高特征提取的准确性。In the above embodiment, if the dimension of the training feature vector is greater than the dimension of the expected semantic vector, the difference between the expected vector component and the mapping vector component is calculated, the semantic constraint loss component corresponding to the expected vector component is obtained, and each semantic constraint loss component is counted. , the semantic constraint loss is obtained, the semantic constraint loss can be accurately calculated, and the accuracy of feature extraction can be improved.

在一个实施例中，从所述训练特征向量中选取一个训练向量分量，将选取的训练向量分量作为所述期望向量分量的映射向量分量，包括：按照期望语义向量的维数，对训练特征向量中的训练向量分量进行分组，得到数量与期望语义向量的维数一致的多个向量分量组；对于期望语义向量中每个期望向量分量，从多个特征分量组中选取一个向量分量组作为目标向量分量组；从目标向量分量组中选取一个训练向量分量作为期望向量分量对应的映射向量分量。In one embodiment, selecting a training vector component from the training feature vector, and using the selected training vector component as a mapping vector component of the expected vector component includes: mapping the training feature vector according to the dimension of the expected semantic vector. The training vector components in are grouped to obtain multiple vector component groups whose number is consistent with the dimension of the expected semantic vector; for each expected vector component in the expected semantic vector, a vector component group is selected from multiple feature component groups as Target vector component group; select a training vector component from the target vector component group as the mapping vector component corresponding to the desired vector component.

具体地，计算机设备可以按照期望语义向量的维数，对训练特征向量中的训练向量分量进行分组，得到数量与期望语义向量的维数一致的多个向量分量组，向量分量组可以按照其所包含的向量的排序位置进行排序，从而对于期望语义向量中每个期望向量分量，计算机设备可以从多个特征分量组中与之排序位置对应的向量分量组作为目标向量分量组，从目标向量分量组中选择一个训练向量分量作为期望向量分量对应的映射向量分量。Specifically, the computer device can group the training vector components in the training feature vector according to the dimension of the desired semantic vector to obtain multiple vector component groups whose number is consistent with the dimension of the desired semantic vector. The vector component groups can be grouped according to the dimension of the desired semantic vector. The sorted positions of the included vectors are sorted, so that for each desired vector component in the desired semantic vector, the computer device can select a vector component group corresponding to the sorted position from the plurality of feature component groups as a target vector component group, from the target vector Select a training vector component from the component group as the mapping vector component corresponding to the desired vector component.

在一个实施例中，若训练特征向量的维数Nh为期望语义向量的维数n的整数倍，即n*P＝Nh(P大于1)，则可以对训练特征向量中的训练向量分量进行平均分组。举例说明，参考图3中的(b)图，其中期望语义向量为3维向量，训练特征向量为6维，(b)图中的黑色圆圈代表期望语义向量中的期望向量分量，白色圆圈代表训练特征向量中的训练向量分量，(b)图中将训练特征向量中的训练向量分量按照顺序划分为三组，每两个训练向量分量划分为一组，将每一组中的第一个训练向量分量作为期望向量分量对应的映射向量分量。In one embodiment, if the dimension Nh of the training feature vector is an integer multiple of the dimension n of the desired semantic vector, that is, n*P=Nh (P is greater than 1), then the training vector components in the training feature vector can be Group evenly. For example, refer to (b) in Figure 3, in which the expected semantic vector is a 3-dimensional vector and the training feature vector is 6-dimensional. The black circle in (b) represents the expected vector component in the expected semantic vector, and the white circle represents The training vector components in the training feature vector. In the figure (b), the training vector components in the training feature vector are divided into three groups in order. Each two training vector components are divided into one group, and the first one in each group is divided into three groups. The training vector component serves as the mapping vector component corresponding to the expected vector component.

上述实施例中，通过对训练特征向量中的训练向量分量进行分组，从每一组中选择一个训练向量分量作为期望向量分量对应的映射向量分量，可以使得训练特征向量中各个特征位可以更好的学习期望语义向量中的语义信息。In the above embodiment, by grouping the training vector components in the training feature vector and selecting one training vector component from each group as the mapping vector component corresponding to the expected vector component, each feature bit in the training feature vector can be better The learning expects the semantic information in the semantic vector.

由上面的实施例可知，训练特征向量的向量维数是大于或者等于期望语义向量的维数的，而期望语义向量是基于训练类别标签得到的，当训练样本存在多个训练类别标签时，期望语义向量需要融合多个标签语义向量得到，因此，期望语义向量的维数是大于或者等于标签语义向量的，基于此，本申请还提供以下实施例来生成标签语义向量。It can be seen from the above embodiment that the vector dimension of the training feature vector is greater than or equal to the dimension of the expected semantic vector, and the expected semantic vector is obtained based on the training category label. When there are multiple training category labels in the training sample, the expected semantic vector The semantic vector needs to be obtained by fusing multiple tag semantic vectors. Therefore, it is expected that the dimension of the semantic vector is greater than or equal to the tag semantic vector. Based on this, this application also provides the following embodiments to generate the tag semantic vector.

在一个实施例中，训练样本属于训练样本集合，训练样本集合中各个训练样本对应的训练类别标签组成标签集合，上述方法还包括：构建目标语义空间；目标语义空间的目标向量维数不超过训练特征向量的向量维数；将标签集合中的各个训练类别标签，分别映射至目标语义空间中，得到各个训练类别标签各自的标签语义向量。In one embodiment, the training samples belong to a training sample set, and the training category labels corresponding to each training sample in the training sample set form a label set. The above method also includes: constructing a target semantic space; the target vector dimension of the target semantic space does not exceed the training The vector dimension of the feature vector; map each training category label in the label set to the target semantic space to obtain the label semantic vector of each training category label.

具体地，计算机设备可以构建目标语义空间，并保证目标语义空间的目标向量维数不超过训练特征向量的向量维数，进而将标签集合中各个训练类别标签分别映射至目标语义空间中，得到各个训练类别标签各自的标签语义向量，计算机设备进一步可以建立训练类别标签与标签语义向量之间的映射关系，从而在训练过程中，计算机设备可以基于该映射关系获取到训练类别标签的标签语义向量。Specifically, the computer device can construct the target semantic space and ensure that the target vector dimension of the target semantic space does not exceed the vector dimension of the training feature vector, and then map each training category label in the label set to the target semantic space to obtain each The computer device can further establish a mapping relationship between the training category labels and the label semantic vectors of the training category labels, so that during the training process, the computer device can obtain the label semantic vectors of the training category labels based on the mapping relationship.

进一步，在一个具体的实施例中，构建目标语义空间，包括：生成矩阵阶数与目标向量维数匹配的目标矩阵，目标矩阵中包括多个候选表征向量，多个候选表征向量形成目标语义空间；候选表征向量中包括数量相等的第一数值和第二数值；将标签集合中的各个训练类别标签，分别映射至目标语义空间中，得到各个训练类别标签各自的标签语义向量，包括：对于语义类别标签集合中的各个语义类别标签，分别从多个候选表征向量中选取一个候选表征向量，作为各个语义类别标签对应的标签语义向量。Further, in a specific embodiment, constructing the target semantic space includes: generating a target matrix whose matrix order matches the dimension of the target vector. The target matrix includes multiple candidate representation vectors, and the multiple candidate representation vectors form the target semantic space. ; The candidate representation vector includes an equal number of first and second values; each training category label in the label set is mapped to the target semantic space, and the respective label semantic vector of each training category label is obtained, including: for semantics For each semantic category label in the category label set, one candidate representation vector is selected from multiple candidate representation vectors as the label semantic vector corresponding to each semantic category label.

其中，目标矩阵中包括多个候选表征向量，并且候选表征向量中包括数量相等的第一数值和第二数值，第一数值与第二数值为不相同的数值，第一数值可以代表用于作为激活向量分量，第二数值可以用于作为非激活向量分量。由于候选表征向量中包括数量相等的第一数值和第二数值，因此将候选表征向量作为标签语义向量时，所构建的语义空间是均匀分布的语义空间。Wherein, the target matrix includes a plurality of candidate representation vectors, and the candidate representation vectors include an equal number of first values and second values. The first values and the second values are different values, and the first value can represent the value used as Activation vector component, the second value can be used as a non-activation vector component. Since the candidate representation vector includes an equal number of first values and second values, when the candidate representation vector is used as a label semantic vector, the semantic space constructed is a uniformly distributed semantic space.

具体地，计算机设备可以确定目标语义空间的目标向量维数，在确定了目标向量维数后，生成矩阵阶数与目标向量维数匹配的目标矩阵，目标矩阵可以是阿达马矩阵(Hadamard)。已知对于n阶阿达马矩阵H，其n+1阶阿达马矩阵为：Specifically, the computer device may determine the target vector dimension of the target semantic space, and after determining the target vector dimension, generate a target matrix whose matrix order matches the target vector dimension. The target matrix may be a Hadamard matrix. It is known that for the n-order Hadamard matrix H, its n+1-order Hadamard matrix is:

故可以得到各阶阿达马矩阵，例如4阶阿达马矩阵如下：Therefore, the Hadamard matrix of each order can be obtained. For example, the 4th order Hadamard matrix is as follows:

其中，矩阵中每个值可以是1或者-1，1表示激活，-1表示不激活。同时第一行全为1，且矩阵是对称矩阵，矩阵对角线上元素的和(矩阵的迹)为0。Among them, each value in the matrix can be 1 or -1, 1 means activation, and -1 means inactivation. At the same time, the first row is all 1, and the matrix is a symmetric matrix, and the sum of the elements on the diagonal of the matrix (the trace of the matrix) is 0.

阿达马矩阵中除第一行外，其他各行的向量可以作为候选表征向量，这些候选表征向量构成目标语义空间，计算机设备可以从该目标语义空间中选择一个候选表征向量作为语义类别标签对应的标签语义向量，其中，对于不同的语义类别标签，计算机设备选择不同的候选标签向量作为标签语义向量。Except for the first row, the vectors in the other rows of the Hadamard matrix can be used as candidate representation vectors. These candidate representation vectors constitute the target semantic space. The computer device can select a candidate representation vector from the target semantic space as the label corresponding to the semantic category label. Semantic vectors, wherein for different semantic category labels, the computer device selects different candidate label vectors as label semantic vectors.

由于阿达马矩阵的各个候选表征向量中-1和1的数量相同，将这些候选表征向量作为目标语义空间中的向量时，可以实现均匀激活的效果，所构成的目标语义向量为均匀分布的语义空间。本申请实施例中，通过将训练类别标签映射至该均匀激活的语义空间中，得到的语义标签向量可以使得训练得到的特征提取模型所提取的特征向量模拟该均匀分布的语义空间，提升特征提取模型所提取特征的语义表征效果，避免某些语义信息在特征位上冗余或者被压缩。Since the numbers of -1 and 1 in each candidate representation vector of the Hadamard matrix are the same, when these candidate representation vectors are used as vectors in the target semantic space, the effect of uniform activation can be achieved, and the resulting target semantic vector is a uniformly distributed semantic space. In the embodiment of the present application, by mapping the training category labels to the uniformly activated semantic space, the obtained semantic label vectors can make the feature vectors extracted by the trained feature extraction model simulate the uniformly distributed semantic space, improving feature extraction. The semantic representation effect of the features extracted by the model prevents certain semantic information from being redundant or compressed in the feature bits.

参考图4，为一个实施例中，为特征学习效果比对图，其中左边的图表示均匀分布的语义空间，不同的圆表示不同标签在语义空间上的分布，最大圆为整体语义空间，不同标签分布在圆周上，由于语义空间采用阿达马矩阵中的向量表征每个标签，故两两标签间距离相同，右边的图代表哈希空间，本实施例中，哈希空间用于度量学习，因此在哈希空间中相似的样本相互靠近，例如图4中加下划线的两个X表示相似样本对。参考图4中的(a)图，均匀分布的语义空间与哈希相似度空间非直接关联，因此不同类别的样本夹杂着分布在哈希空间上，如(a)图中，实线的X代表属于小圆1所代表标签的样本，虚线的X代表属于小圆2所代表标签的样本，由(a)图可以看出这些样本随意分布，虚线的X和实线的X所代表的样本相互交叉。参考图4中的(b)图，基于语义空间微调的哈希空间中，除了样本对靠近外，各样本对朝其标签所在的标签语义向量方向分布，故可以哈希空间中模拟一个接近语义均匀分布的空间。Refer to Figure 4, which is a comparison diagram of feature learning effects in one embodiment. The diagram on the left represents a uniformly distributed semantic space. Different circles represent the distribution of different labels in the semantic space. The largest circle is the overall semantic space. Different circles The labels are distributed on the circle. Since the semantic space uses vectors in the Hadamard matrix to represent each label, the distance between two labels is the same. The picture on the right represents the hash space. In this embodiment, the hash space is used for metric learning. Therefore, similar samples are close to each other in the hash space. For example, the two underlined Xs in Figure 4 represent pairs of similar samples. Referring to (a) in Figure 4, the uniformly distributed semantic space is not directly related to the hash similarity space, so samples of different categories are mixed and distributed in the hash space, as shown in (a), the solid line X represents the samples belonging to the label represented by small circle 1, and the dotted line Cross each other. Referring to (b) in Figure 4, in the hash space based on semantic space fine-tuning, in addition to the proximity of sample pairs, each sample pair is distributed in the direction of the label semantic vector where its label is located, so a close semantic can be simulated in the hash space Evenly distributed space.

上述实施例中，通过模拟均匀分布的语义空间，可以进一步提升所提取特征的语义表征效果。In the above embodiment, by simulating a uniformly distributed semantic space, the semantic representation effect of the extracted features can be further improved.

在一个实施例中，基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型，包括：获取训练样本的对比训练样本对应的对比特征向量，基于训练特征向量与对比特征向量之间的差异得到特征提取损失；基于特征提取损失以及分类损失，得到目标损失；基于目标损失对特征提取模型进行参数调整并继续训练，当满足训练停止条件时，得到目标特征提取模型。In one embodiment, the feature extraction model is trained based on the classification loss. When the training stop condition is met, the target feature extraction model is obtained, which includes: obtaining the comparison feature vector corresponding to the training sample, and comparing the training feature vector and the training sample based on the comparison The difference between feature vectors is used to obtain the feature extraction loss; based on the feature extraction loss and classification loss, the target loss is obtained; based on the target loss, the parameters of the feature extraction model are adjusted and training continues. When the training stop condition is met, the target feature extraction model is obtained.

其中，对比训练样本指的是用于与训练样本进行对比以确定特征提取损失的训练样本。对比训练样本可以包括正向对比训练样本或者负向对比训练样本中的至少一种，正向对比训练样本指的是与训练样本相似的训练样本，负向对比训练样本指的是与当前训练样本不相似的训练样本。Among them, the comparison training sample refers to the training sample used for comparison with the training sample to determine the feature extraction loss. The comparison training samples may include at least one of positive comparison training samples or negative comparison training samples. Positive comparison training samples refer to training samples that are similar to the training samples, and negative comparison training samples refer to training samples that are similar to the current training samples. Dissimilar training samples.

具体地，计算机设备可以基于待训练的特征提取模型对对比训练样本进行特征提取，得到对比训练样本对应的训练特征向量作为对比特征向量，基于训练特征向量与对比特征向量之间的差异得到特征提取损失，将特征提取损失和分类损失进行加权求和，得到目标损失，基于目标损失对待训练的特征提取模型进行参数调整并继续训练，当满足训练停止条件时，得到目标特征提取模型。Specifically, the computer device can perform feature extraction on the comparison training sample based on the feature extraction model to be trained, obtain the training feature vector corresponding to the comparison training sample as the comparison feature vector, and obtain the feature extraction based on the difference between the training feature vector and the comparison feature vector. Loss, perform a weighted sum of feature extraction loss and classification loss to obtain the target loss. Based on the target loss, adjust the parameters of the feature extraction model to be trained and continue training. When the training stop condition is met, the target feature extraction model is obtained.

在一个实施例中，计算机设备基于训练特征向量与对比特征向量的差异得到特征提取损失具体可以是：计算对比特征向量和训练特征向量之间的余弦相似度，用得到的余弦相似度表征二者之间的差异，从而可以计算余弦相似度与训练标签之间的差值得到特征提取损失。其中，当对比训练样本为正向对比训练样本时，训练标签为1，当对比训练样本为负向对比训练样本时，训练标签为0。In one embodiment, the computer device obtains the feature extraction loss based on the difference between the training feature vector and the comparison feature vector. Specifically, the method may be: calculating the cosine similarity between the comparison feature vector and the training feature vector, and using the obtained cosine similarity to characterize the two. The difference between the cosine similarity and the training label can be calculated to obtain the feature extraction loss. Among them, when the comparison training sample is a positive comparison training sample, the training label is 1, and when the comparison training sample is a negative comparison training sample, the training label is 0.

在一些实施例中，计算机设备基于训练特征向量与对比特征向量的差异得到特征提取损失具体可以是：对训练特征向量与对比特征向量进行相似度分类，分为两类，相似和不相似，得到相似的概率和不相似的概率，取概率较大值为分类结果，用分类结果的概率值表征训练特征向量与对比特征向量的差异，进而可以计算分类结果的概率和训练标签之间的差值得到特征提取损失。其中，当对比训练样本为正向对比训练样本时，训练标签为1，当对比训练样本为负向对比训练样本时，训练标签为0。In some embodiments, the computer device obtains the feature extraction loss based on the difference between the training feature vector and the comparison feature vector. Specifically, the method may be: performing similarity classification on the training feature vector and the comparison feature vector, and dividing them into two categories, similar and dissimilar, to obtain Similar probability and dissimilar probability, take the larger value of the probability as the classification result, use the probability value of the classification result to represent the difference between the training feature vector and the comparison feature vector, and then calculate the difference between the probability of the classification result and the training label Get the feature extraction loss. Among them, when the comparison training sample is a positive comparison training sample, the training label is 1, and when the comparison training sample is a negative comparison training sample, the training label is 0.

在一个实施例中，计算机设备基于训练特征向量与对比特征向量的差异得到特征提取损失具体可以是：计算训练特征向量与对比特征向量之间的特征距离，用该特征距离表征训练特征向量与对比特征向量的差异，并将该特征距离作为特征提取损失，其中特征距离例如可以是欧式距离或者是L2距离。In one embodiment, the computer device obtains the feature extraction loss based on the difference between the training feature vector and the comparison feature vector. Specifically, the feature extraction loss can be calculated by: calculating the feature distance between the training feature vector and the comparison feature vector, and using the feature distance to characterize the training feature vector and the comparison feature vector. The difference in feature vectors is used as the feature extraction loss, where the feature distance can be, for example, Euclidean distance or L2 distance.

上述实施例中，计算机设备在对待训练的特征提取模型进行训练时，同时特征提取损失以及分类损失来进行参数调整，使得的训练得到的特征提取模型提取的特征更加准确。In the above embodiment, when the computer device trains the feature extraction model to be trained, the feature extraction loss and the classification loss are simultaneously adjusted to adjust parameters, so that the features extracted by the trained feature extraction model are more accurate.

在一个实施例中，对比特征向量包括正向对比训练样本对应的正向对比特征向量以及负向对比训练样本对应的负向对比特征向量；基于训练特征向量与对比特征向量之间的差异得到特征提取损失包括：获取正向特征差异值，正向特征差异值为训练特征向量与正向对比特征向量之间的特征差异值；获取负向特征差异值，负向特征差异值为训练特征向量与负向对比特征向量之间的特征差异值；基于正向特征差异值与负向特征差异值确定特征提取损失。In one embodiment, the comparison feature vector includes a positive comparison feature vector corresponding to a positive comparison training sample and a negative comparison feature vector corresponding to a negative comparison training sample; the feature is obtained based on the difference between the training feature vector and the comparison feature vector. The extraction loss includes: obtaining the positive feature difference value, which is the feature difference value between the training feature vector and the positive comparison feature vector; obtaining the negative feature difference value, which is the feature difference value between the training feature vector and the positive comparison feature vector. Negatively compare the feature difference values between feature vectors; determine the feature extraction loss based on the positive feature difference value and the negative feature difference value.

具体地，计算机设备基于待训练的特征提取模型对正向对比训练样本提取特征，得到正向对比训练样本对应的正向对比特征向量，基于待训练的特征提取模型对负向对比训练样本提取特征，得到负向对比训练样本对应的负向对比特征向量，获取训练特征向量与正向对比特征向量之间的特征差异值得到正向特征差异值，获取训练特征向量与负向对比特征向量之间的特征差异值得到负向特征差异值，最后基于正向特征差异值与负向特征差异值确定特征提取损失。Specifically, the computer device extracts features from the positive contrast training sample based on the feature extraction model to be trained, obtains the positive contrast feature vector corresponding to the positive contrast training sample, and extracts features from the negative contrast training sample based on the feature extraction model to be trained. , obtain the negative contrast feature vector corresponding to the negative contrast training sample, obtain the feature difference value between the training feature vector and the positive comparison feature vector, obtain the positive feature difference value, and obtain the difference between the training feature vector and the negative comparison feature vector The feature difference value is obtained to obtain the negative feature difference value, and finally the feature extraction loss is determined based on the positive feature difference value and the negative feature difference value.

在一个实施例中，计算机设备可以参考以下公式(1)确定特征提取损失，其中x_a为训练特征向量，x_p为正向对比特征向量，x_n为负向对比特征向量，||x_a-x_p||表示x_a和x_p之间的L2距离，即正向特征差异值，||x_a-x_n||表示x_a和x_n之间的L2距离，即负向特征差异值，公式(1)的目的是使得训练样本与负向对比训练样本的距离比该训练样本与正向对比训练样本的距离大α，α为margin(间隔项)，α的取值可以根据需要进行设定。In one embodiment, the computer device can determine the feature extraction loss with reference to the following formula (1), where x _a is the training feature vector, x _p is the positive contrast feature vector, x _n is the negative contrast feature vector, ||x _a -x _p || represents the L2 distance between x _a and x _p , that is, the positive feature difference value, ||x _a -x _n || represents the L2 distance between x _a and x _n , that is, the negative feature difference value value, the purpose of formula (1) is to make the distance between the training sample and the negative comparison training sample larger than the distance between the training sample and the positive comparison training sample α, α is the margin (spacing term), and the value of α can be as needed Make settings.

L_tri＝max(||x_a-x_p||-||x_a-x_n||+α,0) 公式(1)L _tri =max(||x _a -x _p ||-||x _a -x _n ||+α,0) Formula (1)

由公式(1)可以看出，只有在训练样本与负向对比训练样本的距离比该训练样本与正向对比训练样本的距离大α时，特征提取损失值才为0，否则，特征提取损失值大于0，因此在降低损失值的过程中，训练样本与负向对比训练样本的距离，朝着比训练样本与正向对比训练样本的距离大α的方向发展，从而使得特征提取模型提取的特征能够更加准确地表达语义信息。It can be seen from formula (1) that the feature extraction loss value is 0 only when the distance between the training sample and the negative comparison training sample is greater than the distance α between the training sample and the positive comparison training sample. Otherwise, the feature extraction loss value is 0. The value is greater than 0, so in the process of reducing the loss value, the distance between the training sample and the negative contrast training sample develops in the direction of α greater than the distance between the training sample and the positive contrast training sample, thus making the feature extraction model extract Features can express semantic information more accurately.

上述实施例中，由于特征提取损失值是基于正向特征差异值与负向特征差异值确定的，使得特征提取模型在进行相似性的度量学习时，考虑到了类间特征距离过小对分类的影响，进一步提高了特征提取模型所提取特征的语义准确性。In the above embodiment, since the feature extraction loss value is determined based on the positive feature difference value and the negative feature difference value, when the feature extraction model performs similarity metric learning, it takes into account that the feature distance between classes is too small for classification. The impact further improves the semantic accuracy of features extracted by the feature extraction model.

可以理解的是，当结合上述公式(1)计算得到的特征提取损失对特征提取模型进行训练时，特征提取模型可以进行度量学习，从而使得提取的特征具备相似度度量能力。It can be understood that when the feature extraction model is trained in combination with the feature extraction loss calculated by the above formula (1), the feature extraction model can perform metric learning, so that the extracted features have similarity measurement capabilities.

在一个实施例中，通过待训练的特征提取模型提取训练样本的训练特征向量，包括：通过待训练的特征提取模型提取训练样本的初始样本特征，并对初始样本特征进行量化处理，得到训练样本的训练特征向量；基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型，包括：基于预设的符号函数确定训练特征向量中各个量化值对应的量化目标，基于各个量化值与各自对应的量化目标之间的差异确定量化损失；基于量化损失以及分类损失，得到目标损失；基于目标损失对特征提取模型进行参数调整并继续训练，当满足训练停止条件时，得到目标特征提取模型。In one embodiment, extracting the training feature vector of the training sample through the feature extraction model to be trained includes: extracting the initial sample features of the training sample through the feature extraction model to be trained, and performing quantification processing on the initial sample features to obtain the training sample. training feature vector; the feature extraction model is trained based on the classification loss. When the training stop condition is met, the target feature extraction model is obtained, including: determining the quantification target corresponding to each quantization value in the training feature vector based on the preset symbolic function, based on The difference between each quantified value and the corresponding quantified target determines the quantified loss; based on the quantified loss and the classification loss, the target loss is obtained; based on the target loss, adjust the parameters of the feature extraction model and continue training. When the training stop condition is met, we get Target feature extraction model.

其中，量化损失指的是计算量化效果(是否足够接近-1或1)的损失，在训练过程中，期望训练特征向量中的每一位量化值都足够接近1或-1。量化损失与量化值、量化值对应的量化目标之间的差异成正相关。Among them, the quantization loss refers to the loss of calculating the quantization effect (whether it is close enough to -1 or 1). During the training process, it is expected that each quantized value in the training feature vector is close enough to 1 or -1. The quantization loss is positively related to the difference between the quantization value and the quantization target corresponding to the quantization value.

在一个实施例中，对于每个训练样本的量化处理得到的训练特征向量，Q_i为对该训练样本的量化处理得到的训练特征向量Q在第i位的值，B_i为第i位的量化目标，B_i由Q_i经过预设的符号函数-sign函数-产生，通过以下公式(3)中采用sign函数可以对训练特征向量Q的每一位Q_i分别计算其目标编码Bi，最终Q的目标编码为B，然后参考公式(2)计算得到量化损失。In one embodiment, for the training feature vector obtained by quantization processing of each training sample, Q _i is the value of the training feature vector Q obtained by quantization processing of the training sample at the i-th position, and B _i is the i-th position value. Quantification target, B _i is generated by Q _i through the preset sign function - sign function -. By using the sign function in the following formula (3), the target code Bi can be calculated separately for each bit Q _i of the training feature vector Q. Finally, The target encoding of Q is B, and then the quantization loss is calculated by referring to formula (2).

在一个实施例中，在得到量化损失后，计算机设备可以通过量化损失以及分类损失各自的预设权重，对两个损失进行加权求和，得到目标损失，进而基于目标损失调整特征提取模型的模型参数，得到调整后的特征提取模型。In one embodiment, after obtaining the quantified loss, the computer device can perform a weighted sum of the two losses through the preset weights of the quantified loss and the classification loss to obtain the target loss, and then adjust the model of the feature extraction model based on the target loss. parameters to obtain the adjusted feature extraction model.

上述实施例中，结合量化损失以及分类损失，得到目标损失，基于该目标损失对模型进行训练，训练得到的特征提取模型能够提取得到准确的量化特征。In the above embodiment, the target loss is obtained by combining the quantification loss and the classification loss. The model is trained based on the target loss. The trained feature extraction model can extract accurate quantitative features.

在一个具体的实施例中，如图5所示，为特征提取模型的训练过程示意图。本实施例中，训练样本的内容为图像，训练样本为图像样本。待训练的特征提取模型包括CNN模块，embedding模块，哈希量化模块。参考图5，计算机设备首先借助阿达马矩阵建立训练样本集合对应的均匀表达语义多标签空间，在训练过程中，将训练样本输入CNN模块，得到基础特征信息，将基础特征信息输入embedding模块得到embedding向量，再通过哈希量化模块进行量化处理得到训练特征向量，该训练特征向量进一步输入至分类模型中，分类模型输出的预测语义向量与该训练样本的期望语义向量计算分类损失，并基于期望语义向量和训练特征向量之间的差异计算语义约束损失，该语义约束损失、与度量损失以及分类损失进行加权，得到目标损失。其中，度量损失可参考上文实施例中的量化损失和特征提取损失加权求和得到。In a specific embodiment, as shown in Figure 5, it is a schematic diagram of the training process of the feature extraction model. In this embodiment, the content of the training sample is an image, and the training sample is an image sample. The feature extraction model to be trained includes CNN module, embedding module, and hash quantization module. Referring to Figure 5, the computer device first uses the Hadamard matrix to establish a uniformly expressed semantic multi-label space corresponding to the training sample set. During the training process, the training samples are input into the CNN module to obtain basic feature information, and the basic feature information is input into the embedding module to obtain the embedding. The vector is then quantized through the hash quantization module to obtain the training feature vector. The training feature vector is further input into the classification model. The predicted semantic vector output by the classification model and the expected semantic vector of the training sample calculate the classification loss, and based on the expected semantics The difference between the vector and the training feature vector calculates the semantic constraint loss, which is weighted with the metric loss and the classification loss to obtain the target loss. The metric loss can be obtained by referring to the weighted sum of the quantization loss and the feature extraction loss in the above embodiment.

本实施例中，在哈希特征学习空间中除了学习度量能力外还用每个样本模拟出语义均匀分布的语义空间，使得每个图像的哈希特征在满足度量学习相似度的同时向均匀分布的标签语义向量调整，相比起哈希、语义双空间的模型，更有效地提升了哈希特征的多位激活效果，从而减轻其过度集中于某些特征信息或哈希位过度冗余的问题。图5中哈希码度量约束损失是借助对语义空间均匀分布的模型，对非语义空间进行微调哈希层输出的重要条件，进而支持在非语义空间模拟语义空间并微调非语义空间哈希表征的学习。In this embodiment, in addition to learning metric capabilities in the hash feature learning space, each sample is used to simulate a semantic space with uniform semantic distribution, so that the hash features of each image are uniformly distributed while satisfying the metric learning similarity. The label semantic vector adjustment, compared with the hash and semantic dual space models, more effectively improves the multi-bit activation effect of the hash feature, thereby reducing its excessive concentration on certain feature information or excessive redundancy of the hash bits. question. The hash code metric constraint loss in Figure 5 is an important condition for fine-tuning the output of the hash layer in the non-semantic space with the help of a uniform distribution model of the semantic space, thereby supporting the simulation of the semantic space in the non-semantic space and fine-tuning the hash representation of the non-semantic space. of learning.

在一个实施例中，如图6所示，提供了一种样本检索方法，以该方法由计算机设备执行为例进行说明，可以理解的是，计算机设备可以是图1所示的终端102，也可以是服务器104，还可以是终端102和服务器104所组成的系统。本实施例中，样本检索方法包括以下步骤：In one embodiment, as shown in Figure 6, a sample retrieval method is provided. For illustration, the method is executed by a computer device. It can be understood that the computer device can be the terminal 102 shown in Figure 1, or It may be the server 104, or it may be a system composed of the terminal 102 and the server 104. In this embodiment, the sample retrieval method includes the following steps:

步骤602，获取查询样本和候选召回样本集合。Step 602: Obtain query samples and candidate recall sample sets.

其中，候选召回样本集合指的是数据库中可召回的内容样本组成的集合。查询样本指的是需要从数据库中召回相似样本的内容样本。例如，给定图像A，从数据库中召回图像A的相似图像，则图像A为查询样本。Among them, the candidate recall sample set refers to a set of recallable content samples in the database. Query samples refer to content samples that need to recall similar samples from the database. For example, given an image A, similar images of image A are recalled from the database, then image A is a query sample.

步骤604，分别将查询样本和候选召回样本集合中的候选召回样本输入目标特征提取模型，得到查询样本对应的查询特征向量和候选召回样本对应的候选召回特征向量。Step 604: Input the query sample and the candidate recall sample in the candidate recall sample set into the target feature extraction model to obtain the query feature vector corresponding to the query sample and the candidate recall feature vector corresponding to the candidate recall sample.

其中，目标特征提取模型是通过分类损失对待训练的特征提取模型进行训练得到的，分类损失是基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异确定的，预测语义向量是基于训练特征向量对训练样本进行分类预测得到的，期望语义向量是基于训练样本对应的训练类别标签的标签语义向量确定的，标签语义向量包括至少两个激活标签向量分量，期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，训练特征向量是通过待训练的特征提取模型对训练样本进行特征提取得到的。Among them, the target feature extraction model is obtained by training the feature extraction model to be trained through classification loss. The classification loss is based on each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector. The difference is determined. The predicted semantic vector is obtained by classifying and predicting the training sample based on the training feature vector. The expected semantic vector is determined based on the label semantic vector of the training category label corresponding to the training sample. The label semantic vector includes at least two activation label vectors. component, the expected semantic vector contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, and the training feature vector is obtained by feature extraction of the training sample through the feature extraction model to be trained.

具体地，计算机设备可以将查询样本以及候选召回样本集合中各个候选召回样本分别输入目标特征提取模型中，通过目标特征提取模型分别对查询样本、各个候选召回样本进行特征提取，得到查询样本对应的查询特征向量以及各个候选召回样本各自对应的候选召回特征向量。Specifically, the computer device can input the query sample and each candidate recall sample in the candidate recall sample set into the target feature extraction model, respectively perform feature extraction on the query sample and each candidate recall sample through the target feature extraction model, and obtain the query sample corresponding to Query feature vectors and candidate recall feature vectors corresponding to each candidate recall sample.

步骤606，基于查询特征向量和候选召回特征向量，从候选召回样本集合中确定查询样本对应的目标检索样本。Step 606: Based on the query feature vector and the candidate recall feature vector, determine the target retrieval sample corresponding to the query sample from the candidate recall sample set.

在一个实施例中，计算机设备可以计算查询特征向量和各个候选召回特征向量之间的相似度，将相似度满足相似度条件的候选召回特征向量所对应的候选召回样本确定为目标检索样本。其中，相似度可以是余弦相似度，相似度条件例如可以是相似度大于预设相似度阈值或者相似度排序在预设排序阈值之前，例如，将计算得到的各个相似度从大到小排序，以选取前N个相似度，将这些相似度对应的候选召回特征向量所对应的候选召回样本确定为目标检索样本。In one embodiment, the computer device can calculate the similarity between the query feature vector and each candidate recall feature vector, and determine the candidate recall sample corresponding to the candidate recall feature vector whose similarity satisfies the similarity condition as the target retrieval sample. The similarity may be cosine similarity, and the similarity condition may be, for example, that the similarity is greater than a preset similarity threshold or that the similarity is sorted before the preset sorting threshold. For example, the calculated similarities are sorted from large to small, The top N similarities are selected, and the candidate recall samples corresponding to the candidate recall feature vectors corresponding to these similarities are determined as target retrieval samples.

在另一个实施例中计算机设备可以计算查询特征向量和各个候选召回特征向量之间的差异度，将差异度满足差异度条件的候选召回特征向量所对应的候选召回样本确定为目标检索样本。其中，差异度可以是特征距离，差异度条件例如可以是差异度小于预设差异度阈值或者差异度排序在预设排序阈值之前，例如，将计算得到的各个差异度从小到大排序，以选取前N个差异度，将这些差异度对应的候选召回特征向量所对应的候选召回样本确定为目标检索样本。In another embodiment, the computer device can calculate the degree of difference between the query feature vector and each candidate recall feature vector, and determine the candidate recall sample corresponding to the candidate recall feature vector whose difference satisfies the difference condition as the target retrieval sample. The difference degree may be a feature distance, and the difference degree condition may be, for example, that the difference degree is less than a preset difference degree threshold or the difference degree is sorted before the preset sorting threshold. For example, the calculated difference degrees are sorted from small to large to select For the first N differences, the candidate recall samples corresponding to the candidate recall feature vectors corresponding to these differences are determined as target retrieval samples.

上述样本检索方法，由于目标特征提取模型是通过该分类损失来训练，最终得到的目标特征提取模型可以学到趋向于标签语义向量的特征，从而可以在训练特征向量所在向量空间中对标签语义向量所在的语义空间进行模拟，使得提取的特征向量中可以储存表征语义的特征信息，提高了特征提取的准确性，并且由于标签语义向量包括至少两个激活标签向量分量，可以对训练类别标签进行更加准确地表征，而期望语义向量是基于标签语义向量确定的，并且包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，可以使得计算得到的分类损失更加准确，进一步提高了特征提取的准确性，进而采用目标特征提取模型提取得到的特征向量进行样本检索，可以提高样本检索的准确度。In the above sample retrieval method, since the target feature extraction model is trained through the classification loss, the final target feature extraction model can learn features that tend to the label semantic vector, so that the label semantic vector can be compared in the vector space where the training feature vector is located. The semantic space is simulated, so that the extracted feature vector can store feature information representing semantics, which improves the accuracy of feature extraction. And because the label semantic vector includes at least two activation label vector components, the training category labels can be more refined. Accurately characterized, and the expected semantic vector is determined based on the label semantic vector and contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, which can make the calculated classification loss more accurate and further improve the feature The accuracy of extraction, and then using the feature vector extracted by the target feature extraction model for sample retrieval, can improve the accuracy of sample retrieval.

在一个实施例中，上述样本检索方法还包括：对候选召回样本集合中各个候选召回样本各自的候选召回特征向量进行特征聚类，得到多个聚类簇；各个聚类簇存在对应的聚类中心；对于每一个聚类中心，建立聚类中心与同一聚类簇中各个候选召回特征向量之间的关联关系；基于查询特征向量和候选召回特征向量，从候选召回样本集合中确定查询样本对应的目标检索样本，包括：基于查询特征向量与各个聚类中心之间的特征距离，从各个聚类中心中确定目标聚类中心；获取与目标聚类中心存在关联关系的各个候选召回特征向量，基于查询特征向量与获取的各个候选召回特征向量之间的特征距离，从获取的各个候选召回特征向量中确定目标检索样本。In one embodiment, the above sample retrieval method also includes: performing feature clustering on the candidate recall feature vectors of each candidate recall sample in the candidate recall sample set to obtain multiple clusters; each cluster cluster has a corresponding cluster. center; for each clustering center, establish the association between the clustering center and each candidate recall feature vector in the same cluster; based on the query feature vector and the candidate recall feature vector, determine the query sample correspondence from the candidate recall sample set Target retrieval samples include: determining the target cluster center from each cluster center based on the feature distance between the query feature vector and each cluster center; obtaining each candidate recall feature vector that is associated with the target cluster center, Based on the feature distance between the query feature vector and each obtained candidate recall feature vector, the target retrieval sample is determined from each obtained candidate recall feature vector.

具体地，计算机设备在进行聚类后，建立各个聚类中心与同一聚类簇中各个候选召回特征向量之间的关联关系，那么可以将聚类中心作为改聚类簇的索引，从而在进行检索时，可以先计算查询特征向量和这些索引之间的特征距离，从而筛选出相似样本所在的聚类簇，进行从筛选到的聚类簇检索得到目标检索样本。通过索引查询的方式大大减少了检索过程中的计算量，提高了检索效率。Specifically, after the computer device performs clustering, it establishes an association between each clustering center and each candidate recall feature vector in the same clustering cluster. Then the clustering center can be used as the index of the modified clustering cluster, thereby performing the clustering process. During retrieval, you can first calculate the feature distance between the query feature vector and these indexes, thereby filtering out the clusters where similar samples are located, and perform retrieval from the filtered clusters to obtain the target retrieval samples. Index query greatly reduces the amount of calculation in the retrieval process and improves retrieval efficiency.

举例说明，假设候选召回样本集合中包括1000个候选召回样本，对这些候选召回样本进行聚类后，得到10个聚类簇，那么可以则在进行检索时，计算机设备可以将查询特征向量分别与这10个聚类簇的聚类中心计算特征聚类，选取特征距离最小的前3个聚类簇，将查询特征向量与这3个聚类簇中各个候选召回特征向量计算特征聚类，从而检索得到目标检索样本，而对于其余7个聚类簇，则不需要进行计算。For example, assuming that the candidate recall sample set includes 1,000 candidate recall samples, and after clustering these candidate recall samples, 10 clusters are obtained. Then, during retrieval, the computer device can separate the query feature vector and The clustering centers of these 10 clusters are used to calculate feature clusters, and the first three clusters with the smallest feature distance are selected, and the query feature vector and each candidate recall feature vector in these three clusters are used to calculate feature clusters, so that The target retrieval sample is obtained by retrieval, but for the remaining 7 clusters, no calculation is required.

在一个实施例中，参考图7，为一个实施例中索引检索系统的示意图。图7中的(a)图，对于所有库存图像，通过本申请实施例训练得到的目标特征提取模型提取得到图像特征向量，并进行聚类，得到多个聚类簇，聚类簇与聚类中心关联以构建索引系统，图7中的(b)图，在具体检索过程中，对于查询样本，首先通过目标特征提取模型提取得到查询特征向量，与各个聚类簇的聚类中心进行比较，以确定目标聚类簇，然后通过索引系统从目标聚类簇中召回目标检索样本并进行排序返回。In one embodiment, refer to FIG. 7 , which is a schematic diagram of an index retrieval system in one embodiment. (a) in Figure 7 shows that for all inventory images, the target feature extraction model trained in the embodiment of this application extracts image feature vectors and performs clustering to obtain multiple clusters. Clustering clusters and clustering Center correlation to build an index system, (b) in Figure 7, in the specific retrieval process, for the query sample, first extract the query feature vector through the target feature extraction model, and compare it with the cluster center of each cluster cluster. To determine the target cluster, and then recall the target retrieval samples from the target cluster through the index system and sort them back.

在一个实施例中，上述样本检索方法还包括特征提取模型的处理步骤，该步骤具体包括：通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量；基于训练特征向量对训练样本进行分类预测，得到预测语义向量；获取训练样本对应的训练类别标签的标签语义向量，标签语义向量包括至少两个激活标签向量分量；基于标签语义向量，确定训练样本的期望语义向量，期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量；基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失；基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型；目标特征提取模型用于提取输入样本的样本特征向量。In one embodiment, the above sample retrieval method also includes a feature extraction model processing step. This step specifically includes: performing feature extraction on the training sample through the feature extraction model to be trained to obtain a training feature vector; and extracting the training sample based on the training feature vector. Perform classification prediction to obtain the predicted semantic vector; obtain the label semantic vector of the training category label corresponding to the training sample, the label semantic vector includes at least two activation label vector components; based on the label semantic vector, determine the expected semantic vector of the training sample, the expected semantic vector , contains the expected activation vector components corresponding to the position distribution of at least two activation label vector components; based on the difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector, determine Classification loss; the feature extraction model is trained based on the classification loss. When the training stop condition is met, the target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

本申请还提供一种应用场景，该应用场景可适用于上述的特征提取模型处理方法以及样本检索方法。在该应用场景中，内容为图像，训练样本为图像样本，该特征提取模型处理方法以及样本检索方法可以应用于图像排重检索。This application also provides an application scenario, which is applicable to the above-mentioned feature extraction model processing method and sample retrieval method. In this application scenario, the content is an image and the training sample is an image sample. The feature extraction model processing method and sample retrieval method can be applied to image deduplication retrieval.

相关技术中，图像排重检索采用度量学习外加符号量化的松弛约束产生目标哈希特征，通过目标哈希特征召回相似样本，然而：1)此类哈希特征受训练样本约束，不能感知因数据样本缺失造成哈希编码不充分、对训练样本以外的域外样本表征困难的问题。2)不利于后续解析、分析：由于随机开始学习哈希特征，故每个哈希特征都将存在不同程度冗余位(如一个重要特征可能最终采用了0、1、2等不同数量哈希位表征)，对后续模型困难样本分析带来极大不便。如果哈希特征可以对所有可能样本空间进行有效的表征，基于对整体空间的感知，采用整体空间约束哈希分布、再对目标样本空间进行充分的哈希学习，将能极大缓解上述哈希特征的主要问题。In related technologies, image retrieval uses metric learning plus relaxed constraints of symbolic quantization to generate target hash features, and similar samples are recalled through the target hash features. However: 1) This type of hash feature is constrained by training samples and cannot be perceived due to data Missing samples cause problems such as insufficient hash coding and difficulty in characterizing out-of-domain samples other than training samples. 2) It is not conducive to subsequent parsing and analysis: Since hash features are learned randomly, each hash feature will have varying degrees of redundant bits (for example, an important feature may end up using different numbers of hashes such as 0, 1, 2, etc. Bit representation), which brings great inconvenience to the subsequent analysis of difficult samples of the model. If the hash feature can effectively represent all possible sample spaces, based on the perception of the overall space, using the overall space to constrain the hash distribution, and then performing sufficient hash learning on the target sample space, it will greatly alleviate the above-mentioned hashing problems. The main problem with features.

基于此，本申请提供一种特征提取模型处理方法，首先采用丰富的多标签建立起整体的数据空间，并最大化各多标签类别间的分类表征距离使得分类目标在语义空间均匀分布；同时使得样本哈希特征空间直接关联到其分类的语义空间，使得哈希特征可以在均匀语义空间中进一步学习排重检索需要的特征度量能力，从而使得哈希表征足够分散以支持最大化多标签类别表征的需求，进而避免了原来初始化自随机分布的哈希表征过度集中在某些哈希位、或抽取的特征过度集中在某些图像纹理上；也避免了哈希特征空间与语义空间的隔离。通过模型的迭代更新，最终实现哈希特征受分类的均匀空间表征影响下，提升多位哈希的表征能力。Based on this, this application provides a feature extraction model processing method, which first uses rich multi-labels to establish the overall data space, and maximizes the classification representation distance between each multi-label category so that the classification targets are evenly distributed in the semantic space; at the same time, it makes The sample hash feature space is directly related to the semantic space of its classification, so that the hash feature can further learn the feature measurement capabilities required for duplicate retrieval in the uniform semantic space, thereby making the hash representation dispersed enough to support maximizing multi-label category representation. requirements, thus avoiding the original initialized random distribution hash representation being overly concentrated on certain hash bits, or the extracted features being overly concentrated on certain image textures; it also avoids the isolation of the hash feature space and the semantic space. Through iterative updates of the model, the hash feature is finally affected by the uniform spatial representation of the classification, improving the representation ability of multi-bit hashing.

具体地，以下以该方法由计算机设备执行为例进行说明，可以理解的是，计算机设备可以是图1所示的终端102，也可以是服务器104，还可以是终端102和服务器104所组成的系统。上述特征提取模型处理方法和样本检索方法在该应用场景的应用如下：Specifically, the following description takes the method being executed by a computer device as an example. It can be understood that the computer device may be the terminal 102 shown in FIG. 1 , the server 104 , or a combination of the terminal 102 and the server 104 . system. The application of the above feature extraction model processing method and sample retrieval method in this application scenario is as follows:

一、数据准备1. Data preparation

1、收集相似样本对。可以从图像数据库中随机抽取两帧图像，组成图像对，通过人工标注是否为相似图像，将相似图像组成相似样本对。1. Collect similar sample pairs. Two frames of images can be randomly extracted from the image database to form an image pair, and similar images can be combined into similar sample pairs by manually annotating whether they are similar images.

2、三元组挖掘：以相似样本对作为输入，在每个batch(训练中是全量数据进行epoch轮迭代，每轮迭代中对全量N个相似度样本对每bs对作为一个batch批次，共N/bs个批次)包含bs个样本对，进行如下挖掘得到三元组：对某个样本对中的任意一张图x，从剩余的bs-1个样本对(每对随机选择一张图像)的样本中分别计算样本对的embedding，计算embedding与x的距离，按距离从小到大排序，去除前K％样本(如3％即(2*bs-2)*3/100，如对于bs＝64，为前4)后，取前10个样本作为负样本，分别与x中的正样本对组成三元组，故每个样本产生10个三元组，整个batch得到10*bs个三元组，其中记录每个三元组中图像x为anchor(a)，与x互为相似样本的图像为positive(p)，挖掘到的负样本为negative(n)，三元组又记为(apn)。2. Triplet mining: Taking similar sample pairs as input, in each batch (in training, the full amount of data is used for epoch rounds of iterations, and in each round of iterations, the full amount of N similarity sample pairs is used as a batch. A total of N/bs batches) contains bs sample pairs, and the triplet is obtained by mining as follows: for any picture x in a certain sample pair, randomly select one from the remaining bs-1 sample pairs (each pair Calculate the embedding of the sample pair from the samples of each image), calculate the distance between the embedding and For bs = 64, after the first 4), the first 10 samples are taken as negative samples, and they are combined with the positive sample pairs in x to form triples, so each sample generates 10 triples, and the entire batch gets 10*bs Triplets, in which the image x in each triplet is recorded as anchor(a), the image that is a similar sample to x is positive(p), the negative sample mined is negative(n), and the triplet is Marked as (apn).

本实施例中，需要保证提取到的特征向量对于相同图像(完全一样、极度相似的图像)可以预测为相同(度量距离尽量小)，对于不同图像(包括了仅少量相似但不够相似、不相似)需要距离尽量大(同时也需要满足保序效果：越不相似距离越远)。而除了例如视频相同镜头下的前后帧、在图像上添加色调变换等攻击手段这些类型的相似图外，大部分图像间互为相似图的概率非常低，故本实施例中将每个batch(从全量样本中采样到，可以理解为batch内两个样本对大部分构成不相同，但仍存在少量相同)的正样本对称为噪声，对于每个分镜3个相似样本对，这3个自身是相互相似的，若某个batch抽到了其中两个样本对，则必然存在噪声。此类相似样本对弱监督数据在本实施例中进行如下处理：在剩下的batch样本中去除最相似的少量样本(前K％，认为这里的样本中可能存在相同样本，不应该作为负样本学习)，剩下的均为有效负样本，选择距离最小的10个负样本作为难负样本。其中K是可控值，训练样本集中噪声越大，k越大。In this embodiment, it is necessary to ensure that the extracted feature vectors can be predicted to be the same (the metric distance is as small as possible) for the same image (exactly the same, extremely similar images), and for different images (including only a small amount of similarity but not similar enough, dissimilar ) needs to be as large as possible (it also needs to satisfy the order-preserving effect: the more dissimilar it is, the farther away it is). Except for similar images such as the front and back frames of the same video shot, adding tone transformation and other attack methods to the image, the probability that most images are similar to each other is very low, so in this embodiment, each batch ( Sampling from all samples, it can be understood that most of the two sample pairs in the batch are not the same, but there are still a small number of identical positive sample pairs called noise. For each shot of 3 similar sample pairs, these 3 themselves are similar to each other. If a certain batch draws two of the sample pairs, there must be noise. Such similar samples perform the following processing on weakly supervised data in this embodiment: remove the most similar small number of samples (top K%) from the remaining batch samples. It is considered that the same samples may exist in the samples here and should not be used as negative samples. learning), the rest are all valid negative samples, and the 10 negative samples with the smallest distance are selected as difficult negative samples. Among them, K is a controllable value. The greater the noise in the training sample set, the larger k.

3、多标签标注：对上述相似样本对的图像，标注每个图像的多个标签(即上文中的训练类别标签)，训练样本集合对应的多标签(即具体涉及哪些类别的标签)可以参考大规模开源imagenet或open-image标签体系。可以人工标注图像多标签，也可以采用多标签模型预测图像标签：如采用开源的resnet101网络结构对imagenet预训练的多标签模型作为辅助，对上述训练集的图像产生多标签预测结果。3. Multi-label annotation: For the images of the above similar sample pairs, label multiple labels for each image (i.e., the training category labels above). You can refer to the multi-labels corresponding to the training sample set (i.e., which categories of labels are involved). Large-scale open source imagenet or open-image tag system. Images can be manually labeled with multiple labels, or a multi-label model can be used to predict image labels: For example, the open source resnet101 network structure is used to assist the imagenet pre-trained multi-label model to produce multi-label prediction results for the images in the above training set.

4、对每个图像，产生其多标签标注下的联合多标签阿达马向量表征(即上文的期望语义向量)。根据目标要学习的哈希特征位Nh，设定阿达马矩阵的阶数n，保证n*P＝Nh且n>Nclass，即hadamard矩阵的阶小于等于哈希位长度、大于等于一半哈希位长度、同时大于等于多标签的类别数量。考虑到哈希特征需要表征全图，而分类信息只是图像的一种高度抽象，故n的长度应该小于等于哈希长度；又考虑到全图表征需要在该图像的分类表征外引入图像特有的信息以区分相同分类不同图像，故留出了P>＝1倍的阿达马阶数(n)作为分类信息之外的补充。其中Nclass为训练样本集合所涉及的所有标签的数量。4. For each image, generate a joint multi-label Hadamard vector representation under its multi-label annotation (i.e., the expected semantic vector above). According to the hash feature bit Nh to be learned, set the order n of the Hadamard matrix to ensure that n*P=Nh and n>Nclass, that is, the order of the Hadamard matrix is less than or equal to the hash bit length and greater than or equal to half the hash bits. The length is greater than or equal to the number of categories with multiple labels at the same time. Considering that the hash feature needs to represent the entire image, and the classification information is only a high degree of abstraction of the image, the length of n should be less than or equal to the hash length; also considering that the full image representation needs to introduce image-specific features in addition to the classification representation of the image. Information is used to distinguish different images of the same classification, so the Hadamard order (n) of P>=1 times is reserved as a supplement to the classification information. Where Nclass is the number of all labels involved in the training sample set.

对于训练样本集合涉及的每一个标签，从阿达马矩阵中除第一行外的其他行的向量中选择一个作为其阿达马向量表征(即上文中的标签语义向量)，由于一个阿达马向量只能表征一个标签，若图像存在多个标签，则可以将各个标签的阿达马向量进行融合，得到该图像的联合多标签阿达马向量表征，具体地，可以判断多个阿达马向量中属于同一排序位置的向量分量中是否有激活值，若有激活值，则将该排序位置处置为激活值，若没有激活值，则将该排序位置处置为非激活值，最后各个激活值和非激活值按照其所在排序位置进行排列组成该图像的联合多标签阿达马向量表征。For each label involved in the training sample set, select one of the vectors in the other rows of the Hadamard matrix except the first row as its Hadamard vector representation (i.e., the label semantic vector above). Since a Hadamard vector only It can represent a label. If there are multiple labels in the image, the Hadamard vectors of each label can be fused to obtain the joint multi-label Hadamard vector representation of the image. Specifically, it can be judged that multiple Hadamard vectors belong to the same order. Whether there is an activation value in the vector component of the position. If there is an activation value, the sorting position is treated as an activation value. If there is no activation value, the sorting position is treated as an inactivation value. Finally, each activation value and inactivation value are processed according to Their sorted positions are arranged to form a joint multi-label Hadamard vector representation of the image.

可以理解的是，若图像只有一个标签，则该标签的阿达马向量即为该图像的期望语义向量。It can be understood that if the image has only one label, the Hadamard vector of the label is the expected semantic vector of the image.

二、训练过程2. Training process

其中，所要训练的模型包括三个部分：基础特征提取模块、哈希量化模块、分类模型。其中，基础特征提取模型和哈希量化模快共同构成特征提取模型，基础特征提取模型采用resnet101，参数如表1，包含卷积层1、卷积层2-卷积层5以及池化层共6个部分，卷积层1为7×7×64的卷积，步长(stride)为2，卷积层2包括3×3最大池化层(max pool)和3个ResNet模块(block)，卷积层3-卷积层5分别包括3个ResNet模块、4个ResNet模块、23个ResNet模块以及3个ResNet模块。Among them, the model to be trained includes three parts: basic feature extraction module, hash quantization module, and classification model. Among them, the basic feature extraction model and the hash quantization module together constitute the feature extraction model. The basic feature extraction model uses resnet101. The parameters are as shown in Table 1, including convolution layer 1, convolution layer 2-convolution layer 5 and pooling layer. 6 parts. Convolutional layer 1 is a 7×7×64 convolution with a stride of 2. Convolutional layer 2 includes a 3×3 maximum pool layer (max pool) and 3 ResNet modules (block). , Convolutional layer 3-convolutional layer 5 include 3 ResNet modules, 4 ResNet modules, 23 ResNet modules and 3 ResNet modules respectively.

表1Table 1

哈希量化模快的参数如表2所示，包括一层全连接层，该层以最大池化层的输出为输入，输出1x256的浮点，该浮点向量可以经过sign符号函数映射到二值向量(0或1)，即为最终应用中的哈希特征。分类模型的参数如表3所示，包括分类层(Fc_class)，该分类层的输入为哈希特征生成层的输出，该分类层输出为n维的预测概率。其中，n与阿达马矩阵的阶数相同。The parameters of the hash quantization module are shown in Table 2, including a layer of fully connected layer, which takes the output of the maximum pooling layer as input and outputs a floating point of 1x256. This floating point vector can be mapped to a binary vector through the sign function. The value vector (0 or 1) is the hash feature in the final application. The parameters of the classification model are shown in Table 3, including the classification layer (Fc_class). The input of this classification layer is the output of the hash feature generation layer, and the output of this classification layer is the n-dimensional predicted probability. Among them, n is the same as the order of Hadamard matrix.

表2Table 2

层的名称layer name 输出尺寸Output size 层layer 哈希量化层Hash quantization layer 1x2561x256 全连接层Fully connected layer

表3table 3

层的名称layer name 输出尺寸Output size 层layer 分类层classification layer 1xn1xn 全连接层Fully connected layer

需要说明的，基础特征提取模型、哈希量化模快以及分类模型也可以采用其他模型结构，如基础特征提取模型采用resnet18CNN，如哈希量化模快采用多层全连接层连接等。具体训练过程如下：It should be noted that the basic feature extraction model, hash quantization model and classification model can also use other model structures. For example, the basic feature extraction model uses resnet18CNN, and the hash quantization model uses multi-layer fully connected layer connections. The specific training process is as follows:

1、参数初始化：1. Parameter initialization:

在预训练环节，卷积层1-卷积层5采用在ImageNet数据集上预训练的ResNet101的参数，新添加的层如特征层采用方差为0.01，均值为0的高斯分布进行初始化——哈希量化层、分类层采用方差为0.01，均值为0的随机正态分布进行初始化。In the pre-training process, convolutional layers 1 to 5 use the parameters of ResNet101 pretrained on the ImageNet data set. Newly added layers such as the feature layer are initialized with a Gaussian distribution with a variance of 0.01 and a mean of 0 - ha The quantification layer and the classification layer are initialized with a random normal distribution with a variance of 0.01 and a mean of 0.

2、设置学习参数：表1、表2和表3所有参数都需要学习。2. Set learning parameters: All parameters in Table 1, Table 2 and Table 3 need to be learned.

3、设置学习率：对基础特征提取模型、哈希量化模快采用lr＝0.0005学习率，分类模型采用0.005学习率。每经过10轮迭代后lr变为原来的0.1倍。在梯度回传时，语义的损失(即分类损失)首先更新了分类模型(即表3)的参数，然后回传到哈希量化模块(表2)、基础特征提取模型(表1)。这里设置哈希特征的学习率比语义标签小，可以避免多标签语义损失完全传递到量化哈希层造成语义学习过度影响哈希量化模块度量效果，造成哈希量化模块冗余。3. Set the learning rate: lr=0.0005 learning rate is used for the basic feature extraction model and hash quantization model, and 0.005 learning rate is used for the classification model. After every 10 iterations, lr becomes 0.1 times its original value. During gradient backpropagation, the semantic loss (i.e., classification loss) first updates the parameters of the classification model (i.e., Table 3), and then is transmitted back to the hash quantization module (Table 2) and the basic feature extraction model (Table 1). The learning rate of the hash feature is set to be smaller than the semantic label here, which can prevent the multi-label semantic loss from being completely transferred to the quantized hash layer, causing semantic learning to excessively affect the measurement effect of the hash quantization module, and causing the hash quantization module to be redundant.

4、学习过程：对全量数据，进行epoch轮迭代；每轮迭代处理一次全量样本，直到某epoch下平均epoch loss不再下降，得到图像特征提取模型和目标多标签分类模型。其中，对每个epoch的每轮迭代中的具体操作如下：把全量图像N个相似样本对，每bs个样本对为一批次(batch)，分成Nb＝N/bs个批次，对于每个batch，通过上述三元组挖掘获取到三元组，进行：4. Learning process: Perform epoch rounds of iterations on the full amount of data; each iteration processes a full amount of samples until the average epoch loss no longer decreases under a certain epoch, and the image feature extraction model and target multi-label classification model are obtained. Among them, the specific operations in each iteration of each epoch are as follows: divide the entire image N similar sample pairs, each bs sample pair into a batch (batch), divide it into Nb=N/bs batches, for each A batch, obtain the triples through the above triple mining, proceed:

4.1、前向计算：把所有参数都设为需要学习状态，训练时对输入的一张图片进行前向计算得到哈希特征以及分类层输出的n维的概率向量，用Q以及P表示，其中Q为1x256向量表示哈希特征，P为1*n维的概率向量，每一位表示该特征位的预测概率。4.1. Forward calculation: Set all parameters to the state that needs to be learned. During training, perform forward calculation on an input picture to obtain the hash feature and the n-dimensional probability vector output by the classification layer, represented by Q and P, where Q is a 1x256 vector representing the hash feature, P is a 1*n-dimensional probability vector, and each bit represents the predicted probability of the feature bit.

4.2、损失计算：1)对每个三元组，计算量化损失和度量损失；2)对bs对样本对中，每个图像，分别获取图像的期望语义向量，计算分类层输出的n维概率向量与期望语义向量的多标签交叉熵损失函数；3)对bs对样本对，分别计算其哈希输出结果中阿达马多标签有效位与阿达马多标签目标的L2损失，使得样本哈希位中阿达马多标签位与该样本的阿达马标签一致从而在哈希空间中模拟阿达马空间。计算三个损失的和作为总损失loss。具体如下：4.2. Loss calculation: 1) For each triplet, calculate the quantification loss and metric loss; 2) For the bs sample pair, for each image, obtain the expected semantic vector of the image, and calculate the n-dimensional probability of the classification layer output The multi-label cross-entropy loss function of the vector and the desired semantic vector; 3) For the bs pair of sample pairs, calculate the L2 loss of the Adama multi-label valid bits and the Adama multi-label target in the hash output result, so that the sample hash bits The medium Hadamard multi-label bits are consistent with the Hadamard label of the sample to simulate Hadamard space in the hash space. Calculate the sum of the three losses as the total loss. details as follows:

对batch中每个样本对(共bs个样本对)，其哈希损失为10个挖掘到的三元组哈希损失的均值、2个图像样本的阿达马分类损失均值、2个图像样本的哈希表征与阿达马表征一致的回归损失(L2损失)。其分类损失为2个图像的分类损失求均值。具体参考以下公式(4)：For each sample pair in the batch (bs sample pairs in total), the hash loss is the mean of the 10 mined triple hash losses, the mean of the Hadamard classification loss of the 2 image samples, and the mean of the Hadamard classification loss of the 2 image samples. The hash representation is consistent with the regression loss (L2 loss) of the Hadamard representation. The classification loss is the average of the classification losses of the two images. Please refer to the following formula (4) for details:

其中w₁＝0.1，w₃＝0.2，w₂＝1-w₁-w₃，本实施例中加入基于阿达马向量的多标签语义空间学习的作用是：在哈希特征之上加入均匀多激活的阿达马向量表征，可以约束哈希特征在分布上也倾向于均匀的效果。于阿达马向量除了第一行外，其他各行和均为0，即-1与1的个数相同，可以理解为对比起one-hot表征和永远为1而言类别激活位的表征是均匀的。为激活程度，可以根据多个语义标签向量中，同一特征位下的激活标签向量分量的数量确定。具体可以参考上文实施例中的描述。Among them, w ₁ =0.1, w ₃ =0.2, w ₂ =1-w ₁ -w ₃ . The effect of adding multi-label semantic space learning based on Hadamard vector in this embodiment is to add uniform multi-label semantic space learning on top of the hash feature. The activated Hadamard vector representation can constrain the distribution of hash features to tend to be uniform. Except for the first row of the Hadamard vector, the sums of the other rows are all 0, that is, the number of -1 and 1 is the same. It can be understood that compared with the one-hot representation and the representation of always 1, the representation of the category activation bit is uniform. . The degree of activation can be determined based on the number of activation label vector components under the same feature bit among multiple semantic label vectors. For details, reference may be made to the descriptions in the above embodiments.

本实施例中，考虑到由于度量学习收敛慢——在2个图像上找相似与不相似部位，相比在同一个分类的多个图像下找相同类别共性要难得多，故语义学习可以加速收敛，避免度量学习因局部表征不当造成语义相关的图像漏召回，设置W₁＝0.1。然而语义收敛快也容易给量化特征带来过拟合，上述通过量化学习率为语义学习的0.1倍、语义多标签损失权重w₁为度量学习w₂的1/9来约束。In this embodiment, considering that metric learning converges slowly - it is much more difficult to find similar and dissimilar parts on two images than it is to find commonalities in the same category in multiple images of the same category, so semantic learning can be accelerated. To converge and avoid missing recall of semantically related images due to improper local representation in metric learning, set W ₁ =0.1. However, fast semantic convergence can easily bring overfitting to quantitative features. The above constraints are achieved by setting the quantitative learning rate to 0.1 times that of semantic learning, and the semantic multi-label loss weight w ₁ to 1/9 of metric learning w ₂ .

设置W3＝0.2可以有效保证度量学习样本对靠近，再用较小的权重使样本对调整到语义表征附近，避免因调整语义损失时失去或降低度量学习效果。Setting W3 = 0.2 can effectively ensure that the metric learning sample pairs are close to each other, and then use smaller weights to adjust the sample pairs closer to the semantic representation to avoid losing or reducing the metric learning effect when adjusting the semantic loss.

以下分别对各个损失进行具体介绍：The following is a detailed introduction to each loss:

1)L_hash：由L_coding与L_triplet组成。具体参考以下公式(5)：1)L _hash : composed of L _coding and L _triplet . Please refer to the following formula (5) for details:

L_hash＝w₂₁L_triplet+w₂₂L_coding (5)L _hash =w ₂₁ L _triplet +w ₂₂ L _coding (5)

其中，L_triplet可参考上文中的公式(1)进行计算，L_coding可参考上文实施例中的公式(2)和(3)进行计算。w₂₁可以设置为1，w₂₂可以设置为0.5。由于L_reg收敛比L_triplet快，为了保证L_triplet在整体loss中处于主导地位，从而保证embedding始终具有相似度度量的能力，故本处w₂₂设为0.5(或小于1的其他值，可视情况调整)。可以理解的是，在实际应用中可以直接采用sign函数对输出的Q产生量化的二值向量，即可用于图像检索。Among them, L _triplet can be calculated with reference to the above formula (1), and L _coding can be calculated with reference to the formulas (2) and (3) in the above embodiment. w ₂₁ can be set to 1 and w ₂₂ can be set to 0.5. Since L _reg converges faster than L _triplet , in order to ensure that L _triplet is in a dominant position in the overall loss and thus ensure that embedding always has the ability to measure similarity, w ₂₂ here is set to 0.5 (or other values less than 1, which can be seen situation adjustment). It can be understood that in practical applications, the sign function can be directly used to generate a quantized binary vector for the output Q, which can be used for image retrieval.

2)L_class：对分类模型输出的概率向量，计算其与该图像的期望语义向量的多标签损失可以参考以下公式(6)，其中t[i]为1*n的0、1向量(其中由阿达马向量得到的期望语义向量中的-1替换为0即可)，o[i]为在1*n个阿达马位(即阿达马向量中的排序位置)分别的预测概率。2) L _class : For the probability vector output by the classification model, calculate the multi-label loss between it and the expected semantic vector of the image. You can refer to the following formula (6), where t[i] is the 0 and 1 vector of 1*n (where Just replace -1 in the expected semantic vector obtained by the Hadamard vector with 0), and o[i] is the predicted probability of each of the 1*n Hadamard bits (that is, the sorting position in the Hadamard vector).

3)L_reg：样本对的哈希空间中语义表征建模的回归损失Lreg如下，其中a、p代表样本对中两个样本，Hash_a表示哈希样本a的哈希输出，Hada_a表示样本a的阿达马多标签目标，可以参考以下公式(7)：3) L _reg : The regression loss Lreg for semantic representation modeling in the hash space of the sample pair is as follows, where a and p represent the two samples in the sample pair, Hash _a represents the hash output of the hash sample a, and Hada _a represents the sample For the Hadamard multi-label target of a, you can refer to the following formula (7):

L_reg＝(Hash_a-Hada_a)²+(Hash_p-Hada_p)² (7)L _reg = (Hash _a -Hada _a ) ² + (Hash _p -Hada _p ) ² (7)

当P＝1时，Nh＝n，此时期望语义向量和训练特征向量的维数相同，可以直接通过L2距离最小作为建模语义空间的约束。可以理解的是，本实施例中，训练特征向量为哈希特征，因此训练特征向量中的排序位置为哈希位。When P=1, Nh=n, at this time, the expected semantic vector and the training feature vector have the same dimension, and the minimum L2 distance can be directly used as a constraint for modeling the semantic space. It can be understood that in this embodiment, the training feature vector is a hash feature, so the sorting position in the training feature vector is a hash bit.

当P>1，即Nh是n的P倍时，公式(7)需要调整为：仅保持哈希位中，与阿达马位有映射的那些位一致即可，其中哈希位与阿达马位映射的关系可以参考图3，可以从每P个哈希位中选一位与阿达马位映射即可，其中P为Nh＝n*P中的P位长倍数，对图像j的每个阿达马位i(共n位)，均使得该阿达马位与与之对应的哈希位(Pi+q，q表示偏移)的回归损失最小。具体可以参考以下公式(8)：When P>1, that is, when Nh is P times of n, formula (7) needs to be adjusted to: only keep those bits in the hash bits that are mapped to the Hadamard bits, where the hash bits are consistent with the Hadamard bits. The mapping relationship can be referred to Figure 3. You can select one bit from each P hash bit to map to the Hadamard bit, where P is a multiple of the P bit length in Nh = n*P. For each Hadamard bit of image j Bit i (n bits in total), all minimize the regression loss between the Hadamard bit and the corresponding hash bit (Pi+q, q represents the offset). For details, please refer to the following formula (8):

4.3、模型参数更新：采用随机梯度下降算法，将上一步得到目标模型损失值进行梯度后向计算得到全部模型参数的更新值，使用该更新值对全部模型的参数进行更新。4.3. Model parameter update: Use the stochastic gradient descent algorithm to perform gradient backward calculation on the target model loss value obtained in the previous step to obtain the update value of all model parameters, and use this update value to update the parameters of all models.

三、量化特征检索应用3. Quantitative feature retrieval application

对所有库存图像利用训练得到的图像特征提取模型提取到所有的Q，经过sign函数激活得到的二值量化特征，入库。对查询图像经过上述图像特征提取模型提取Q并经过二值化量化得到二值量化特征，然后跟库存的二值量化特征一一对比，采用针对二值量化特征的汉明距离计算可以加速计算效率(相比浮点的embedding特征)，计算距离后根据从小到大排序取topK最相似的返回。从而可以检索到跟查询图像具备语义相似的库存图(即上文中的目标检索结果样本)。Use the trained image feature extraction model to extract all Qs from all inventory images, and store the binary quantified features obtained through sign function activation into the database. Extract Q from the query image through the above image feature extraction model and obtain binary quantified features through binary quantization, and then compare them one by one with the binary quantified features in the inventory. Using Hamming distance calculation for binary quantified features can speed up the calculation efficiency. (Compared to floating-point embedding features), after calculating the distance, the topK most similar ones are returned in order from small to large. In this way, it is possible to retrieve inventory images that are semantically similar to the query image (i.e., the target retrieval result sample above).

上述实施例具有以下有益效果：The above embodiments have the following beneficial effects:

一方面，借助样本丰富的多标签语义空间约束哈希特征不仅可以表征度量学习特征外，还可储存表示语义的特征信息，避免特征由于样本带偏或不足造成哈希特征学习有偏从而无法有效表征域外样本，从而提升在包含域外样本的检索场景的召回准确率。On the one hand, with the help of sample-rich multi-label semantic space constrained hash features, not only can it represent metric learning features, but it can also store feature information that represents semantics, preventing features from being biased and ineffective in hash feature learning due to biased or insufficient samples. Characterize out-of-domain samples to improve recall accuracy in retrieval scenarios that include out-of-domain samples.

另一方面，借助多标签目标的多位均匀激活的避免了独热多标签无法有效利用所有位、易于过拟合等问题，并通过梯度回传使得哈希位也集中在特定分类位所需的特征上，从而产生冗余特征等问题。On the other hand, the uniform activation of multiple bits of the multi-label target avoids problems such as the inability of one-hot multi-label to effectively utilize all bits and easy overfitting, and through gradient backpropagation, the hash bits are also concentrated in the specific classification bits. features, resulting in problems such as redundant features.

此外，借助基于阿达马的哈希码度量约束，实现哈希空间中样本整体分布模拟均匀分布的语义空间(在非语义空间模拟语义空间并微调非语义空间哈希表征的学习)，一方面，语义空间极大程度提升哈希特征多位激活效果，并通过对哈希特征中非语义位的微调达到哈希特征度量表征图片的效果；另一方面可以使得信息偏少的非语义空间也可从语义空间中获取更多的有意义监督信息从而得到更均匀的非语义哈希分布，避免了非语义哈希分布不均匀，从而在检索中因为某些特征上过度集中造成这些特征上区分度不高检索效果不好的问题。In addition, with the help of Hadamard-based hash code metric constraints, the overall distribution of samples in the hash space is realized to simulate a uniformly distributed semantic space (simulating the semantic space in the non-semantic space and fine-tuning the learning of hash representations in the non-semantic space). On the one hand, The semantic space greatly improves the multi-bit activation effect of the hash feature, and achieves the effect of hash feature measurement to represent images by fine-tuning the non-semantic bits in the hash feature; on the other hand, it can also make the non-semantic space with less information possible Obtain more meaningful supervision information from the semantic space to obtain a more uniform non-semantic hash distribution, avoiding the uneven distribution of non-semantic hashes, which will cause excessive concentration on certain features during retrieval, resulting in discrimination on these features. The problem of poor search results is not high.

本申请还提供另外一种应用场景，该应用场景可适用于上述的特征提取模型处理方法。在该应用场景中，通过与上文应用场景相同的步骤训练得到图像特征提取模型和目标多标签分类模型，将图像特征提取模型和目标多标签分类模型拼接得到目标分类模型，根据目标分类模型可以实现对输入图像进行多标签分类。This application also provides another application scenario, which is applicable to the above feature extraction model processing method. In this application scenario, the image feature extraction model and the target multi-label classification model are obtained through the same steps as the above application scenario. The image feature extraction model and the target multi-label classification model are spliced to obtain the target classification model. According to the target classification model, Implement multi-label classification of input images.

应该理解的是，虽然如上的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，如上的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowcharts involved in the above embodiments are shown in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in the flowcharts involved in the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution order of these steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least part of the steps or stages in other steps.

基于同样的发明构思，本申请实施例还提供了一种用于实现上述所涉及的特征提取模型处理方法的特征提取模型处理装置以及一种用于实现上述所涉及的样本检索方法的样本检索装置。这些装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似，故下面所提供的一个或多个特征提取模型处理装置实施例中以及样本检索装置实施例中的具体限定可以参见上文中对于特征提取模型处理方法的限定，在此不再赘述。Based on the same inventive concept, embodiments of the present application also provide a feature extraction model processing device for implementing the above-mentioned feature extraction model processing method and a sample retrieval device for implementing the above-mentioned sample retrieval method. . The problem-solving solutions provided by these devices are similar to the implementation solutions recorded in the above methods. Therefore, the specific limitations in one or more feature extraction model processing device embodiments and sample retrieval device embodiments provided below can be found in The above limitations on the feature extraction model processing method will not be repeated here.

在一个实施例中，如图8所示，提供了一种特征提取模型处理装置800，包括：In one embodiment, as shown in Figure 8, a feature extraction model processing device 800 is provided, including:

特征提取模块802，用于通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量；The feature extraction module 802 is used to extract features of the training samples through the feature extraction model to be trained to obtain training feature vectors;

分类预测模块804，用于基于训练特征向量对训练样本进行分类预测，得到预测语义向量；The classification prediction module 804 is used to classify and predict training samples based on training feature vectors to obtain predicted semantic vectors;

标签向量获取模块806，用于获取训练样本对应的训练类别标签的标签语义向量，标签语义向量包括至少两个激活标签向量分量；The label vector acquisition module 806 is used to obtain the label semantic vector of the training category label corresponding to the training sample. The label semantic vector includes at least two activation label vector components;

期望向量确定模块808，用于基于标签语义向量，确定训练样本的期望语义向量，期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量；The expected vector determination module 808 is used to determine the expected semantic vector of the training sample based on the label semantic vector, where the expected semantic vector contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components;

分类损失确定模块810，用于基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失；The classification loss determination module 810 is used to determine the classification loss based on the difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector;

模型训练模块812，用于基于分类损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型；目标特征提取模型用于提取输入样本的样本特征向量。The model training module 812 is used to train the feature extraction model based on the classification loss. When the training stop condition is met, the target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

上述特征提取模型处理装置，通过待训练的特征提取模型对训练样本进行特征提取，得到训练特征向量，基于训练特征向量对训练样本进行分类预测，得到预测语义向量，获取训练样本对应的训练类别标签的标签语义向量，基于标签语义向量，确定训练样本的期望语义向量，进而基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异，确定分类损失，通过该分类损失来训练特征提取模型，最终得到的目标特征提取模型可以学到趋向于标签语义向量的特征，从而可以在训练特征向量所在向量空间中对标签语义向量所在的语义空间进行模拟，使得提取的特征向量中可以储存表征语义的特征信息，提高了特征提取的准确性，并且由于标签语义向量包括至少两个激活标签向量分量，可以对训练类别标签进行更加准确地表征，而期望语义向量是基于标签语义向量确定的，并且包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，可以使得计算得到的分类损失更加准确，进一步提高了特征提取的准确性。The above-mentioned feature extraction model processing device extracts features from training samples through the feature extraction model to be trained to obtain training feature vectors, classifies and predicts training samples based on the training feature vectors, obtains predicted semantic vectors, and obtains training category labels corresponding to the training samples. The label semantic vector of , the feature extraction model is trained through this classification loss, and the final target feature extraction model can learn features that tend to the label semantic vector, so that the semantic space where the label semantic vector is located can be simulated in the vector space where the training feature vector is located. The feature information representing semantics can be stored in the extracted feature vector, which improves the accuracy of feature extraction, and because the label semantic vector includes at least two activation label vector components, the training category label can be more accurately characterized, and the expected semantics The vector is determined based on the label semantic vector and contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, which can make the calculated classification loss more accurate and further improve the accuracy of feature extraction.

在一个实施例中，期望向量确定模块，还用于将各个标签语义向量中，排序位置相同的标签向量分量组成特征分量集合；对于包含激活标签向量分量的特征分量集合，将特征分量集合所属排序位置处的期望向量分量置为激活值；对于不包含激活标签向量分量的特征分量集合，将特征分量集合所属排序位置处的期望向量分量置为非激活值；各个期望向量分量按照各自对应的排序位置进行组合形成训练样本的期望语义向量。In one embodiment, the expected vector determination module is also used to form a feature component set by label vector components with the same sorting position in each label semantic vector; for a feature component set that contains an activation label vector component, sort the feature component set to which it belongs. The expected vector component at the position is set to the activation value; for the feature component set that does not contain the activation label vector component, the expected vector component at the sorting position to which the feature component set belongs is set to the inactive value; each expected vector component is sorted according to its corresponding The positions are combined to form the expected semantic vector of the training samples.

在一个实施例中，分类损失确定模块，用于针对期望语义向量中的每一个期望向量分量，基于期望向量分量与预测语义向量中排序位置对应的预测向量分量之间的差异，确定期望向量分量所在排序位置处的分类损失分量初始值；对期望向量分量所在排序位置处的特征分量集合中的激活标签向量分量进行统计，基于统计结果确定期望向量分量所在排序位置处的激活程度；基于激活程度对分类损失分量初始值进行加权处理，得到分类损失分量目标值；统计各个分类损失分量目标值，以确定分类损失。In one embodiment, the classification loss determination module is configured to, for each expected vector component in the expected semantic vector, determine the expected vector component based on the difference between the expected vector component and the predicted vector component corresponding to the sorted position in the predicted semantic vector. The initial value of the classification loss component at the sorting position; count the activation label vector components in the feature component set at the sorting position where the expected vector component is located, and determine the activation degree at the sorting position where the expected vector component is located based on the statistical results; based on the activation degree The initial value of the classification loss component is weighted to obtain the target value of the classification loss component; the target value of each classification loss component is counted to determine the classification loss.

在一个实施例中，上述装置还包括：语义约束损失确定模块，用于基于训练样本的训练特征向量和期望语义向量之间的差异，确定语义约束损失，语义约束损失用于对特征提取模型所提取的训练特征向量进行语义约束；分类损失确定模块，还用于基于语义约束损失和分类损失，确定目标损失；基于目标损失对特征提取模型进行训练，当满足训练停止条件时，得到目标特征提取模型。In one embodiment, the above device further includes: a semantic constraint loss determination module, configured to determine the semantic constraint loss based on the difference between the training feature vector of the training sample and the expected semantic vector. The semantic constraint loss is used to determine the semantic constraint loss required by the feature extraction model. The extracted training feature vectors are semantically constrained; the classification loss determination module is also used to determine the target loss based on the semantic constraint loss and classification loss; the feature extraction model is trained based on the target loss, and when the training stop condition is met, the target feature extraction is obtained Model.

在一个实施例中，语义约束损失确定模块，还用于若训练特征向量的维数和期望语义向量的维数一致，则计算训练特征向量和期望语义向量之间的向量距离，将向量距离作为训练样本对应的语义约束损失。In one embodiment, the semantic constraint loss determination module is also used to calculate the vector distance between the training feature vector and the expected semantic vector if the dimension of the training feature vector is consistent with the dimension of the expected semantic vector, and divide the vector distance into As the semantic constraint loss corresponding to the training sample.

在一个实施例中，语义约束损失确定模块，还用于若训练特征向量的维数大于期望语义向量的维数，则对于期望语义向量中每个期望向量分量，从所述训练特征向量中选取一个训练向量分量，将选取的训练向量分量作为所述期望向量分量的映射向量分量；计算期望向量分量与映射向量分量之间的差异，得到期望向量分量对应的语义约束损失分量；统计各个语义约束损失分量，得到所述训练样本对应的语义约束损失。In one embodiment, the semantic constraint loss determination module is also configured to select from the training feature vector for each expected vector component in the expected semantic vector if the dimension of the training feature vector is greater than the dimension of the expected semantic vector. A training vector component, using the selected training vector component as the mapping vector component of the desired vector component; calculating the difference between the desired vector component and the mapping vector component to obtain the semantic constraint loss component corresponding to the desired vector component; counting each semantic constraint loss component to obtain the semantic constraint loss corresponding to the training sample.

在一个实施例中，语义约束损失确定模块，还用于按照期望语义向量的维数，对训练特征向量中的训练向量分量进行分组，得到数量与期望语义向量的维数一致的多个向量分量组；对于期望语义向量中每个期望向量分量，从多个特征分量组中选取一个向量分量组作为目标向量分量组；从目标向量分量组中选取一个训练向量分量作为期望向量分量对应的映射向量分量。In one embodiment, the semantic constraint loss determination module is also used to group the training vector components in the training feature vector according to the dimension of the desired semantic vector, and obtain multiple vectors whose number is consistent with the dimension of the desired semantic vector. Component group; for each expected vector component in the expected semantic vector, select a vector component group from multiple feature component groups as the target vector component group; select a training vector component from the target vector component group as the mapping corresponding to the expected vector component vector components.

在一个实施例中，训练样本属于训练样本集合，训练样本集合中各个训练样本对应的训练类别标签组成标签集合，上述装置还包括标签向量生成模块，用于构建目标语义空间；目标语义空间的目标向量维数不超过训练特征向量的向量维数；将标签集合中的各个训练类别标签，分别映射至目标语义空间中，得到各个训练类别标签各自的标签语义向量。In one embodiment, the training samples belong to the training sample set, and the training category labels corresponding to each training sample in the training sample set form a label set. The above device also includes a label vector generation module for constructing the target semantic space; the target of the target semantic space The vector dimension does not exceed the vector dimension of the training feature vector; each training category label in the label set is mapped to the target semantic space respectively, and the respective label semantic vector of each training category label is obtained.

在一个实施例中，标签向量生成模块，用于生成矩阵阶数与目标向量维数匹配的目标矩阵，目标矩阵中包括多个候选表征向量，多个候选表征向量形成目标语义空间；候选表征向量中包括数量相等的第一数值和第二数值；对于语义类别标签集合中的各个语义类别标签，分别从多个候选表征向量中选取一个候选表征向量，作为各个语义类别标签对应的标签语义向量。In one embodiment, the label vector generation module is used to generate a target matrix whose matrix order matches the dimension of the target vector. The target matrix includes multiple candidate representation vectors, and the multiple candidate representation vectors form the target semantic space; the candidate representation vectors includes an equal number of first and second values; for each semantic category label in the semantic category label set, a candidate representation vector is selected from multiple candidate representation vectors as the label semantic vector corresponding to each semantic category label.

在一个实施例中，分类损失确定模块，还用于获取训练样本的对比训练样本对应的对比特征向量，基于训练特征向量与对比特征向量之间的差异得到特征提取损失；基于特征提取损失以及分类损失，得到目标损失；基于目标损失对特征提取模型进行参数调整并继续训练，当满足训练停止条件时，得到目标特征提取模型。In one embodiment, the classification loss determination module is also used to obtain the comparison feature vector corresponding to the training sample, and obtain the feature extraction loss based on the difference between the training feature vector and the comparison feature vector; based on the feature extraction loss and classification Loss, the target loss is obtained; based on the target loss, the parameters of the feature extraction model are adjusted and training continues. When the training stop condition is met, the target feature extraction model is obtained.

在一个实施例中，对比特征向量包括正向对比训练样本对应的正向对比特征向量以及负向对比训练样本对应的负向对比特征向量；分类损失确定模块，还用于获取正向特征差异值，正向特征差异值为训练特征向量与正向对比特征向量之间的特征差异值；获取负向特征差异值，负向特征差异值为训练特征向量与负向对比特征向量之间的特征差异值；基于正向特征差异值与负向特征差异值确定特征提取损失。In one embodiment, the contrast feature vector includes a positive contrast feature vector corresponding to the positive contrast training sample and a negative contrast feature vector corresponding to the negative contrast training sample; the classification loss determination module is also used to obtain the positive feature difference value , the positive feature difference value is the feature difference value between the training feature vector and the positive comparison feature vector; the negative feature difference value is obtained, and the negative feature difference value is the feature difference between the training feature vector and the negative comparison feature vector value; determine the feature extraction loss based on the positive feature difference value and the negative feature difference value.

在一个实施例中，特征提取模块，还用于通过特征提取模型提取训练样本的初始样本特征，并对初始样本特征进行量化处理，得到训练样本的训练特征向量；分类损失确定模块，还用于基于预设的符号函数确定训练特征向量中各个量化值对应的量化目标，基于各个量化值与各自对应的量化目标之间的差异确定量化损失；基于量化损失以及分类损失，得到目标损失；基于目标损失对特征提取模型进行参数调整并继续训练，当满足训练停止条件时，得到目标特征提取模型。In one embodiment, the feature extraction module is also used to extract the initial sample features of the training sample through the feature extraction model, and perform quantitative processing on the initial sample features to obtain the training feature vector of the training sample; the classification loss determination module is also used to The quantification target corresponding to each quantization value in the training feature vector is determined based on the preset symbolic function, and the quantization loss is determined based on the difference between each quantization value and the corresponding quantification target; based on the quantization loss and classification loss, the target loss is obtained; based on the target The loss adjusts the parameters of the feature extraction model and continues training. When the training stop condition is met, the target feature extraction model is obtained.

在一个实施例中，如图9所示，提供了一种样本检索装置900，包括：In one embodiment, as shown in Figure 9, a sample retrieval device 900 is provided, including:

样本获取模块902，用于获取查询样本和候选召回样本集合；Sample acquisition module 902, used to acquire query samples and candidate recall sample sets;

特征提取模块904，用于分别将查询样本和候选召回样本集合中的候选召回样本输入目标特征提取模型，得到查询样本对应的查询特征向量和候选召回样本对应的候选召回特征向量；其中，目标特征提取模型是通过分类损失对待训练的特征提取模型进行训练得到的，分类损失是基于期望语义向量中各个期望激活向量分量，与预测语义向量中各自对应位置的预测向量分量之间的差异确定的，预测语义向量是基于训练特征向量对训练样本进行分类预测得到的，期望语义向量是基于训练样本对应的训练类别标签的标签语义向量确定的，标签语义向量包括至少两个激活标签向量分量，期望语义向量中，包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，训练特征向量是通过待训练的特征提取模型对训练样本进行特征提取得到的；The feature extraction module 904 is used to input the query sample and the candidate recall sample in the candidate recall sample set into the target feature extraction model to obtain the query feature vector corresponding to the query sample and the candidate recall feature vector corresponding to the candidate recall sample; wherein, the target feature The extraction model is obtained by training the feature extraction model to be trained through classification loss. The classification loss is determined based on the difference between each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector. The predicted semantic vector is obtained by classifying and predicting the training sample based on the training feature vector. The expected semantic vector is determined based on the label semantic vector of the training category label corresponding to the training sample. The label semantic vector includes at least two activation label vector components. The expected semantic vector The vector contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, and the training feature vector is obtained by feature extraction of the training sample through the feature extraction model to be trained;

检索模块906，用于基于查询特征向量和候选召回特征向量，从候选召回样本集合中确定查询样本对应的目标检索样本。The retrieval module 906 is configured to determine the target retrieval sample corresponding to the query sample from the candidate recall sample set based on the query feature vector and the candidate recall feature vector.

上述样本检索装置，由于目标特征提取模型是通过该分类损失来训练，最终得到的目标特征提取模型可以学到趋向于标签语义向量的特征，从而可以在训练特征向量所在向量空间中对标签语义向量所在的语义空间进行模拟，使得提取的特征向量中可以储存表征语义的特征信息，提高了特征提取的准确性，并且由于标签语义向量包括至少两个激活标签向量分量，可以对训练类别标签进行更加准确地表征，而期望语义向量是基于标签语义向量确定的，并且包含与至少两个激活标签向量分量的位置分布对应的期望激活向量分量，可以使得计算得到的分类损失更加准确，进一步提高了特征提取的准确性，进而采用目标特征提取模型提取得到的特征向量进行样本检索，可以提高样本检索的准确度。In the above sample retrieval device, since the target feature extraction model is trained through the classification loss, the final target feature extraction model can learn features that tend to the label semantic vector, so that the label semantic vector can be compared in the vector space where the training feature vector is located. The semantic space is simulated, so that the extracted feature vector can store feature information representing semantics, which improves the accuracy of feature extraction. And because the label semantic vector includes at least two activation label vector components, the training category labels can be more refined. Accurately characterized, and the expected semantic vector is determined based on the label semantic vector and contains the expected activation vector component corresponding to the position distribution of at least two activation label vector components, which can make the calculated classification loss more accurate and further improve the feature The accuracy of extraction, and then using the feature vector extracted by the target feature extraction model for sample retrieval, can improve the accuracy of sample retrieval.

在一个实施例中，上述装置还包括：关联关系建立模块，用于对候选召回样本集合中各个候选召回样本各自的候选召回特征向量进行特征聚类，得到多个聚类簇；各个聚类簇存在对应的聚类中心；对于每一个聚类中心，建立聚类中心与同一聚类簇中各个候选召回特征向量之间的关联关系；检索模块，用于基于查询特征向量与各个聚类中心之间的特征距离，从各个聚类中心中确定目标聚类中心；获取与目标聚类中心存在关联关系的各个候选召回特征向量，基于查询特征向量与获取的各个候选召回特征向量之间的特征距离，从获取的各个候选召回特征向量中确定目标检索样本。In one embodiment, the above device further includes: an association relationship establishment module, used to perform feature clustering on the candidate recall feature vectors of each candidate recall sample in the candidate recall sample set to obtain multiple clusters; each cluster cluster There is a corresponding clustering center; for each clustering center, establish an association between the clustering center and each candidate recall feature vector in the same cluster; the retrieval module is used to based on the query feature vector and each clustering center determine the target clustering center from each clustering center; obtain each candidate recall feature vector that is associated with the target clustering center, based on the feature distance between the query feature vector and each obtained candidate recall feature vector , determine the target retrieval sample from each obtained candidate recall feature vector.

上述特征提取模型处理装置以及样本检索装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。Each module in the above feature extraction model processing device and sample retrieval device can be implemented in whole or in part by software, hardware, and combinations thereof. Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图10所示。该计算机设备包括处理器、存储器、输入/输出接口(Input/Output，简称I/O)和通信接口。其中，处理器、存储器和输入/输出接口通过系统总线连接，通信接口通过输入/输出接口连接到系统总线。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储训练样本数据。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种特征提取模型处理方法或者样本检索方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in Figure 10. The computer device includes a processor, a memory, an input/output interface (Input/Output, referred to as I/O), and a communication interface. Among them, the processor, memory and input/output interface are connected through the system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems, computer programs and databases. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media. The database of the computer device is used to store training sample data. The input/output interface of the computer device is used to exchange information between the processor and external devices. The communication interface of the computer device is used to communicate with an external terminal through a network connection. The computer program, when executed by the processor, implements a feature extraction model processing method or a sample retrieval method.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是终端，其内部结构图可以如图11所示。该计算机设备包括处理器、存储器、输入/输出接口、通信接口、显示单元和输入装置。其中，处理器、存储器和输入/输出接口通过系统总线连接，通信接口、显示单元和输入装置通过输入/输出接口连接到系统总线。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信，无线方式可通过WIFI、移动蜂窝网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种特征提取模型处理方法或者样本检索方法。该计算机设备的显示单元用于形成视觉可见的画面，可以是显示屏、投影装置或虚拟现实成像装置，显示屏可以是液晶显示屏或电子墨水显示屏，该计算机设备的输入装置可以是显示屏上覆盖的触摸层，也可以是计算机设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in Figure 11. The computer device includes a processor, memory, input/output interface, communication interface, display unit and input device. Among them, the processor, memory and input/output interface are connected through the system bus, and the communication interface, display unit and input device are connected to the system bus through the input/output interface. Wherein, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems and computer programs. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and external devices. The communication interface of the computer device is used for wired or wireless communication with external terminals. The wireless mode can be implemented through WIFI, mobile cellular network, NFC (Near Field Communication) or other technologies. The computer program, when executed by the processor, implements a feature extraction model processing method or a sample retrieval method. The display unit of the computer device is used to form a visually visible picture and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen. The input device of the computer device can be a display screen. The touch layer covered above can also be buttons, trackballs or touch pads provided on the computer equipment shell, or it can also be an external keyboard, touch pad or mouse, etc.

本领域技术人员可以理解，图10和图11中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structures shown in Figures 10 and 11 are only block diagrams of partial structures related to the solution of the present application, and do not constitute a limitation on the computer equipment to which the solution of the present application is applied. Specifically, Computer equipment may include more or fewer components than shown in the figures, or some combinations of components, or have different arrangements of components.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现上述特征提取模型方法或者样本检索的步骤。In one embodiment, a computer device is provided, including a memory and a processor. A computer program is stored in the memory. When the processor executes the computer program, it implements the above feature extraction model method or sample retrieval steps.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现上述特征提取模型方法或者样本检索的步骤。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the above-mentioned feature extraction model method or sample retrieval steps are implemented.

在一个实施例中，提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述特征提取模型方法或者样本检索的步骤。In one embodiment, a computer program product is provided, including a computer program that implements the above feature extraction model method or sample retrieval steps when executed by a processor.

需要说明的是，本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)，均为经用户授权或者经过各方充分授权的信息和数据，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It should be noted that the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all It is information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with the relevant laws, regulations and standards of relevant countries and regions.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用，均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-OnlyMemory，ROM)、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器(ReRAM)、磁变存储器(Magnetoresistive Random Access Memory，MRAM)、铁电存储器(Ferroelectric Random Access Memory，FRAM)、相变存储器(Phase Change Memory，PCM)、石墨烯存储器等。易失性存储器可包括随机存取存储器(Random Access Memory，RAM)或外部高速缓冲存储器等。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器(Static Random Access Memory，SRAM)或动态随机存取存储器(Dynamic RandomAccess Memory，DRAM)等。本申请所提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等，不限于此。本申请所提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器等，不限于此。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer-readable storage. In the media, when executed, the computer program may include the processes of the above method embodiments. Any reference to memory, database or other media used in the embodiments provided in this application may include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive memory (ReRAM), magnetic variable memory (Magnetoresistive Random) Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene memory, etc. Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration but not limitation, RAM can be in various forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM). The databases involved in the various embodiments provided in this application may include at least one of a relational database and a non-relational database. Non-relational databases may include blockchain-based distributed databases, etc., but are not limited thereto. The processors involved in the various embodiments provided in this application may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to this.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but should not be construed as limiting the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the scope of protection of this application should be determined by the appended claims.

Claims

1. A feature extraction model processing method, characterized in that the method includes:

Extract features from training samples through the feature extraction model to be trained to obtain training feature vectors;

Classify and predict the training samples based on the training feature vectors to obtain predicted semantic vectors;

Obtain the label semantic vector of the training category label corresponding to the training sample, where the label semantic vector includes at least two activation label vector components;

Based on the label semantic vector, determine an expected semantic vector of the training sample, where the expected semantic vector includes an expected activation vector component corresponding to the position distribution of the at least two activation label vector components;

Determine a classification loss based on the difference between each of the expected activation vector components in the expected semantic vector and the predicted vector components at respective corresponding positions in the predicted semantic vector;

The feature extraction model is trained based on the classification loss, and when the training stop condition is met, a target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

2. The method according to claim 1, wherein the training sample corresponds to multiple training category labels, and determining the expected semantic vector of the training sample based on the label semantic vector includes:

The label vector components with the same sorted position in each label semantic vector form a feature component set;

For a feature component set that contains an activation label vector component, set the expected vector component at the sorted position to which the feature component set belongs as the activation value;

For a feature component set that does not contain an activation label vector component, set the expected vector component at the sorting position to which the feature component set belongs to a non-activation value;

Each expected vector component is combined according to its corresponding sorting position to form the expected semantic vector of the training sample.

3. The method according to claim 2, characterized in that the difference between each of the expected activation vector components based on the expected semantic vector and the predicted vector component of the corresponding position in the predicted semantic vector , determine the classification loss, including:

For each expected vector component in the expected semantic vector, based on the difference between the expected vector component and the predicted vector component corresponding to the sorted position in the predicted semantic vector, determine the classification loss at the sorted position where the expected vector component is located. Component initial value;

Perform statistics on the activation label vector components in the feature component set at the sorted position where the expected vector component is located, and determine the activation degree at the sorted position where the expected vector component is located based on the statistical results;

Perform weighting processing on the initial value of the classification loss component based on the activation degree to obtain a target value of the classification loss component;

Count the target values of each classification loss component to determine the classification loss.

4. The method according to claim 1, characterized in that, before training the feature extraction model based on the classification loss and obtaining the target feature extraction model when the training stop condition is met, the method further include:

Determine a semantic constraint loss based on the difference between the training feature vector of the training sample and the expected semantic vector, where the semantic constraint loss is used to semantically constrain the training feature vector extracted by the feature extraction model;

The feature extraction model is trained based on the classification loss. When the training stop condition is met, the target feature extraction model is obtained, including:

determining a target loss based on the semantic constraint loss and the classification loss;

The feature extraction model is trained based on the target loss, and when the training stop condition is met, the target feature extraction model is obtained.

5. The method of claim 4, wherein determining the semantic constraint loss based on the difference between the training feature vector of the training sample and the expected semantic vector includes:

If the dimension of the training feature vector is consistent with the dimension of the expected semantic vector, then

Calculate the vector distance between the training feature vector and the expected semantic vector, and use the vector distance as the semantic constraint loss corresponding to the training sample.

6. The method of claim 4, wherein determining the semantic constraint loss based on the difference between the training feature vector of the training sample and the expected semantic vector includes:

If the dimension of the training feature vector is greater than the dimension of the expected semantic vector, then

For each expected vector component in the expected semantic vector, select a training vector component from the training feature vector, and use the selected training vector component as the mapping vector component of the expected vector component;

Calculate the difference between the expected vector component and the mapping vector component to obtain the semantic constraint loss component corresponding to the expected vector component;

Each semantic constraint loss component is counted to obtain the semantic constraint loss corresponding to the training sample.

7. The method according to claim 6, characterized in that selecting a training vector component from the training feature vector and using the selected training vector component as the mapping vector component of the expected vector component includes:

Grouping the training vector components in the training feature vector according to the dimension of the desired semantic vector to obtain a plurality of vector component groups whose number is consistent with the dimension of the desired semantic vector;

For each expected vector component in the expected semantic vector, select a vector component group from the plurality of feature component groups as a target vector component group;

Select a training vector component from the target vector component group as the mapping vector component corresponding to the desired vector component.

8. The method according to claim 1, wherein the training sample belongs to a training sample set, and the training category labels corresponding to each training sample in the training sample set form a label set, and the method further includes:

Construct a target semantic space; the target vector dimension of the target semantic space does not exceed the vector dimension of the training feature vector;

Each training category label in the label set is mapped to the target semantic space to obtain a label semantic vector of each training category label.

9. The method according to claim 8, characterized in that said constructing the target semantic space includes:

Generate a target matrix whose matrix order matches the dimension of the target vector, the target matrix includes a plurality of candidate representation vectors, the plurality of candidate representation vectors form the target semantic space; the candidate representation vector includes a quantity equal first and second values;

The step of mapping each training category label in the label set to the target semantic space to obtain the respective label semantic vector of each training category label includes:

For each training category label in the label set, one candidate representation vector is selected from the plurality of candidate representation vectors as the label semantic vector corresponding to each training category label.

10. The method according to claim 1, characterized in that the feature extraction model is trained based on the classification loss, and when the training stop condition is met, the target feature extraction model is obtained, including:

Obtain a comparison feature vector corresponding to the comparison training sample of the training sample, and obtain a feature extraction loss based on the difference between the training feature vector and the comparison feature vector;

Based on the feature extraction loss and the classification loss, a target loss is obtained;

Based on the target loss, parameters of the feature extraction model are adjusted and training is continued. When the training stop condition is met, the target feature extraction model is obtained.

11. The method according to claim 10, wherein the comparison feature vector includes a positive contrast feature vector corresponding to a positive contrast training sample and a negative contrast feature vector corresponding to a negative contrast training sample; The feature extraction loss obtained from the difference between the training feature vector and the comparison feature vector includes:

Obtain a forward feature difference value, which is a feature difference value between the training feature vector and the forward comparison feature vector;

Obtain a negative feature difference value, which is a feature difference value between the training feature vector and the negative comparison feature vector;

The feature extraction loss is determined based on the positive feature difference value and the negative feature difference value.

12. The method according to claim 1, wherein the feature extraction of the training samples through the feature extraction model to obtain a training feature vector includes:

Extract the initial sample features of the training sample through the feature extraction model, and perform quantification processing on the initial sample features to obtain the training feature vector of the training sample;

Determine the quantization target corresponding to each quantization value in the training feature vector based on the preset symbolic function, and determine the quantization loss based on the difference between each quantization value and the respective corresponding quantization target;

Based on the quantification loss and the classification loss, a target loss is obtained;

13. A sample retrieval method, characterized in that the method includes:

Obtain query samples and candidate recall sample sets;

Respectively input the query sample and the candidate recall sample in the candidate recall sample set into the target feature extraction model to obtain the query feature vector corresponding to the query sample and the candidate recall feature vector corresponding to the candidate recall sample;

Wherein, the target feature extraction model is obtained by training the feature extraction model to be trained through classification loss. The classification loss is based on each expected activation vector component in the expected semantic vector and the prediction vector of the corresponding position in the predicted semantic vector. The difference between the components is determined, the predicted semantic vector is obtained by classifying and predicting the training sample based on the training feature vector, and the expected semantic vector is determined based on the label semantic vector of the training category label corresponding to the training sample, The label semantic vector includes at least two activation label vector components, the expected semantic vector includes an expected activation vector component corresponding to the position distribution of the at least two activation label vector components, and the training feature vector is obtained by The feature extraction model to be trained is obtained by extracting features from the training samples;

Based on the query feature vector and the candidate recall feature vector, a target retrieval sample corresponding to the query sample is determined from the candidate recall sample set.

14. The method according to claim 13, characterized in that the method further comprises:

Perform feature clustering on the candidate recall feature vectors of each candidate recall sample in the candidate recall sample set to obtain multiple clusters; each cluster cluster has a corresponding cluster center;

For each clustering center, establish an association between the clustering center and each candidate recall feature vector in the same clustering cluster;

Determining the target retrieval sample corresponding to the query sample from the candidate recall sample set based on the query feature vector and the candidate recall feature vector includes:

Based on the feature distance between the query feature vector and each cluster center, determine the target cluster center from each of the cluster centers;

Each candidate recall feature vector associated with the target clustering center is obtained, and the target retrieval sample is determined based on the feature distance between the query feature vector and each obtained candidate recall feature vector.

15. A feature extraction model processing device, characterized in that the device includes:

The feature extraction module is used to extract features from training samples through the feature extraction model to be trained to obtain training feature vectors;

A classification prediction module, used to perform classification prediction on the training sample based on the training feature vector to obtain a predicted semantic vector;

A label vector acquisition module, configured to obtain a label semantic vector of a training category label corresponding to the training sample, where the label semantic vector includes at least two activation label vector components;

An expected vector determination module, configured to determine an expected semantic vector of the training sample based on the label semantic vector, where the expected semantic vector includes an expected activation vector corresponding to the position distribution of the at least two activation label vector components. weight; weight

A classification loss determination module, configured to determine the classification loss based on the difference between each of the expected activation vector components in the expected semantic vector and the predicted vector components at respective corresponding positions in the predicted semantic vector;

A model training module is used to train the feature extraction model based on the classification loss. When the training stop condition is met, a target feature extraction model is obtained; the target feature extraction model is used to extract the sample feature vector of the input sample.

16. A sample retrieval device, characterized in that the device includes:

The sample acquisition module is used to obtain query samples and candidate recall sample sets;

The feature extraction module is used to input the query sample and the candidate recall sample in the candidate recall sample set into the target feature extraction model to obtain the query feature vector corresponding to the query sample and the candidate recall feature vector corresponding to the candidate recall sample; wherein , the target feature extraction model is obtained by training the feature extraction model to be trained through classification loss. The classification loss is based on each expected activation vector component in the expected semantic vector and the predicted vector component at the corresponding position in the predicted semantic vector. The difference between them is determined, the predicted semantic vector is obtained by classifying and predicting the training sample based on the training feature vector, and the expected semantic vector is determined based on the label semantic vector of the training category label corresponding to the training sample, so The label semantic vector includes at least two activation label vector components, the expected semantic vector includes an expected activation vector component corresponding to the position distribution of the at least two activation label vector components, and the training feature vector is obtained by The feature extraction model to be trained is obtained by extracting features from the training samples;

A retrieval module, configured to determine a target retrieval sample corresponding to the query sample from the candidate recall sample set based on the query feature vector and the candidate recall feature vector.

17. A computer device, including a memory and a processor, the memory stores a computer program, characterized in that when the processor executes the computer program, the processor implements any one of claims 1 to 12 or 13 to 14. steps of the method described.

18. A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 12 or 13 to 14 are implemented. .

19. A computer program product, comprising a computer program, characterized in that, when executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 12 or 13 to 14.