CN110569918A

CN110569918A - A kind of sample classification method and related device

Info

Publication number: CN110569918A
Application number: CN201910873761.0A
Authority: CN
Inventors: 石楷弘; 陈志博; 王吉; 余莉萍
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2019-12-13
Anticipated expiration: 2039-09-12
Also published as: CN110569918B

Abstract

The embodiment of the present application provides a sample classification method and a related device, and the method can cluster pedestrian images according to mutual neighbor distance, neighbor ranking distance and absolute distance. In the embodiment of the present application, the same people are aggregated together, and a sample classification result can be generated according to the clustered images, the clustering effect is good, and samples suitable for training a pedestrian re-identification model can be obtained.

Description

A kind of sample classification method and related device

技术领域technical field

本申请涉及人工智能技术领域，尤其涉及一种样本分类的方法以及相关装置。The present application relates to the technical field of artificial intelligence, in particular to a sample classification method and a related device.

背景技术Background technique

行人再识别(person re-identification，Person re-ID)是近年来计算机视觉的一个研究重点，给定一个监控行人图像，跨设备检索该行人的图像。由于不同摄像设备之间存在差异，行人外观易受穿着、尺度、遮挡、姿态和视角等影响，行人再识别是一个既具研究价值同时又极富挑战性的课题。行人再识别技术也可以称为行人重识别技术，以下简称ReID技术。Person re-identification (Person re-identification, Person re-ID) is a research focus of computer vision in recent years. Given a monitored pedestrian image, retrieve the image of the pedestrian across devices. Due to the differences among different camera devices, the appearance of pedestrians is easily affected by clothing, scale, occlusion, posture and viewing angle, etc. Pedestrian re-identification is a research topic that is both valuable and challenging. Pedestrian re-identification technology may also be called pedestrian re-identification technology, hereinafter referred to as ReID technology.

目前ReID技术正在被广泛的应用于商业、安防、交通、金融等领域，数据在提升ReID模型性能中发挥重要作用，对于研究人员而言，从海量人体图片中挖掘有标签的身份是一个重要任务。聚类技术在挖掘过程中起到核心作用，将相同身份的人聚合到一起，可以减轻后续人工标注量。At present, ReID technology is being widely used in business, security, transportation, finance and other fields. Data plays an important role in improving the performance of ReID models. For researchers, mining labeled identities from massive human body pictures is an important task. . Clustering technology plays a central role in the mining process. Grouping people with the same identity together can reduce the amount of subsequent manual labeling.

由于人脸识别技术已经很成熟，在人脸数据库(labeled faces in the wild，LFW)上，人脸识别模型准确率要高于人眼识别准确率，利用人脸特征进行聚类得到的效果非常好。然而，ReID技术相比较人脸识别技术发展比较慢，模型精度还比较弱。在人脸识别中表现好的聚类算法，在ReID技术中聚类效果往往表现不佳，得到的样本往往达不到要求，例如出现坏档和一人多档的情况，或者例如出现同一个人没有聚到一起的情况。因此，将人脸识别技术中的聚类算法应用到ReID技术中难以得到合适的样本。Since the face recognition technology is very mature, in the face database (labeled faces in the wild, LFW), the accuracy of the face recognition model is higher than the accuracy of human eye recognition, and the effect of clustering using face features is very good. it is good. However, the development of ReID technology is relatively slow compared with face recognition technology, and the model accuracy is still relatively weak. The clustering algorithm that performs well in face recognition often performs poorly in ReID technology, and the obtained samples often fail to meet the requirements, such as bad files and multiple files for one person, or for example, the same person does not The situation of coming together. Therefore, it is difficult to obtain suitable samples by applying the clustering algorithm in face recognition technology to ReID technology.

发明内容Contents of the invention

本申请实施例提供了一种样本分类的方法以及相关装置，解决现有技术中聚类效果不佳，难以得到合适样本的技术问题。The embodiment of the present application provides a sample classification method and a related device, which solve the technical problem of poor clustering effect and difficulty in obtaining suitable samples in the prior art.

第一方面，本申请实施例提供了一种样本分类的方法，包括：In the first aspect, the embodiment of the present application provides a sample classification method, including:

获取第一待处理集合和第二待处理集合，所述第一待处理集合包括至少一个第一样本图像，所述第二待处理集合包括至少一个第二样本图像；Acquiring a first set to be processed and a second set to be processed, the first set to be processed includes at least one first sample image, and the second set to be processed includes at least one second sample image;

获取第一图像特征向量和第二图像特征向量，所述第一图像特征向量与所述第一样本图像具有第一对应关系，所述第二图像特征向量与所述第二样本图像具有第二对应关系；Acquiring a first image feature vector and a second image feature vector, the first image feature vector has a first corresponding relationship with the first sample image, and the second image feature vector has a first corresponding relationship with the second sample image Two correspondences;

根据所述第一图像特征向量和所述第二图像特征向量获取所述第一待处理集合和所述第二待处理集合之间的距离，所述距离包括绝对距离，互近邻距离以及邻居排序距离；Obtaining the distance between the first set to be processed and the second set to be processed according to the first image feature vector and the second image feature vector, the distance includes absolute distance, mutual neighbor distance and neighbor sorting distance;

若所述绝对距离满足第一设定条件，所述互近邻距离满足第二设定条件且所述邻居排序距离满足第三设定条件，则生成所述第一待处理集合和所述第二待处理集合对应的样本分类结果。If the absolute distance satisfies the first set condition, the mutual neighbor distance satisfies the second set condition and the neighbor ranking distance satisfies the third set condition, then the first set to be processed and the second set to be processed are generated. The sample classification result corresponding to the set to be processed.

第二方面，本申请实施例提供了一种样本分类的装置，包括：In the second aspect, the embodiment of the present application provides a device for classifying samples, including:

获取单元，用于获取第一待处理集合和第二待处理集合，所述第一待处理集合包括至少一个第一样本图像，所述第二待处理集合包括至少一个第二样本图像；An acquiring unit, configured to acquire a first set to be processed and a second set to be processed, the first set to be processed includes at least one first sample image, and the second set to be processed includes at least one second sample image;

获取单元，还用于获取第一图像特征向量和第二图像特征向量，所述第一图像特征向量与所述第一样本图像具有第一对应关系，所述第二图像特征向量与所述第二样本图像具有第二对应关系；The acquiring unit is further configured to acquire a first image feature vector and a second image feature vector, the first image feature vector has a first corresponding relationship with the first sample image, and the second image feature vector and the The second sample image has a second corresponding relationship;

处理单元，用于根据所述第一图像特征向量和所述第二图像特征向量获取所述第一待处理集合和所述第二待处理集合之间的距离，所述距离包括绝对距离，互近邻距离以及邻居排序距离；A processing unit, configured to obtain a distance between the first set to be processed and the second set to be processed according to the first image feature vector and the second image feature vector, the distance includes an absolute distance, Neighbor distance and neighbor sorting distance;

处理单元，还用于若所述绝对距离满足第一设定条件，所述互近邻距离满足第二设定条件且所述邻居排序距离满足第三设定条件，则生成所述第一待处理集合和所述第二待处理集合对应的样本分类结果。The processing unit is further configured to generate the first to-be-processed Set the sample classification results corresponding to the second set to be processed.

在一种可能的设计中，在本申请实施例第二方面的一种实现方式中，所述处理单元还用于：In a possible design, in an implementation manner of the second aspect of the embodiment of the present application, the processing unit is further configured to:

根据所述第一图像特征向量和所述第二图像特征向量确定所述第一样本图像与所述第二样本图像之间的余弦距离；determining a cosine distance between the first sample image and the second sample image according to the first image feature vector and the second image feature vector;

根据所述余弦距离确定所述第一待处理集合和所述第二待处理集合之间的绝对距离，所述绝对距离为所述余弦距离的最小值。An absolute distance between the first set to be processed and the second set to be processed is determined according to the cosine distance, and the absolute distance is a minimum value of the cosine distance.

获取所述第一待处理集合对应的第一绝对距离；Obtain a first absolute distance corresponding to the first set to be processed;

根据所述第一绝对距离的大小排序得到所述第一待处理集合对应的第一最近邻序列；Sorting according to the size of the first absolute distance to obtain a first nearest neighbor sequence corresponding to the first set to be processed;

获取所述第二待处理集合对应的第二绝对距离；Obtain a second absolute distance corresponding to the second set to be processed;

根据所述第二绝对距离的大小排序得到所述第二待处理集合对应的第二最近邻序列；Sorting according to the size of the second absolute distance to obtain a second nearest neighbor sequence corresponding to the second set to be processed;

根据所述第一待处理集合在所述第二最近邻序列的序号与所述第二待处理集合在所述第一最近邻序列的序号确定所述第一待处理集合和所述第二待处理集合之间的互近邻距离。Determine the first set to be processed and the second set to be processed according to the sequence number of the first set to be processed in the second nearest neighbor sequence and the sequence number of the second set to be processed in the first nearest neighbor sequence Handles mutual proximity distances between collections.

根据所述第一最近邻序列和所述第二最近邻序列确定所述第一待处理集合和所述第二待处理集合之间的邻居排序距离。Determine a neighbor ranking distance between the first set to be processed and the second set to be processed according to the first nearest neighbor sequence and the second nearest neighbor sequence.

获取目标集合，所述目标集合包括所述第一待处理集合和所述第二待处理集合；Acquiring a target set, where the target set includes the first to-be-processed set and the second to-be-processed set;

根据所述第一图像特征向量和所述第二图像特征向量确定单特征向量；determining a single feature vector based on the first image feature vector and the second image feature vector;

根据所述单特征向量计算所述目标集合之间的相似度；calculating the similarity between the target sets according to the single feature vector;

向终端设备发送所述相似度小于设定阈值的所述目标集合，使得所述终端设备展示所述目标集合。Sending the target set whose similarity is smaller than a set threshold to the terminal device, so that the terminal device displays the target set.

选择所述目标集合中的一个样本图像；selecting a sample image in said target set;

向终端设备发送所述样本图像，使得所述终端设备展示所述样本图像。sending the sample image to a terminal device, so that the terminal device displays the sample image.

获取标注信息，所述标注信息与所述目标集合具有关联关系；Acquiring annotation information, where the annotation information has an association relationship with the target set;

根据所述标注信息确定所述目标集合对应的二次样本分类结果。A secondary sample classification result corresponding to the target set is determined according to the annotation information.

第三方面，本申请实施例提供一种服务器，包括：In a third aspect, the embodiment of the present application provides a server, including:

一个或一个以上中央处理器，存储器，输入输出接口，有线或无线网络接口，电源；One or more central processing units, memory, input and output interfaces, wired or wireless network interfaces, power supply;

所述存储器为短暂存储存储器或持久存储存储器；The memory is a temporary storage memory or a persistent storage memory;

所述中央处理器配置为与所述存储器通信，在所述服务器上执行如下步骤：The central processing unit is configured to communicate with the memory, and the following steps are performed on the server:

在一种可能的设计中，在本申请实施例第三方面的一种实现方式中，所述中央处理器还用于执行如下步骤：In a possible design, in an implementation manner of the third aspect of the embodiment of the present application, the central processing unit is further configured to perform the following steps:

根据所述绝对距离确定所述第一待处理集合所对应的第一最近邻序列和所述第二待处理集合所对应的的第二最近邻序列，所述第一最近邻序列为所述第二样本图像按照所述余弦距离的排序队列，所述第二最近邻序列为所述第一样本图像按照所述余弦距离的排序队列；Determine the first nearest neighbor sequence corresponding to the first set to be processed and the second nearest neighbor sequence corresponding to the second set to be processed according to the absolute distance, the first nearest neighbor sequence being the first nearest neighbor sequence The two sample images are sorted according to the cosine distance, and the second nearest neighbor sequence is the sorted queue of the first sample image according to the cosine distance;

根据所述第一绝对样本图像在所述第二最近邻序列的序号与所述第一绝对样本图像在所述第二最近邻序列的序号确定所述第一待处理集合和所述第二待处理集合之间的互近邻距离。The first set to be processed and the second set to be processed are determined according to the sequence number of the first absolute sample image in the second nearest neighbor sequence and the sequence number of the first absolute sample image in the second nearest neighbor sequence. Handles mutual proximity distances between collections.

第四方面，本申请实施例提供一种计算机可读存储介质，所述计算机可读存储介质中存储有指令，当其在计算机上运行时，使得计算机执行上述各个方面的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, it causes the computer to execute the methods in the above aspects.

从以上技术方案可以看出，本申请实施例具有以下优点：It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

本申请实施例提供了一种样本分类的方法以及相关装置，能够通过互近邻距离、邻居排序距离以及绝对距离对行人图像进行聚类。本申请实施例将相同的人聚合到一起，并且可以根据聚类后的图像生成样本分类结果，聚类效果好，能够得到适合用于训练行人再识别模型的样本。The embodiment of the present application provides a sample classification method and a related device, which can cluster images of pedestrians by using mutual neighbor distance, neighbor sorting distance and absolute distance. In the embodiment of the present application, the same people are aggregated together, and a sample classification result can be generated according to the clustered images, the clustering effect is good, and samples suitable for training a pedestrian re-identification model can be obtained.

附图说明Description of drawings

图1为本申请实施例中行人再识别技术的场景示例图；FIG. 1 is a scene example diagram of the pedestrian re-identification technology in the embodiment of the present application;

图2为本申请实施例中离群点的示例图；Fig. 2 is an example figure of outliers in the embodiment of the present application;

图3为本申请实施例提供的一种样本分类的方法的示意图；Fig. 3 is a schematic diagram of a sample classification method provided in the embodiment of the present application;

图4为本申请实施例中第一最近邻序列和第二最近邻序列的示例图；Figure 4 is an example diagram of the first nearest neighbor sequence and the second nearest neighbor sequence in the embodiment of the present application;

图5为本申请实施例中服务器计算邻居排序距离的示例图；FIG. 5 is an example diagram of the server calculating the neighbor sorting distance in the embodiment of the present application;

图6为本申请实施例中目标集合的展示示例图；FIG. 6 is a diagram showing an example of a target set in the embodiment of the present application;

图7为本申请实施例中一个可选实施例的流程示意图；FIG. 7 is a schematic flow diagram of an optional embodiment in the embodiments of the present application;

图8为本申请实施例中另一个可选实施例的流程示意图；FIG. 8 is a schematic flowchart of another optional embodiment in the embodiments of the present application;

图9为本申请实施例中一种样本分类的装置的示例图；FIG. 9 is an example diagram of a device for classifying samples in the embodiment of the present application;

图10为本申请实施例提供的一种服务器结构示意图。FIG. 10 is a schematic structural diagram of a server provided by an embodiment of the present application.

具体实施方式Detailed ways

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“对应于”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and not necessarily Used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "corresponding to" and any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements need not be limited to the expressly listed Instead, other steps or elements not explicitly listed or inherent to the process, method, product or apparatus may be included.

在本申请实施例中，“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言，使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

为了下述各实施例的描述清楚简洁，首先给出相关技术的简要介绍：In order to make the description of the following embodiments clear and concise, a brief introduction of related technologies is first given:

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology. Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学，更进一步的说，就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术，还包括常见的人脸识别、指纹识别、行人再识别等生物特征识别技术。Computer Vision Technology (Computer Vision, CV) Computer vision is a science that studies how to make machines "see". More specifically, it refers to machine vision that uses cameras and computers instead of human eyes to identify, track and measure targets. And further graphics processing, so that the computer processing becomes an image that is more suitable for human observation or sent to the instrument for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous positioning and maps Construction and other technologies also include common face recognition, fingerprint recognition, pedestrian re-identification and other biometric recognition technologies.

本申请实施例提供的方案涉及人工智能的行人再识别等技术，具体通过如下实施例进行说明:The solutions provided in the embodiments of the present application relate to technologies such as artificial intelligence pedestrian re-identification, and are specifically described through the following embodiments:

图1为本申请实施例中行人再识别技术的场景示例图。摄像设备可以包括但不限于摄像头、监控设备、录像仪等，本申请实施例对此不做限定。摄像设备与服务器通过网络连接，使得摄像设备可以将拍摄到的图像或视频传输到服务器。服务器接收到这些图像或视频后，可以通过行人再识别技术(以下简称ReID技术)识别出图像或视频中的行人，具体流程大概如下：FIG. 1 is an example diagram of a scene of the pedestrian re-identification technology in the embodiment of the present application. The camera device may include but not limited to a camera, a monitoring device, a video recorder, etc., which is not limited in this embodiment of the present application. The camera device is connected to the server through a network, so that the camera device can transmit the captured images or videos to the server. After receiving these images or videos, the server can identify pedestrians in the images or videos through pedestrian re-identification technology (hereinafter referred to as ReID technology). The specific process is roughly as follows:

服务器获取摄像设备中的图像或视频数据。若服务器接收到摄像设备传输的视频数据，则服务器可以按照抽样方式抽取其中的若干帧图像。然后，服务器可以将获取到的图像进行聚类，将相同行人对应的所有图像归为同一类别。示例性的，摄像设备A拍摄到行人A的三张图像，摄像设备B也拍摄到了行人A的三张图像，则服务器可以将这六张图像归为同一类别。最后，服务器根据人脸识别等识别算法，识别出每一类图像对应的行人身份。The server acquires image or video data in the camera device. If the server receives the video data transmitted by the camera device, the server may extract several frames of images in a sampling manner. Then, the server can cluster the acquired images, and classify all images corresponding to the same pedestrian into the same category. Exemplarily, if camera device A captures three images of pedestrian A, and camera device B also captures three images of pedestrian A, the server may classify these six images into the same category. Finally, the server identifies the identity of the pedestrian corresponding to each type of image based on recognition algorithms such as face recognition.

上述流程中，服务器将图像进行聚类的聚类算法在整个流程中起着重要作用。目前采用的层次聚类算法以及rank order聚类算法难以得到合适的样本，或者说难以得到归类准确的样本。以下将对层次聚类算法和rank-order聚类算法进行简单介绍：In the above process, the clustering algorithm for the server to cluster the images plays an important role in the whole process. The current hierarchical clustering algorithm and rank order clustering algorithm are difficult to obtain suitable samples, or it is difficult to obtain accurate samples. The following will briefly introduce the hierarchical clustering algorithm and the rank-order clustering algorithm:

层次聚类算法实现的方式为：先将每个对象作为一个簇，然后合并这些原子簇为越来越大的簇，直到所有对象都在一个簇中，或者某个终结条件被满足。采用层次聚类算法的ReID模型特征表达能力弱，同人跨越摄像头、同人正反面等情况相似度偏低，不同人衣着相似相似度高，利用层次聚类方法会产生大量的坏档(不同人的图像归到同一类别)和一人多档(相同人的图像分别归到多个类别)。此外，终止条件不容易设定。The hierarchical clustering algorithm is implemented in the following way: first treat each object as a cluster, and then merge these atomic clusters into larger and larger clusters until all objects are in one cluster, or a certain terminal condition is satisfied. The feature expression ability of the ReID model using the hierarchical clustering algorithm is weak, the similarity of the same person crossing the camera, the front and back of the same person is low, and the similarity of different people's clothing is high. Using the hierarchical clustering method will generate a large number of bad files (different people's images are assigned to the same category) and multiple files for one person (images of the same person are assigned to multiple categories). Furthermore, termination conditions are not easy to set.

基于rank order排序距离的聚类方法被广泛应用于手机相册聚类。rank order基于一个常见现象：同一个人的两张脸有许多共享的邻居，但是来自不同人的人脸的邻居通常差异很大。rank order聚类算法应用于ReID技术时，离群点(outlier)出现会放大排序距离，导致同一个人没有聚到一起。The clustering method based on rank order sorting distance is widely used in mobile phone photo album clustering. The rank order is based on a common phenomenon: two faces of the same person have many shared neighbors, but the neighbors of faces from different people are usually very different. When the rank order clustering algorithm is applied to ReID technology, the appearance of outliers will enlarge the sorting distance, resulting in the same person not being clustered together.

图2为本申请实施例中离群点的示例图。可见，样本a、b、c,其中a、b属于同一个人，c与a、b不属于同人，如果c与a、b其中一个距离很近，与另外一个很远，则成c为离群点。例如，c是a的第一个邻居，是b的第155个邻居。FIG. 2 is an example diagram of outliers in the embodiment of the present application. It can be seen that samples a, b, c, where a, b belong to the same person, c and a, b do not belong to the same person, if c is very close to one of a, b, and far away from the other, then c is an outlier point. For example, c is the first neighbor of a and is the 155th neighbor of b.

因此，上述的聚类算法未能达到要求，不能得到归类合理的样本。Therefore, the above-mentioned clustering algorithm fails to meet the requirements and cannot obtain reasonably classified samples.

有鉴于此，本申请实施例提供一种样本分类的方法，以解决上述技术问题。图3为本申请实施例提供的一种样本分类的方法的示意图。该方法包括：In view of this, an embodiment of the present application provides a sample classification method to solve the above technical problem. FIG. 3 is a schematic diagram of a method for classifying samples provided by an embodiment of the present application. The method includes:

301、获取第一待处理集合和第二待处理集合，第一待处理集合包括至少一个第一样本图像，第二待处理集合包括至少一个第二样本图像；301. Acquire a first set to be processed and a second set to be processed, the first set to be processed includes at least one first sample image, and the second set to be processed includes at least one second sample image;

在本申请实施例中，服务器首先获取第一待处理集合和第二待处理集合，第一待处理集合和第二待处理集合也可以称为聚类算法中的类，为方便描述，以下统一将第一待处理集合和第二待处理集合称为A类和B类。可以理解的是，服务器不仅获取到A类和B类，还可以获取到其他类，例如C类、D类等。在本申请实施例中，为方便描述，仅对其中的A类和B类的情况进行描述，不应理解为服务器仅获取到A类和B类。示例性的，服务器获取到的待处理集合包括A类(第一待处理集合)、B类(第二待处理集合)、C类、D类等，本申请实施例中，服务器可以判断A类与B类是否能够合并。若A类与B类能够合并，则服务器将A类与B类中的样本图像均归为一个大类。以此类推，服务器可以将各个类进行合并，得到若干个大类，每个大类下包括若干个样本图像，从而实现样本图像的聚合，将样本图像进行了合理的分类，使得服务器可以根据合理分类的样本图像识别行人身份。例如，服务器得到的一个大类中的样本图像均为行人A的图像，则服务器对该大类下的样本图像进行行人识别时，可以提高识别的准确率，并且可以识别出该大类下的图像均为行人A的图像。In the embodiment of the present application, the server first acquires the first set to be processed and the second set to be processed. The first set to be processed and the second set to be processed can also be referred to as classes in the clustering algorithm. For the convenience of description, the following unified The first to-be-processed set and the second to-be-processed set are referred to as class A and class B. It is understandable that the server not only obtains classes A and B, but also obtains other classes, such as classes C and D. In this embodiment of the present application, for the convenience of description, only the cases of class A and class B are described, and it should not be understood that the server only obtains class A and class B. Exemplarily, the collections to be processed acquired by the server include Class A (the first collection to be processed), Class B (the second collection to be processed), Class C, and Class D, etc. In this embodiment of the application, the server may determine the Whether it can be merged with Class B. If class A and class B can be merged, the server classifies the sample images in class A and class B into one large class. By analogy, the server can combine various categories to obtain several major categories, each of which includes several sample images, so as to realize the aggregation of sample images and reasonably classify the sample images, so that the server can Classified sample images for pedestrian identity recognition. For example, if the sample images in a large category obtained by the server are images of pedestrian A, then when the server recognizes pedestrians in the sample images under this large category, the recognition accuracy can be improved, and pedestrians under this category can be identified. The images are all images of pedestrian A.

服务器获取的每个类中可以至少包括一个样本图像。在一些实施例中，服务器获取到摄像设备传输的图像或者由视频确定的图像后，将每个图像作为样本图像进行处理。在一些实施例中，服务器可以将每个样本图像作为一个类进行处理，一个类中可以仅包括一个样本图像。示例性的，服务器获取到A样本图像和B样本图像，则服务器将A样本图像初步归为A类，将B样本图像初步归为B类。可以理解的是，本申请实施例中，A类内的样本图像为第一样本图像，B类内的样本图像为第二样本图像。Each category acquired by the server may include at least one sample image. In some embodiments, after the server acquires the image transmitted by the camera device or the image determined by the video, each image is processed as a sample image. In some embodiments, the server may treat each sample image as a class, and a class may include only one sample image. Exemplarily, when the server obtains the A sample image and the B sample image, the server initially classifies the A sample image into the category A, and preliminarily classifies the B sample image into the B category. It can be understood that, in the embodiment of the present application, the sample images in category A are the first sample images, and the sample images in category B are the second sample images.

302、获取第一图像特征向量和第二图像特征向量，第一图像特征向量与第一样本图像具有第一对应关系，第二图像特征向量与第二样本图像具有第二对应关系；302. Acquire a first image feature vector and a second image feature vector, the first image feature vector has a first corresponding relationship with the first sample image, and the second image feature vector has a second corresponding relationship with the second sample image;

在本申请实施例中，服务器可以根据第一样本图像提取特征，得到第一图像特征向量。提取特征的算法或模型可以包括但不限于卷积神经网络模型(convolutionalneural network,CNN)、人体姿态(Pose)和骨架关键点(Skeleton)模型等，具体此处不做限定。同理，服务器可以根据第二样本图像提取特征，得到第二图像特征向量。In this embodiment of the present application, the server may extract features according to the first sample image to obtain the first image feature vector. Algorithms or models for extracting features may include, but are not limited to, convolutional neural network (CNN), human body pose (Pose) and skeleton key point (Skeleton) models, etc., which are not limited here. Similarly, the server may extract features according to the second sample image to obtain the second image feature vector.

可以理解的是，第一图像特征向量和第二图像特征向量用于表示样本图像中的行人的信息。It can be understood that the first image feature vector and the second image feature vector are used to represent the information of the pedestrian in the sample image.

303、根据第一图像特征向量和第二图像特征向量获取第一待处理集合和第二待处理集合之间的距离，距离包括绝对距离，互近邻距离以及邻居排序距离；303. Obtain the distance between the first set to be processed and the second set to be processed according to the first image feature vector and the second image feature vector, where the distance includes an absolute distance, a mutual neighbor distance, and a neighbor ranking distance;

在本申请实施例中，服务器需要计算多个距离，包括绝对距离(也可以称为最近距离)，互近邻距离以及邻居排序距离，以下将分别对这三个距离进行详细的描述。In the embodiment of the present application, the server needs to calculate multiple distances, including absolute distance (also called the shortest distance), mutual neighbor distance and neighbor ranking distance. The three distances will be described in detail below.

在一些实施例中，服务器计算绝对距离的方式可以是服务器首先根据第一图像特征向量和第二图像特征向量确定第一样本图像与第二样本图像之间的余弦距离。然后，服务器根据余弦距离确定第一待处理集合和第二待处理集合之间的绝对距离，绝对距离为余弦距离的最小值。服务器计算余弦距离的方法在本申请实施例中不做具体限定。服务器计算绝对距离可通过绝对距离计算公式进行计算，绝对距离计算公式为：In some embodiments, the server may calculate the absolute distance by first determining the cosine distance between the first sample image and the second sample image according to the first image feature vector and the second image feature vector. Then, the server determines the absolute distance between the first set to be processed and the second set to be processed according to the cosine distance, and the absolute distance is the minimum value of the cosine distance. The method for the server to calculate the cosine distance is not specifically limited in this embodiment of the application. The absolute distance calculated by the server can be calculated through the absolute distance calculation formula. The absolute distance calculation formula is:

d(C_i,C_j)＝min(f(x_m,x_n))；d(C _i ,C _j )=min(f(x _m ,x _n ));

其中，d(C_i,C_j)为C_i类与C_j类之间的绝对距离，x_m为C_i类中的样本图像，x_n为C_j类中的样本图像，f(x_m,x_n)为计算样本图像之间余弦距离的函数。Among them, d(C _i , C _j ) is the absolute distance between class C _i and class C _j , x _m is the sample image in class C _i , x _n is the sample image in class C _j , f(x _m , x _n ) is a function to calculate the cosine distance between sample images.

示例性的，A类包括样本图像a和样本图像b，B类包括样本图像c和样本图像d，计算得到样本图像a和样本图像c的余弦距离为1，样本图像a和样本图像d的余弦距离为2，样本图像b和样本图像c的余弦距离为3，样本图像b和样本图像d的余弦距离为4，则服务器确定A类与B类的绝对距离为1。Exemplarily, class A includes sample image a and sample image b, class B includes sample image c and sample image d, the calculated cosine distance between sample image a and sample image c is 1, and the cosine distance between sample image a and sample image d is If the distance is 2, the cosine distance between sample image b and sample image c is 3, and the cosine distance between sample image b and sample image d is 4, then the server determines that the absolute distance between class A and class B is 1.

在一些实施例中，A类仅有一个样本图像，B类也仅有一个样本图像，则A类与B类的绝对距离可以是两个样本图像之间的余弦距离。In some embodiments, class A has only one sample image, and class B has only one sample image, then the absolute distance between class A and class B may be the cosine distance between the two sample images.

在一些实施例中，服务器还可以确定与绝对距离关联的第一绝对样本图像与第二绝对样本图像。示例性的，服务器根据样本图像a和样本图像c计算得到的余弦距离最小，则服务器以该余弦距离作为绝对距离，与该绝对距离关联的样本图像a可以作为第一绝对样本图像，与该余弦距离关联的样本图像c可以作为第二绝对样本图像。In some embodiments, the server may also determine the first absolute sample image and the second absolute sample image associated with the absolute distance. Exemplarily, if the cosine distance calculated by the server based on sample image a and sample image c is the smallest, the server uses the cosine distance as the absolute distance, and the sample image a associated with the absolute distance can be used as the first absolute sample image, and the cosine distance The distance-associated sample image c can be used as the second absolute sample image.

在一些实施例中，服务器可以计算互近邻距离的方式可以是服务器首先获取第一待处理集合对应的第一绝对距离，根据第一绝对距离的大小排序得到第一待处理集合对应的第一最近邻序列。然后，服务器获取第二待处理集合对应的第二绝对距离；根据第二绝对距离的大小排序得到第二待处理集合对应的第二最近邻序列。最后，服务器根据第一待处理集合在第二最近邻序列的序号与第二待处理集合在第一最近邻序列的序号确定第一待处理集合和第二待处理集合之间的互近邻距离。In some embodiments, the way in which the server can calculate the mutual neighbor distance is that the server first obtains the first absolute distance corresponding to the first set to be processed, and sorts the first absolute distance corresponding to the first set to be processed according to the size of the first absolute distance. adjacent sequence. Then, the server obtains the second absolute distance corresponding to the second to-be-processed set; sorts according to the size of the second absolute distance to obtain the second nearest neighbor sequence corresponding to the second to-be-processed set. Finally, the server determines the mutual neighbor distance between the first set to be processed and the second set to be processed according to the sequence number of the first set to be processed in the second nearest neighbor sequence and the sequence number of the second set to be processed in the first nearest neighbor sequence.

示例性的，服务器获取到的待处理集合可以包括A类(第一待处理集合)、B类(第二待处理集合)、C类、D类等。然后，服务器可以计算A类与所有类之间的绝对距离(也可以称为第一绝对距离)，并根据该绝对距离将所有类进行排序，得到第一最近邻序列，如图4所示。图4为本申请实施例中第一最近邻序列和第二最近邻序列的示例图。其中，O_A为第一最近邻序列，O_B为第二最近邻序列，图4中的类可以包括A类、B类、C类、D类等。示例性的，假设服务器计算得到A类与A类的绝对距离为1，A类与B类的绝对距离为0.1，A类与C类的绝对距离为0.8，A类与D类的绝对距离为0.3，则服务器可以根据这些绝对距离由大到小排序得到第一最近邻序列，如图4所示。在另一些实施例中，服务器还可以通过其他方式排序，此处不做限定。Exemplarily, the collection to be processed acquired by the server may include category A (first collection to be processed), category B (second collection to be processed), category C, category D, and so on. Then, the server can calculate the absolute distance (also called the first absolute distance) between class A and all classes, and sort all classes according to the absolute distance to obtain the first nearest neighbor sequence, as shown in FIG. 4 . FIG. 4 is an example diagram of a first nearest neighbor sequence and a second nearest neighbor sequence in an embodiment of the present application. Among them, _OA is the first nearest neighbor sequence, _OB is the second nearest neighbor sequence, and the classes in Figure 4 can include class A, class B, class C, class D, etc. Exemplarily, suppose the server calculates that the absolute distance between class A and class A is 1, the absolute distance between class A and class B is 0.1, the absolute distance between class A and class C is 0.8, and the absolute distance between class A and class D is 0.3, the server can obtain the first nearest neighbor sequence by sorting these absolute distances from large to small, as shown in Figure 4. In some other embodiments, the server may also sort by other methods, which are not limited here.

在一些实施例中，服务器可以通过互近邻距离计算公式计算类间的互近邻距离，该互近邻距离计算公式为：In some embodiments, the server may calculate the mutual neighbor distance between classes through a mutual neighbor distance calculation formula, and the mutual neighbor distance calculation formula is:

MN(C_i,C_j)＝m+n；MN(C _i ,C _j )=m+n;

其中，MN为互近邻距离，m为C_i类在C_j类对应的最近邻序列的序号，n为C_j类在C_i类对应的最近邻序列的序号。Among them, MN is the mutual neighbor distance, m is the serial number of the nearest neighbor sequence corresponding to class C _i in class C _j , and n is the serial number of the nearest neighbor sequence corresponding to class C _j in class C _i .

示例性的，如图4所示，第一最近邻序列的顺序为A类、C类、D类和B类，则B类在第一最近邻序列中的序号为3。同理，A类在第二最近邻序列中的序号为5，则A类与B类的互近邻距离为3+5＝8。Exemplarily, as shown in FIG. 4 , the order of the first nearest neighbor sequence is class A, class C, class D, and class B, and the sequence number of class B in the first nearest neighbor sequence is 3. Similarly, if the serial number of class A in the second nearest neighbor sequence is 5, then the mutual neighbor distance between class A and class B is 3+5=8.

在本申请实施例中，服务器计算邻居排序距离的方法可以为：根据第一最近邻序列和第二最近邻序列确定第一待处理集合和第二待处理集合之间的邻居排序距离。In this embodiment of the present application, the method for the server to calculate the neighbor sorting distance may be: determining the neighbor sorting distance between the first to-be-processed set and the second to-be-processed set according to the first nearest neighbor sequence and the second nearest neighbor sequence.

在一些实施例中，服务器可以先计算所有到A类距离比B类到A类的距离近或相等的类，在第二近邻序列中的排序和(也可以称为非对称rank order距离)。服务器可以通过排序和公式进行计算，该排序和公式为：In some embodiments, the server may first calculate the sort sum (also called asymmetric rank order distance) of all classes whose distance to class A is closer or equal to that of class B to class A in the second nearest neighbor sequence. The server can calculate by sorting and formula, which is:

其中，D(A,B)为A类与B类的非对称rank order距离，O_A(B)为第一最近邻序列中，B类的序号，O_B(f_A(i))为第二最近邻序列中，i类对应的序号。Among them, D(A,B) is the asymmetric rank order distance between class A and class B, O _A (B) is the serial number of class B in the first nearest neighbor sequence, O _B (f _A (i)) is the In the two-nearest neighbor sequence, the serial number corresponding to class i.

具体地，服务器可以先计算每个类与A类的绝对距离，若某个类与A类的绝对距离大于或等于A类与B类之间的绝对距离，则说明该类更接近A类，服务器可以将该类在第二最近邻序列的序号记录下来。服务器遍历所有类之后，可以将记录下来的序号相加，得到排序和。示例性的，如图5所示，图5为本申请实施例中，服务器计算邻居排序距离的示例图。图5中，A类、C类和D类均排在B类之前，说明A类、C类和D类均比B类更接近A类，也可以说，A类与A类的绝对距离大于A类与B类之间的绝对距离，C类与A类的绝对距离大于A类与B类之间的绝对距离，D类与A类的绝对距离也大于A类与B类之间的绝对距离。而B类与A类的绝对距离等于A类与B类之间的绝对距离。因此，服务器将A类、C类、D类和B类在第二最近邻序列的序号记录，分别是5、2、4、0。因此，服务器可以计算得到5+2+4+0＝11。可以通过如下公式表示该过程：Specifically, the server can first calculate the absolute distance between each class and class A. If the absolute distance between a class and class A is greater than or equal to the absolute distance between class A and class B, it means that the class is closer to class A. The server can record the sequence number of the class in the second nearest neighbor sequence. After the server traverses all classes, it can add up the recorded serial numbers to obtain the sorted sum. Exemplarily, as shown in FIG. 5 , FIG. 5 is an example diagram of the server calculating the neighbor ranking distance in the embodiment of the present application. In Figure 5, Class A, Class C, and Class D are all ranked before Class B, indicating that Class A, Class C, and Class D are all closer to Class A than Class B. It can also be said that the absolute distance between Class A and Class A is greater than The absolute distance between class A and class B, the absolute distance between class C and class A is greater than the absolute distance between class A and class B, the absolute distance between class D and class A is also greater than the absolute distance between class A and class B distance. The absolute distance between class B and class A is equal to the absolute distance between class A and class B. Therefore, the server records the sequence numbers of class A, class C, class D, and class B in the second nearest neighbor sequence, which are 5, 2, 4, and 0, respectively. Therefore, the server can calculate 5+2+4+0=11. This process can be represented by the following formula:

在本申请实施例中，服务器还可以计算得到B类与A类的非对称rank order距离为D(B,A)，计算过程与前述计算D(A,B)的过程类似，此处不再赘述。In the embodiment of this application, the server can also calculate the asymmetric rank order distance between class B and class A as D(B,A). The calculation process is similar to the aforementioned process of calculating D(A,B), and will not repeat.

服务器计算得到D(A,B)和D(B,A)后，可以将它们相加得到邻居排序距离，邻居排序距离的计算公式为：After the server calculates D(A,B) and D(B,A), they can be added together to obtain the neighbor sorting distance. The formula for calculating the neighbor sorting distance is:

RO(A,B)＝D(A,B)+D(B,A)；RO(A,B)=D(A,B)+D(B,A);

其中，RO(A,B)为A类与B类之间的邻居排序距离。Among them, RO(A,B) is the neighbor ranking distance between class A and class B.

304、若绝对距离满足第一设定条件，互近邻距离满足第二设定条件且邻居排序距离满足第三设定条件，则生成第一待处理集合和第二待处理集合对应的样本分类结果。304. If the absolute distance satisfies the first set condition, the mutual neighbor distance satisfies the second set condition, and the neighbor ranking distance satisfies the third set condition, then generate sample classification results corresponding to the first set to be processed and the second set to be processed .

在本申请实施例中，服务器可以预先设定第一设定条件、第二设定条件和第三设定条件，从而当绝对距离满足第一设定条件，互近邻距离满足第二设定条件且邻居排序距离满足第三设定条件时，服务器生成第一待处理集合和第二待处理集合对应的样本分类结果。在一些实施例中，样本分类结果可以是第一待处理集合和第二待处理集合合并为一个大的目标集合，第一待处理集合和第二待处理集合中的第一样本图像和第二样本图像均属于该大的目标集合。在本申请实施例中，目标集合也可以称为大类。In this embodiment of the application, the server can pre-set the first setting condition, the second setting condition and the third setting condition, so that when the absolute distance satisfies the first setting condition, the distance between adjacent neighbors satisfies the second setting condition And when the neighbor ranking distance satisfies the third set condition, the server generates sample classification results corresponding to the first set to be processed and the second set to be processed. In some embodiments, the sample classification result may be that the first set to be processed and the second set to be processed are merged into a large target set, and the first sample image and the second set in the first set to be processed and the second set to be processed Both sample images belong to the large target set. In this embodiment of the present application, a target set may also be referred to as a category.

示例性的，若A类与B类的绝对距离小于或等于3，互近邻距离大于t且邻居排序距离小于或等于30，则服务器将A类和B类合并为新的大类，即A类和B类中的样本图像均属于该新的大类。满足的条件可以用公式表示如下：Exemplarily, if the absolute distance between class A and class B is less than or equal to 3, the mutual neighbor distance is greater than t, and the neighbor ranking distance is less than or equal to 30, the server will merge class A and class B into a new class, that is, class A The sample images in the and B categories all belong to this new category. The conditions to be satisfied can be expressed as follows:

MN(C_i,C_j)<＝3||d(C_i,C_j)＞t||RO(A,B)<＝30；MN(C _i ,C _j )<=3||d(C _i ,C _j )＞t||RO(A,B)<=30;

其中，t可以根据实际需要设定，本申请实施例对此不做具体限定。在一些实施例中，服务器设定t，使得样本分类的准确率达到97％。Wherein, t may be set according to actual needs, which is not specifically limited in this embodiment of the present application. In some embodiments, the server sets t so that the accuracy rate of sample classification reaches 97%.

可选地，在上述图3对应的实施例的基础上，本发明实施例的一个可选实施例中，生成第一待处理集合和第二待处理集合对应的样本分类结果之后，方法还包括：Optionally, on the basis of the above embodiment corresponding to FIG. 3 , in an optional embodiment of the embodiment of the present invention, after generating the sample classification results corresponding to the first set to be processed and the second set to be processed, the method further includes :

获取目标集合，目标集合包括第一待处理集合和第二待处理集合；Obtaining a target set, where the target set includes a first to-be-processed set and a second to-be-processed set;

根据第一图像特征向量和第二图像特征向量确定单特征向量；determining a single eigenvector according to the first image eigenvector and the second image eigenvector;

根据单特征向量计算目标集合之间的相似度；Calculate the similarity between target sets based on a single feature vector;

向终端设备发送相似度小于设定阈值的目标集合，使得终端设备展示目标集合。The target set whose similarity is less than the set threshold is sent to the terminal device, so that the terminal device displays the target set.

在本申请实施例中，服务器可以获取到目标集合，目标集合为第一待处理集合和第二待处理集合合并形成的。可以理解的是，服务器可以形成多个目标集合，示例性的，服务器首先获取到了A类、B类、C类、D类，假设A类与B类合并成一个目标集合，C类和D类合并成一个目标集合，则服务器得到了两个目标集合。以此类推，服务器在实际应用中可以获取到若干个目标集合。In the embodiment of the present application, the server may obtain the target set, and the target set is formed by combining the first set to be processed and the second set to be processed. It is understandable that the server can form multiple target sets. For example, the server first obtains categories A, B, C, and D. Assume that categories A and B are combined into one target set, and categories C and D are Combined into one target set, the server gets two target sets. By analogy, the server can obtain several target sets in practical applications.

在一些实施例中，服务器获取到目标集合后，目标集合中包括第一样本图像和第二样本图像，则服务器可以根据第一图像特征向量和第二图像特征向量确定单特征向量。在一些实施例中，单特征向量为第一图像特征向量和第二图像特征向量的平均值。示例性的，第一图像特征向量为[1,2,3]，第二图像特征向量[7,8,9]，则单特征向量为[(1+7)/2,(2+8)/2,(3+9)/2]＝[4,5,6]。在一些实施例中，目标集合中的所有图像特征向量求平均值可以得到该目标集合对应的单特征向量。In some embodiments, after the server acquires the target set, and the target set includes the first sample image and the second sample image, the server may determine a single feature vector according to the first image feature vector and the second image feature vector. In some embodiments, the single feature vector is the average of the first image feature vector and the second image feature vector. Exemplarily, the first image feature vector is [1,2,3], the second image feature vector is [7,8,9], then the single feature vector is [(1+7)/2,(2+8) /2,(3+9)/2]=[4,5,6]. In some embodiments, the average of all image feature vectors in the target set can be used to obtain a single feature vector corresponding to the target set.

然后，服务器可以根据单特征向量计算目标集合之间的相似度。在一些实施例中，服务器可以根据单特征向量计算目标集合之间的余弦相似度，计算方法此处不做具体限定。Then, the server can calculate the similarity between target sets according to the single feature vector. In some embodiments, the server may calculate the cosine similarity between the target sets according to the single feature vector, and the calculation method is not specifically limited here.

在一些实施例中，服务器将相似度小于设定阈值的目标集合发送至终端设备进行展示。在另一些实施例中，服务器将相似度进行排序，并将相似度最高的前5组目标集合进行展示。在一些实施例中，服务器进行展示的方式可以是服务器将需要展示的内容发送至带有显示屏的终端设备，使得该终端设备在该显示屏上显示需要展示的内容。展示的内容可以是显示该目标集合的向量，也可以是显示该目标集合中的一个样本图像。In some embodiments, the server sends the target set whose similarity is less than the set threshold to the terminal device for display. In some other embodiments, the server sorts the similarities, and displays the top 5 groups of targets with the highest similarities. In some embodiments, the way for the server to display may be that the server sends the content to be displayed to a terminal device with a display screen, so that the terminal device displays the content to be displayed on the display screen. The displayed content may be a vector showing the target set, or a sample image in the target set.

在一些实施例中，服务器通过姿态优选算法选择目标集合中的一个样本图像发送至带有显示屏的终端设备进行展示。姿态优选算法在本申请实施例中不做具体限定。图6为本申请实施例中，目标集合的展示示例图。服务器可以将选中的样本图像发送至带有显示屏的终端设备，使得该终端设备在该显示屏上显示该样本图像。从图6可见，界面中包括标题栏、功能板块以及主界面，其中，标题栏用于显示程序的标题，功能板块用于显示可选的功能。主界面中，显示有多组样本图像，每个样本图像可以代表一个目标集合。由于边幅限制，图6中仅显示了3组样本图像，在实际应用中，展示的样本图像数量没有限制，本申请实施例对此不做具体限定。In some embodiments, the server selects a sample image in the target set through a pose optimization algorithm and sends it to a terminal device with a display screen for display. The posture optimization algorithm is not specifically limited in this embodiment of the application. FIG. 6 is a diagram showing an example of a target set in the embodiment of the present application. The server may send the selected sample image to a terminal device with a display screen, so that the terminal device displays the sample image on the display screen. It can be seen from Fig. 6 that the interface includes a title bar, a function block and a main interface, wherein the title bar is used to display the title of the program, and the function block is used to display optional functions. In the main interface, multiple sets of sample images are displayed, and each sample image can represent a target set. Due to margin limitations, only three sets of sample images are shown in FIG. 6 . In practical applications, there is no limit to the number of displayed sample images, which is not specifically limited in this embodiment of the present application.

可选地，在上述图3对应的各个实施例的基础上，本发明实施例的一个可选实施例中，展示相似度小于设定阈值的目标集合之后，方法还包括：Optionally, on the basis of the various embodiments corresponding to FIG. 3 above, in an optional embodiment of the embodiment of the present invention, after displaying target sets whose similarities are less than the set threshold, the method further includes:

获取标注信息，标注信息与目标集合具有关联关系；Obtain annotation information, which has an association relationship with the target set;

根据标注信息确定目标集合对应的二次样本分类结果。Determine the secondary sample classification result corresponding to the target set according to the label information.

在本申请实施例中，服务器可以获取标注信息，标注信息与目标集合具有关联关系。然后，服务器可以将标注信息关联的若干个目标集合合并为新的大类，得到二次样本分类结果。示例性的，如图6所示，主界面中第一组样本图像均为老奶奶，工作人员观察到这两幅样本图像中的行人实际上是同一个人，则工作人员可以点击主界面上对应位置的“是”虚拟按钮。响应于该点击操作，该显示屏对应的终端设备可以生成标注信息，该标注信息包括第一组样本图像对应的目标集合标识。然后，终端设备可以将该标注信息发送至服务器，则服务器可以获取到该标注信息。In the embodiment of the present application, the server may obtain annotation information, and the annotation information has an association relationship with the target set. Then, the server can combine several target sets associated with the label information into a new category to obtain a secondary sample classification result. Exemplarily, as shown in Figure 6, the first group of sample images in the main interface are all grandmothers, and the staff observes that the pedestrians in the two sample images are actually the same person, then the staff can click on the corresponding position on the main interface The "Yes" virtual button. In response to the click operation, the terminal device corresponding to the display screen may generate annotation information, where the annotation information includes an identifier of a target set corresponding to the first group of sample images. Then, the terminal device can send the marking information to the server, and the server can obtain the marking information.

然后，服务器可以根据该标注信息中的目标集合标识将对应的目标集合合并为一个新的大类。示例性的，如图6所示，服务器可以根据标注信息中的目标集合标识，将老奶奶对应的目标集合合并为一个新的大类，大类中包括该两个目标集合，大类中的样本图像均为老奶奶对应的样本图像。Then, the server may combine the corresponding target sets into a new category according to the target set identifier in the annotation information. Exemplarily, as shown in FIG. 6 , the server can merge the target set corresponding to the grandmother into a new category according to the target set identifier in the tagging information, the category includes the two target sets, and the samples in the category The images are all sample images corresponding to the grandmother.

图7为本申请实施例中一个可选实施例的流程示意图。该流程可以描述为：Fig. 7 is a schematic flowchart of an optional embodiment in the embodiments of the present application. The process can be described as:

701、获取行人图像；701. Obtain pedestrian images;

在本实施例中，服务器可以获取有关行人的图像(以下称为行人图像)。在一些实施例中，服务器可以从摄像设备中获取得到行人图像。在另一些实施例中，服务器可以从数据库中读取得到。具体可参照前述步骤301的描述，此处不再赘述。In this embodiment, the server can acquire images of pedestrians (hereinafter referred to as pedestrian images). In some embodiments, the server may acquire images of pedestrians from camera equipment. In other embodiments, the server can read from the database. For details, reference may be made to the description of the aforementioned step 301 , which will not be repeated here.

702、提取特征；702. Extract features;

在本申请实施例中，服务器可以从行人图像中提取到关于行人的特征，或者从行人图像中提取整个图像的特征，本申请实施例对此不做具体限定。具体可参照前述步骤302的描述，此处不再赘述。In the embodiment of the present application, the server may extract features related to the pedestrian from the pedestrian image, or extract features of the entire image from the pedestrian image, which is not specifically limited in the embodiment of the present application. For details, reference may be made to the description of the foregoing step 302 , which will not be repeated here.

703、MN&RO细粒度聚类；703. MN&RO fine-grained clustering;

在本申请实施例中，服务器可以根据提取到的特征对行人图像进行聚类，具体可参照前述步骤303和步骤304的描述，此处不再赘述。可以理解的是，本申请实施例中，可以将一个行人图像作为一个类进行聚类。In the embodiment of the present application, the server may cluster the images of pedestrians according to the extracted features. For details, please refer to the descriptions of the aforementioned steps 303 and 304 , which will not be repeated here. It can be understood that, in the embodiment of the present application, one pedestrian image may be clustered as one class.

704、类内多特征融合成单特征；704. Multiple features within a class are fused into a single feature;

在本申请实施例中，服务器通过上述步骤703的聚类，将多个行人图像进行聚类，可以得到N个类，每个类中包括n_i个行人图像。其中，每个行人图像均有对应的特征，则每个类内均有n_i个特征。然后，服务器可以将每个类内的n_i个特征融合成单特征。在一些实施例中，服务器可以通过特征融合算法进行融合。在另一些实施例中，服务器可以将对每个类的多个特征取平均特征，作为该类的特征表征。每个类n_i个行人图像融合成1个特征，形成新特征集F＝{f₁,f₂,f₃,...f_N}。其中，N为大于或等于1的整数，n_i为大于或等于1的整数。In the embodiment of the present application, the server clusters a plurality of pedestrian images through the clustering in the above step 703 to obtain N categories, and each category includes n _i pedestrian images. Among them, each pedestrian image has corresponding features, and there are n _i features in each class. Then, the server can fuse the _ni features within each class into a single feature. In some embodiments, the server can perform fusion through a feature fusion algorithm. In some other embodiments, the server may take the average feature of multiple features of each class as the feature representation of the class. The n _i pedestrian images of each class are fused into one feature to form a new feature set F={f ₁ , f ₂ , f ₃ ,...f _N }. Wherein, N is an integer greater than or equal to 1, and n _i is an integer greater than or equal to 1.

705、姿态优选；705. Optimal posture;

在本申请实施例中，服务器得到的每个类中包括n_i个行人图像。因此，服务器可以从n_i个行人图像中选择一个行人图像代表该类。在一些实施例中，服务器通过姿态优选算法进行选择。在另一些实施例中，服务器可以通过人工选择，此处不做具体限定。In the embodiment of the present application, each category obtained by the server includes n _i pedestrian images. Therefore, the server can select a pedestrian image from n _i pedestrian images to represent this class. In some embodiments, the server makes the selection via a pose-optimization algorithm. In other embodiments, the server may be manually selected, which is not specifically limited here.

706、自检索；706. Self-retrieval;

在本申请实施例中，服务器可以根据新特征集F＝{f₁,f₂,f₃,...f_N}对N个类之间的相似度进行计算。示例性的，服务器可以计算f₁,和f₂之间的余弦相似度得到1类和2类之间的相似度。以此类推，服务器可以得到所有类之间的相似度，然后服务器将这些相似度进行排序，选择其中相似度最高的前5组类。示例性的，服务器计算得到f₁,和f₂之间的余弦相似度排序为第1，则服务器可以选择f₁,和f₂对应的类(即1类和2类)。In the embodiment of the present application, the server may calculate the similarity between N classes according to the new feature set F={f ₁ , f ₂ , f ₃ , . . . f _N }. Exemplarily, the server may calculate the cosine similarity between f ₁ and f ₂ to obtain the similarity between class 1 and class 2. By analogy, the server can obtain the similarity between all classes, and then the server sorts these similarities, and selects the top 5 groups of classes with the highest similarity. Exemplarily, the server calculates that the cosine similarity between f ₁ and f ₂ ranks first, then the server can select the class corresponding to f ₁ , and f ₂ (that is, class 1 and class 2).

707、送标；707. Send bids;

在本申请实施例中，服务器可以将前述步骤706中选择到的类对应的代表图像发送至具有显示屏的终端设备，使得终端设备展示该代表图像，如图6所示。在另一些实施例中，服务器也可以直接将前述步骤706中选择到的类发送至具有显示屏的终端设备，使得终端设备通过类似步骤705的算法选择类中的代表图像，然后将代表图像进行展示。本申请实施例对此不做具体限定。In the embodiment of the present application, the server may send the representative image corresponding to the class selected in step 706 to the terminal device with a display screen, so that the terminal device displays the representative image, as shown in FIG. 6 . In some other embodiments, the server may also directly send the class selected in step 706 to the terminal device with a display screen, so that the terminal device selects a representative image in the class through an algorithm similar to step 705, and then performs the representative image exhibit. This embodiment of the present application does not specifically limit it.

708、获取样本。708. Obtain a sample.

在本申请实施例中，步骤707送标后，工作人员在如图6所示的界面中选择相同行人的类，使得终端设备生成标注信息，具体可参照前述实施例的描述，此处不再赘述。然后，服务器获取到该标注信息，并根据该标注信息将服务器中的N个类进一步合并。合并完毕后，服务器可以得到分类完成的样本。In this embodiment of the application, after submitting bids in step 707, the staff selects the same pedestrian category in the interface as shown in Figure 6, so that the terminal device generates labeling information. For details, refer to the description of the foregoing embodiments, which will not be repeated here. repeat. Then, the server obtains the label information, and further merges the N classes in the server according to the label information. After the merging is completed, the server can obtain the classified samples.

图8为本申请实施例中另一个可选实施例的流程示意图，该流程包括：Fig. 8 is a schematic flow diagram of another optional embodiment in the embodiment of the present application, the flow includes:

801、获取行人图像；801. Obtain pedestrian images;

步骤801与前述步骤701类似，此处不再赘述。Step 801 is similar to the aforementioned step 701 and will not be repeated here.

802、提取特征；802. Extract features;

步骤802与前述步骤702类似，此处不再赘述。Step 802 is similar to the aforementioned step 702 and will not be repeated here.

803、MN&RO细粒度聚类；803, MN&RO fine-grained clustering;

步骤803与前述步骤703类似，此处不再赘述。需要说明的是，本申请实施例中，服务器通过如步骤803的步骤得到N个类。Step 803 is similar to the aforementioned step 703, and will not be repeated here. It should be noted that, in this embodiment of the application, the server obtains N classes through steps such as step 803 .

804、类内多特征融合成单特征；804. Multiple features within a class are fused into a single feature;

步骤804与前述步骤704类似，此处不再赘述。Step 804 is similar to the aforementioned step 704 and will not be repeated here.

805、MN&RO粗粒度聚类；805, MN&RO coarse-grained clustering;

在本申请实施例中，服务器可以对前述的N个类进行进一步聚类。步骤805与前述步骤803类似，但其中的设定条件可以不同。示例性的，步骤803中的设定条件为：In the embodiment of the present application, the server may perform further clustering on the aforementioned N categories. Step 805 is similar to the aforementioned step 803, but the setting conditions therein may be different. Exemplarily, the setting conditions in step 803 are:

MN(C_i,C_j)<＝3||d(C_i,C_j)>t₁||RO(A,B)<＝30；MN(C _i ,C _j )<=3||d(C _i ,C _j )>t ₁ ||RO(A,B)<=30;

步骤805中的设定条件可以为：The setting condition in step 805 can be:

MN(C_i,C_j)<＝3||d(C_i,C_j)>t₂||RO(A,B)<＝30；MN(C _i ,C _j )<=3||d(C _i ,C _j )>t ₂ ||RO(A,B)<=30;

其中，t₁和t₂可以根据实际需要进行修改，使得步骤805得到的P个类，P为大于或等于0，且小于N的整数。示例性的，步骤803中服务器聚类得到10000个小类，步骤805中服务器聚类得到5000个大类，每个大类可以至少包括一个小类。Among them, t ₁ and t ₂ can be modified according to actual needs, so that for the P classes obtained in step 805, P is an integer greater than or equal to 0 and less than N. Exemplarily, in step 803, the server clusters to obtain 10,000 sub-categories, and in step 805, the server clusters to obtain 5,000 major categories, and each major category may include at least one sub-category.

806、每个大类仅保留一个小类；806. Only one sub-category is reserved for each major category;

在本申请实施例中，服务器可以从每个大类中选择一个小类进行保留(服务器可以删除不保留的小类)。在一些实施例中，服务器将小类对应的代表图像发送至带有显示屏的终端设备，让工作人员选择其中一个小类作为保留。在另一些实施例中，服务器通过统计每个小类中的行人图像数量，选择其中行人图像数量最多的小类进行保留。在另一些实施例中，服务器可以首先通过统计每个小类中的行人图像数量，选择其中行人图像数量由多至少排列的前5个小类。然后服务器可以从该5个小类中选择方差最大的1个小类进行保留。本申请实施例对选择保留小类的方法不做具体限定。In the embodiment of the present application, the server may select a subcategory from each category to be reserved (the server may delete the subcategory that is not reserved). In some embodiments, the server sends the representative images corresponding to the subcategories to the terminal device with a display screen, allowing the staff to select one of the subcategories as a reservation. In some other embodiments, the server selects the sub-category with the largest number of pedestrian images for retention by counting the number of pedestrian images in each sub-category. In some other embodiments, the server may firstly count the number of pedestrian images in each sub-category, and select the top 5 sub-categories in which the number of pedestrian images is ranked from most to least. Then the server can select a subclass with the largest variance from the five subclasses to keep. The embodiment of the present application does not specifically limit the method for selecting and retaining subcategories.

807、获取样本。807. Obtain a sample.

在本申请实施例中，服务器根据步骤805得到P个类之后，可以将P个类中多余的小类去除，得到精简后的P个类{c₁,c₂,c₃,...c_P}。每个类中包括若干个行人图像。In the embodiment of this application, after the server obtains the P classes according to step 805, it can remove redundant small classes in the P classes to obtain the simplified P classes {c ₁ , c ₂ , c ₃ ,...c _P }. Each class includes several pedestrian images.

服务器通过本申请实施例得到的行人图像具有对应的分类，因此服务器在进行行人识别时，可以选择其中一类进行行人识别。示例性的，c₁类中的行人图像大概率是关于老奶奶的图像，因此服务器在对c₁类的行人图像进行识别时，能够有很好的识别率。The pedestrian images obtained by the server through the embodiments of the present application have corresponding classifications, so when the server performs pedestrian recognition, it can select one of the categories for pedestrian recognition. Exemplarily, the images of pedestrians in category c ₁ are most likely to be images of grandmas, so the server can have a good recognition rate when identifying pedestrian images in category c ₁ .

图9为本申请实施例中一种样本分类的装置的示例图，该样本分类的装置900包括：FIG. 9 is an example diagram of a device for classifying samples in an embodiment of the present application. The device 900 for classifying samples includes:

获取单元901，用于获取第一待处理集合和第二待处理集合，第一待处理集合包括至少一个第一样本图像，第二待处理集合包括至少一个第二样本图像；An acquiring unit 901, configured to acquire a first set to be processed and a second set to be processed, the first set to be processed includes at least one first sample image, and the second set to be processed includes at least one second sample image;

获取单元901，还用于获取第一图像特征向量和第二图像特征向量，第一图像特征向量与第一样本图像具有第一对应关系，第二图像特征向量与第二样本图像具有第二对应关系；The acquiring unit 901 is further configured to acquire a first image feature vector and a second image feature vector, the first image feature vector has a first corresponding relationship with the first sample image, and the second image feature vector has a second image feature vector with the second sample image Correspondence;

处理单元902，用于根据第一图像特征向量和第二图像特征向量获取第一待处理集合和第二待处理集合之间的距离，距离包括绝对距离，互近邻距离以及邻居排序距离；The processing unit 902 is configured to obtain the distance between the first set to be processed and the second set to be processed according to the first image feature vector and the second image feature vector, and the distance includes an absolute distance, a mutual neighbor distance and a neighbor ranking distance;

处理单元902，还用于若绝对距离满足第一设定条件，互近邻距离满足第二设定条件且邻居排序距离满足第三设定条件，则生成第一待处理集合和第二待处理集合对应的样本分类结果。The processing unit 902 is further configured to generate the first set to be processed and the second set to be processed if the absolute distance satisfies the first set condition, the mutual neighbor distance satisfies the second set condition, and the neighbor ranking distance satisfies the third set condition The corresponding sample classification results.

可选地，在上述图9对应的各个实施例的基础上，本发明实施例的一个可选实施例中，处理单元902还用于：Optionally, on the basis of the above embodiments corresponding to FIG. 9 , in an optional embodiment of the embodiment of the present invention, the processing unit 902 is further configured to:

根据第一图像特征向量和第二图像特征向量确定第一样本图像与第二样本图像之间的余弦距离；determining a cosine distance between the first sample image and the second sample image according to the first image feature vector and the second image feature vector;

根据余弦距离确定第一待处理集合和第二待处理集合之间的绝对距离，绝对距离为余弦距离的最小值。The absolute distance between the first set to be processed and the second set to be processed is determined according to the cosine distance, and the absolute distance is the minimum value of the cosine distance.

获取第一待处理集合对应的第一绝对距离；Obtain the first absolute distance corresponding to the first set to be processed;

根据第一绝对距离的大小排序得到第一待处理集合对应的第一最近邻序列；Sorting according to the size of the first absolute distance to obtain the first nearest neighbor sequence corresponding to the first set to be processed;

获取第二待处理集合对应的第二绝对距离；Obtain a second absolute distance corresponding to the second set to be processed;

根据第二绝对距离的大小排序得到第二待处理集合对应的第二最近邻序列；Sorting according to the size of the second absolute distance to obtain the second nearest neighbor sequence corresponding to the second set to be processed;

根据第一待处理集合在第二最近邻序列的序号与第二待处理集合在第一最近邻序列的序号确定第一待处理集合和第二待处理集合之间的互近邻距离。The mutual neighbor distance between the first set to be processed and the second set to be processed is determined according to the sequence number of the first set to be processed in the second nearest neighbor sequence and the sequence number of the second set to be processed in the first nearest neighbor sequence.

根据第一最近邻序列和第二最近邻序列确定第一待处理集合和第二待处理集合之间的邻居排序距离。The neighbor ranking distance between the first set to be processed and the second set to be processed is determined according to the first nearest neighbor sequence and the second nearest neighbor sequence.

选择目标集合中的一个样本图像；Select a sample image in the target collection;

向终端设备发送样本图像，使得终端设备展示样本图像。The sample image is sent to the terminal device, so that the terminal device displays the sample image.

图10为本申请实施例提供的一种服务器结构示意图，该服务器1000可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上中央处理器(central processingunits，CPU)1022(例如，一个或一个以上处理器)和存储器1032，一个或一个以上存储应用程序1042或数据1044的存储介质1030(例如一个或一个以上海量存储设备)。其中，存储器1032和存储介质1030可以是短暂存储或持久存储。存储在存储介质1030的程序可以包括一个或一个以上模块(图示没标出)，每个模块可以包括对服务器中的一系列指令操作。更进一步地，中央处理器1022可以设置为与存储介质1030通信，在服务器1000上执行存储介质1030中的一系列指令操作。FIG. 10 is a schematic structural diagram of a server provided by an embodiment of the present application. The server 1000 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1022 (for example, one or more processors) and memory 1032, one or more storage media 1030 (such as one or more mass storage devices) for storing application programs 1042 or data 1044. Wherein, the memory 1032 and the storage medium 1030 may be temporary storage or persistent storage. The program stored in the storage medium 1030 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processing unit 1022 may be configured to communicate with the storage medium 1030 , and execute a series of instruction operations in the storage medium 1030 on the server 1000 .

服务器1000还可以包括一个或一个以上电源1026，一个或一个以上有线或无线网络接口1050，一个或一个以上输入输出接口1058，和/或，一个或一个以上操作系统1041，例如Windows ServerTM，Mac OS XTM，UnixTM,LinuxTM，FreeBSDTM等等。The server 1000 can also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input and output interfaces 1058, and/or, one or more operating systems 1041, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

上述实施例中由服务器所执行的步骤可以基于该图10所示的服务器结构。The steps performed by the server in the foregoing embodiments may be based on the server structure shown in FIG. 10 .

在本申请实施例中，CPU1022具体用于执行以下步骤：In the embodiment of this application, CPU1022 is specifically used to perform the following steps:

获取第一待处理集合和第二待处理集合，第一待处理集合包括至少一个第一样本图像，第二待处理集合包括至少一个第二样本图像；Acquiring a first set to be processed and a second set to be processed, the first set to be processed includes at least one first sample image, and the second set to be processed includes at least one second sample image;

获取第一图像特征向量和第二图像特征向量，第一图像特征向量与第一样本图像具有第一对应关系，第二图像特征向量与第二样本图像具有第二对应关系；Obtaining a first image feature vector and a second image feature vector, the first image feature vector has a first correspondence with the first sample image, and the second image feature vector has a second correspondence with the second sample image;

根据第一图像特征向量和第二图像特征向量获取第一待处理集合和第二待处理集合之间的距离，距离包括绝对距离，互近邻距离以及邻居排序距离；Obtaining the distance between the first set to be processed and the second set to be processed according to the first image feature vector and the second image feature vector, the distance includes an absolute distance, a mutual neighbor distance and a neighbor sorting distance;

若绝对距离满足第一设定条件，互近邻距离满足第二设定条件且邻居排序距离满足第三设定条件，则生成第一待处理集合和第二待处理集合对应的样本分类结果。If the absolute distance satisfies the first set condition, the mutual neighbor distance satisfies the second set condition, and the neighbor sorting distance satisfies the third set condition, then the sample classification results corresponding to the first set to be processed and the second set to be processed are generated.

在本申请实施例中，CPU1022还用于执行以下步骤：In the embodiment of this application, CPU1022 is also used to perform the following steps:

根据绝对距离确定第一待处理集合所对应的第一最近邻序列和第二待处理集合所对应的的第二最近邻序列，第一最近邻序列为第二样本图像按照余弦距离的排序队列，第二最近邻序列为第一样本图像按照余弦距离的排序队列；Determine the first nearest neighbor sequence corresponding to the first set to be processed and the second nearest neighbor sequence corresponding to the second set to be processed according to the absolute distance, the first nearest neighbor sequence is the sorting queue of the second sample image according to the cosine distance, The second nearest neighbor sequence is the sorting queue of the first sample image according to the cosine distance;

根据第一绝对样本图像在第二最近邻序列的序号与第一绝对样本图像在第二最近邻序列的序号确定第一待处理集合和第二待处理集合之间的互近邻距离。The mutual neighbor distance between the first set to be processed and the second set to be processed is determined according to the sequence number of the first absolute sample image in the second nearest neighbor sequence and the sequence number of the first absolute sample image in the second nearest neighbor sequence.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统，装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, and other media that can store program codes.

Claims

1. A method of sample classification, comprising:

acquiring a first to-be-processed set and a second to-be-processed set, wherein the first to-be-processed set comprises at least one first sample image, and the second to-be-processed set comprises at least one second sample image;

Acquiring a first image feature vector and a second image feature vector, wherein the first image feature vector has a first corresponding relation with the first sample image, and the second image feature vector has a second corresponding relation with the second sample image;

Acquiring the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector, wherein the distance comprises an absolute distance, a mutual neighbor distance and a neighbor sorting distance;

and if the absolute distance meets a first set condition, the mutual neighbor distance meets a second set condition and the neighbor sorting distance meets a third set condition, generating sample classification results corresponding to the first to-be-processed set and the second to-be-processed set.

2. The method according to claim 1, wherein said obtaining the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector comprises:

determining a cosine distance between the first sample image and the second sample image according to the first image feature vector and the second image feature vector;

And determining an absolute distance between the first to-be-processed set and the second to-be-processed set according to the cosine distance, wherein the absolute distance is the minimum value of the cosine distance.

3. the method according to claim 2, wherein the absolute distance is determined from a first absolute sample image and a second absolute sample image, and wherein the obtaining the distance between the first to-be-processed set and the second to-be-processed set from the first image feature vector and the second image feature vector comprises:

Acquiring a first absolute distance corresponding to the first set to be processed;

Sequencing according to the magnitude of the first absolute distance to obtain a first nearest neighbor sequence corresponding to the first to-be-processed set;

acquiring a second absolute distance corresponding to the second to-be-processed set;

Obtaining a second nearest neighbor sequence corresponding to the second to-be-processed set according to the magnitude sorting of the second absolute distance;

And determining the mutual neighbor distance between the first to-be-processed set and the second to-be-processed set according to the sequence numbers of the first to-be-processed set in the second nearest neighbor sequence and the sequence numbers of the second to-be-processed set in the first nearest neighbor sequence.

4. The method according to claim 3, wherein said obtaining the distance between the first to-be-processed set and the second to-be-processed set according to the first image feature vector and the second image feature vector comprises:

determining a neighbor ordering distance between the first to-be-processed set and the second to-be-processed set according to the first nearest neighbor sequence and the second nearest neighbor sequence.

5. The method according to claim 1, wherein after the generating of the sample classification results corresponding to the first to-be-processed set and the second to-be-processed set, the method further comprises:

Acquiring a target set, wherein the target set comprises the first to-be-processed set and the second to-be-processed set;

Determining a single feature vector according to the first image feature vector and the second image feature vector;

calculating the similarity between the target sets according to the single feature vectors;

And sending the target set with the similarity smaller than a set threshold value to a terminal device, so that the terminal device displays the target set.

6. The method of claim 5, wherein the sending the target set with the similarity smaller than a set threshold to a terminal device, so that the terminal device presents the target set comprises:

selecting a sample image in the target set;

And sending the sample image to a terminal device, so that the terminal device displays the sample image.

7. the method of claim 5, wherein after the presenting the target set with the similarity less than a set threshold, the method further comprises:

Obtaining marking information, wherein the marking information has an incidence relation with the target set;

And determining a secondary sample classification result corresponding to the target set according to the labeling information.

8. an apparatus for sample classification, comprising:

An obtaining unit, configured to obtain a first to-be-processed set and a second to-be-processed set, where the first to-be-processed set includes at least one first sample image, and the second to-be-processed set includes at least one second sample image;

the acquiring unit is further used for acquiring a first image feature vector and a second image feature vector, wherein the first image feature vector has a first corresponding relation with the first sample image, and the second image feature vector has a second corresponding relation with the second sample image;

A processing unit, configured to obtain, according to the first image feature vector and the second image feature vector, a distance between the first to-be-processed set and the second to-be-processed set, where the distance includes an absolute distance, a mutual neighbor distance, and a neighbor sorting distance;

the processing unit is further configured to generate sample classification results corresponding to the first to-be-processed set and the second to-be-processed set if the absolute distance satisfies a first set condition, the mutual neighbor distance satisfies a second set condition, and the neighbor sorting distance satisfies a third set condition.

9. a server, comprising:

one or more than one central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory, the instructions in the memory being executable on the server to perform the method of any one of claims 1 to 7.

10. a computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.