CN114443875A

CN114443875A - File merging method and device, electronic equipment and storage medium

Info

Publication number: CN114443875A
Application number: CN202111678208.5A
Authority: CN
Inventors: 刘国伟
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-05-06
Anticipated expiration: 2041-12-31
Also published as: CN114443875B

Abstract

The invention discloses a method and a device for merging files, electronic equipment and a storage medium, wherein the method for merging the files comprises the following steps: carrying out similarity calculation on any two archived data in the archived data sets to obtain a similarity set, wherein the archived data set is an archived data set consisting of a plurality of archived data of the same person; determining a density value set among a plurality of archived data according to the similarity set; calculating a distance value set of a plurality of archived data according to the density value set and the similarity set; determining a plurality of archived data corresponding to the density values and the distance values meeting preset conditions as a plurality of clustering center points; and taking the filing data corresponding to the plurality of cluster center points as cover images. The method avoids the generated cover image from being high in randomness, and improves the accuracy and the representativeness of the cover image.

Description

File merging method, device, electronic device and storage medium

技术领域technical field

本发明涉及数据处理技术领域，具体涉及一种档案合并方法、装置、电子设备及存储介质。The present invention relates to the technical field of data processing, in particular to a file merging method, device, electronic device and storage medium.

背景技术Background technique

目前监控摄像头和人脸识别技术相结合对各种公共场所的人员进行监控，增加了城市的安全性，在对公共场所的人员进行监控的同时可以为每个人员建立人员档案，以方便对人员的管理。目前主要是对抓拍数据进行质量分层，对抓拍质量较好的图片与现有的档案封面比较，如果不相似则作为新的档案的封面，但是生成的档案封面随机性较大，导致精准度和代表性比较低。At present, the combination of surveillance cameras and face recognition technology monitors personnel in various public places, which increases the security of the city. While monitoring personnel in public places, a personnel file can be established for each person to facilitate the monitoring of personnel. management. At present, the quality of the snapshot data is mainly stratified, and the pictures with better snapshot quality are compared with the existing archive covers. If they are not similar, they are used as the cover of the new archive. However, the generated archive covers are more random, which leads to accuracy. and less representative.

发明内容SUMMARY OF THE INVENTION

第一方面，本发明的主要目的是提供一种档案合并方法，包括：In the first aspect, the main purpose of the present invention is to provide a file merging method, including:

对归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合，所述归档数据集为由同一人员的多个归档数据组成的归档数据集；Perform similarity calculation on any two archived data in the archived data set to obtain a similarity set, the archived data set is an archived data set formed by multiple archived data of the same person;

根据所述相似度集合，确定多个所述归档数据之间的密度值集合；According to the similarity set, determine a set of density values between a plurality of the archived data;

根据所述密度值集合及所述相似度集合计算多个所述归档数据的距离值集合；Calculate a plurality of distance value sets of the archived data according to the density value set and the similarity set;

将满足预设条件的密度值和距离值所对应的多个归档数据确定为多个聚类中心点；Determining multiple archived data corresponding to density values and distance values that satisfy preset conditions as multiple cluster center points;

将所述多个聚类中心点对应的归档数据，作为封面图像。The archived data corresponding to the plurality of cluster center points is used as a cover image.

可选地，所述根据所述相似度集合确定多个归档数据之间的密度值集合包括：Optionally, the determining a set of density values between multiple archived data according to the set of similarity includes:

确定所述相似度集合中每个所述归档数据的相似度阈值；determining the similarity threshold of each of the archived data in the similarity set;

根据每个归档数据的相似度阈值对所述相似度集合进行筛选；Screening the similarity set according to the similarity threshold of each archived data;

根据预定算法将筛选后的相似度集合进行计算，以得到所述密度值集合。The filtered similarity set is calculated according to a predetermined algorithm to obtain the density value set.

可选地，所述根据预定算法将筛选后的相似度集合进行计算，以得到所述密度值集合包括：Optionally, calculating the filtered similarity set according to a predetermined algorithm to obtain the density value set includes:

根据筛选后的所述相似度集合确定出对应的相似度；Determine the corresponding similarity according to the filtered similarity set;

将所述相似度进行求和，以得到每个归档数据对应的多个密度值；Summing the similarity to obtain a plurality of density values corresponding to each archived data;

将多个所述归档数据对应的所述密度值进行排序，得到所述密度值集合。Sorting the density values corresponding to a plurality of the archived data to obtain the density value set.

可选地，所述根据所述密度值集合及所述相似度集合计算多个所述归档数据的距离值集合包括：Optionally, the calculating a plurality of distance value sets of the archived data according to the density value set and the similarity set includes:

根据每个归档数据的密度值集合对所述归档数据集进行排序，并将排序后的所述归档数据集进行筛选；Sort the archived data set according to the density value set of each archived data, and filter the sorted archived data set;

将筛选后的所述归档数据集进行距离计算，得到所述距离值集合。Perform distance calculation on the archived data set after screening to obtain the distance value set.

可选地，所述根据每个归档数据的密度值集合对所述归档数据集进行排序，并将排序后的所述归档数据集进行筛选包括：Optionally, sorting the archived data set according to the density value set of each archived data, and screening the sorted archived data set includes:

根据每个归档数据的密度值将所述归档数据集由大到小进行排序；Sorting the archived data sets from large to small according to the density value of each archived data;

针对每个所述归档数据，For each of the archived data,

确定出密度值大于所述归档数据自身密度值的多个参考归档数据；determining a plurality of reference archived data whose density value is greater than the density value of the archived data itself;

在多个所述参考归档数据中确定出与所述归档数据之间相似度最小的目标归档数据，以根据所述目标归档数据和所述归档数据进行距离计算。The target archived data with the smallest similarity with the archived data is determined from the plurality of the reference archived data, so as to perform distance calculation according to the target archived data and the archived data.

可选地，所述方法还包括：Optionally, the method further includes:

根据所述多个聚类中心点，将所述归档数据集进行合并，得到聚类簇集合；其中，每个所述聚类簇包括对应的封面图像；According to the plurality of cluster center points, the archived data sets are merged to obtain a cluster set; wherein, each of the clusters includes a corresponding cover image;

根据所述聚类簇集合确定每个聚类簇的封面图像；Determine the cover image of each cluster according to the cluster set;

在多个所述聚类簇与所述封面图像满足预设关系时，将多个所述聚类簇进行合并。When a plurality of the clusters and the cover image satisfy a preset relationship, the plurality of clusters are merged.

可选地，所述在所述聚类簇集合与所述封面图像满足预设关系时，将多个所述聚类簇进行合并包括：Optionally, when the cluster set and the cover image satisfy a preset relationship, merging a plurality of the clusters includes:

判断多个所述聚类簇的数量是否大于所述封面图像的数量；Judging whether the number of the plurality of clusters is greater than the number of the cover images;

当多个所述聚类簇的数量大于所述封面图像的数量时，确定两两聚类簇的聚类中心点之间的相似度；When the number of the plurality of clusters is greater than the number of the cover images, determining the similarity between the cluster center points of the two clusters;

在两两所述聚类中心点的相似度大于预设相似度的情况下，将对应的两两聚类簇进行合并。In the case that the similarity of the two-by-two cluster center points is greater than the preset similarity, the corresponding two-by-two clusters are merged.

第二方面，本发明实施例提供了一种档案合并装置，包括：In a second aspect, an embodiment of the present invention provides a file merging device, including:

第一计算模块，用于对所述归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合，所述归档数据集为由同一人员的多个归档数据组成的归档数据集；a first calculation module, configured to perform similarity calculation on any two archived data in the archived data set to obtain a similarity set, where the archived data set is an archived data set composed of multiple archived data of the same person;

第一确定模块，用于根据所述相似度集合确定多个所述归档数据之间的密度值集合；a first determining module, configured to determine a set of density values between a plurality of the archived data according to the set of similarity;

第二计算模块，用于根据所述密度值集合及所述相似度集合计算多个所述归档数据的距离值集合；a second calculation module, configured to calculate a plurality of distance value sets of the archived data according to the density value set and the similarity set;

第二确定模块，用于将满足预设条件的密度值和距离值所对应的多个归档数据确定为多个聚类中心点；a second determination module, configured to determine a plurality of archived data corresponding to the density value and the distance value that meet the preset conditions as a plurality of cluster center points;

第三确定模块，用于将所述多个聚类中心点对应的归档数据作为封面图像。The third determining module is configured to use the archived data corresponding to the plurality of cluster center points as the cover image.

第三方面，本发明实施例提供了一种电子设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述的档案合并方法的步骤。In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program Implement the steps of the file merging method as described above.

第四方面，本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，所述计算机程序被处理器执行时实现如上述的档案合并方法的步骤。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the above method for merging files.

本发明的上述方案至少包括以下有益效果：The above-mentioned scheme of the present invention at least includes the following beneficial effects:

本发明提供的档案合并方法，首先对归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合，归档数据集为由同一人员的多个归档数据组成的归档数据集；根据相似度集合，确定多个归档数据之间的密度值集合；根据密度值集合及相似度集合计算多个归档数据的距离值集合；将满足预设条件的密度值和距离值所对应的多个归档数据确定为多个聚类中心点；将多个聚类中心点对应的归档数据作为封面图像。由此避免了生成的封面图像随机性较大，提升了封面图像的精准度和代表性。The archive merging method provided by the present invention firstly performs similarity calculation on any two archived data in the archived data set to obtain a similarity set, and the archived data set is an archived data set composed of multiple archived data of the same person; Set, determine the set of density values between multiple archived data; calculate the set of distance values of multiple archived data according to the set of density values and the set of similarity; It is determined as multiple cluster center points; the archived data corresponding to multiple cluster center points is used as the cover image. This avoids the generated cover image from being very random, and improves the accuracy and representativeness of the cover image.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained according to the structures shown in these drawings without creative efforts.

图1为本发明实施例提供的档案合并方法的整体流程示意图；1 is a schematic overall flow diagram of a file merging method provided by an embodiment of the present invention;

图2为本发明实施例提供的步骤S20的流程示意图；FIG. 2 is a schematic flowchart of step S20 provided by an embodiment of the present invention;

图3为本发明实施例提供的步骤S23的具体流程示意图；FIG. 3 is a schematic flowchart of a specific flow of step S23 provided by an embodiment of the present invention;

图4为本发明实施例提供的步骤S30的具体流程示意图；FIG. 4 is a schematic flowchart of a specific flow of step S30 provided by an embodiment of the present invention;

图5为本发明实施例提供的步骤S32的另一流程示意图；FIG. 5 is another schematic flowchart of step S32 provided by an embodiment of the present invention;

图6为本发明实施例提供的档案合并方法的另一流程示意图；6 is another schematic flowchart of a file merging method provided by an embodiment of the present invention;

图7为本发明实施例提供的步骤32的另一流程示意图；FIG. 7 is another schematic flowchart of step 32 provided by an embodiment of the present invention;

图8为本发明实施例提供的档案合并装置的结构框图；8 is a structural block diagram of a file merging apparatus provided by an embodiment of the present invention;

图9为本发明实施例提供的电子设备的结构框图。FIG. 9 is a structural block diagram of an electronic device provided by an embodiment of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象，而非用于描述特定顺序。此外，术语“包括”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second" and "third" in the description and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the term "comprising" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

首先结合相关附图来举例介绍下本申请实施例的方案。First, the solutions of the embodiments of the present application are introduced by way of example with reference to the relevant drawings.

如图1所示，本发明的具体实施例提供了一种档案合并方法，包括：As shown in FIG. 1, a specific embodiment of the present invention provides a file merging method, including:

S10、对归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合；归档数据集为由同一人员的多个归档数据组成的归档数据集。S10. Perform similarity calculation on any two archived data in the archived data set to obtain a similarity set; the archived data set is an archived data set composed of multiple archived data of the same person.

在本实施例中，数据库中存储有档案数据集合，档案数据集合包括归档数据集，归档数据集可以包括多张图像档案，当然，档案数据也可以是其他属性的档案数据；档案数据可以是在公共场所采集得到的图像档案，例如城市街道、高铁火车站、汽车站、机场等场所采集的，数据库可以是警务平台的数据库，通过对各类人群的图像档案进行采集并聚类分析，进而可以对各类人群的管理更为便捷。In this embodiment, an archive data set is stored in the database, the archive data set includes an archive data set, and the archive data set may include multiple image archives. Of course, the archive data may also be archive data of other attributes; the archive data may be in Image files collected in public places, such as city streets, high-speed railway stations, bus stations, airports and other places, the database can be the database of the police platform. It can be more convenient to manage various groups of people.

并且，在对归档数据集进行相似度计算时，可以采用余弦定理计算两两归档数据之间的相似度，余弦定理也称为余弦相似度，表示向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量，在余弦值越接近1，就表明夹角越接近0度，也就是两个向量越相似；也就是说，在两两归档数据之间的余弦值越接近1时，则表示两两归档数据之间的相似度越大。In addition, when the similarity calculation is performed on the archived data set, the cosine theorem can be used to calculate the similarity between the two archived data. The cosine theorem is also called the cosine similarity. A measure of the magnitude of the difference between two individuals, the closer the cosine value is to 1, the closer the included angle is to 0 degrees, that is, the more similar the two vectors are; that is, the closer the cosine value between the two archived data is. When it is close to 1, it means that the similarity between the two archived data is greater.

其中，档案数据集合可以是aidN{aid1，aid2，aid3...aidN}，每个档案数据集合aidN中的归档数据集合可以是a_n{a₁，a₂，a₃，..a_n}，通过对档案数据集合aidN中的归档数据集合a_n进行n×n计算出两两归档数据之间的相似度，因此，计算的相似度集合可以是sim_ij{sim₁₂,sim₁₃....sim_ij}，i，j表示数据a_i，a_j；举例来说，归档数据集合中包括图片a、图片b、图片c、图片d，在通过余弦定理计算图片a、图片b、图片c、图片d的相似度后，得到的相似度集合为sim_ab，sim_ac，sim_ad，sim_bc，sim_bd，sim_cd；由此，可以确定出归档数据集中相似度最大的图片。The archive data set may be aidN{aid1, aid2, aid3...aidN}, and the archive data set in each archive data set aidN may _be an {a ₁ , a ₂ , a ₃ , .. a _n } , the similarity between the two archived data is calculated by performing n× _n on the archived data set an in the archived data set aidN. Therefore, the calculated similarity set can be sim _ij {sim ₁₂ ,sim ₁₃ ... .sim _ij }, i, j represent data a _i , a _j ; for example, the archived data set includes picture a, picture b, picture c, and picture d. , after the similarity of picture d, the obtained similarity set is sim _ab , sim _ac , sim _ad , sim _bc , sim _bd , and sim _cd ; thus, the picture with the largest similarity in the archived data set can be determined.

S20、根据相似度集合，确定多个归档数据之间的密度值集合。S20. Determine, according to the similarity set, a set of density values between multiple archived data.

在本实施例中，密度值表示为归档数据集的相似度密度，密度值集合中包含多个密度值；每个归档数据可以代表对应有多张图像数据，每张图像数据可以作为一个数据点，通过计算出每个簇对应的各个数据点之间的密度值，由此可以确定出密度值集合；一般来说，各个归档数据之间的相似度比较小，但每个归档数据中的各个数据点之间的相似度比较大。In this embodiment, the density value is expressed as the similarity density of the archived data set, and the density value set contains multiple density values; each archived data may represent corresponding multiple pieces of image data, and each piece of image data may be used as a data point , by calculating the density value between each data point corresponding to each cluster, the density value set can be determined; generally, the similarity between each archived data is relatively small, but the The similarity between data points is relatively large.

举例来说，在对A的图像档案归档后，A的图像档案包含有图片a、图片b、图片c、图片d，在对B的图像档案归档后，B的图像档案包含有图片e、图片f、图片g、图片h，在聚类合并后，A的图像档案可以作为一个簇，B的图像档案可以作为一个簇，因此A对应的归档数据可以通过图片a、图片b、图片c、图片d的相似度确定出多个密度值，B对应的归档数据也可以通过图片e、图片f、图片g、图片h确定出多个密度值。For example, after archiving the image file of A, the image file of A contains image a, image b, image c, and image d, and after archiving the image file of B, the image file of B contains image e, image f. Picture g and picture h, after the clustering is merged, the image file of A can be used as a cluster, and the image file of B can be used as a cluster, so the archived data corresponding to A can be obtained through picture a, picture b, picture c, picture The similarity of d determines multiple density values, and the archived data corresponding to B can also determine multiple density values through picture e, picture f, picture g, and picture h.

如图2所示，上述根据相似度集合确定多个归档数据之间的密度值集合包括：As shown in Figure 2, the above-mentioned determination of the density value set between multiple archived data according to the similarity set includes:

S21、确定相似度集合中每个归档数据的相似度阈值；S21, determine the similarity threshold of each archived data in the similarity set;

S22、根据每个归档数据的相似度阈值对相似度集合进行筛选；S22, screening the similarity set according to the similarity threshold of each archived data;

S23、根据预定算法将筛选后的相似度集合进行计算，以得到密度值集合。S23. Calculate the filtered similarity set according to a predetermined algorithm to obtain a density value set.

在本实施例中，相似度阈值可以是预先设定的，通过每个归档数据的相似度阈值，进而可以对每个归档数据的多个相似度进行筛选，并在筛选后将每个归档数据的多个相似度进行计算，由此确定出每个归档数据的密度值集合。In this embodiment, the similarity threshold may be preset, and through the similarity threshold of each archived data, multiple similarities of each archived data may be screened, and after screening, each archived data The multiple similarities are calculated, thereby determining the density value set of each archived data.

其中，在上述计算得到的档案数据aidN的相似度集合sim_ij{sim₁₂,sim₁₃....sim_ij}中，可以从相似度集合中确定出相似度阈值，并且将相似度阈值设定为

通过相似度阈值

对多个相似度进行筛选，筛选条件可以是

也就是说，相似度小于等于

的可以滤除，而大于

的相似度可以保留，由此，可以得到筛选后的相似度集合以计算密度值集合。Among them, in the similarity set sim _ij {sim ₁₂ ,sim ₁₃ ....sim _ij } of the file data aidN obtained by the above calculation, the similarity threshold can be determined from the similarity set, and the similarity threshold is set for

pass similarity threshold

Filter multiple similarities, the filter conditions can be

That is, the similarity is less than or equal to

can be filtered out, while greater than

The similarity can be retained, and thus, the filtered similarity set can be obtained to calculate the density value set.

如图3所示，上述根据预定算法将筛选后的相似度集合进行计算，以得到密度值集合包括：As shown in Figure 3, the above-mentioned calculation of the filtered similarity set according to the predetermined algorithm to obtain the density value set includes:

S231、根据筛选后的相似度集合确定出对应的相似度；S231. Determine the corresponding similarity according to the filtered similarity set;

S232、将相似度进行求和，以得到每个归档数据对应的多个密度值；S232, the similarity is summed to obtain a plurality of density values corresponding to each archived data;

S233、将多个归档数据对应的密度值进行排序，得到密度值集合。S233. Sort the density values corresponding to the multiple archived data to obtain a density value set.

在本实施例中，通过确定的相似度进行求和，并将求和结果作为该归档数据的密度值，并在确定出每个归档数据的密度值后，将多个密度值由大到小进行排序，从而可以得到密度值集合，密度值集合可以表示为β(sim_ij)∈{β₁,β₂，...β_n}。In this embodiment, the determined similarity is summed, and the summation result is used as the density value of the archived data, and after the density value of each archived data is determined, the multiple density values are sorted from large to small. Sorting is performed to obtain a set of density values, which can be expressed as β(sim _ij )∈{β ₁ ,β ₂ ,...β _n }.

具体的，预定算法可以采用以下公式进行计算：Specifically, the predetermined algorithm can be calculated using the following formula:

其中，β_i表示为密度值，sim_ij为相似度，

为最小相似度，x表示相似度数量，因此，上述公式表示在

时，则x＝1，在小于0时，则x＝0，通过将确定的相似度进行求和，由此可以确定出每个归档数据对应的密度值集合。Among them, β _i is the density value, sim _ij is the similarity,

is the minimum similarity, x represents the number of similarity, therefore, the above formula expresses in

When it is less than 0, then x=1, and when it is less than 0, then x=0. By summing the determined similarities, the set of density values corresponding to each archived data can be determined.

举例来说，归档数据集合中包括A、B、C三个人的图像档案，A的图像档案包含有图片a、图片b、图片c、图片d，B的图像档案包含有图片e，图片f，图片g，图片h，图片i，计算后得到A的图像档案对应的相似度集合为sim_ab，sim_ac，sim_ad，sim_bc，sim_bd，sim_cd；B的图像档案对应的相似度集合sim_ef，sim_eg，sim_eh，sim_ei，sim_fg，sim_fh，sim_fi，sim_gh，sim_gi，sim_hi；在计算A的密度值时，当A的相似度求和后计算得到密度值为4时；在计算B的密度值时，当B的相似度求和后计算得到密度值为9时；通过确定出A和B对应的密度值后，将9和4由大到小进行排序，得到的密度值集合为β(sim_AB)∈{9,4}。For example, the archived data set includes the image files of three persons, A, B, and C. The image file of A includes picture a, picture b, picture c, and picture d, and the image file of B includes picture e, picture f, Picture g, picture h, picture i, after calculation, the similarity set corresponding to the image file of A is sim _ab , sim _ac , sim _ad , sim _bc , sim _bd , sim _cd ; the similarity set sim corresponding to the image file of B _ef , sim _eg , sim _eh , sim _ei , sim _fg , sim _fh , sim _fi , sim _gh , sim _gi , sim _hi ; when calculating the density value of A, when the similarity of A is summed, the calculated density value is 4; when calculating the density value of B, when the similarity of B is summed up to obtain a density value of 9; after determining the density values corresponding to A and B, sort 9 and 4 from large to small, The resulting set of density values is β(sim _AB )∈{9,4}.

S30、根据密度值集合及相似度集合计算多个归档数据的距离值集合。S30. Calculate distance value sets of multiple archived data according to the density value set and the similarity set.

如图4所示，上述步骤S30的具体实现方式包括：As shown in FIG. 4 , the specific implementation manner of the above step S30 includes:

S31、根据每个归档数据的密度值集合对归档数据集进行排序，并将排序后的归档数据集进行筛选；S31. Sort the archived data set according to the density value set of each archived data, and filter the sorted archived data set;

S32、将筛选后的归档数据集进行距离计算，得到距离值集合。S32. Perform distance calculation on the filtered archived data set to obtain a distance value set.

在本实施例中，距离值可以表示为归档数据之间的距离，即每个归档数据之间的距离值；在对归档数据集进行排序时，可以通过密度值集合进行排序，密度值集合为由大到小进行排序，因此归档数据集可以依照密度值进行依次排序，在排序完成后，可以将每个归档数据进行距离计算，通过递归比较多个归档数据的密度值，然后再确定出多个归档数据之间的距离；举例来说，在A、B、C、D四个归档数据得到密度值集合为(600，500，400，300)，因此，可以将B和A进行递归比较，C和A、B进行递归比较，D和A、B、C进行递归比较，由此以确定出多个归档数据之间的距离。In this embodiment, the distance value can be expressed as the distance between archived data, that is, the distance value between each archived data; when sorting the archived data set, it can be sorted by the density value set, and the density value set is Sort from large to small, so the archived data sets can be sorted according to the density value. After the sorting is completed, each archived data can be calculated by distance, and the density values of multiple archived data can be compared recursively, and then the most The distance between the archived data; for example, the set of density values obtained from the four archived data of A, B, C, and D is (600, 500, 400, 300), so B and A can be recursively compared, C performs recursive comparison with A and B, and D performs recursive comparison with A, B, and C, thereby determining the distances between multiple archived data.

如图5所示，上述根据每个归档数据的密度值集合对归档数据集进行排序，并将排序后的归档数据集进行筛选包括：As shown in Figure 5, the above-mentioned sorting of the archived data sets according to the density value set of each archived data, and screening of the sorted archived data sets includes:

S311、根据每个归档数据的密度值将归档数据集由大到小进行排序；S311, sorting the archived data sets from large to small according to the density value of each archived data;

S312、针对每个归档数据，确定出密度值大于归档数据自身密度值的多个参考归档数据；S312, for each archived data, determine a plurality of reference archived data whose density value is greater than the density value of the archived data itself;

S313、在多个参考归档数据中确定出与归档数据之间相似度最小的目标归档数据，以根据目标归档数据和归档数据进行距离计算。S313: Determine the target archived data with the smallest similarity with the archived data from the multiple reference archived data, so as to perform distance calculation according to the target archived data and the archived data.

其中，在对归档数据进行递归比较时，可以递归比较每个密度值大于它自身的参考归档数据；可选地，在密度值集合中的最大密度值所对应的归档数据，可以计算与其距离最大的目标归档数据之间的距离值；因此，在确定出参考归档数据后，可以判断归档数据与其对应的目标归档数据之间的相似度是否最小，如果二者之间的相似度最小时，则可以确定为目标归档数据，并通过目标归档数据和归档数据计算出二者之间的距离，进而确定出距离值，在计算出所有归档数据的距离值后，则可以确定出距离值集合；具体的，距离值可以采用以下公式进行计算：Wherein, when the archived data is recursively compared, the reference archived data whose density value is greater than itself can be recursively compared; optionally, the archived data corresponding to the maximum density value in the density value set can be calculated by calculating the maximum distance from the archived data. The distance value between the target archived data; therefore, after determining the reference archived data, it can be judged whether the similarity between the archived data and its corresponding target archived data is the smallest, if the similarity between the two is the smallest, then It can be determined as the target archived data, and the distance between the two is calculated through the target archived data and the archived data, and then the distance value is determined. After the distance value of all the archived data is calculated, the distance value set can be determined; , the distance value can be calculated using the following formula:

其中，γ_i表示为距离值，在确定出密度值集合和距离值集合后，可以将密度值集合和距离值集合通过二维向量坐标表示，例如x轴为密度值，y轴为距离值，由此，在二维向量坐标中可以判断出每个密度值和距离值的值是否比较大，进而确定出对应的聚类中心点；也就是说，归档数据对应的密度值和距离值都比较大时，该归档数据则可以确定为聚类中心点。Among them, γ _i represents the distance value. After the density value set and the distance value set are determined, the density value set and the distance value set can be represented by two-dimensional vector coordinates. For example, the x-axis is the density value, and the y-axis is the distance value. Therefore, in the two-dimensional vector coordinates, it can be judged whether the value of each density value and distance value is relatively large, and then the corresponding cluster center point can be determined; that is, the density value and distance value corresponding to the archived data are compared. When it is large, the archived data can be determined as the cluster center point.

举例来说，在A、B、C、D四个归档数据得到密度值集合为(600，500，400，300)，因此，可以将B和A进行比较，C和A、B进行比较，D和A、B、C进行比较，并且计算得到的距离值集合为(0.9，0.4，0.7，0.2)，即与A距离最大的距离值为0.9，B和A之间的距离值为0.4，C和A、B分别计算后的相似度最小的距离值为0.7，D和A、B、C分别计算后的相似度最小的距离值为0.2。For example, the set of density values obtained from the four archived data of A, B, C, and D is (600, 500, 400, 300). Therefore, B can be compared with A, C can be compared with A, B, D Compare with A, B, and C, and the calculated set of distance values is (0.9, 0.4, 0.7, 0.2), that is, the maximum distance from A is 0.9, the distance between B and A is 0.4, and C The distance with the smallest similarity calculated with A and B respectively is 0.7, and the distance with the smallest similarity calculated by D and A, B and C respectively is 0.2.

S40、将满足预设条件的密度值和距离值所对应的多个归档数据确定为多个聚类中心点。S40. Determine the multiple archived data corresponding to the density value and the distance value that satisfy the preset condition as multiple cluster center points.

在本实施例中，预设条件表示密度值和距离值的值比较大，也就是说，每个档案数据的密度值比较大，但与其他档案数据之间的距离值又比较大，由此，将对应的归档数据确定为聚类中心点，在将归档数据重新聚类归档时，则可以更准确地通过该聚类中心点进行归档以提高精准度。In this embodiment, the preset condition indicates that the density value and the distance value are relatively large, that is to say, the density value of each file data is relatively large, but the distance value between it and other file data is relatively large, so , and the corresponding archived data is determined as the cluster center point. When the archived data is re-clustered and archived, the cluster center point can be used for archiving more accurately to improve the accuracy.

S50、将所述多个聚类中心点对应的归档数据作为封面图像。S50. Use the archived data corresponding to the plurality of cluster center points as a cover image.

在本实施例中，封面图像可以是人脸图像，每个归档数据中可以包含有多张的图像档案，在通过上述计算得到的聚类中心进行聚类合并后，每个人的图像档案可以合并形成一个聚类簇或多个聚类簇，由于每个聚类中心点对应的图像档案密度大且与其他聚类中心点的距离更远，因此，可以将其作为具有代表性的封面图像，在后续进行归档时，通过该封面图像可以更准确地将不同的图像档案进行聚类合并以提升精准度。In this embodiment, the cover image may be a face image, and each archived data may contain multiple image files. After clustering and merging the cluster centers obtained by the above calculation, the image files of each person can be merged. Form a cluster or multiple clusters. Since the image files corresponding to each cluster center point are denser and farther away from other cluster center points, it can be used as a representative cover image. During subsequent filing, the cover image can be used to more accurately cluster and merge different image files to improve accuracy.

如图6所示，本发明实施例提供的档案合并方法还包括：As shown in FIG. 6 , the file merging method provided by the embodiment of the present invention further includes:

30、根据多个聚类中心点，将归档数据集重新进行合并，得到聚类簇集合；其中，每个聚类簇包括对应的封面图像；30. Re-merge the archived data sets according to multiple cluster center points to obtain a cluster cluster set; wherein, each cluster cluster includes a corresponding cover image;

31、根据聚类簇集合确定每个聚类簇的封面图像；31. Determine the cover image of each cluster according to the cluster set;

32、在多个聚类簇与封面图像满足预设关系时，将多个聚类簇进行合并。32. When the multiple clusters and the cover image satisfy the preset relationship, merge the multiple clusters.

在本实施例中，预设关系表示聚类簇的数量大于封面图像的数量，每个聚类簇可以表示为某一个人的图像档案，一个人的图像档案也可以形成多个聚类簇，在多个聚类簇中可以确定出多张的封面图像，为了减少同一档案对应的聚类簇的数量，则可以将多个聚类簇进行合并，并在合并后重新确定出对应的封面图像；例如，A的图像档案形成有10个聚类簇，在A的图像档案中只有5张封面图像，通过将10个聚类簇进行合并形成为5个，使得A的图像档案形成的聚类簇数量更少。In this embodiment, the preset relationship indicates that the number of clusters is greater than the number of cover images, each cluster may be represented as an image file of a certain person, and an image file of one person may also form multiple clusters, Multiple cover images can be determined in multiple clusters. In order to reduce the number of clusters corresponding to the same file, multiple clusters can be merged, and the corresponding cover images can be re-determined after merging ; For example, the image file of A has 10 clusters, and there are only 5 cover images in the image file of A. By merging the 10 clusters to form 5 clusters, the image file of A forms a cluster Fewer clusters.

如图7所示，上述在聚类簇集合与封面图像满足预设关系时，将多个聚类簇进行合并以确定出对应的封面图像包括：As shown in FIG. 7 , when the cluster set and the cover image meet the preset relationship, the multiple clusters are merged to determine the corresponding cover image, including:

321、判断多个聚类簇的数量是否大于封面图像的数量；321. Determine whether the number of multiple clusters is greater than the number of cover images;

322、当多个聚类簇的数量大于封面图像的数量时，确定两两聚类簇的聚类中心点之间的相似度；322. When the number of multiple clusters is greater than the number of cover images, determine the similarity between the cluster center points of the two clusters;

323、在两两聚类中心点的相似度大于预设相似度的情况下，将对应的两两聚类簇进行合并。323. In the case that the similarity between the center points of the pairwise clusters is greater than the preset similarity, merge the corresponding pairwise clusters.

在本实施例中，预设相似度可以是用户预先设定的相似度，在将多个聚类簇进行合并时，可以通过判断两两聚类簇之间的相似度，在相似度大于预设相似度时，则可以将两两聚类簇进行合并；可以理解的是，通过上述确定的聚类中心点，在判断两两聚类簇之间的相似度是否大于预设相似度时，可以通过两两聚类簇对应的聚类中心点进行相似度计算，其中，预设相似度可以设定为o_sim，两两聚类中心点的相似度可以是sim_ij，因此，在确定两两聚类簇之间相似度是否大于预设相似度时，通过sim_ij-o_sim＞0进行计算，从而确定出两两聚类簇之间的相似度是否大于预设相似度，以确定是否进行聚类簇合并，可以理解的是，在聚类簇的数量小于封面图像的数量时，则聚类簇之间可以不执行合并操作。In this embodiment, the preset similarity may be the similarity preset by the user. When merging multiple clusters, the similarity between two clusters may be judged, and if the similarity is greater than the predetermined similarity When the similarity is set, the two clusters can be merged; it can be understood that when judging whether the similarity between the two clusters is greater than the preset similarity through the above-determined cluster center points, the The similarity calculation can be performed by the cluster center points corresponding to the pairwise clusters, wherein the preset similarity can be set to o _sim , and the similarity of the pairwise cluster center points can be sim _ij . When the similarity between the two clusters is greater than the preset similarity, it is calculated by sim _ij -o _sim > 0 to determine whether the similarity between the two clusters is greater than the preset similarity to determine whether When performing cluster merging, it can be understood that when the number of clusters is less than the number of cover images, the merging operation may not be performed between the clusters.

举例来说，甲的图像档案中包括聚类簇A、聚类簇B、聚类簇C、聚类簇D，并且甲对应的封面图像包括人脸图片e，人脸图片f；乙的图像档案中包括聚类簇G、聚类簇H、聚类簇I，并且乙对应的封面图像包括人脸图片1，人脸图片2，人脸图片3，人脸图片4；因此，甲对应的聚类簇数量大于其封面图像的数量，可以将聚类簇A、聚类簇B、聚类簇C、聚类簇D进行合并，通过计算出每个聚类簇的聚类中心点之间的两两相似度，可以得到6个相似度，将6个相似度和预设相似度进行计算，从而可以确定出可以合并的聚类簇，使得合并后的聚类簇代表性更强；由于乙对应的聚类簇数量小于其封面图像的数量，因此，乙的聚类簇可以不进行合并，在后续形成有更多聚类簇时，则可以重新进行聚类簇合并。For example, A's image file includes cluster A, cluster B, cluster C, and cluster D, and the cover image corresponding to A includes face image e and face image f; B's image The file includes cluster G, cluster H, and cluster I, and the cover image corresponding to B includes face picture 1, face picture 2, face picture 3, and face picture 4; therefore, the corresponding cover image of A If the number of clusters is greater than the number of cover images, cluster A, cluster B, cluster C, and cluster D can be merged, and the distance between the cluster center points of each cluster can be calculated by calculating The pairwise similarities of the The number of clusters corresponding to B is less than the number of its cover images. Therefore, the clusters of B may not be merged, and when more clusters are formed subsequently, the clusters can be merged again.

如图8所示，本发明实施例提供了一种档案合并装置10，包括：As shown in FIG. 8, an embodiment of the present invention provides a file merging apparatus 10, including:

第一计算模块11，用于对归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合，归档数据集为由同一人员的多个归档数据组成的归档数据集；The first calculation module 11 is used to perform similarity calculation on any two archived data in the archived data set to obtain a similarity set, and the archived data set is an archived data set composed of multiple archived data of the same person;

第一确定模块12，用于根据相似度集合确定多个归档数据之间的密度值集合；a first determining module 12, configured to determine a set of density values between a plurality of archived data according to the set of similarity;

第二计算模块13，用于根据密度值集合及相似度集合计算多个归档数据的距离值集合；The second calculation module 13 is used to calculate the distance value sets of the multiple archived data according to the density value set and the similarity set;

第二确定模块14，用于将满足预设条件的密度值和距离值所对应的多个归档数据确定为多个聚类中心点；The second determination module 14 is configured to determine a plurality of archived data corresponding to the density value and the distance value satisfying the preset condition as a plurality of cluster center points;

第三确定模块15，用于将多个聚类中心点对应的归档数据作为封面图像。The third determining module 15 is configured to use the archived data corresponding to the plurality of cluster center points as the cover image.

本发明提供的档案合并装置10，首先对归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合，归档数据集为由同一人员的多个归档数据组成的归档数据集；根据相似度集合，确定多个归档数据之间的密度值集合；根据密度值集合及相似度集合计算多个归档数据的距离值集合；将满足预设条件的密度值和距离值所对应的多个归档数据确定为多个聚类中心点；将多个聚类中心点对应的归档数据作为封面图像。由此避免了生成的封面图像随机性较大，提升了封面图像的精准度和代表性。The archive merging device 10 provided by the present invention firstly performs similarity calculation on any two archived data in the archived data set to obtain a similarity set, and the archived data set is an archived data set composed of multiple archived data of the same person; degree set, determine the density value set between multiple archived data; calculate the distance value set of multiple archived data according to the density value set and similarity set; The data is determined as multiple cluster center points; the archived data corresponding to the multiple cluster center points is used as the cover image. This avoids the generated cover image from being very random, and improves the accuracy and representativeness of the cover image.

需要说明的是，本发明具体实施例提供的档案合并装置10为与上述档案合并方法对应的装置，上述档案合并方法的所有实施例均适用于该档案合并装置10，上述档案合并装置10实施例中均有相应的模块对应上述档案合并方法中的步骤，能达到相同或相似的有益效果，为避免过多重复，在此不对档案合并装置2中的每一模块进行过多赘述。It should be noted that the file merging apparatus 10 provided by the specific embodiment of the present invention is an apparatus corresponding to the above-mentioned file merging method, and all the above-mentioned file merging methods are applicable to the file merging apparatus 10. The above-mentioned file merging apparatus 10 embodiments There are corresponding modules corresponding to the steps in the above-mentioned file merging method, which can achieve the same or similar beneficial effects. To avoid excessive repetition, each module in the file merging apparatus 2 is not described here.

如图9所示，本发明的具体实施例还提供了一种电子设备20，包括存储器202、处理器201以及存储在存储器202中并可在处理器201上运行的计算机程序，该处理器201执行计算机程序时实现上述的档案合并方法的步骤。As shown in FIG. 9 , a specific embodiment of the present invention further provides an electronic device 20 , including a memory 202 , a processor 201 and a computer program stored in the memory 202 and running on the processor 201 , the processor 201 The steps of the above-mentioned file merging method are realized when the computer program is executed.

具体的，处理器201用于调用存储器202存储的计算机程序，执行如下步骤：Specifically, the processor 201 is configured to call the computer program stored in the memory 202, and perform the following steps:

对归档数据集中的任意两个归档数据进行相似度计算，得到相似度集合，归档数据集为由同一人员的多个归档数据组成的归档数据集；Calculate the similarity of any two archived data in the archived data set to obtain a similarity set, and the archived data set is an archived data set composed of multiple archived data of the same person;

根据相似度集合，确定多个归档数据之间的密度值集合；According to the similarity set, determine the density value set between multiple archived data;

根据密度值集合及相似度集合计算多个归档数据的距离值集合；Calculate the distance value set of multiple archived data according to the density value set and the similarity set;

将多个聚类中心点对应的归档数据作为封面图像。The archived data corresponding to multiple cluster center points are used as the cover image.

可选的，处理器201执行的Optionally, the processor 201 executes

根据相似度集合确定多个归档数据之间的密度值集合包括：Determining the set of density values between multiple archived data according to the set of similarity includes:

确定相似度集合中每个归档数据的相似度阈值；Determine the similarity threshold for each archived data in the similarity set;

根据每个归档数据的相似度阈值对相似度集合进行筛选；Filter the similarity set according to the similarity threshold of each archived data;

根据预定算法将筛选后的相似度集合进行计算，以得到密度值集合。The filtered similarity set is calculated according to a predetermined algorithm to obtain a density value set.

可选的，处理器201执行的根据预定算法将筛选后的相似度集合进行计算，以得到密度值集合包括：Optionally, the calculation performed by the processor 201 to calculate the filtered similarity set according to a predetermined algorithm to obtain a density value set includes:

根据筛选后的相似度集合确定出对应的相似度；Determine the corresponding similarity according to the filtered similarity set;

将相似度进行求和，以得到每个归档数据对应的多个密度值；Sum the similarity to get multiple density values corresponding to each archived data;

将多个归档数据对应的密度值进行排序，得到密度值集合。Sort the density values corresponding to multiple archived data to obtain a density value set.

可选的，处理器201执行的根据密度值集合及相似度集合计算多个归档数据的距离值集合包括：Optionally, calculating the distance value set of multiple archived data according to the density value set and the similarity set executed by the processor 201 includes:

根据每个归档数据的密度值集合对归档数据集进行排序，并将排序后的归档数据集进行筛选；Sort the archived datasets according to the density value set of each archived data, and filter the sorted archived datasets;

将筛选后的归档数据集进行距离计算，得到距离值集合。Perform distance calculation on the filtered archive data set to obtain a set of distance values.

可选的，处理器201执行的根据每个归档数据的密度值集合对归档数据集进行排序，并将排序后的归档数据集进行筛选包括：Optionally, the processing performed by the processor 201 to sort the archived data set according to the density value set of each archived data, and to filter the sorted archived data set includes:

根据每个归档数据的密度值将归档数据集由大到小进行排序；Sort archived datasets from largest to smallest according to the density value of each archived data;

针对每个归档数据，确定出密度值大于归档数据自身密度值的多个参考归档数据；For each archived data, determine a plurality of reference archived data whose density value is greater than the density value of the archived data itself;

在多个参考归档数据中确定出与归档数据之间相似度最小的目标归档数据，以根据目标归档数据和归档数据进行距离计算。The target archived data with the smallest similarity with the archived data is determined among the plurality of reference archived data, so as to perform distance calculation according to the target archived data and the archived data.

可选的，处理器201执行的方法还包括：Optionally, the method performed by the processor 201 further includes:

根据多个聚类中心点，将归档数据集重新进行合并，得到聚类簇集合；其中，每个聚类簇包括对应的封面图像；According to a plurality of cluster center points, the archived data sets are re-merged to obtain a cluster cluster set; wherein each cluster cluster includes a corresponding cover image;

根据聚类簇集合确定每个聚类簇的封面图像；Determine the cover image of each cluster according to the cluster set;

在多个聚类簇与封面图像满足预设关系时，将多个聚类簇进行合并。When the multiple clusters and the cover image satisfy the preset relationship, the multiple clusters are merged.

可选的，处理器201执行的在聚类簇集合与封面图像满足预设关系时，将多个聚类簇进行合并包括：Optionally, when the cluster set and the cover image meet a preset relationship, the execution of the processor 201 to merge multiple clusters includes:

判断多个聚类簇的数量是否大于封面图像的数量；Determine whether the number of multiple clusters is greater than the number of cover images;

当多个聚类簇的数量大于封面图像的数量时，确定两两聚类簇的聚类中心点之间的相似度；When the number of multiple clusters is greater than the number of cover images, determine the similarity between the cluster center points of the two clusters;

在两两聚类中心点的相似度大于预设相似度的情况下，将对应的两两聚类簇进行合并。When the similarity of the center points of the pairwise clusters is greater than the preset similarity, the corresponding pairwise clusters are merged.

即，在本发明的具体实施例中，电子设备20的处理器201执行计算机程序时实现上述档案合并方法的步骤，由此避免了生成的封面图像随机性较大，提升了封面图像的精准度和代表性。That is, in the specific embodiment of the present invention, when the processor 201 of the electronic device 20 executes the computer program, it implements the steps of the above-mentioned file merging method, thereby avoiding the generated cover image with high randomness and improving the accuracy of the cover image. and representation.

需要说明的是，由于电子设备20的处理器201执行计算机程序时实现上述档案合并方法的步骤，因此上述档案合并方法的所有实施例均适用于该电子设备20，且均能达到相同或相似的有益效果。It should be noted that, since the processor 201 of the electronic device 20 executes the computer program to implement the steps of the above-mentioned file merging method, all embodiments of the above-mentioned file merging method are applicable to the electronic device 20, and can achieve the same or similar beneficial effect.

本发明实施例中提供的计算机可读存储介质，计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现本发明实施例提供的档案合并方法或应用端档案合并方法的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。In the computer-readable storage medium provided in the embodiment of the present invention, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, each of the file merging method or the application-side file merging method provided by the embodiment of the present invention is implemented. process, and can achieve the same technical effect, in order to avoid repetition, it will not be repeated here.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存取存储器(Random AccessMemory，简称RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM for short).

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

以上所述仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是在本发明的构思下，利用本发明说明书及附图内容所作的等效结构变换，或直接/间接运用在其他相关的技术领域均包括在本发明的专利保护范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Under the conception of the present invention, the equivalent structural transformations made by the contents of the description and accompanying drawings of the present invention, or directly/indirectly applied in Other related technical fields are included within the scope of patent protection of the present invention.

Claims

1. A file merging method is characterized by comprising the following steps:

carrying out similarity calculation on any two archived data in the archived data sets to obtain a similarity set, wherein the archived data set is an archived data set consisting of a plurality of archived data of the same person;

determining a density value set among the plurality of archived data according to the similarity set;

calculating a plurality of sets of distance values of the archived data according to the sets of density values and the sets of similarity;

determining a plurality of archived data corresponding to the density values and the distance values meeting preset conditions as a plurality of clustering center points;

and taking the filing data corresponding to the plurality of cluster center points as cover images.

2. The archive merging method according to claim 1, wherein the determining a set of density values among the plurality of archived data according to the set of similarities comprises:

determining a similarity threshold value of each archived data in the similarity set;

screening the similarity set according to the similarity threshold value of each archived data;

and calculating the screened similarity set according to a preset algorithm to obtain the density value set.

3. The archive merging method according to claim 2, wherein the calculating the filtered similarity sets according to a predetermined algorithm to obtain the density value sets comprises:

determining corresponding similarity according to the screened similarity set;

summing the similarity to obtain a plurality of density values corresponding to each archived data;

and sorting the density values corresponding to the plurality of archived data to obtain the density value set.

4. The archive merging method of claim 1, wherein said calculating a set of distance values for a plurality of the archived data from the set of density values and the set of similarities comprises:

sorting the archived data sets according to the density value set of each archived data, and screening the sorted archived data sets;

and performing distance calculation on the screened archival data set to obtain the distance value set.

5. The archive merging method of claim 4, wherein the sorting the archive data sets according to the density value set of each archive data and the screening the sorted archive data sets comprises:

sorting the archived data sets from large to small according to the density value of each archived data;

determining a plurality of reference filing data of which the density values are greater than the density values of the filing data for each filing data;

and determining target archival data with the minimum similarity with the archival data from the plurality of reference archival data, and performing distance calculation according to the target archival data and the archival data.

6. The archive merging method according to claim 1, further comprising:

merging the archived data sets according to the plurality of clustering central points to obtain a clustering set; wherein each cluster comprises a corresponding cover image;

determining a cover image of each cluster according to the cluster set;

and when the plurality of clustering clusters and the cover image meet a preset relation, merging the clustering clusters.

7. The archive merging method according to claim 6, wherein the merging the plurality of clusters when the cluster set and the cover image satisfy a preset relationship comprises:

judging whether the number of the plurality of clustering clusters is larger than that of the cover images or not;

when the number of the plurality of clustering clusters is larger than that of the cover images, determining the similarity between the clustering center points of every two clustering clusters;

and combining the corresponding pairwise clustering clusters under the condition that the similarity of the pairwise clustering central points is greater than the preset similarity.

8. An archive merging apparatus, comprising:

the first calculation module is used for carrying out similarity calculation on any two archival data in the archival data sets to obtain a similarity set, wherein the archival data set is an archival data set consisting of a plurality of archival data of the same person;

a first determining module, configured to determine a density value set among the plurality of archived data according to the similarity set;

the second calculation module is used for calculating a plurality of distance value sets of the archived data according to the density value set and the similarity set;

the second determining module is used for determining a plurality of archived data corresponding to the density values and the distance values meeting the preset conditions as a plurality of clustering center points;

and the third determining module is used for taking the filing data corresponding to the plurality of clustering center points as a cover image.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the archive merging method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the archive merging method according to any one of claims 1 to 7.