CN110210603A

CN110210603A - Counter model construction method, method of counting and the device of crowd

Info

Publication number: CN110210603A
Application number: CN201910497050.8A
Authority: CN
Inventors: 彭玉旭; 彭贤; 黄园媛; 罗元盛
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2019-06-10
Filing date: 2019-06-10
Publication date: 2019-09-06

Abstract

The present invention relates to a crowd counting model construction method, counting method and device. Firstly, the crowd images in the pre-stored data set are marked with heads to obtain a crowd density map, and the crowd images and the crowd density map corresponding to the crowd images are combined. Combining to obtain the target data set; then performing data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set; finally using the training crowd image and the training crowd density map in the target training set, The multi-scale feature aggregation convolutional neural network model after parameter initialization is trained to obtain the counting model of the crowd. By adopting the technical solution of the present invention, when the model is trained and applied, it can not only solve the problem of the scale change of the characters in the image, but also reduce the amount of computation, improve the recognition efficiency, and do not need to use perspective images in the training process, which improves the accuracy of the model. applicability.

Description

Crowd counting model construction method, counting method and device

技术领域technical field

本发明涉及计算机视觉领域，具体涉及一种人群的计数模型构建方法、计数方法和装置。The invention relates to the field of computer vision, in particular to a method for constructing a crowd counting model, a counting method and a device.

背景技术Background technique

人群计数是从图像或者视频帧中计算出其中的人头数目，人群密度是人群在一定时间一定空间内的分布情况。准确地估计人群数目和人群密度是衡量一个安防系统好坏的重要指标之一。准确地估计人群数目和人群密度在公共安全和交通管制上有着极其重要的作用，同时通过分析大型商场的人群密度分布，可以获得顾客的购买喜好与倾向并发掘潜在的商业价值。Crowd counting is to calculate the number of heads from images or video frames, and crowd density is the distribution of crowds in a certain time and space. Accurately estimating the number of crowds and crowd density is one of the important indicators to measure the quality of a security system. Accurately estimating the number and density of crowds plays an extremely important role in public safety and traffic control. At the same time, by analyzing the distribution of crowd density in large shopping malls, customers' purchasing preferences and tendencies can be obtained and potential commercial value can be explored.

在对人群图像处理过程中，图像中人物的尺度变化问题传统的基于单列卷积神经网络的算法难以解决，通常采用多列网络架构处理，但是多列网络比单列网络的运算量大，任务训练繁重，识别效率较低。并且现有的模型训练过程中需要在训练场景和测试场景上使用透视图，但是在实际应用中透视图难以获得，使得模型的适用性较差。In the process of image processing of crowds, the scale change of the characters in the image is difficult to solve by the traditional algorithm based on single-column convolutional neural network, and multi-column network architecture is usually used for processing, but multi-column network is more computationally intensive than single-column network. Heavy and low recognition efficiency. And the existing model training process needs to use the perspective view on the training scene and the test scene, but it is difficult to obtain the perspective view in the actual application, which makes the applicability of the model poor.

因此，如何降低运算量，提高识别效率和模型的适用性是本领域技术人员亟待解决的技术问题。Therefore, how to reduce the amount of computation, improve the recognition efficiency and the applicability of the model is a technical problem to be solved urgently by those skilled in the art.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提供一种人群的计数模型构建方法、计数方法和装置，以解决现有技术中运算量大、识别效率较低、模型的适用性较差的问题。In view of this, the object of the present invention is to provide a crowd counting model building method, counting method and device to solve the problems of large amount of computation, low recognition efficiency and poor model applicability in the prior art.

为实现以上目的，本发明采用如下技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种人群的计数模型构建方法，包括：A method for constructing a population counting model, comprising:

对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图；Perform head annotation processing on the crowd images in the pre-stored data set to obtain a crowd density map;

将所述人群图像和与所述人群图像对应的人群密度图进行组合，得到目标数据集；combining the crowd image with a crowd density map corresponding to the crowd image to obtain a target data set;

对所述目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；performing data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set;

利用所述目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。Using the images of the training crowd and the density map of the training crowd in the target training set, the multi-scale feature aggregation convolutional neural network model after parameter initialization is trained to obtain the counting model of the crowd.

进一步地，上述所述的方法中，所述对所述目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集，包括：Further, in the method described above, performing data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set includes:

通过预设裁剪方式，分别对所述训练集中的每个所述原始训练人群图像及所述原始训练人群图像对应的所述原始训练人群密度图进行裁剪，得到至少两个裁剪图像和至少两个裁剪密度图；Each of the original training crowd images in the training set and the original training crowd density map corresponding to the original training crowd images are respectively cropped by a preset cropping method to obtain at least two cropped images and at least two crop density map;

将所述裁剪图像进行翻转处理，得到翻转图像，以及，将所述裁剪密度图进行翻转处理，得到翻转密度图；performing flipping processing on the cropped image to obtain a flipped image, and flipping the cropped density map to obtain a flipped density map;

按照所述原始训练人群图像的第二预设比例，将所述裁剪图像和所述翻转图像扩大，得到扩大图像，以及，将所述裁剪密度图和所述翻转密度图扩大，得到扩大密度图；According to the second preset ratio of the original training crowd image, the cropped image and the flipped image are enlarged to obtain an enlarged image, and the cropped density map and the flipped density map are enlarged to obtain an enlarged density map ;

将所述扩大图像和所述扩大密度图存储到所述训练集中，得到目标训练集；storing the enlarged image and the enlarged density map in the training set to obtain a target training set;

其中，所述训练人群图像包括所述原始训练人群图像和所述扩大图像；所述训练人群密度图包括所述原始训练人群密度图和所述扩大密度图。Wherein, the training crowd image includes the original training crowd image and the enlarged image; the training crowd density map includes the original training crowd density map and the enlarged density map.

进一步地，上述所述的方法中，所述利用所述目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型，包括：Further, in the above-mentioned method, the multi-scale feature aggregation convolutional neural network model after parameter initialization is trained by using the training crowd image and the training crowd density map in the target training set to obtain the crowd's Counting models, including:

将所述目标训练集中的所述训练人群图像和所述训练人群密度图输入到经参数初始化处理后的所述多尺度特征聚合卷积神经网络模型中进行第一级别的训练，得到第一训练模型；Input the training crowd image and the training crowd density map in the target training set into the multi-scale feature aggregation convolutional neural network model after parameter initialization to perform first-level training to obtain the first training Model;

利用欧氏距离损失公式对所述第一训练模型的模型参数进行第一级别的优化，得到第一优化模型；Using the Euclidean distance loss formula to optimize the model parameters of the first training model at the first level to obtain the first optimization model;

按照预设的第一迭代次数，对所述第一级别的训练和所述第一级别的优化的过程进行迭代执行，得到第一目标模型；Iteratively executing the first-level training and the first-level optimization processes according to a preset first number of iterations to obtain a first target model;

将所述训练人群图像和所述训练人群密度图输入到所述第一目标模型中进行第二级别的训练，得到第二训练模型；Inputting the training crowd image and the training crowd density map into the first target model to perform second-level training to obtain a second training model;

利用所述欧氏距离损失公式和相对人数损失公式对所述第二训练模型的模型参数进行第二级别的优化，得到第二优化模型；Using the Euclidean distance loss formula and the relative number of people loss formula to perform second-level optimization on the model parameters of the second training model to obtain a second optimization model;

按照预设的第二迭代次数，对所述第二级别的训练和所述第二级别的优化的过程进行迭代执行，得到第二目标模型；performing iterative execution on the second-level training and the second-level optimization process according to the preset second number of iterations to obtain a second target model;

将所述第二目标模型作为所述人群的计数模型。The second target model is used as the population count model.

进一步地，上述所述的方法中，所述将所述第二目标模型作为所述人群的计数模型之前，还包括：Further, in the above-mentioned method, before using the second target model as the counting model of the crowd, it also includes:

将所述目标数据集中按照所述第一预设比例划分得到的测试集中的测试人群图像输入到所述第二目标模型，得到目标人群密度图；inputting the test crowd images in the test set obtained by dividing the target data set according to the first preset ratio into the second target model to obtain a target crowd density map;

根据所述目标人群密度图和所述测试集中的所述测试人群图像对应的测试人群密度图，确定模型的测试准确率；Determine the test accuracy of the model according to the target crowd density map and the test crowd density map corresponding to the test crowd images in the test set;

判断所述测试准确率是否大于预设准确率；Judging whether the test accuracy rate is greater than the preset accuracy rate;

对应地，所述将所述第二目标模型作为所述人群的计数模型，包括：Correspondingly, said using the second target model as the population counting model includes:

若所述测试准确率大于所述预设准确率，将所述第二目标模型作为所述人群的计数模型。If the test accuracy rate is greater than the preset accuracy rate, the second target model is used as the population count model.

进一步地，上述所述的方法中，所述欧氏距离损失公式为：Further, in the method described above, the Euclidean distance loss formula is:

其中，Θ表示参数模型；F(X_i；Θ)表示输出模型；X_i表示第i个输入的所述训练人群图像；F_i表示第i个输入的所述训练人群图像对应的所述训练人群密度图；N表示输入数据的个数；Wherein, Θ represents a parameter model; F(X _i ; Θ) represents an output model; Xi represents the training crowd image of the _{i-th input; F i} _represents the training corresponding to the training crowd image of the i-th input Crowd density map; N represents the number of input data;

所述相对人数损失公式为：The relative headcount loss formula is:

其中，F_D(X_i；Θ)表示预测的人数；D_i表示真值人数；N表示输入数据的个数；分母为D_i+1是为了防止分母为零；所述相对人数损失的训练权重为L＝1.0*L(Θ)+0.1*L_D(Θ)。Wherein, F _D (X _i ; Θ) represents the number of predictions; D _i represents the number of true value; N represents the number of input data; the denominator is D _i +1 to prevent the denominator from being zero; the training of the relative number of people loss The weight is L=1.0*L(Θ)+0.1*L _D (Θ).

本发明还提供一种人群的计数方法，包括：The present invention also provides a method for counting people, including:

获取待计数人群图像；Obtain the image of the crowd to be counted;

基于预先构建的人群的计数模型，输入所述待计数人群图像，得到待计数人群密度图；所述人群的计数模型通过上述人群的计数模型构建方法构建；Based on the pre-built counting model of the crowd, the image of the crowd to be counted is input to obtain a density map of the crowd to be counted; the counting model of the crowd is constructed by the counting model construction method of the above crowd;

利用累加计算法，累加所述待计数人群密度图中各像素点中的数值，得到像素点总数值，将所述像素点总数值作为待计数人群图像中的人数。Using an accumulative calculation method, the values in each pixel in the population density map to be counted are accumulated to obtain the total value of the pixel points, and the total value of the pixel points is used as the number of people in the image of the population to be counted.

本发明还提供一种人群的计数模型构建装置，包括：The present invention also provides a device for constructing a population counting model, including:

第一处理模块，用于对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图；The first processing module is used to perform head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map;

数据组合模块，用于将所述人群图像和与所述人群图像对应的人群密度图进行组合，得到目标数据集；A data combination module, configured to combine the crowd image and the crowd density map corresponding to the crowd image to obtain a target data set;

数据扩增模块，用于对所述目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；A data amplification module, configured to perform data amplification processing on the training set obtained by dividing the target data set according to a first preset ratio to obtain a target training set;

第二处理模块，用于利用所述目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。The second processing module is used to use the training crowd image and training crowd density map in the target training set to train the multi-scale feature aggregation convolutional neural network model after parameter initialization to obtain a crowd counting model.

进一步地，上述所述的装置中，所述第二处理模块包括：第一训练单元、第一优化单元、迭代处理单元、第二训练单元、第二优化单元和模型确定单元；Further, in the above-mentioned device, the second processing module includes: a first training unit, a first optimization unit, an iterative processing unit, a second training unit, a second optimization unit, and a model determination unit;

所述第一训练单元，用于将所述目标训练集中的所述训练人群图像和所述训练人群密度图输入到经参数初始化处理后的所述多尺度特征聚合卷积神经网络模型中进行第一级别的训练，得到第一训练模型；The first training unit is configured to input the training crowd image and the training crowd density map in the target training set into the multi-scale feature aggregation convolutional neural network model after parameter initialization processing to perform the second One level of training to obtain the first training model;

所述第一优化单元，用于利用欧氏距离损失公式对所述第一训练模型的模型参数进行第一级别的优化，得到第一优化模型；The first optimization unit is configured to use the Euclidean distance loss formula to perform first-level optimization on the model parameters of the first training model to obtain a first optimization model;

所述迭代处理单元，用于按照预设的第一迭代次数，对所述第一级别的训练和所述第一级别的优化的过程进行迭代执行，得到第一目标模型；The iterative processing unit is configured to iteratively execute the first-level training and first-level optimization processes according to a preset first number of iterations to obtain a first target model;

所述第二训练单元，用于将所述训练人群图像和所述训练人群密度图输入到所述第一目标模型中进行第二级别的训练，得到第二训练模型；The second training unit is configured to input the training crowd image and the training crowd density map into the first target model to perform second-level training to obtain a second training model;

所述第二优化单元，用于利用所述欧氏距离损失公式和相对人数损失公式对所述第二训练模型的模型参数进行第二级别的优化，得到第二优化模型；The second optimization unit is configured to use the Euclidean distance loss formula and the relative number of people loss formula to optimize the model parameters of the second training model at a second level to obtain a second optimization model;

所述迭代处理单元，还用于按照预设的第二迭代次数，对所述第二级别的训练和所述第二级别的优化的过程进行迭代执行，得到第二目标模型；The iterative processing unit is further configured to iteratively execute the process of the second-level training and the second-level optimization according to a preset second number of iterations to obtain a second target model;

模型确定单元，用于将所述第二目标模型作为所述人群的计数模型。A model determining unit, configured to use the second target model as a counting model of the population.

进一步地，上述所述的装置中，所述多尺度特征聚合卷积神经网络模型包括特征映射模块、多尺度特征聚合模块和密度图回归模块；Further, in the above-mentioned device, the multi-scale feature aggregation convolutional neural network model includes a feature mapping module, a multi-scale feature aggregation module and a density map regression module;

所述特征映射模块采用VGG-16结构的前X个卷积层，其中X为4或6；The feature mapping module adopts the first X convolutional layers of the VGG-16 structure, where X is 4 or 6;

所述卷积层为3*3的卷积核；The convolution layer is a 3*3 convolution kernel;

所述多尺度特征聚合模块采用至少两个尺度分支；The multi-scale feature aggregation module adopts at least two scale branches;

所述密度图回归模块采用A列B层的空洞卷积，其中A和B均为正整数。The density map regression module adopts atrous convolution of column A and layer B, wherein both A and B are positive integers.

本发明还提供一种人群的计数装置，包括：图像获取模块、密度图确定模块和人数统计模块；The present invention also provides a crowd counting device, including: an image acquisition module, a density map determination module, and a people counting module;

所述图像获取模块，用于获取待计数人群图像；The image acquisition module is used to acquire images of people to be counted;

所述密度图确定模块，用于基于预先构建的人群的计数模型，输入所述待计数人群图像，得到待计数人群密度图；所述人群的计数模型通过上述人群的计数模型构建方法构建；The density map determination module is used to input the image of the crowd to be counted to obtain a density map of the crowd to be counted based on the pre-built counting model of the crowd; the counting model of the crowd is constructed by the above-mentioned crowd counting model construction method;

所述人数统计模块，用于利用累加计算法，累加所述待计数人群密度图中各像素点中的数值，得到像素点总数值，将所述像素点总数值作为待计数人群图像中的人数。The population counting module is used to accumulate the values of each pixel in the population density map to be counted by using an accumulation calculation method to obtain the total value of the pixels, and use the total value of the pixels as the number of people in the image of the population to be counted .

本发明的人群的计数模型构建方法、计数方法和装置，首先对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图，并将人群图像和与人群图像对应的人群密度图进行组合，得到目标数据集，其中，本方案是对人群图像中的人头进行标注的，对身体的遮挡不影响结果，所以不需要在训练场景和测试场景上使用透视图；再对目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；最后利用目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。其中，多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构。采用本发明的技术方案，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率，并且训练过程中不需要使用透视图，提高了模型的适用性。The crowd counting model construction method, counting method and device of the present invention, first, carry out head labeling processing on the crowd images in the pre-stored data set, obtain the crowd density map, and combine the crowd image and the crowd density map corresponding to the crowd image , to get the target data set, in which, this scheme is to label the heads in the crowd images, and the occlusion of the body does not affect the result, so there is no need to use perspective images on the training scene and test scene; then the target data set according to the first The training set obtained by a preset ratio is subjected to data amplification processing to obtain the target training set; finally, the multi-scale feature aggregation convolutional neural network after parameter initialization is used to use the training crowd images and training crowd density maps in the target training set. The model is trained to obtain the counting model of the crowd. Among them, the multi-scale feature aggregation convolutional neural network model is a single-column network architecture with multi-scale feature aggregation function. By adopting the technical solution of the present invention, when the model is trained and applied, it can not only solve the problem of the scale change of the characters in the image, but also reduce the amount of computation, improve the recognition efficiency, and do not need to use perspective images in the training process, which improves the accuracy of the model. applicability.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本发明。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是本发明的人群的计数模型构建方法实施例一的流程图；Fig. 1 is the flow chart of embodiment 1 of the counting model construction method of the crowd of the present invention;

图2是多尺度特征聚合卷积神经网络模型中多尺度特征聚合模块的结构示意图；Figure 2 is a schematic structural diagram of the multi-scale feature aggregation module in the multi-scale feature aggregation convolutional neural network model;

图3是堆叠池的结构示意图；Fig. 3 is the structural representation of stacking pool;

图4是本发明的人群的计数模型构建方法实施例二的流程图；Fig. 4 is the flow chart of embodiment 2 of the counting model construction method of the crowd of the present invention;

图5是本发明的人群的计数方法实施例的流程图；Fig. 5 is a flow chart of an embodiment of the crowd counting method of the present invention;

图6是本发明的人群的计数模型构建装置实施例一的结构示意图；Fig. 6 is a structural schematic diagram of Embodiment 1 of the crowd counting model construction device of the present invention;

图7是本发明的人群的计数模型构建装置实施例二的结构示意图；Fig. 7 is a schematic structural diagram of Embodiment 2 of the crowd counting model construction device of the present invention;

图8是本发明的人群的计数装置实施例的结构示意图。Fig. 8 is a structural schematic diagram of an embodiment of the crowd counting device of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将对本发明的技术方案进行详细的描述。显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所得到的所有其它实施方式，都属于本发明所保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be described in detail below. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other implementations obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

图1是本发明的人群的计数模型构建方法实施例一的流程图。如图1所示，本实施例的人群的计数模型构建方法具体可以包括如下步骤：FIG. 1 is a flow chart of Embodiment 1 of the method for constructing a population counting model in the present invention. As shown in Figure 1, the crowd counting model construction method of the present embodiment may specifically include the following steps:

S101、对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图；S101. Perform head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map;

本实施例，首先对预存储的数据集中的人群图像进行人头标注处理，其中，数据集为预先存储的包含诸多人群图像的集合。再通过以下公式处理得到人群图像对应的人群密度图：In this embodiment, head labeling processing is first performed on crowd images in a pre-stored data set, wherein the data set is a pre-stored collection containing many crowd images. Then process the crowd density map corresponding to the crowd image by the following formula:

其中，F(x)表示人群密度图；M表示人群图像中头部标注的总数；x_i表示像素点；δ(x-x_i)表示第i个人头部位置；表示高斯核；表示第i个人的头部位置到k个最近邻的人的平均距离。Among them, F(x) represents the crowd density map; M represents the total number of head annotations in the crowd image; x _i represents the pixel point; δ(xx _i ) represents the head position of the ith person; Represents the Gaussian kernel; Indicates the average distance from the i-th person's head position to the k nearest neighbors.

S102、将人群图像和与人群图像对应的人群密度图进行组合，得到目标数据集；S102. Combine the crowd image and the crowd density map corresponding to the crowd image to obtain a target data set;

通过上述步骤，得到人群图像对应的人群密度图之后，将数据集中的所有人群图像和人群图像对应的人群密度图进行组合并存储，得到新的数据集，即目标数据集。Through the above steps, after the crowd density map corresponding to the crowd image is obtained, all the crowd images in the data set and the crowd density map corresponding to the crowd image are combined and stored to obtain a new data set, namely the target data set.

S103、对目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；S103. Perform data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set;

通过上述步骤，得到目标数据集之后，按照第一预设比例将目标数据集划分成训练集和测试集，再将训练集进行数据扩增处理，从而得到目标训练集。本实施例中，第一预设比例优选为7:3，即目标数据集划分成的训练集和测试集中数据的比例为7:3，另外，一般情况下，目标数据集划分成的训练集中的数据多于测试集中的数据。Through the above steps, after the target data set is obtained, the target data set is divided into a training set and a test set according to a first preset ratio, and then the training set is subjected to data amplification processing, thereby obtaining the target training set. In this embodiment, the first preset ratio is preferably 7:3, that is, the ratio of the data in the training set and the test set into which the target data set is divided is 7:3. In addition, in general, the training set into which the target data set is divided has more data than the test set.

S104、利用目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。S104. Using the images of the training crowd and the density map of the training crowd in the target training set, train the multi-scale feature aggregation convolutional neural network model after parameter initialization to obtain a counting model of the crowd.

通过上述步骤，得到目标训练集后，利用目标训练集中的训练人群图像和该训练人群图像对应的训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练。其中，参数初始化处理是对模型的参数进行初始化设置，本实施例采用tensorflow实现模型代码，对模型参数初始化设置中，初始学习率优选为10^-4，衰减系数优选为0.9，衰减速度优选为20。对多尺度特征聚合卷积神经网络模型训练后，便能得到人群的计数模型。Through the above steps, after obtaining the target training set, use the training crowd image in the target training set and the training crowd density map corresponding to the training crowd image to train the multi-scale feature aggregation convolutional neural network model after parameter initialization. Among them, the parameter initialization process is to initialize and set the parameters of the model. In this embodiment, tensorflow is used to implement the model code. In the initial setting of the model parameters, the initial learning rate is preferably 10 ^-4 , the attenuation coefficient is preferably 0.9, and the attenuation speed is preferably 20 . After the multi-scale feature aggregation convolutional neural network model is trained, the counting model of the crowd can be obtained.

经参数初始化处理后的多尺度特征聚合卷积神经网络模型包括前端、中端和后端三个部分，其中前端为特征映射模块，中端为多尺度特征聚合模块FME，后端为密度图回归模块。After parameter initialization, the multi-scale feature aggregation convolutional neural network model includes three parts: the front end, the middle end and the back end. The front end is the feature mapping module, the middle end is the multi-scale feature aggregation module FME, and the back end is the density map regression module.

本实施例中，特征映射模块采用VGG-16结构的前4个卷积层或者VGG-16结构的前6个卷积层，其中使用堆叠池代替最大池化层，在不引入额外参数的前提下，能够增加模型的尺度不变性。本模块中的卷积层采用3*3的卷积核堆叠，能够加深网络深度，增加模型的非线性性。多个3*3的卷积层比一个大尺寸的过滤器filte有更少的参数，假设卷积层的输入和输出的特征图大小相同为C，那么3个3*3的卷积层参数个数为27C²，一个7*7的卷积层参数为49C²，所以可以把3个3*3卷积核看成是一个7*7的卷积核的分解(中间层有非线性的分解，并且起到隐式正则化的作用)，这种的替换可以使得模型的参数更少。并且网络单层神经元个数足够多，生成的特征抽象层度足够高，提高了网络的精确度与运行速度，同时拥有更少的网络参数。本实施例的多尺度特征聚合卷积神经网络模型中特征映射模块的卷积层是3*3的卷积核，通过其对图像特征进行重映射。为了保证模型的非线性，每个卷积层之后都跟着一个非线性激活函数层，本实施例使用的是修正线性单元(Rectified linearunits，ReLU)，ReLU可以加速网络的收敛。In this embodiment, the feature mapping module uses the first 4 convolutional layers of the VGG-16 structure or the first 6 convolutional layers of the VGG-16 structure, where the stacked pool is used instead of the maximum pooling layer, without introducing additional parameters , which can increase the scale invariance of the model. The convolution layer in this module adopts 3*3 convolution kernel stacking, which can deepen the network depth and increase the nonlinearity of the model. Multiple 3*3 convolutional layers have fewer parameters than a large-size filter filter. Assuming that the input and output feature maps of the convolutional layer are the same size as C, then three 3*3 convolutional layer parameters The number is 27C ² , and the parameter of a 7*7 convolution layer is 49C ² , so three 3*3 convolution kernels can be regarded as a decomposition of a 7*7 convolution kernel (the middle layer has nonlinear decomposition, and play the role of implicit regularization), this replacement can make the model have fewer parameters. Moreover, the number of single-layer neurons in the network is large enough, and the generated feature abstraction level is high enough, which improves the accuracy and speed of the network, and has fewer network parameters. The convolutional layer of the feature mapping module in the multi-scale feature aggregation convolutional neural network model of this embodiment is a 3*3 convolution kernel, through which image features are remapped. In order to ensure the nonlinearity of the model, each convolution layer is followed by a nonlinear activation function layer. This embodiment uses Rectified linear units (ReLU), which can accelerate the convergence of the network.

本实施例中，多尺度特征聚合卷积神经网络模型优选采用四层多尺度特征聚合模块FME，多尺度特征聚合模块FME采用至少两个尺度分支，本实施例中的多尺度特征聚合模块优选采用四个分支，图2是多尺度特征聚合卷积神经网络模型中多尺度特征聚合模块的结构示意图，如图2所示，多尺度特征聚合模块的每个分支只采用1*1与3*3的卷积核，此模块第一个分支只采用1*1的卷积核，是为了保留上一层的特征尺度来覆盖小目标。其余三个分支都采用了3*3卷积核的堆叠来模仿大卷核的感受野，模仿的分别是3*3、5*5、7*7卷积核，不过它们前面增加了一层1*1的卷积，将特征维度减少一半。同时为了简略起见，每一个分支的通道数都设置为相等的，并在每个卷积核后跟了一个ReLU函数层。直观的来说，多尺度特征聚合模块FME就是一个不同大小感受野的集成，此模块可以捕捉到密集人群中人群的多尺度外观，有利于人群计数。In this embodiment, the multi-scale feature aggregation convolutional neural network model preferably adopts a four-layer multi-scale feature aggregation module FME, and the multi-scale feature aggregation module FME uses at least two scale branches. The multi-scale feature aggregation module in this embodiment preferably adopts Four branches, Figure 2 is a schematic structural diagram of the multi-scale feature aggregation module in the multi-scale feature aggregation convolutional neural network model, as shown in Figure 2, each branch of the multi-scale feature aggregation module only uses 1*1 and 3*3 The convolution kernel, the first branch of this module only uses a 1*1 convolution kernel, in order to preserve the feature scale of the previous layer to cover small targets. The remaining three branches all use the stack of 3*3 convolution kernels to imitate the receptive field of large convolution kernels. The imitations are 3*3, 5*5, and 7*7 convolution kernels, but they add a layer in front of them. 1*1 convolution reduces the feature dimension by half. At the same time, for the sake of simplicity, the number of channels of each branch is set to be equal, and a ReLU function layer is followed by each convolution kernel. Intuitively speaking, the multi-scale feature aggregation module FME is an integration of receptive fields of different sizes. This module can capture the multi-scale appearance of crowds in dense crowds, which is conducive to crowd counting.

本实施例中，密度图回归模块采用A列B层的空洞卷积，其中A和B均为正整数。经实验验证，本实施例的密度图回归模块优选为3列5层的空洞卷积，多层空洞卷积可以在不影响分辨率的情况下增加感受野，通过空洞卷积可以得到质量更高的人群密度图。空洞卷积的定义如下：In this embodiment, the density map regression module uses atrous convolution of column A and layer B, where both A and B are positive integers. It has been verified by experiments that the density map regression module of this embodiment is preferably 3 columns and 5 layers of atrous convolution. Multi-layer atrous convolution can increase the receptive field without affecting the resolution, and higher quality can be obtained through atrous convolution. population density map. The dilated convolution is defined as follows:

其中，w(i,j)表示过滤器；y(m,n)表示x(m,n)作为输入加上一个过滤器w(i,j)分别以M和N为长和宽进行空洞卷积时的输出；r表示扩张率。Among them, w(i,j) represents the filter; y(m,n) represents x(m,n) as input plus a filter w(i,j) for hollow volume with M and N as the length and width respectively The output of the product; r represents the expansion rate.

如果扩张率r＝1，一个空洞的卷积就是普通的卷积。空洞卷积是池化层的一个很好的替代，在分割任务中能够使准确率有显著的提高。虽然池化层(如最大池化层)被广泛用于保持不变性和控制过拟合，但它们也极大地降低了空间分辨率，意味着特征图的空间信息丢失。反卷层可以减少信息的丢失，但是额外的复杂性和执行延迟可能并不适用于所有情况。空洞卷积是更好的选择，它使用稀疏内核来交替池化和卷积层。在保持特征图分辨率方面，与卷积+池化+反卷积的方案相比，空洞卷积具有明显的优势。另经过实证研究，当扩张率为2时取得的效果最好，因此本实施例中扩张率r优选为2。If the dilation rate r=1, a dilated convolution is an ordinary convolution. Dilated convolutions are a good alternative to pooling layers and can lead to significant improvements in accuracy in segmentation tasks. Although pooling layers (such as max pooling layers) are widely used to maintain invariance and control overfitting, they also greatly reduce the spatial resolution, meaning that the spatial information of the feature maps is lost. Unwrapping layers can reduce the loss of information, but the additional complexity and execution latency may not be appropriate in all cases. Atrous convolution is a better choice, which uses sparse kernels to alternate pooling and convolutional layers. In terms of maintaining the resolution of the feature map, compared with the scheme of convolution + pooling + deconvolution, dilated convolution has obvious advantages. In addition, through empirical research, the best effect is obtained when the expansion rate is 2, so the expansion rate r is preferably 2 in this embodiment.

表1是多尺度特征聚合卷积神经网络模型的架构表格，如表1所示：Table 1 is the architecture table of the multi-scale feature aggregation convolutional neural network model, as shown in Table 1:

表1Table 1

其中，由于密度图的值始终为正，需要在最后一层1*1的卷积层后添加ReLU激活函数加强密度图的回归。Among them, since the value of the density map is always positive, it is necessary to add a ReLU activation function after the last 1*1 convolutional layer to strengthen the regression of the density map.

表1中列出了详细的参数设置，其中所有卷积层都使用填充来保持原来的大小。表格中卷积层的参数表示为“conv(kernel size)-(filter number)-(dilation rate)”其中，kernel size表示卷积核大小，filter number表示通道数，dilation rate表示扩张率；stacked-pooling为堆叠池，堆叠池的参数表示为内核集。The detailed parameter settings are listed in Table 1, where all convolutional layers use padding to keep the original size. The parameters of the convolutional layer in the table are expressed as "conv(kernel size)-(filter number)-(dilation rate)", where kernel size represents the size of the convolution kernel, filter number represents the number of channels, and dilation rate represents the expansion rate; stacked- Pooling is a stacking pool, and the parameters of the stacking pool are expressed as a kernel set.

堆叠池为池层的堆栈，除一个池内核外，它的池操作是在向下采样的特征映射上计算的，其中中间的特征映射连续计算公式为：Stacked pooling is a stack of pooling layers. Except for a pooling kernel, its pooling operation is calculated on the downsampled feature map, where the continuous calculation formula of the feature map in the middle is:

其中，箭头s是代表向下采样率s；ρ表示普通最大池化层；k′_i表示核大小；s′_i表示步长；Y′₀＝X是输入特征映射；核大小k′_i对应于k_i的某个变换；步长s′_i＝1＝s，s′_i＞1＝1。Among them, the arrow s represents the downsampling rate s; ρ represents the ordinary maximum pooling layer; k' _i represents the kernel size; s' _i represents the step size; Y' ₀ = X is the input feature map; the kernel size k' _i corresponds to A certain transformation on ki; step size s' _i ₌₁ =s, s'_i>1 =1.

堆叠池的输出连接中间特征映射的计算公式为：The output connection intermediate feature map of the stacked pool is calculated as:

图3是堆叠池的结构示意图，如图3所示，堆叠池中的内核集K＝{2，2，3}，步长S＝{2，1，1}，通过实验显示，图3中的堆叠池配置在本实施例模型中起到很好的效果。其中，M、N分别表示长和宽；Map pool表示最大池化；2ⅹ2、3ⅹ3分别为堆叠池中的内核大小，stride表示步长；channel average表示信道平均值。Figure 3 is a schematic diagram of the structure of the stacked pool, as shown in Figure 3, the kernel set K in the stacked pool = {2, 2, 3}, the step size S = {2, 1, 1}, as shown by experiments, in Figure 3 The stacked pool configuration works well in this example model. Among them, M and N represent the length and width respectively; Map pool represents the maximum pooling; 2ⅹ2 and 3ⅹ3 represent the kernel size in the stacking pool respectively, stride represents the step size; channel average represents the average value of the channel.

本实施例的人群的计数模型构建方法，首先对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图，并将人群图像和与人群图像对应的人群密度图进行组合，得到目标数据集；再对目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；最后利用目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。其中，本实施例是对人群图像中的人头进行标注的，对身体的遮挡不影响结果，所以不需要再训练场景和测试场景上使用透视图，提高了模型的适用性。多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率。In the method for constructing a crowd counting model in this embodiment, first, the crowd images in the pre-stored data set are marked with heads to obtain a crowd density map, and the crowd image and the crowd density map corresponding to the crowd image are combined to obtain the target data set; and then perform data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set; finally, use the training crowd images and training crowd density maps in the target training set to process the parameter initialization The multi-scale feature aggregation convolutional neural network model is trained to obtain the counting model of the crowd. Among them, in this embodiment, the head in the crowd image is marked, and the occlusion of the body does not affect the result, so there is no need to use perspective images on the training scene and test scene, which improves the applicability of the model. The multi-scale feature aggregation convolutional neural network model is a single-column network architecture with multi-scale feature aggregation function. When the model is trained and applied, it can not only solve the problem of scale changes of people in the image, but also reduce the amount of calculation and improve the recognition efficiency.

图4是本发明的人群的计数模型构建方法实施例二的流程图，如图4所示，本实施例的人群的计数模型构建方法是在图1所述实施例的基础上，进一步更加详细地对本发明的技术方案进行描述。Fig. 4 is a flow chart of the second embodiment of the crowd counting model construction method of the present invention. As shown in Fig. 4, the crowd counting model construction method of this embodiment is based on the embodiment described in Fig. 1 and further detailed The technical solution of the present invention is described in detail.

如图4所示，本实施例的人群的计数模型构建方法具体可以包括如下步骤：As shown in Figure 4, the crowd counting model construction method of this embodiment may specifically include the following steps:

S201、对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图；S201. Perform head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map;

该步骤的执行过程与图1所示的S101的执行过程相同，此处不再赘述。The execution process of this step is the same as the execution process of S101 shown in FIG. 1 , and will not be repeated here.

S202、将人群图像和与人群图像对应的人群密度图进行组合，得到目标数据集；S202. Combine the crowd image and the crowd density map corresponding to the crowd image to obtain a target data set;

该步骤的执行过程与图1所示的S102的执行过程相同，此处不再赘述。The execution process of this step is the same as the execution process of S102 shown in FIG. 1 , and will not be repeated here.

S203、通过预设裁剪方式，分别对目标数据集中按照第一预设比例划分得到的训练集中的每个原始训练人群图像及原始训练人群图像对应的原始训练人群密度图进行裁剪，得到至少两个裁剪图像和至少两个裁剪密度图；S203. Using a preset clipping method, respectively clip each original training crowd image and the original training crowd density map corresponding to the original training crowd image in the training set obtained by dividing the target data set according to the first preset ratio, to obtain at least two a cropped image and at least two cropped density maps;

通过上述步骤，得到目标数据集之后，按照第一预设比例对目标数据集进行划分，具体的划分方式在上述实施例中已经说明，此处不再阐述。划分后得到训练集，训练集中包括原始训练人群图像和原始人群图像对应的原始训练人群密度图，通过预设裁剪方式，分别对每个原始训练人群图像和原始训练人群密度图进行裁剪，得到至少两个裁剪图像和至少两个裁剪密度图，裁剪图像与裁剪密度图是相对应的，如何对原始训练人群图像进行裁剪，就如何对原始训练人群图像对应的原始训练人群密度图进行裁剪。本实施例中，预设裁剪方式优选为九宫格裁剪方式，即将原始训练人群图像和原始训练人群密度图裁剪成九宫格的样子，每个原始训练人群图像均裁剪出9个裁剪图像，每个原始训练人群密度图也均裁剪出9个裁剪密度图。Through the above steps, after the target data set is obtained, the target data set is divided according to the first preset ratio. The specific division method has been described in the above embodiments and will not be elaborated here again. After division, the training set is obtained. The training set includes the original training crowd image and the original training crowd density map corresponding to the original crowd image. Through the preset cutting method, each original training crowd image and the original training crowd density map are respectively cut to obtain at least Two cropped images and at least two cropped density maps, the cropped images correspond to the cropped density maps, and how to crop the original training crowd image is how to crop the original training crowd density map corresponding to the original training crowd image. In this embodiment, the preset cropping method is preferably a nine-square grid cutting method, that is, the original training crowd image and the original training crowd density map are cut into nine square grids, and each original training crowd image is cut out to 9 cropped images, and each original training crowd image The crowd density maps are also cropped to produce 9 cropped density maps.

S204、将裁剪图像进行翻转处理，得到翻转图像，以及，将裁剪密度图进行翻转处理，得到翻转密度图；S204, flipping the cropped image to obtain a flipped image, and flipping the cropped density map to obtain a flipped density map;

通过上述步骤，得到裁剪图像和裁剪图像对应的裁剪密度图之后，将裁剪图像进行翻转处理，得到翻转图像；将裁剪密度图进行翻转处理，得到翻转密度图。Through the above steps, after the cropped image and the cropped density map corresponding to the cropped image are obtained, the cropped image is flipped to obtain a flipped image; the cropped density map is flipped to obtain a flipped density map.

S205、按照原始训练人群图像的第二预设比例，将裁剪图像和翻转图像扩大，得到扩大图像，以及，将裁剪密度图和翻转密度图扩大，得到扩大密度图；S205. Expand the cropped image and flipped image according to the second preset ratio of the original training crowd image to obtain an enlarged image, and expand the cropped density map and flipped density map to obtain an enlarged density map;

通过上述步骤，得到裁剪图像、裁剪密度图、翻转图像和翻转密度图之后，按照原始训练人群图像的第二预设比例，将裁剪图像和翻转图像扩大，得到扩大图像；按照原始训练人群图像的第二预设比例，将裁剪密度图和翻转密度图扩大，得到扩大密度图。本实施例中，原始训练人群图像的第二预设比例优选为原始训练人群图像的90％，即将裁剪图像、裁剪密度图、翻转图像和翻转密度图均扩大为原始训练人群图像的90％。Through the above steps, after the cropped image, cropped density map, flipped image and flipped density map are obtained, the cropped image and flipped image are enlarged according to the second preset ratio of the original training crowd image to obtain the enlarged image; according to the ratio of the original training crowd image The second preset ratio expands the cropped density map and flipped density map to obtain the enlarged density map. In this embodiment, the second preset ratio of the original training crowd image is preferably 90% of the original training crowd image, that is, the cropped image, cropped density map, flipped image, and flipped density map are all enlarged to 90% of the original training crowd image.

S206、将扩大图像和扩大密度图存储到训练集中，得到目标训练集；S206. Store the enlarged image and the enlarged density map in the training set to obtain the target training set;

通过上述步骤，得到扩大图像和扩大图像对应的扩大密度图之后，将扩大图像和扩大密度图存储到训练集中，得到新的训练集，即目标训练集。其中，目标训练集中的训练人群图像包括原始训练人群图像和扩大图像；目标训练集中的训练人群密度图包括原始训练人群密度图和扩大密度图。Through the above steps, after the enlarged image and the enlarged density map corresponding to the enlarged image are obtained, the enlarged image and the enlarged density map are stored in the training set to obtain a new training set, that is, the target training set. Wherein, the training crowd image in the target training set includes the original training crowd image and the enlarged image; the training crowd density map in the target training set includes the original training crowd density map and the enlarged density map.

S207、将目标训练集中的训练人群图像和训练人群密度图输入到经参数初始化处理后的多尺度特征聚合卷积神经网络模型中进行第一级别的训练，得到第一训练模型；S207. Input the images of the training crowd and the density map of the training crowd in the target training set into the multi-scale feature aggregation convolutional neural network model after parameter initialization processing to perform first-level training to obtain the first training model;

通过上述步骤，得到目标训练集之后，将目标训练集中的训练人群图像和训练人群密度图输入到经参数初始化处理后的多尺度特征聚合卷积神经网络模型中，对该模型进行第一级别的训练，从而得到第一训练模型。其中，每次训练均是从训练集中选择一张训练人群图像和该训练人群图像对应的一张训练人群密度图输入到模型中。Through the above steps, after the target training set is obtained, the training crowd images and training crowd density maps in the target training set are input into the multi-scale feature aggregation convolutional neural network model after parameter initialization processing, and the first level of training is performed on the model. training to obtain the first training model. Wherein, for each training, a training crowd image and a training crowd density map corresponding to the training crowd image are selected from the training set and input into the model.

S208、利用欧氏距离损失公式对第一训练模型的模型参数进行第一级别的优化，得到第一优化模型；S208. Using the Euclidean distance loss formula to optimize the model parameters of the first training model at the first level to obtain the first optimization model;

通过上述步骤，得到第一训练模型后，利用欧氏距离损失公式对第一训练模型的模型参数进行第一级别的优化，从而得到第一优化模型。Through the above steps, after the first training model is obtained, the model parameters of the first training model are optimized at the first level by using the Euclidean distance loss formula, so as to obtain the first optimization model.

欧氏距离损失公式为：The Euclidean distance loss formula is:

其中，Θ表示参数模型；F(X_i；Θ)表示输出模型；X_i表示第i个输入的训练人群图像；F_i表示第i个输入的训练人群图像对应的训练人群密度图；N表示输入数据的个数。Wherein, Θ represents a parameter model; F(X _i ; Θ) represents an output model; Xi represents the _i -th input training crowd image; F _i represents the training crowd density map corresponding to the i-th input training crowd image; N represents The number of input data.

S209、按照预设的第一迭代次数，对第一级别的训练和第一级别的优化的过程进行迭代执行，得到第一目标模型；S209. Perform iterative execution of the first-level training and first-level optimization processes according to the preset number of first iterations to obtain the first target model;

通过上述步骤，进行了第一级别的训练和第一级别的优化之后，按照预设的第一迭代次数，对第一级别的训练和第一级别的优化的过程进行迭代执行，从而得到第一目标模型。本实施例中预设的第一迭代次数优选为20万次。Through the above steps, after the first-level training and the first-level optimization are performed, the process of the first-level training and the first-level optimization is iteratively executed according to the preset number of iterations to obtain the first target model. The preset first number of iterations in this embodiment is preferably 200,000 times.

迭代是指重复反馈过程的活动，其目的通常是为了逼近所需目标或结果。每一次对过程的重复称为一次“迭代”，而每一次迭代得到的结果会作为下一次迭代的初始值。对计算机特定程序中需要反复执行的子程序，进行一次重复，即重复执行程序中的循环，直到满足某条件为止，亦称为迭代。本实施例中，是不断进行第一级别的训练和第一级别的优化，直到迭代执行的次数满足预设的第一迭代次数为止。Iteration is the activity of repeating a feedback process, usually with the aim of approaching a desired goal or result. Each repetition of the process is called an "iteration", and the result of each iteration is used as the initial value for the next iteration. For a subroutine that needs to be executed repeatedly in a specific computer program, perform a repetition, that is, repeat the loop in the program until a certain condition is met, which is also called iteration. In this embodiment, the first-level training and the first-level optimization are continuously performed until the number of iterative executions satisfies the preset first number of iterations.

S210、将训练人群图像和训练人群密度图输入到第一目标模型中进行第二级别的训练，得到第二训练模型；S210. Input the training crowd image and the training crowd density map into the first target model to perform second-level training to obtain a second training model;

通过上述步骤，得到第一目标模型后，将训练人群图像和训练人群密度图输入到第一目标模型中，对该模型进行第二级别的训练，从而得到第二训练模型。其中，每次训练均是从训练集中选择一张训练人群图像和该训练人群图像对应的一张训练人群密度图输入到模型中。Through the above steps, after the first target model is obtained, the training crowd image and the training crowd density map are input into the first target model, and the second level of training is performed on the model, thereby obtaining the second training model. Wherein, for each training, a training crowd image and a training crowd density map corresponding to the training crowd image are selected from the training set and input into the model.

S211、利用欧氏距离损失公式和相对人数损失公式对第二训练模型的模型参数进行第二级别的优化，得到第二优化模型；S211. Using the Euclidean distance loss formula and the relative number of people loss formula to perform second-level optimization on the model parameters of the second training model to obtain a second optimization model;

通过上述步骤，得到第二训练模型后，利用欧氏距离损失公式和相对人数损失公式共同对第二训练模型的模型参数进行第二级别的优化，得到第二优化模型。利用相对人数损失公式对模型参数进行优化，能够集中学习预测误差较大的样本，在绝对人数非常稀疏的场景下，通过在网络中使用相对人数损失，能够使准确率显著提升。本实施例中不仅采用欧氏距离损失公式对模型参数进行优化，还采用相对人数损失公式对模型参数进行优化，能够提高在绝对人数稀疏场景下的预测精度。Through the above steps, after the second training model is obtained, the model parameters of the second training model are optimized at the second level by using the Euclidean distance loss formula and the relative number loss formula to obtain the second optimization model. Using the relative number loss formula to optimize the model parameters can focus on learning samples with large prediction errors. In the scenario where the absolute number of people is very sparse, the accuracy rate can be significantly improved by using the relative number loss in the network. In this embodiment, not only the Euclidean distance loss formula is used to optimize the model parameters, but also the relative number of people loss formula is used to optimize the model parameters, which can improve the prediction accuracy in the scene of absolute sparse number of people.

相对人数损失公式为：The relative headcount loss formula is:

其中，F_D(X_i；Θ)表示预测的人数；D_i表示真值人数；N表示输入数据的个数；分母为D_i+1是为了防止分母为零；相对人数损失的训练权重为L＝1.0*L(Θ)+0.1*L_D(Θ)。Among them, F _D (X _i ; Θ) represents the number of people predicted; D _i represents the number of people of true value; N represents the number of input data; the denominator is D _i +1 in order to prevent the denominator from being zero; the training weight of the relative number of people loss is L=1.0*L(Θ)+0.1*L _D (Θ).

S212、按照预设的第二迭代次数，对第二级别的训练和第二级别的优化的过程进行迭代执行，得到第二目标模型；S212. Iteratively execute the second-level training and second-level optimization processes according to the second preset number of iterations to obtain a second target model;

通过上述步骤，进行了第二级别的训练和第二级别的优化之后，按照预设的第二迭代次数，对第二级别的训练和第二级别的优化的过程进行迭代执行，从而得到第二目标模型。本实施例中预设的第二迭代次数优选为10万次。Through the above steps, after the second-level training and the second-level optimization are performed, the process of the second-level training and the second-level optimization is iteratively executed according to the second preset number of iterations, so as to obtain the second target model. The second preset number of iterations in this embodiment is preferably 100,000 times.

本实施例中，除了采用先训练和采用欧式距离损失公式优化迭代，再训练和采用欧式距离损失公式与相对人数损失公式优化迭代，还可以直接采用训练和采用欧式距离损失公式与相对人数损失公式优化迭代，不过迭代次数优选为30万次以上，这样才能达到模型的收敛。In this embodiment, in addition to training first and using the Euclidean distance loss formula to optimize iterations, then training and using the Euclidean distance loss formula and the relative number loss formula to optimize iterations, it is also possible to directly use the training and use the Euclidean distance loss formula and the relative number loss formula Optimization iterations, but the number of iterations is preferably more than 300,000 times, so as to achieve the convergence of the model.

S213、将目标数据集中按照第一预设比例划分得到的测试集中的测试人群图像输入到第二目标模型，得到目标人群密度图；S213. Input the test crowd images in the test set obtained by dividing the target data set according to the first preset ratio into the second target model to obtain a target crowd density map;

通过上述步骤，得到第二目标模型后，将目标数据集中按照第一预设比例划分得到的测试集中的测试人群图像输入到第二目标模型，经第二目标模型对测试人群图像处理后，得到目标人群密度图。Through the above steps, after the second target model is obtained, the test population images in the test set obtained by dividing the target data set according to the first preset ratio are input into the second target model, and after the second target model processes the test crowd images, the obtained Target population density map.

S214、根据目标人群密度图和测试集中测试人群图像对应的测试人群密度图，确定模型的测试准确率；S214. Determine the test accuracy of the model according to the target crowd density map and the test crowd density map corresponding to the test crowd images in the test set;

通过上述步骤，得到目标人群密度图后，将目标人群密度图和测试集中测试人群图像对应的测试人群密度图进行比较分析，确定第二目标模型的测试准确率。After the target population density map is obtained through the above steps, compare and analyze the target population density map and the test population density map corresponding to the test population images in the test set to determine the test accuracy of the second target model.

S215、判断测试准确率是否大于预设准确率；若是，执行步骤S216，若否，执行步骤S210；S215. Determine whether the test accuracy rate is greater than the preset accuracy rate; if yes, execute step S216; if not, execute step S210;

通过上述步骤，得到测试准确率之后，将测试准确率与预设准确率进行比较，判断测试准确率是否大于预设准确率，如果测试准确率大于预设准确率，则执行步骤S216；如果测试准确率小于或者等于预设准确率，则表示第二目标模型的准确率不达标，则将该第二目标模型作为第一目标模型，来执行步骤S210，重新获取数据并重新对模型进行训练。Through the above steps, after obtaining the test accuracy rate, compare the test accuracy rate with the preset accuracy rate, and judge whether the test accuracy rate is greater than the preset accuracy rate, if the test accuracy rate is greater than the preset accuracy rate, then perform step S216; If the accuracy rate is less than or equal to the preset accuracy rate, it means that the accuracy rate of the second target model is not up to the standard, and the second target model is used as the first target model to perform step S210 to reacquire data and retrain the model.

S216、将第二目标模型作为人群的计数模型。S216. Use the second target model as a population counting model.

通过上述步骤，如果判断出测试准确率大于预设准确率，则说明第二目标模型的准确率满足要求，将第二目标模型作为人群的计数模型。Through the above steps, if it is judged that the test accuracy rate is greater than the preset accuracy rate, it means that the accuracy rate of the second target model meets the requirements, and the second target model is used as the population counting model.

本实施例的人群的计数模型构建方法，首先对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图，这样对人群图像中的人头进行标注，对身体的遮挡不影响结果，所以不需要再训练场景和测试场景上使用透视图，提高了模型的适用性。对目标数据集划分的训练集中的原始训练人群图像和原始训练人群密度图进行裁剪、翻转和扩大，得到扩大图像和扩大密度图，并将扩大图像和扩大密度图存储到训练集中，得到目标训练集，从而实现训练集的数据扩增，保证模型的训练数据充足。对多尺度特征聚合卷积神经网络模型迭代执行第一级别的训练和第一级别的优化，得到第一目标模型，再对第一目标模型迭代执行第二级别的训练和第二级别的优化，得到第二目标模型，优化过程中实现欧氏距离损失公式和相对人数损失公式同时优化模型，从而提高人数的预测精度。将目标数据集划分的测试集中的测试人群图像输入到第二目标模型中，得到目标人群密度图，根据目标人群密度图和测试集中的测试人群密度图，确定模型的测试准确率，如果判断出测试准确率大于预设准确率，则将第二目标模型作为人群的计数模型，这样能够检验模型的准确率，从而保证计数结果的准确度。本实施例采用的多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率。In the method for constructing a crowd counting model in this embodiment, first, the crowd images in the pre-stored data set are marked with heads to obtain a crowd density map. In this way, marking the heads in the crowd images does not affect the result due to body occlusion, so There is no need to use perspective images on the training scene and test scene, which improves the applicability of the model. Crop, flip and expand the original training population image and the original training population density map in the training set divided by the target data set to obtain the enlarged image and the enlarged density map, and store the enlarged image and the enlarged density map in the training set to obtain the target training population Set, so as to realize the data amplification of the training set and ensure that the training data of the model is sufficient. Iteratively perform first-level training and first-level optimization on the multi-scale feature aggregation convolutional neural network model to obtain the first target model, and then iteratively perform second-level training and second-level optimization on the first target model, The second objective model is obtained. During the optimization process, the Euclidean distance loss formula and the relative number loss formula are implemented to optimize the model at the same time, thereby improving the prediction accuracy of the number of people. Input the test population images in the test set divided by the target data set into the second target model to obtain the target population density map, and determine the test accuracy of the model according to the target population density map and the test population density map in the test set. If the test accuracy rate is greater than the preset accuracy rate, the second target model is used as the counting model of the crowd, so that the accuracy of the model can be tested, thereby ensuring the accuracy of the counting results. The multi-scale feature aggregation convolutional neural network model used in this embodiment is a single-column network architecture with multi-scale feature aggregation function. When the model is trained and applied, it can not only solve the problem of the scale change of the characters in the image, but also reduce the amount of calculation. , to improve the recognition efficiency.

图5是本发明的人群的计数方法实施例的流程图，如图5所示，本实施例的人群的计数方法具体可以包括如下步骤：Fig. 5 is a flow chart of an embodiment of the crowd counting method of the present invention. As shown in Fig. 5, the crowd counting method of this embodiment may specifically include the following steps:

S301、获取待计数人群图像；S301. Obtain images of people to be counted;

本实施例，首先需要获取待计数人群图像。其中，待计数人群图像为需要计算人数的图像。In this embodiment, it is first necessary to obtain images of people to be counted. Wherein, the image of the crowd to be counted is an image that needs to be counted.

S302、基于预先构建的人群的计数模型，输入待计数人群图像，得到待计数人群密度图；S302. Based on the pre-built counting model of the crowd, input the image of the crowd to be counted, and obtain a density map of the crowd to be counted;

通过上述步骤，获取到待计数人群图像后，将该待计数人群图像输入到通过上述实施例中人群的计数模型构建方法构建的人群的计数模型中，通过该人群的计数模型的处理，得到待计数人群图像对应的待计数人群密度图。其中，该待计数人群密度图是由许多点组成的，每个点代表一个人，点的位置为该人在待计数人群图像中的头部位置。Through the above steps, after the image of the crowd to be counted is obtained, the image of the crowd to be counted is input into the counting model of the crowd constructed by the counting model construction method of the crowd in the above embodiment, and through the processing of the counting model of the crowd, the counting model of the crowd to be counted is obtained. The density map of the crowd to be counted corresponding to the counted crowd image. Wherein, the population density map to be counted is composed of many points, each point represents a person, and the position of the point is the head position of the person in the image of the population to be counted.

S303、利用累加计算法，累加待计数人群密度图中各像素点中的数值，得到像素点总数值，将像素点总数值作为待计数人群图像中的人数。S303. Use an accumulation calculation method to accumulate the values of each pixel in the population density map to be counted to obtain the total value of the pixels, and use the total value of the pixels as the number of people in the image of the population to be counted.

通过上述步骤，得到待计数人群密度图后，利用累加计算法，累加待计数人群密度图中各像素点中的数值，得到像素点总数值，将像素点总数值作为待计数人群图像中的人数。Through the above steps, after obtaining the population density map to be counted, use the cumulative calculation method to accumulate the values in each pixel in the population density map to be counted to obtain the total value of the pixels, and use the total value of the pixels as the number of people in the image of the population to be counted .

本实施例的人群的计数方法，首先获取待计数人群图像，并将该待计数人群图像输入到通过人群的计数模型构建方法预先构建的人群的计数模型中，得到待计数人群密度图，最后，利用累加计算法，累加待计数人群密度图中各像素点中的数值，得到像素点总数值，将像素点总数值作为待计数人群图像中的人数。本实施例是对人群图像中的人头进行标注的，对身体的遮挡不影响结果，所以构建人群的计数模型时不需要在训练场景和测试场景上使用透视图，提高了模型的适用性，并且人群的计数模型中采用的多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率。The counting method of the crowd in this embodiment first obtains the image of the crowd to be counted, and inputs the image of the crowd to be counted into the counting model of the crowd pre-built by the counting model construction method of the crowd to obtain the density map of the crowd to be counted, and finally, Using the cumulative calculation method, the values in each pixel in the population density map to be counted are accumulated to obtain the total value of the pixel points, and the total value of the pixel points is used as the number of people in the image of the population to be counted. In this embodiment, the head in the crowd image is marked, and the occlusion of the body does not affect the result, so when constructing the counting model of the crowd, it is not necessary to use perspective images on the training scene and the test scene, which improves the applicability of the model, and The multi-scale feature aggregation convolutional neural network model used in the crowd counting model is a single-column network architecture with multi-scale feature aggregation function. When the model is trained and applied, it can not only solve the problem of scale changes of people in the image, but also reduce reduce the amount of computation and improve the recognition efficiency.

为了更全面，对应于本发明实施例提供的人群的计数模型构建方法，本申请还提供了人群的计数模型构建装置。To be more comprehensive, corresponding to the method for constructing a crowd counting model provided in the embodiment of the present invention, the present application also provides a crowd counting model constructing device.

图6是本发明的人群的计数模型构建装置实施例一的结构示意图，如图6所示，本实施例的人群的计数模型构建装置包括第一处理模块11、数据组合模块12、数据扩增模块13和第二处理模块14。Figure 6 is a schematic structural view of Embodiment 1 of the crowd counting model construction device of the present invention, as shown in Figure 6, the crowd counting model construction device of this embodiment includes a first processing module 11, a data combination module 12, a data amplification module 13 and a second processing module 14.

第一处理模块11，用于对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图；The first processing module 11 is used to perform head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map;

数据组合模块12，用于将人群图像和与人群图像对应的人群密度图进行组合，得到目标数据集；The data combination module 12 is used to combine the crowd image and the crowd density map corresponding to the crowd image to obtain the target data set;

数据扩增模块13，用于对目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；The data amplification module 13 is used to perform data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set;

第二处理模块14，用于利用目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。The second processing module 14 is configured to use the training crowd image and training crowd density map in the target training set to train the multi-scale feature aggregation convolutional neural network model after parameter initialization to obtain a crowd counting model.

本实施例的人群的计数模型构建装置，首先通过第一处理模块11对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图，并通过数据组合模块12将人群图像和与人群图像对应的人群密度图进行组合，得到目标数据集；再通过数据扩增模块13对目标数据集中按照第一预设比例划分得到的训练集进行数据扩增处理，得到目标训练集；最后通过第二处理模块14利用目标训练集中的训练人群图像和训练人群密度图，对经参数初始化处理后的多尺度特征聚合卷积神经网络模型进行训练，得到人群的计数模型。其中，本实施例是对人群图像中的人头进行标注的，对身体的遮挡不影响结果，所以不需要再训练场景和测试场景上使用透视图，提高了模型的适用性。多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率。In the device for constructing a crowd counting model in this embodiment, first, the first processing module 11 performs head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map, and the crowd images are combined with the crowd images by the data combination module 12. The corresponding crowd density maps are combined to obtain the target data set; then the training set obtained by dividing the target data set according to the first preset ratio is subjected to data amplification processing through the data amplification module 13 to obtain the target training set; finally through the second The processing module 14 uses the images of the training crowd and the density map of the training crowd in the target training set to train the multi-scale feature aggregation convolutional neural network model after parameter initialization to obtain a counting model of the crowd. Among them, in this embodiment, the head in the crowd image is marked, and the occlusion of the body does not affect the result, so there is no need to use perspective images on the training scene and test scene, which improves the applicability of the model. The multi-scale feature aggregation convolutional neural network model is a single-column network architecture with multi-scale feature aggregation function. When the model is trained and applied, it can not only solve the problem of scale changes of people in the image, but also reduce the amount of calculation and improve the recognition efficiency.

图7是本发明的人群的计数模型构建装置实施例二的结构示意图，如图7所示，本实施例的人群的计数模型构建装置在图6所述实施例的基础上，数据扩增模块13包括剪裁单元131、翻转单元132、扩大单元133和存储单元134。Fig. 7 is a schematic structural diagram of the second embodiment of the crowd counting model construction device of the present invention. As shown in Fig. 7, the crowd counting model construction device of this embodiment is based on the embodiment described in Fig. 6 and has a data amplification module 13 includes a cropping unit 131 , a flipping unit 132 , an expanding unit 133 and a storage unit 134 .

裁剪单元131，用于通过预设裁剪方式，分别对训练集中的每个原始训练人群图像及原始训练人群图像对应的原始训练人群密度图进行裁剪，得到至少两个裁剪图像和至少两个裁剪密度图；The clipping unit 131 is configured to clip each original training crowd image in the training set and the original training crowd density map corresponding to the original training crowd image through a preset clipping method to obtain at least two cropped images and at least two cropped densities picture;

翻转单元132，用于将裁剪图像进行翻转处理，得到翻转图像，以及，将裁剪密度图进行翻转处理，得到翻转密度图；The flipping unit 132 is configured to flip the cropped image to obtain a flipped image, and flip the cropped density map to obtain a flipped density map;

扩大单元133，用于按照原始训练人群图像的第二预设比例，将裁剪图像和翻转图像扩大，得到扩大图像，以及，将裁剪密度图和翻转密度图扩大，得到扩大密度图；The expansion unit 133 is configured to expand the cropped image and flipped image according to the second preset ratio of the original training crowd image to obtain an enlarged image, and expand the cropped density map and flipped density map to obtain an enlarged density map;

存储单元134，用于将扩大图像和扩大密度图存储到训练集中，得到目标训练集。The storage unit 134 is configured to store the enlarged image and the enlarged density map in the training set to obtain a target training set.

进一步地，本实施例的人群的计数模型构建装置中，第二处理模块14包括第一训练单元141、第一优化单元142、迭代处理单元143、第二训练单元144、第二优化单元145和模型确定单元146；本实施例的人群的计数模型构建装置还包括模型测试模块15、确定模块16和判断模块17。Further, in the device for constructing a crowd counting model in this embodiment, the second processing module 14 includes a first training unit 141, a first optimization unit 142, an iterative processing unit 143, a second training unit 144, a second optimization unit 145 and The model determination unit 146 ; the device for constructing a population counting model in this embodiment also includes a model testing module 15 , a determination module 16 and a judgment module 17 .

第一训练单元141，用于将目标训练集中的训练人群图像和训练人群密度图输入到经参数初始化处理后的多尺度特征聚合卷积神经网络模型中进行第一级别的训练，得到第一训练模型；The first training unit 141 is used to input the images of the training crowd and the density map of the training crowd in the target training set into the multi-scale feature aggregation convolutional neural network model after parameter initialization for the first level of training to obtain the first training Model;

其中，训练人群图像包括原始训练人群图像和扩大图像；训练人群密度图包括原始训练人群密度图和扩大密度图。Wherein, the training crowd image includes an original training crowd image and an enlarged image; the training crowd density map includes an original training crowd density map and an enlarged density map.

第一优化单元142，用于利用欧氏距离损失公式对第一训练模型的模型参数进行第一级别的优化，得到第一优化模型；The first optimization unit 142 is configured to use the Euclidean distance loss formula to perform first-level optimization on the model parameters of the first training model to obtain the first optimization model;

迭代处理单元143，用于按照预设的第一迭代次数，对第一级别的训练和第一级别的优化的过程进行迭代执行，得到第一目标模型；The iterative processing unit 143 is configured to iteratively execute the first-level training and first-level optimization processes according to the preset first number of iterations to obtain the first target model;

第二训练单元144，用于将训练人群图像和训练人群密度图输入到第一目标模型中进行第二级别的训练，得到第二训练模型；The second training unit 144 is configured to input the training crowd image and the training crowd density map into the first target model for second-level training to obtain a second training model;

第二优化单元145，用于利用欧氏距离损失公式和相对人数损失公式对第二训练模型的模型参数进行第二级别的优化，得到第二优化模型；The second optimization unit 145 is configured to use the Euclidean distance loss formula and the relative number of people loss formula to perform second-level optimization on the model parameters of the second training model to obtain a second optimization model;

迭代处理单元143，还用于按照预设的第二迭代次数，对第二级别的训练和第二级别的优化的过程进行迭代执行，得到第二目标模型；The iterative processing unit 143 is further configured to iteratively execute the second-level training and second-level optimization processes according to a preset second number of iterations to obtain a second target model;

模型测试模块15，用于将目标数据集中按照第一预设比例划分得到的测试集中的测试人群图像输入到第二目标模型，得到目标人群密度图；The model testing module 15 is used to input the test crowd images in the test set obtained by dividing the target data set according to the first preset ratio into the second target model to obtain the target crowd density map;

确定模块16，用于根据目标人群密度图和测试集中测试人群图像对应的测试人群密度图，确定模型的测试准确率；Determining module 16, for determining the test accuracy of the model according to the target crowd density map and the test crowd density map corresponding to the test crowd image in the test set;

判断模块17，用于判断测试准确率是否大于预设准确率；Judging module 17, for judging whether the test accuracy rate is greater than the preset accuracy rate;

模型确定单元146，用于若测试准确率大于预设准确率，将第二目标模型作为人群的计数模型。The model determining unit 146 is configured to use the second target model as the population counting model if the test accuracy rate is greater than the preset accuracy rate.

本实施例的人群的计数模型构建装置，首先第一处理模块11对预存储的数据集中的人群图像进行人头标注处理，得到人群密度图，这样对人群图像中的人头进行标注，对于身体的遮挡是无所谓的，所以不需要再训练场景和测试场景上使用透视图，提高了模型的适用性。裁剪单元131、翻转单元132和扩大单元133对目标数据集划分的训练集中的原始训练人群图像和原始训练人群密度图进行裁剪、翻转和扩大，得到扩大图像和扩大密度图，存储单元134将扩大图像和扩大密度图存储到训练集中，得到目标训练集，从而实现训练集的数据扩增，保证模型的训练数据充足。迭代处理单元143对多尺度特征聚合卷积神经网络模型迭代执行第一级别的训练和第一级别的优化，得到第一目标模型，再对第一目标模型迭代执行第二级别的训练和第二级别的优化，得到第二目标模型，优化过程中实现欧氏距离损失公式和相对人数损失公式同时优化模型，从而提高人数的预测精度。模型测试模块15将目标数据集划分的测试集中的测试人群图像输入到第二目标模型中，确定模块16根据得到的目标人群密度图和测试集中的测试人群密度图，确定模型的测试准确率，如果判断模块17判断出测试准确率大于预设准确率，则模型确定单元146将第二目标模型作为人群的计数模型，这样能够检验模型的准确率，从而保证计数结果的准确度。本实施例采用的多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率。In the device for constructing a crowd counting model in this embodiment, first, the first processing module 11 performs head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map. It doesn't matter, so there is no need to use the perspective view on the training scene and test scene, which improves the applicability of the model. The cutting unit 131, the flipping unit 132 and the expanding unit 133 cut, flip and expand the original training crowd image and the original training crowd density map in the training set divided by the target data set to obtain the enlarged image and the enlarged density map, and the storage unit 134 will expand The image and the enlarged density map are stored in the training set to obtain the target training set, so as to realize the data amplification of the training set and ensure that the training data of the model is sufficient. The iterative processing unit 143 iteratively performs first-level training and first-level optimization on the multi-scale feature aggregation convolutional neural network model to obtain the first target model, and then iteratively performs second-level training and second-level optimization on the first target model. Level optimization, get the second target model, realize the Euclidean distance loss formula and the relative number loss formula to optimize the model at the same time during the optimization process, so as to improve the prediction accuracy of the number of people. The model testing module 15 inputs the test crowd images in the test set divided by the target data set into the second target model, and the determination module 16 determines the test accuracy of the model according to the obtained target crowd density map and the test crowd density map in the test set, If the judging module 17 judges that the test accuracy rate is greater than the preset accuracy rate, the model determination unit 146 uses the second target model as the counting model of the crowd, so that the accuracy of the model can be checked, thereby ensuring the accuracy of the counting result. The multi-scale feature aggregation convolutional neural network model used in this embodiment is a single-column network architecture with multi-scale feature aggregation function. When the model is trained and applied, it can not only solve the problem of the scale change of the characters in the image, but also reduce the amount of calculation. , to improve the recognition efficiency.

为了更全面，对应于本发明实施例提供的人群的计数方法，本申请还提供了人群的计数装置。To be more comprehensive, corresponding to the crowd counting method provided in the embodiment of the present invention, the present application also provides a crowd counting device.

图8是本发明的人群的计数装置实施例的结构示意图，如图8所示，本实施例的人群的计数装置包括图像获取模块21、密度图确定模块22和人数统计模块23。FIG. 8 is a schematic structural diagram of an embodiment of the crowd counting device of the present invention. As shown in FIG.

图像获取模块21，用于获取待计数人群图像；An image acquisition module 21, configured to acquire an image of a crowd to be counted;

密度图确定模块22，用于基于预先构建的人群的计数模型，输入待计数人群图像，得到待计数人群密度图；人群的计数模型通过上述实施例中人群的计数模型构建方法构建；The density map determination module 22 is used to input the image of the crowd to be counted based on the pre-built counting model of the crowd to obtain the density map of the crowd to be counted; the counting model of the crowd is constructed by the counting model construction method of the crowd in the above embodiment;

人数统计模块23，用于利用累加计算法，累加待计数人群密度图中各像素点中的数值，得到像素点总数值，将像素点总数值作为待计数人群图像中的人数。The people counting module 23 is used for accumulating the values of each pixel in the density map of the people to be counted by using the cumulative calculation method to obtain the total value of the pixels, and using the total value of the pixels as the number of people in the image of the people to be counted.

本实施例的人群的计数装置，图像获取模块21获取待计数人群图像，密度图确定模块22将该待计数人群图像输入到通过人群的计数模型构建方法预先构建的人群的计数模型中，得到待计数人群密度图，最后，人数统计模块23利用累加计算法，累加待计数人群密度图中各像素点中的数值，得到像素点总数值，将像素点总数值作为待计数人群图像中的人数。本实施例是对人群图像中的人头进行标注的，对身体的遮挡不影响结果，所以构建人群的计数模型时不需要在训练场景和测试场景上使用透视图，提高了模型的适用性，并且人群的计数模型中采用的多尺度特征聚合卷积神经网络模型为具有多尺度特征聚合功能的单列网络架构，在模型进行训练和应用时，既能解决图像中人物的尺度变化问题，也能降低运算量，提高识别效率。In the crowd counting device of this embodiment, the image acquisition module 21 acquires the images of the crowds to be counted, and the density map determination module 22 inputs the images of the crowds to be counted into the counting model of the crowds pre-built by the counting model construction method of the crowds, and obtains For counting the crowd density map, finally, the people counting module 23 uses the cumulative calculation method to accumulate the values in each pixel in the crowd density map to be counted to obtain the total value of the pixels, and use the total value of the pixels as the number of people in the crowd image to be counted. In this embodiment, the head in the crowd image is marked, and the occlusion of the body does not affect the result, so when constructing the counting model of the crowd, it is not necessary to use perspective images on the training scene and the test scene, which improves the applicability of the model, and The multi-scale feature aggregation convolutional neural network model used in the crowd counting model is a single-column network architecture with multi-scale feature aggregation function. When the model is trained and applied, it can not only solve the problem of scale changes of people in the image, but also reduce reduce the amount of computation and improve the recognition efficiency.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

可以理解的是，上述各实施例中相同或相似部分可以相互参考，在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。It can be understood that, the same or similar parts in the above embodiments can be referred to each other, and the content that is not described in detail in some embodiments can be referred to the same or similar content in other embodiments.

需要说明的是，在本发明的描述中，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性。此外，在本发明的描述中，除非另有说明，“多个”的含义是指至少两个。It should be noted that, in the description of the present invention, the terms "first", "second" and so on are only used for description purposes, and should not be understood as indicating or implying relative importance. In addition, in the description of the present invention, unless otherwise specified, the meaning of "plurality" means at least two.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be executed when the program is executed. When, one or a combination of the steps of the method embodiment is included.

此外，在本发明各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. A counting model building method of a crowd, is characterized in that, comprises:

Perform head annotation processing on the crowd images in the pre-stored data set to obtain a crowd density map;

combining the crowd image with a crowd density map corresponding to the crowd image to obtain a target data set;

performing data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set;

Using the images of the training crowd and the density map of the training crowd in the target training set, the multi-scale feature aggregation convolutional neural network model after parameter initialization is trained to obtain the counting model of the crowd.

2. The method according to claim 1, characterized in that, performing data amplification processing on the training set obtained by dividing the target data set according to the first preset ratio to obtain the target training set, comprising:

Each of the original training crowd images in the training set and the original training crowd density map corresponding to the original training crowd images are respectively cropped by a preset cropping method to obtain at least two cropped images and at least two crop density map;

performing flipping processing on the cropped image to obtain a flipped image, and flipping the cropped density map to obtain a flipped density map;

According to the second preset ratio of the original training crowd image, the cropped image and the flipped image are enlarged to obtain an enlarged image, and the cropped density map and the flipped density map are enlarged to obtain an enlarged density map ;

storing the enlarged image and the enlarged density map in the training set to obtain a target training set;

Wherein, the training crowd image includes the original training crowd image and the enlarged image; the training crowd density map includes the original training crowd density map and the enlarged density map.

3. The method according to claim 1, wherein, using the training crowd image and the training crowd density map in the target training set, the multi-scale feature aggregation convolutional neural network model after parameter initialization is processed Train to get the counting model of the crowd, including:

Input the training crowd image and the training crowd density map in the target training set into the multi-scale feature aggregation convolutional neural network model after parameter initialization to perform first-level training to obtain the first training Model;

Using the Euclidean distance loss formula to optimize the model parameters of the first training model at the first level to obtain the first optimization model;

Iteratively executing the first-level training and the first-level optimization processes according to a preset first number of iterations to obtain a first target model;

Inputting the training crowd image and the training crowd density map into the first target model to perform second-level training to obtain a second training model;

Using the Euclidean distance loss formula and the relative number of people loss formula to perform second-level optimization on the model parameters of the second training model to obtain a second optimization model;

performing iterative execution on the second-level training and the second-level optimization process according to the preset second number of iterations to obtain a second target model;

The second target model is used as the population count model.

4. The method according to claim 3, wherein, before using the second target model as the counting model of the crowd, further comprising:

inputting the test crowd images in the test set obtained by dividing the target data set according to the first preset ratio into the second target model to obtain a target crowd density map;

Determine the test accuracy of the model according to the target crowd density map and the test crowd density map corresponding to the test crowd images in the test set;

Judging whether the test accuracy rate is greater than the preset accuracy rate;

Correspondingly, said using the second target model as the population counting model includes:

If the test accuracy rate is greater than the preset accuracy rate, the second target model is used as the population count model.

5. method according to claim 3, is characterized in that, described Euclidean distance loss formula is:

Wherein, Θ represents a parameter model; F(X _i ; Θ) represents an output model; Xi represents the training crowd image of the _{i-th input; F i} _represents the training corresponding to the training crowd image of the i-th input Crowd density map; N represents the number of input data;

The relative headcount loss formula is:

Wherein, F _D (X _i ; Θ) represents the number of predictions; D _i represents the number of true value; N represents the number of input data; the denominator is D _i +1 to prevent the denominator from being zero; the training of the relative number of people loss The weight is L=1.0*L(Θ)+0.1*L _D (Θ).

6. A crowd counting method, characterized in that, comprising:

Obtain the image of the crowd to be counted;

Based on the pre-built counting model of the crowd, input the image of the crowd to be counted to obtain a density map of the crowd to be counted; the counting model of the crowd is constructed by the crowd counting model construction method described in any one of claims 1-5;

Using an accumulative calculation method, the values in each pixel in the population density map to be counted are accumulated to obtain the total value of the pixel points, and the total value of the pixel points is used as the number of people in the image of the population to be counted.

7. A crowd counting model building device, characterized in that it comprises:

The first processing module is used to perform head labeling processing on the crowd images in the pre-stored data set to obtain a crowd density map;

A data combination module, configured to combine the crowd image and the crowd density map corresponding to the crowd image to obtain a target data set;

A data amplification module, configured to perform data amplification processing on the training set obtained by dividing the target data set according to a first preset ratio to obtain a target training set;

The second processing module is used to use the training crowd image and training crowd density map in the target training set to train the multi-scale feature aggregation convolutional neural network model after parameter initialization to obtain a crowd counting model.

8. The device according to claim 7, wherein the second processing module comprises: a first training unit, a first optimization unit, an iterative processing unit, a second training unit, a second optimization unit and a model determination unit ;

The first training unit is configured to input the training crowd image and the training crowd density map in the target training set into the multi-scale feature aggregation convolutional neural network model after parameter initialization processing to perform the second One level of training to obtain the first training model;

The first optimization unit is configured to use the Euclidean distance loss formula to perform first-level optimization on the model parameters of the first training model to obtain a first optimization model;

The iterative processing unit is configured to iteratively execute the first-level training and first-level optimization processes according to a preset first number of iterations to obtain a first target model;

The second training unit is configured to input the training crowd image and the training crowd density map into the first target model to perform second-level training to obtain a second training model;

The second optimization unit is configured to use the Euclidean distance loss formula and the relative number of people loss formula to optimize the model parameters of the second training model at a second level to obtain a second optimization model;

The iterative processing unit is further configured to iteratively execute the process of the second-level training and the second-level optimization according to a preset second number of iterations to obtain a second target model;

A model determining unit, configured to use the second target model as a counting model of the population.

9. The device according to claim 8, wherein the multi-scale feature aggregation convolutional neural network model includes a feature mapping module, a multi-scale feature aggregation module and a density map regression module;

The feature mapping module adopts the first X convolutional layers of the VGG-16 structure, where X is 4 or 6;

The convolution layer is a 3*3 convolution kernel;

The multi-scale feature aggregation module adopts at least two scale branches;

The density map regression module adopts atrous convolution of column A and layer B, wherein both A and B are positive integers.

10. A crowd counting device, comprising: an image acquisition module, a density map determination module and a people counting module;

The image acquisition module is used to acquire images of people to be counted;

The density map determination module is used to input the image of the crowd to be counted based on the pre-built counting model of the crowd to obtain a density map of the crowd to be counted; the counting model of the crowd is as described in any one of claims 1-5. The counting model construction method construction of the crowd;

The population counting module is used to accumulate the values of each pixel in the population density map to be counted by using an accumulation calculation method to obtain the total value of the pixels, and use the total value of the pixels as the number of people in the image of the population to be counted .