[go: up one dir, main page]

CN110837888A - Traffic missing data completion method based on bidirectional cyclic neural network - Google Patents

Traffic missing data completion method based on bidirectional cyclic neural network Download PDF

Info

Publication number
CN110837888A
CN110837888A CN201911106967.7A CN201911106967A CN110837888A CN 110837888 A CN110837888 A CN 110837888A CN 201911106967 A CN201911106967 A CN 201911106967A CN 110837888 A CN110837888 A CN 110837888A
Authority
CN
China
Prior art keywords
data
traffic flow
missing
completion
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911106967.7A
Other languages
Chinese (zh)
Inventor
申彦明
徐文权
齐恒
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201911106967.7A priority Critical patent/CN110837888A/en
Publication of CN110837888A publication Critical patent/CN110837888A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明提供了一种基于双向循环神经网络的交通缺失数据补全方法,属于交通领域。该方法首先利用数据在时间上的时序性特点,同时考虑了补全时间点之前的数据和之后的数据对当前时间点的影响,大幅提高了对数据的利用和补全精度,其次考虑到外部特征、相邻传感器数据对当前传感器数据的影响,将其加入到补全模型中,大幅提高了补全精度。本发明的方法不仅大幅提高数据缺失率低的情况下补全精度,还提升了在数据缺失率高的情况下的补全精度。

Figure 201911106967

The invention provides a traffic missing data completion method based on a bidirectional cyclic neural network, which belongs to the field of traffic. This method firstly utilizes the time-series characteristics of data, and also considers the impact of data before and after the completion time point on the current time point, which greatly improves the utilization and completion accuracy of data, and secondly considers the external The influence of features and adjacent sensor data on the current sensor data is added to the completion model, which greatly improves the completion accuracy. The method of the present invention not only greatly improves the completion accuracy in the case of a low data missing rate, but also improves the completion accuracy in the case of a high data missing rate.

Figure 201911106967

Description

一种基于双向循环神经网络的交通缺失数据补全方法A Complete Method for Traffic Missing Data Based on Bidirectional Recurrent Neural Network

技术领域technical field

本发明属于交通领域,具体涉及一种基于双向循环神经网络的交通缺失数据补全方法。The invention belongs to the field of traffic, and in particular relates to a traffic missing data completion method based on a bidirectional cyclic neural network.

背景技术Background technique

道路线圈车流量数据具有周期性、时间序列性和趋势性。现阶段,对车流量数据补全的方法主要是基于其时序性。The traffic flow data of road coils have periodicity, time series and trend. At this stage, the method for completing traffic flow data is mainly based on its timing.

基于时序性的车流量数据补全,取当前缺失点之前的一段时间的数据,通过神经网络,来对缺失点数据进行补全。比如要补全今天16点的车流量数据,那么就取当天8点到15点的数据作为输入,通过循环神经网络,得到下一个时间点——16点的数据。这种基于历史数据的补全方法,很好地利用了数据的时序性的特点来进行补全,补全结果相对较好,但是该方法具有局限性。当有特殊事件发生时,当前的缺失点之前也是一系列的缺失点,比如:停电,会导致一段连续的数据的丢失,当对最后一个缺失点进行补全时,由于输入数据缺失严重,补全效果在这种情况下非常差。Based on the completion of time series traffic flow data, the data of a period of time before the current missing point is taken, and the missing point data is completed through the neural network. For example, if you want to complete the traffic flow data at 16:00 today, then take the data from 8:00 to 15:00 that day as input, and get the data at the next time point - 16:00 through the recurrent neural network. This historical data-based completion method makes good use of the time-series characteristics of data for completion, and the completion results are relatively good, but this method has limitations. When a special event occurs, the current missing point is also preceded by a series of missing points, such as a power outage, which will result in the loss of a continuous segment of data. The full effect is very poor in this case.

神经网络最开始是受生物神经系统的启发,为了模拟生物神经系统而出现的,由大量的节点(或称神经元)之间相互联接构成。神经网络根据输入的变化,对权值进行调整,改善系统的行为,自动学习到一个能够解决问题的模型。 LSTM(长短记忆网络)是RNN(循环神经网络)的一种特殊形式,有效地解决多层神经网络训练的梯度消失和梯度爆炸问题,能够处理长时时间依赖序列。LSTM 能够捕获充电量数据的时间序列特性,使用LSTM模型能够有效提高补全精度。The neural network was originally inspired by the biological nervous system and appeared in order to simulate the biological nervous system. It is composed of a large number of nodes (or neurons) connected with each other. The neural network adjusts the weights according to changes in the input, improves the behavior of the system, and automatically learns a model that can solve the problem. LSTM (Long Short-Term Memory Network) is a special form of RNN (Recurrent Neural Network), which can effectively solve the gradient disappearance and gradient explosion problems of multi-layer neural network training, and can handle long-term time-dependent sequences. LSTM can capture the time series characteristics of charging data, and the use of LSTM model can effectively improve the completion accuracy.

LSTM网络由LSTM单元组成,LSTM单元由单元,输入门、输出门和遗忘门组成。An LSTM network consists of LSTM cells, which consist of cells, input gates, output gates, and forget gates.

遗忘门:决定从上一个单元的输出状态中丢弃多少信息,公式如下:Forget gate: decides how much information to discard from the output state of the previous unit, the formula is as follows:

ft=σg(Wfxt+Ufht-1+bf)f tg (W f x t +U f h t-1 +b f )

其中,ft是遗忘门的输出,xt是输入序列,ht-1是上一个单元的输出,σg表示sigmoid函数,Wf表示输入的权重参数矩阵,Uf表示上一个单元输出的权重参数矩阵,bf表示偏差参数向量。where f t is the output of the forget gate, x t is the input sequence, h t-1 is the output of the previous unit, σ g represents the sigmoid function, W f represents the input weight parameter matrix, and U f represents the output of the previous unit Weight parameter matrix, b f represents the bias parameter vector.

输入门:决定让多少新的信息加入到Cell状态中,并对单元状态C进行更新,公式如下:Input gate: Determine how much new information is added to the Cell state, and update the cell state C, the formula is as follows:

it=σg(Wixt+Uiht-1+bi)i tg (W i x t +U i h t-1 +b i )

Figure BDA0002271602260000021
Figure BDA0002271602260000021

其中,ct表示当前单元的单元状态,σg和σc表示sigmoid函数,

Figure BDA0002271602260000022
表示矩阵乘积,Wi表示输入的权重参数矩阵,Ui表示上一个单元输出的权重参数矩阵,bi表示偏差参数向量,ft是遗忘门的输出,ct-1是上一个单元的单元状态,表示矩阵乘积,Wc表示输入的权重参数矩阵,Uc表示上一个单元输出的权重参数矩阵, bc表示偏差参数向量。where c t represents the cell state of the current cell, σ g and σ c represent the sigmoid function,
Figure BDA0002271602260000022
represents the matrix product, Wi represents the input weight parameter matrix, U i represents the weight parameter matrix output by the previous unit, bi represents the bias parameter vector, f t is the output of the forget gate, and c t-1 is the unit of the previous unit state, represents the matrix product, W c represents the input weight parameter matrix, U c represents the weight parameter matrix output by the previous unit, and b c represents the bias parameter vector.

输出门:基于当前的单元状态输出结果。Output gates: output results based on the current cell state.

ot=σg(Woxt+Uoht-1+bo)o tg (W o x t +U o h t-1 +b o )

Figure BDA0002271602260000023
Figure BDA0002271602260000023

其中,ht表示当前单元的输出,σg和σh表示sigmoid函数,

Figure BDA0002271602260000024
表示矩阵乘积, Wo表示输入的权重参数矩阵,Uo表示上一个单元输出的权重参数矩阵,bo表示偏差参数向量。where h t represents the output of the current unit, σ g and σ h represent the sigmoid function,
Figure BDA0002271602260000024
represents the matrix product, W o represents the input weight parameter matrix, U o represents the weight parameter matrix output by the previous unit, and b o represents the bias parameter vector.

发明内容SUMMARY OF THE INVENTION

本发明提出了一种基于双向循环神经网络的交通缺失数据补全方法,是基于时序性、周期性以及空间性的深度学习补全方法,目的在于提高道路车流量数据的补全精度。The invention proposes a traffic missing data completion method based on a bidirectional cyclic neural network, which is a deep learning completion method based on time series, periodicity and space, and aims to improve the completion accuracy of road traffic flow data.

本发明的技术方案:Technical scheme of the present invention:

一种基于双向循环神经网络的交通缺失数据补全方法,步骤如下:A traffic missing data completion method based on bidirectional recurrent neural network, the steps are as follows:

第一步,将车流量数据进行预处理The first step is to preprocess the traffic flow data

所述的预处理包括时间粒度划分和对数据进行标准化;The preprocessing includes time granularity division and data standardization;

第二步、将预处理后的数据进行随机数据点丢失处理,构建带有缺失点的数据集,然后记录缺失点所在的位置信息,用作验证值,从而验证方法的补全效果。The second step is to perform random data point loss processing on the preprocessed data, construct a data set with missing points, and then record the location information of the missing points as a verification value to verify the completion effect of the method.

同时,构建时间维度影响衰减性矩阵。由于数据发生缺失会出现连续缺失的情况,比如,传感器的供电元件的损坏会导致之后一段时间的数据丢失,随着时间的积累,历史数据对缺失点的数据的影响会越来越小,会影响补全精度,所以需要记录时间维度数据影响的衰减性。时间维度影响衰减性矩阵定义如下:At the same time, the construction time dimension affects the decay matrix. Due to the lack of data, there will be continuous deletion. For example, the damage of the power supply element of the sensor will lead to the loss of data for a period of time. With the accumulation of time, the impact of historical data on the data of the missing points will become smaller and smaller. It affects the completion accuracy, so it is necessary to record the attenuation of the influence of time dimension data. The time dimension influence decay matrix is defined as follows:

Figure BDA0002271602260000031
Figure BDA0002271602260000031

其中,nt表示当前的时刻,的定义如下:Among them, n t represents the current moment, is defined as follows:

第三步、将丢失处理后的车流量数据划分为训练集、验证集和测试集。在每个数据集中,不同模型采用的数据有以下几种类型:The third step is to divide the lost traffic data into training set, validation set and test set. In each dataset, the data used by the different models are of the following types:

前向时间序列深度学习模块用的数据:

Figure BDA0002271602260000034
Data used by the forward time series deep learning module:
Figure BDA0002271602260000034

反向时间序列深度学习模块的数据:

Figure BDA0002271602260000035
Data for the reverse time series deep learning module:
Figure BDA0002271602260000035

外部特征模块中采用的外部特征数据:FnExternal feature data used in the external feature module: F n ;

周期性特征模块中采用的周期性序列数据:

Figure BDA0002271602260000036
Periodic sequence data used in periodic feature module:
Figure BDA0002271602260000036

其中,n表示当前时刻,t表示时间序列的步长,p表示周期序列的步长。S 表示的是车流量数据,T表示的是S在时间维度上的反向序列。si表示在n时刻的车流量数据,表示第n时刻的前i天的日内相同时刻的车流量数据,

Figure BDA0002271602260000042
表示包括第n时刻的前t个时刻的车流量数据的集合,
Figure BDA0002271602260000043
表示包括第n时刻当天的前p天日内相同时刻的车流量数据集合,Fn表示在第n时刻的外部特征,包括节假日、位置区域、天气和气温。Among them, n represents the current moment, t represents the step size of the time series, and p represents the step size of the periodic sequence. S represents the traffic flow data, and T represents the reverse sequence of S in the time dimension. s i represents the traffic flow data at time n, represents the traffic flow data at the same time in the day i days before the nth time,
Figure BDA0002271602260000042
represents the set of traffic flow data including the first t moments of the nth moment,
Figure BDA0002271602260000043
Represents the traffic flow data set at the same time in the p days before the nth time, and Fn represents the external features at the nth time, including holidays, location area, weather and temperature.

第四步、构建补全模型,补全模型包括前向时间序列深度学习模块、反向时间序列深度学习模块、周期性特征模块和外部特征模块,各个模块的结构及训练机制如下:The fourth step is to build a completion model. The completion model includes a forward time series deep learning module, a reverse time series deep learning module, a periodic feature module and an external feature module. The structure and training mechanism of each module are as follows:

(1)前向时间序列深度学习模块:是一个线性回归网络和多层长短记忆网络组合LSTM模型,通过一层线性回归网络,添加当前缺失点在时间上的延续性信息,用来应对长时间序列缺失的情况,提升补全精度。(1) Forward time series deep learning module: It is a combination of a linear regression network and a multi-layer long-short-term memory network LSTM model. Through a layer of linear regression network, the continuity information of the current missing points in time is added to deal with long-term In the case of missing sequences, the completion accuracy is improved.

前向序列深度学习模块的实现细节:先将时间维度衰减性矩阵输入到线性回归网络,然后将线性回归网络的输出和前向时间序列数据

Figure BDA0002271602260000044
输入LSTM网络中,对当前时刻输入值xt,如果数据点没有缺失,则直接输入,当数据点缺失时,将上一个时刻的隐含状态作为当前时刻的输入,在处理完输入后,对深度学习网络进行训练,在不断的迭代更新中得到最终的前向序列深度学习模块的输出。The implementation details of the forward sequence deep learning module: first input the time dimension decay matrix into the linear regression network, and then combine the output of the linear regression network with the forward time series data
Figure BDA0002271602260000044
In the LSTM network, input the value x t at the current moment. If the data point is not missing, input it directly. When the data point is missing, use the hidden state of the previous moment as the input at the current moment. The deep learning network is trained, and the output of the final forward sequence deep learning module is obtained in the continuous iterative update.

(2)反向时间序列深度学习模块:在网络结构上与前向序列深度学习模块一致,不同的在于将前向时间序列深度学习模块的输入在时间维度上做一个反向处理,作为模块的输入。(2) Reverse time series deep learning module: The network structure is consistent with the forward series deep learning module, the difference is that the input of the forward time series deep learning module is reversely processed in the time dimension, as the module's input enter.

(3)周期性特征学习模块:是由三层全连接网络构成的模块,通过对周期性数据特征的提取,获取历史数据中、同一个传感器、同一个时间段车流量的变化规律,然后将提取到的特征输出。实现细节:将周期序列数据输入到全连接层中,经过三层全连接层,提取周期性数据的时序性特征,然后输出。(3) Periodic feature learning module: It is a module composed of a three-layer fully connected network. Through the extraction of periodic data features, the change rule of traffic flow in the historical data, the same sensor, and the same time period is obtained, and then the Extracted feature output. Implementation details: The periodic sequence data is input into the fully connected layer, and after three fully connected layers, the time series features of the periodic data are extracted, and then output.

(4)外部特征模块:由两部分组成:第一部分处理节假日、天气特征,是一层特征编码层。实现细节:将外部特征数据输入到特征编码层,把数据转化为向量形式,然后把得到的向量和上述三个模块的输出合并。(4) External feature module: It consists of two parts: the first part deals with holiday and weather features, and is a feature encoding layer. Implementation details: Input the external feature data into the feature encoding layer, convert the data into vector form, and then combine the obtained vector with the outputs of the above three modules.

第二部分处理空间性特征。为了将道路空间上的信息考虑进去,将路段上所有传感器同时输入第二部分中,然后将与当前传感器的缺失点相同时刻的其它传感器的隐含状态作为输入,通过Softmax网络计算权重之后,得到输出,将该输出输入到前向、反向时间序列深度学习模块中。The second part deals with spatial features. In order to take into account the information on the road space, all the sensors on the road segment are input into the second part at the same time, and then the hidden states of other sensors at the same time as the missing point of the current sensor are used as input, and after calculating the weight through the Softmax network, we get output, which is fed into the forward and reverse time series deep learning modules.

最后将上述四个模块的输出合并成一维向量,通过一层全连接网络,得到最终的补全结果。Finally, the outputs of the above four modules are combined into a one-dimensional vector, and the final completion result is obtained through a layer of fully connected network.

第五步、使用训练集数据对前向时间序列深度学习模块、反向时间序列深度学习模块的预训练部分进行预训练,提前优化时间序列深度学习模型的参数,避免在整体训练时将参数优化到局部最优点。Step 5: Use the training set data to pre-train the pre-training part of the forward time series deep learning module and the reverse time series deep learning module, optimize the parameters of the time series deep learning model in advance, and avoid optimizing the parameters during the overall training. to the local optimum.

第六步、使用训练集数据和验证集数据对步骤四建立的四个模块进行整体性训练:Step 6: Use the training set data and the validation set data to perform overall training on the four modules established in Step 4:

将预处理后的数据分别输入到相应的模块中,同时对所有模块进行整体性训练。计算每次训练后的补全值和车流量数据的真值的损失函数值,将模型的参数训练到目标值。根据模型在训练集、验证集上的效果,不断调试模型的超参数,在减小过拟合的条件下提高补全精度。The preprocessed data are input into the corresponding modules respectively, and the overall training is performed on all modules at the same time. Calculate the loss function value of the complement value after each training and the true value of the traffic flow data, and train the parameters of the model to the target value. According to the effect of the model on the training set and validation set, the hyperparameters of the model are continuously debugged, and the completion accuracy is improved under the condition of reducing overfitting.

所述的输入数据包括:前向时间序列数据

Figure BDA0002271602260000051
(前t1小时的车流量数据)、反向时间序列数据
Figure BDA0002271602260000052
(后t2小时的车流量数据)、周期序列数据
Figure BDA0002271602260000053
(前t3天同一时刻的车流量数据)、时间维度影响衰减性矩阵
Figure BDA0002271602260000054
外部特征数据Fn(第n时刻的节假日、区域、天气和气温外部特征数据)和车流量数据的真值
Figure BDA0002271602260000055
(当前时刻的车流量数据)。The input data includes: forward time series data
Figure BDA0002271602260000051
(traffic data for the first t 1 hour), reverse time series data
Figure BDA0002271602260000052
(traffic flow data in the last t 2 hours), periodic series data
Figure BDA0002271602260000053
(traffic flow data at the same time in the first t 3 days), time dimension influence attenuation matrix
Figure BDA0002271602260000054
The true value of external feature data Fn (holiday, area, weather and temperature external feature data at the nth time) and traffic flow data
Figure BDA0002271602260000055
(traffic data at the current moment).

经过一次迭代后,得到的是经过一次补全操作之后的车流量数据。将这次迭代后的数据作为下一次迭代的输入,之前缺失点虽然有了补全值,但是由于标签还是表示缺失,后续迭代过程中,目标还是对这些缺失点进行数据补全,但是由于已经存在相对接近真值的数据,提供了先验知识,可以提升模型收敛的速度以及补全精度。After one iteration, the traffic flow data after one completion operation is obtained. The data after this iteration is used as the input of the next iteration. Although the missing points have completed values before, but because the labels still indicate missing, in the subsequent iteration process, the goal is to complete the data for these missing points, but due to the The existence of data that is relatively close to the true value provides prior knowledge, which can improve the speed of model convergence and completion accuracy.

第七步、使用测试集利用第六步训练好的模型进行车流量数据补全。The seventh step is to use the test set to use the model trained in the sixth step to complete the traffic flow data.

输入数据为:前向时间序列数据

Figure BDA0002271602260000061
反向时间序列数据周期序列数据时间维度影响衰减性矩阵
Figure BDA0002271602260000064
外部特征数据
Figure BDA0002271602260000065
和车流量数据的真值
Figure BDA0002271602260000066
The input data is: forward time series data
Figure BDA0002271602260000061
reverse time series data Periodic sequence data Time Dimension Influence Decay Matrix
Figure BDA0002271602260000064
External feature data
Figure BDA0002271602260000065
and the ground truth of the traffic flow data
Figure BDA0002271602260000066

通过第六步的模型得到缺失的车流量数据的补全值,和第二步进行丢失处理后得到的验证值进行对比,验证模型的补全效果。The complement value of the missing traffic flow data is obtained through the model in the sixth step, and is compared with the verification value obtained after the loss processing in the second step to verify the complement effect of the model.

所述第一步中,预处理的具体过程为:In the first step, the specific process of preprocessing is:

(1)时间粒度划分:将所有车流量数据按k分钟的时间粒度处理为每k分钟的车流量数据;(1) Time granularity division: All traffic flow data are processed into traffic flow data per k minutes according to the time granularity of k minutes;

(2)对数据进行标准化:采用最小值和最大值对车流量数据进行标准化,公式如下:(2) Standardize the data: Use the minimum and maximum values to standardize the traffic flow data, the formula is as follows:

Figure BDA0002271602260000067
Figure BDA0002271602260000067

其中,x表示原始值,xmin表示原始值的最小值,xmax表示原始值的最大值, max为归一化的上限值,min为归一化的下限值,[min,max]表示归一化后的区间,x*为标准化后的结果。Among them, x represents the original value, x min represents the minimum value of the original value, x max represents the maximum value of the original value, max is the upper limit of normalization, min is the lower limit of normalization, [min,max] Indicates the normalized interval, and x * is the normalized result.

所述第四步中,考虑道路空间信息部分(Softmax处理):设所有传感器在当前时刻的隐含状态h=<h1,h2,h3,…,hi,…,ht>,hi是第i个传感器在当前时刻的隐含状态,然后对每一个hi计算权重,得到当前传感器的新的隐含状态h′iIn the fourth step, consider the road space information part (Softmax processing): set the hidden states of all sensors at the current moment h=<h 1 , h 2 , h 3 ,..., hi ,..., h t >, h i is the hidden state of the i-th sensor at the current moment, and then the weight is calculated for each h i to obtain the new hidden state h′ i of the current sensor.

Figure BDA0002271602260000071
Figure BDA0002271602260000071

使用Softmax处理后,所有的权重和为1。其中,l表示传感器数目,hij表示第j个传感器i时刻的隐含状态。After processing with Softmax, all weights sum to 1. Among them, l represents the number of sensors, and h ij represents the implicit state of the jth sensor i at the moment.

所述第六步中,计算每次迭代所得到的补全得到后的数据和车流量数据真值的均方误差MAE,使用Adam方法最小化MAE。In the sixth step, the mean square error MAE between the complemented data obtained in each iteration and the true value of the traffic flow data is calculated, and the Adam method is used to minimize the MAE.

Figure BDA0002271602260000072
Figure BDA0002271602260000072

其中,x′i表示第i时刻的传感器真实值,xi表示第i时刻的传感器补全值。Among them, x′ i represents the real value of the sensor at the ith moment, and xi represents the sensor complement value at the ith moment.

本发明的有益效果:本发明与已有方法的区别在于,首先是对数据时序性特点使用上的改进,以往的方法在利用数据时序性特点时,往往考虑的是历史数据对当前时间点数据的影响,但是在车流量数据的补全应用上,后续时间点的信息对当前时间点的数据有影响,本发明同时考虑前向时间序列和反向时间序列,大幅提高了补全精度。其次考虑到外部特征节假日、传感器相邻区域对车流量数据的影响,将其加入到补全模型中,大幅提高了补全精度和对特殊值的补全。最后还考虑了数据缺失在时间维度上影响的衰减性,提高了补全精度。本发明的方法不仅大幅提高低缺失率车流量数据的补全精度,而且能够在数据缺失率较高的情况下达到很好的补全效果。Beneficial effects of the present invention: The difference between the present invention and the existing method is that, first of all, it is an improvement in the use of the time-series characteristics of data. When using the time-series characteristics of data, the previous method often considers the difference between historical data and current time point data. However, in the application of the completion of traffic flow data, the information of the subsequent time point has an impact on the data of the current time point. The present invention simultaneously considers the forward time series and the reverse time series, which greatly improves the completion accuracy. Secondly, considering the influence of external characteristics, holidays and sensor adjacent areas on traffic flow data, it is added to the completion model, which greatly improves the completion accuracy and the completion of special values. Finally, the attenuation of the impact of missing data in the time dimension is also considered, which improves the completion accuracy. The method of the invention not only greatly improves the completion accuracy of the low missing rate traffic flow data, but also can achieve a good completion effect under the condition of a high data missing rate.

附图说明Description of drawings

图1是本发明涉及的补全模型结构图。FIG. 1 is a structural diagram of a complementation model according to the present invention.

图2是数据缺失率为20%的低缺失率补全结果与真实值的对比图。Figure 2 is a comparison diagram of the low-missing rate completion result with a data missing rate of 20% and the true value.

图3是数据缺失率为50%的高缺失率补全结果与真实值的对比图。Figure 3 is a comparison diagram of the high missing rate completion result with a data missing rate of 50% and the true value.

具体实施方法Specific implementation method

下面将结合具体实施例和附图对本发明的技术方案进行进一步的说明。The technical solutions of the present invention will be further described below with reference to specific embodiments and accompanying drawings.

一种基于双向循环神经网络的交通缺失数据补全方法,步骤如下:A traffic missing data completion method based on bidirectional recurrent neural network, the steps are as follows:

第一步,将车流量数据预处理The first step is to preprocess the traffic flow data

(1)时间粒度划分:将所有车流量数据按5分钟的时间粒度处理为每5分钟的车流量数据;(1) Time granularity division: All traffic flow data are processed into traffic flow data every 5 minutes according to the time granularity of 5 minutes;

(2)对数据进行标准化:采用最小值最大值对车流量数据进行标准化,公式如下:(2) Standardize the data: Use the minimum and maximum values to standardize the traffic flow data, the formula is as follows:

Figure BDA0002271602260000081
Figure BDA0002271602260000081

其中,x表示原始值,xmin表示原始值的最小值,xmax表示原始值的最大值, max为归一化的上限值,min为归一化的下限值,[min,max]表示归一化后的区间,x*为标准化后的结果。Among them, x represents the original value, x min represents the minimum value of the original value, x max represents the maximum value of the original value, max is the upper limit of normalization, min is the lower limit of normalization, [min,max] Indicates the normalized interval, and x * is the normalized result.

第二步,将预处理后的数据进行随机数据点丢失,采用随机数的方法,将一定比例(根据实验要求自行设置)的数据打上缺失的标签,用来作为缺失点,然后记录这些点的值,作为真值,用来验证模型最终的补全效果。The second step is to lose random data points in the preprocessed data. Using the method of random numbers, a certain proportion of the data (set by yourself according to the experimental requirements) is labeled as missing points, which are used as missing points, and then record the data of these points. The value, as the true value, is used to verify the final completion effect of the model.

同时,建立时间维度影响衰减性矩阵。由于数据的缺失会出现连续缺失的情况,比如,一次停电可能会导致传感器在几个小时之内采集不到数据,随着时间的积累,历史数据对缺失点的数据的影响会越来越小,会影响补全精度,所以需要记录时间维度数据影响的衰减性。时间维度影响衰减性矩阵定义如下:At the same time, the time dimension influence decay matrix is established. Due to the lack of data, there will be continuous loss. For example, a power outage may cause the sensor to fail to collect data within a few hours. With the accumulation of time, the impact of historical data on the data of missing points will become smaller and smaller. , which will affect the completion accuracy, so it is necessary to record the attenuation of the influence of time dimension data. The time dimension influence decay matrix is defined as follows:

Figure BDA0002271602260000082
Figure BDA0002271602260000082

其中,nt表示当前的时刻,

Figure BDA0002271602260000083
的定义如下:Among them, n t represents the current moment,
Figure BDA0002271602260000083
is defined as follows:

Figure BDA0002271602260000084
Figure BDA0002271602260000084

第三步、将预处理后的车流量数据划分为训练集、验证集和测试集,按照8:1:1的比例进行划分。在每个数据集中,不同模型采用的数据有以下几种类型:The third step is to divide the preprocessed traffic flow data into training set, validation set and test set, and divide them according to the ratio of 8:1:1. In each dataset, the data used by the different models are of the following types:

前向时间序列深度学习模块用的数据:

Figure BDA0002271602260000091
Data used by the forward time series deep learning module:
Figure BDA0002271602260000091

反向时间序列深度学习模块的数据:

Figure BDA0002271602260000092
Data for the reverse time series deep learning module:
Figure BDA0002271602260000092

外部特征模型中采用的外部特征数据:FnExternal feature data used in the external feature model: F n ;

周期性特征模块中采用的周期性序列数据: Periodic sequence data used in periodic feature module:

其中,n表示当前时刻,t表示时间序列的步长,p表示周期序列的步长。S 表示的是车流量数据,T表示的是S在时间维度上的反向序列。si表示在n时刻的车流量数据,

Figure BDA0002271602260000094
表示第n时刻的前i天的日内相同时刻的车流量数据,表示包括第n时刻的前t个时刻的车流量数据的集合,
Figure BDA0002271602260000096
表示包括第n时刻当天的前p天日内相同时刻的车流量数据集合,Fn表示在第n时刻的外部特征,包括节假日、位置区域、天气和气温。Among them, n represents the current moment, t represents the step size of the time series, and p represents the step size of the periodic sequence. S represents the traffic flow data, and T represents the reverse sequence of S in the time dimension. s i represents the traffic flow data at time n,
Figure BDA0002271602260000094
represents the traffic flow data at the same time in the day i days before the nth time, represents the set of traffic flow data including the first t moments of the nth moment,
Figure BDA0002271602260000096
Represents the traffic flow data set at the same time in the p days before the nth time, and Fn represents the external features at the nth time, including holidays, location area, weather and temperature.

第四步、构建补全模型,补全模型包括前向序列深度学习模块、反向时间序列深度学习模块、周期性特征模块和外部特征模块,各个模块的结构及训练机制如下:The fourth step is to build a completion model. The completion model includes a forward sequence deep learning module, a reverse time series deep learning module, a periodic feature module and an external feature module. The structure and training mechanism of each module are as follows:

(1)前向序列深度学习模块:是一个线性回归网络和多层长短记忆网络组合LSTM模型,通过一层线性回归网络,添加当前缺失点在时间上的延续性信息,用来应对长时间序列缺失的情况,提升补全精度。(1) Forward sequence deep learning module: It is a combination of a linear regression network and a multi-layer long-short-term memory network LSTM model. Through a layer of linear regression network, the temporal continuity information of the current missing point is added to deal with long-term sequences. In the case of missing, improve the completion accuracy.

前向序列深度学习模块的实现细节:先将时间维度衰减性矩阵输入到线性回归网络,然后将线性回归网络的输出和前向时间序列数据

Figure BDA0002271602260000097
输入LSTM网络中,对当前时刻输入值xt,如果数据点没有缺失,则直接输入,当数据点缺失时,将上一个时刻的隐含状态作为当前时刻的输入,在处理完输入后,对深度学习网络进行训练,在不断的迭代更新中得到最终的前向序列深度学习模块的输出。The implementation details of the forward sequence deep learning module: first input the time dimension decay matrix into the linear regression network, and then combine the output of the linear regression network with the forward time series data
Figure BDA0002271602260000097
In the LSTM network, input the value x t at the current moment. If the data point is not missing, input it directly. When the data point is missing, use the hidden state of the previous moment as the input at the current moment. The deep learning network is trained, and the output of the final forward sequence deep learning module is obtained in the continuous iterative update.

(2)反向序列深度学习模块:在网络结构上与前向序列深度学习模块一致,不同的在于将前向序列深度学习模块的输入在时间维度上做一个反向处理,作为模块的输入。(2) Reverse sequence deep learning module: The network structure is consistent with the forward sequence deep learning module, the difference is that the input of the forward sequence deep learning module is reversely processed in the time dimension as the input of the module.

(3)周期性特征模块:是由三层全连接网络构成的模块,通过对周期性数据特征的提取,获取历史数据中,同一个传感器,同一个时间段车流量的变化规律,然后将提取到的特征输出。实现细节:将周期序列数据输入到全连接层中,经过三层全连接层,提取周期性数据的时序性特征,然后输出。(3) Periodic feature module: It is a module composed of a three-layer fully connected network. Through the extraction of periodic data features, it can obtain the change law of the traffic flow of the same sensor and the same time period in the historical data, and then extract the to the feature output. Implementation details: The periodic sequence data is input into the fully connected layer, and after three fully connected layers, the time series features of the periodic data are extracted, and then output.

(4)外部特征模块:是一层特征编码层;实现细节:将外部特征数据输入到特征编码层,将文字化描述的天气,节假日等外部特征,通过划分等级的方式:比如根据是否是节假日,将是节假日的用1来表示,不是节假日的用0来表示,将周期序列数据转化为向量形式,然后把得到的向量输出到下一步。(4) External feature module: it is a layer of feature coding layer; implementation details: input external feature data into the feature coding layer, and describe the weather, holidays and other external features in text, through the method of grading: for example, according to whether it is a holiday or not , which will be represented by 1 for holidays and 0 for non-holidays, convert the periodic sequence data into vector form, and then output the resulting vector to the next step.

为了将道路空间上的信息考虑进去,还加入了空间性特征学习模块,将路段上所有传感器同时输入模型中,然后将与当前传感器的缺失点相同时刻的其它传感器的隐含状态作为输入,通过Softmax网络计算权重之后,得到输出,将输入到前向序列模块和反向序列模块中。In order to take into account the information on the road space, a spatial feature learning module is also added to input all sensors on the road segment into the model at the same time, and then use the hidden state of other sensors at the same time as the missing point of the current sensor as input, through After the Softmax network calculates the weights, the output is obtained, which will be input to the forward sequence module and the reverse sequence module.

最后,将各个模块的输出合并成一维向量,然后通过一层全连接网络,得到最终的补全结果。Finally, the outputs of each module are combined into a one-dimensional vector, and then the final completion result is obtained through a layer of fully connected network.

第五步、使用训练集数据对时间序列深度学习模型的预训练部分进行预训练,提前优化时间序列深度学习模型的参数,避免在整体训练时将参数优化到局部最优点。The fifth step is to use the training set data to pre-train the pre-training part of the time series deep learning model, optimize the parameters of the time series deep learning model in advance, and avoid optimizing the parameters to the local optimum during the overall training.

第六步、使用训练集数据和验证集数据对步骤四建立的四个模块进行整体性训练(对于数据有缺失的点用补全值替换,数据没有缺失就保持原始数据不变):Step 6: Use the training set data and the validation set data to perform overall training on the four modules established in Step 4 (replace the points with missing data with complementary values, and keep the original data unchanged if there is no missing data):

将预处理后的数据分别输入到相应的模块中,同时对所有模块进行整体性训练。计算每次训练后的补全值和车流量数据的真值的损失函数值,将模型的参数训练到目标值。根据模型在训练集、验证集上的效果,不断调试模型的超参数,在减小过拟合的条件下提高补全精度。训练过程中,计算每次迭代所得到的补全得到后的数据和车流量数据真值的MAE(均方误差),使用Adam方法最小化MAE。The preprocessed data are input into the corresponding modules respectively, and the overall training is performed on all modules at the same time. Calculate the loss function value of the complement value after each training and the true value of the traffic flow data, and train the parameters of the model to the target value. According to the effect of the model on the training set and validation set, the hyperparameters of the model are continuously debugged, and the completion accuracy is improved under the condition of reducing overfitting. During the training process, the MAE (mean square error) of the complemented data obtained in each iteration and the true value of the traffic flow data is calculated, and the Adam method is used to minimize the MAE.

Figure BDA0002271602260000111
Figure BDA0002271602260000111

其中,x′i表示第i时刻的传感器真实值,xi表示第i时刻的传感器补全值。Among them, x′ i represents the real value of the sensor at the ith moment, and xi represents the sensor complement value at the ith moment.

所述的输入数据包括:前向时间序列数据

Figure BDA0002271602260000112
(前t1小时的车流量数据)、反向时间序列数据
Figure BDA0002271602260000113
(后t2小时的车流量数据)、时间维度影响衰减性矩阵
Figure BDA0002271602260000114
周期序列数据
Figure BDA0002271602260000115
(前t3天同一时刻的车流量数据)、外部特征数据Fn(第n时刻的节假日、区域、天气和气温外部特征数据)和车流量数据的真值
Figure BDA0002271602260000116
(当前时刻的车流量数据)。The input data includes: forward time series data
Figure BDA0002271602260000112
(traffic data for the first t 1 hour), reverse time series data
Figure BDA0002271602260000113
(traffic flow data in the last t 2 hours), time dimension influence attenuation matrix
Figure BDA0002271602260000114
Periodic sequence data
Figure BDA0002271602260000115
(traffic flow data at the same time in the previous t 3 days), external feature data Fn (holidays, area, weather and temperature external feature data at the nth time) and the true value of traffic flow data
Figure BDA0002271602260000116
(traffic flow data at the current moment).

第七步、使用测试集利用第六步训练好的模型进行车流量数据补全。The seventh step is to use the test set to use the model trained in the sixth step to complete the traffic flow data.

输入数据为:前向时间序列数据

Figure BDA0002271602260000117
反向时间序列数据
Figure BDA0002271602260000118
周期序列数据
Figure BDA0002271602260000119
外部特征数据
Figure BDA00022716022600001110
和车流量数据的真值
Figure BDA00022716022600001111
时间维度影响衰减性矩阵
Figure BDA00022716022600001112
The input data is: forward time series data
Figure BDA0002271602260000117
reverse time series data
Figure BDA0002271602260000118
Periodic sequence data
Figure BDA0002271602260000119
External feature data
Figure BDA00022716022600001110
and the ground truth of the traffic flow data
Figure BDA00022716022600001111
Time Dimension Influence Decay Matrix
Figure BDA00022716022600001112

图2是数据缺失率为20%的补全结果与真实值的对比图,模型补全结果与车流量真实值的均方误差MAE是29.18。(图中选取前100个缺失点)Figure 2 is a comparison chart of the completion result with the data missing rate of 20% and the real value. The mean square error MAE between the model completion result and the real value of traffic flow is 29.18. (Select the first 100 missing points in the figure)

图3是数据缺失率为50%的补全结果与真实值的对比图,模型补全结果与车流量真实值的均方误差MAE是31.94。(图中选取前100个缺失点)。Figure 3 is a comparison diagram of the completion result with the data missing rate of 50% and the real value. The mean square error MAE between the model completion result and the real value of traffic flow is 31.94. (The first 100 missing points are selected in the figure).

Claims (5)

1.一种基于双向循环神经网络的交通缺失数据补全方法,其特征在于,步骤如下:1. a traffic missing data completion method based on bidirectional recurrent neural network, is characterized in that, step is as follows: 第一步,将车流量数据进行预处理The first step is to preprocess the traffic flow data 所述的预处理包括时间粒度划分和对数据进行标准化;The preprocessing includes time granularity division and data standardization; 第二步、将预处理后的数据进行随机数据点丢失处理,构建带有缺失点的数据集,然后记录缺失点所在的位置信息,用作验证值;同时,构建时间维度影响衰减性矩阵:The second step is to perform random data point loss processing on the preprocessed data, construct a data set with missing points, and then record the location information of the missing points as the verification value; at the same time, construct the time dimension influence attenuation matrix:
Figure FDA0002271602250000011
Figure FDA0002271602250000011
其中,nt表示当前的时刻,
Figure FDA0002271602250000012
的定义如下:
Among them, n t represents the current moment,
Figure FDA0002271602250000012
is defined as follows:
Figure FDA0002271602250000013
Figure FDA0002271602250000013
第三步、将丢失处理后的车流量数据划分为训练集、验证集和测试集;在每个数据集中,不同模型采用的数据有以下几种类型:The third step is to divide the traffic flow data after loss processing into training set, validation set and test set; in each data set, the data used by different models have the following types: 前向时间序列深度学习模块用的数据:
Figure FDA0002271602250000014
Data used by the forward time series deep learning module:
Figure FDA0002271602250000014
反向时间序列深度学习模块的数据:
Figure FDA0002271602250000015
Data for the reverse time series deep learning module:
Figure FDA0002271602250000015
外部特征模块中采用的外部特征数据:FnExternal feature data used in the external feature module: F n ; 周期性特征模块中采用的周期性序列数据:
Figure FDA0002271602250000016
Periodic sequence data used in periodic feature module:
Figure FDA0002271602250000016
其中,n表示当前时刻,t表示时间序列的步长,p表示周期序列的步长;S表示的是车流量数据,T表示的是S在时间维度上的反向序列;si表示在n时刻的车流量数据,
Figure FDA0002271602250000017
表示第n时刻的前i天的日内相同时刻的车流量数据,
Figure FDA0002271602250000018
表示包括第n时刻的前t个时刻的车流量数据的集合,
Figure FDA0002271602250000019
表示包括第n时刻当天的前p天日内相同时刻的车流量数据集合,Fn表示在第n时刻的外部特征,包括节假日、位置区域、天气和气温;
Among them, n represents the current moment, t represents the step size of the time series, p represents the step size of the periodic sequence; S represents the traffic flow data, T represents the reverse sequence of S in the time dimension; time traffic data,
Figure FDA0002271602250000017
represents the traffic flow data at the same time in the day i days before the nth time,
Figure FDA0002271602250000018
represents the set of traffic flow data including the first t moments of the nth moment,
Figure FDA0002271602250000019
Represents the traffic flow data set at the same time in the p days before the nth time, and Fn represents the external characteristics at the nth time, including holidays, location area, weather and temperature;
第四步、构建补全模型,补全模型包括前向时间序列深度学习模块、反向时间序列深度学习模块、周期性特征模块和外部特征模块,各个模块的结构及训练机制如下:The fourth step is to build a completion model. The completion model includes a forward time series deep learning module, a reverse time series deep learning module, a periodic feature module and an external feature module. The structure and training mechanism of each module are as follows: (1)前向时间序列深度学习模块:是一个线性回归网络和多层长短记忆网络组合LSTM模型,通过一层线性回归网络,添加当前缺失点在时间上的延续性信息,用来应对长时间序列缺失的情况,提升补全精度;(1) Forward time series deep learning module: It is a combination of a linear regression network and a multi-layer long-short-term memory network LSTM model. Through a layer of linear regression network, the continuity information of the current missing points in time is added to deal with long-term In the case of missing sequences, the completion accuracy is improved; 前向序列深度学习模块的实现细节:先将时间维度衰减性矩阵输入到线性回归网络,然后将线性回归网络的输出和前向时间序列数据
Figure FDA0002271602250000021
输入LSTM网络中,对当前时刻输入值xt,如果数据点没有缺失,则直接输入,当数据点缺失时,将上一个时刻的隐含状态作为当前时刻的输入,在处理完输入后,对深度学习网络进行训练,在不断的迭代更新中得到最终的前向序列深度学习模块的输出;
The implementation details of the forward sequence deep learning module: first input the time dimension decay matrix into the linear regression network, and then combine the output of the linear regression network with the forward time series data
Figure FDA0002271602250000021
In the LSTM network, input the value x t at the current moment. If the data point is not missing, input it directly. When the data point is missing, use the hidden state of the previous moment as the input at the current moment. The deep learning network is trained, and the output of the final forward sequence deep learning module is obtained in the continuous iterative update;
(2)反向时间序列深度学习模块:在网络结构上与前向序列深度学习模块一致,不同的在于将前向时间序列深度学习模块的输入在时间维度上做一个反向处理,作为模块的输入;(2) Reverse time series deep learning module: The network structure is consistent with the forward series deep learning module, the difference is that the input of the forward time series deep learning module is reversely processed in the time dimension, as the module's input enter; (3)周期性特征学习模块:是由三层全连接网络构成的模块,通过对周期性数据特征的提取,获取历史数据中、同一个传感器、同一个时间段车流量的变化规律,然后将提取到的特征输出;实现细节:将周期序列数据输入到全连接层中,经过三层全连接层,提取周期性数据的时序性特征,然后输出;(3) Periodic feature learning module: It is a module composed of a three-layer fully connected network. Through the extraction of periodic data features, the change rule of traffic flow in the historical data, the same sensor, and the same time period is obtained, and then the Extracted feature output; implementation details: input the periodic sequence data into the fully connected layer, through three fully connected layers, extract the time series features of the periodic data, and then output; (4)外部特征模块:该模块由两部分组成:第一部分处理节假日、天气特征,是一层特征编码层;实现细节:将外部特征数据输入到特征编码层,把数据转化为向量形式,然后把得到的向量和上述三个模块的输出合并;(4) External feature module: This module consists of two parts: the first part deals with holiday and weather features, and is a feature coding layer; implementation details: input external feature data into the feature coding layer, convert the data into vector form, and then Combine the obtained vector with the output of the above three modules; 第二部分处理空间性特征,将路段上所有传感器同时输入第二部分中,然后将与当前传感器的缺失点相同时刻的其它传感器的隐含状态作为输入,通过Softmax网络计算权重之后,得到输出,将该输出输入到前向、反向时间序列深度学习模块中;The second part deals with spatial features. All sensors on the road section are input into the second part at the same time, and then the hidden state of other sensors at the same time as the missing point of the current sensor is used as input. After calculating the weight through the Softmax network, the output is obtained, Input this output into the forward and reverse time series deep learning modules; 最后将上述四个模块的输出合并成一维向量,通过一层全连接网络,得到最终的补全结果;Finally, the outputs of the above four modules are combined into a one-dimensional vector, and the final completion result is obtained through a layer of fully connected network; 第五步、使用训练集数据对前向时间序列深度学习模块、反向时间序列深度学习模块的预训练部分进行预训练,提前优化时间序列深度学习模型的参数,避免在整体训练时将参数优化到局部最优点;Step 5: Use the training set data to pre-train the pre-training part of the forward time series deep learning module and the reverse time series deep learning module, optimize the parameters of the time series deep learning model in advance, and avoid optimizing the parameters during the overall training. to the local optimum; 第六步、使用训练集数据和验证集数据对步骤四建立的四个模块进行整体性训练:Step 6: Use the training set data and the validation set data to perform overall training on the four modules established in Step 4: 将预处理后的数据分别输入到相应的模块中,同时对所有模块进行整体性训练;计算每次训练后的补全值和车流量数据的真值的损失函数值,将模型的参数训练到目标值;根据模型在训练集、验证集上的效果,不断调试模型的超参数,在减小过拟合的条件下提高补全精度;Input the preprocessed data into the corresponding modules respectively, and conduct overall training for all modules at the same time; Target value; according to the effect of the model on the training set and validation set, continuously debug the hyperparameters of the model, and improve the completion accuracy under the condition of reducing overfitting; 所述的输入数据包括:The input data includes: 前向时间序列数据:前t1小时的车流量数据
Figure FDA0002271602250000031
Forward time series data: traffic flow data for the first t 1 hour
Figure FDA0002271602250000031
反向时间序列数据:后t2小时的车流量数据
Figure FDA0002271602250000032
Reverse time series data: traffic flow data for the last t 2 hours
Figure FDA0002271602250000032
周期序列数据:前t3天同一时刻的车流量数据 Periodic series data: traffic flow data at the same time in the first t 3 days 时间维度影响衰减性矩阵:
Figure FDA0002271602250000034
The time dimension affects the decay matrix:
Figure FDA0002271602250000034
外部特征数据:第n时刻的节假日、区域、天气和气温外部特征数据FnExternal feature data: holiday, area, weather and temperature external feature data F n at the nth time; 车流量数据的真值:当前时刻的车流量数据
Figure FDA0002271602250000035
The true value of the traffic flow data: the traffic flow data at the current moment
Figure FDA0002271602250000035
经过一次迭代后,得到的是经过一次补全操作之后的车流量数据;将这次迭代后的数据作为下一次迭代的输入,之前缺失点虽然有了补全值,但是由于标签还是表示缺失,后续迭代过程中,目标还是对这些缺失点进行数据补全;After one iteration, the traffic flow data obtained after one completion operation is obtained; the data after this iteration is used as the input of the next iteration. Although the missing points have completed values before, but because the labels still indicate missing, In the subsequent iteration process, the goal is to complete data for these missing points; 第七步、使用测试集利用第六步训练好的模型进行车流量数据补全;The seventh step is to use the test set to complete the traffic flow data with the model trained in the sixth step; 输入数据为:前向时间序列数据反向时间序列数据
Figure FDA0002271602250000042
周期序列数据
Figure FDA0002271602250000043
时间维度影响衰减性矩阵
Figure FDA0002271602250000044
外部特征数据
Figure FDA0002271602250000045
和车流量数据的真值
Figure FDA0002271602250000046
The input data is: forward time series data reverse time series data
Figure FDA0002271602250000042
Periodic sequence data
Figure FDA0002271602250000043
Time Dimension Influence Decay Matrix
Figure FDA0002271602250000044
External feature data
Figure FDA0002271602250000045
and the ground truth of the traffic flow data
Figure FDA0002271602250000046
通过第六步的模型得到缺失的车流量数据的补全值,和第二步进行丢失处理后得到的验证值进行对比,验证模型的补全效果。The complement value of the missing traffic flow data is obtained through the model in the sixth step, and is compared with the verification value obtained after the loss processing in the second step to verify the complement effect of the model.
2.根据权利要求1所述的一种基于双向循环神经网络的交通缺失数据补全方法,其特征在于,所述第一步中,预处理的具体过程为:2. a kind of traffic missing data completion method based on bidirectional cyclic neural network according to claim 1, is characterized in that, in the described first step, the concrete process of preprocessing is: (1)时间粒度划分:将所有车流量数据按k分钟的时间粒度处理为每k分钟的车流量数据;(1) Time granularity division: All traffic flow data are processed into traffic flow data per k minutes according to the time granularity of k minutes; (2)对数据进行标准化:采用最小值和最大值对车流量数据进行标准化,公式如下:(2) Standardize the data: Use the minimum and maximum values to standardize the traffic flow data, the formula is as follows:
Figure FDA0002271602250000047
Figure FDA0002271602250000047
其中,x表示原始值,xmin表示原始值的最小值,xmax表示原始值的最大值,max为归一化的上限值,min为归一化的下限值,[min,max]表示归一化后的区间,x*为标准化后的结果。Among them, x represents the original value, x min represents the minimum value of the original value, x max represents the maximum value of the original value, max is the upper limit of normalization, min is the lower limit of normalization, [min,max] Indicates the normalized interval, and x * is the normalized result.
3.根据权利要求1或2所述的一种基于双向循环神经网络的交通缺失数据补全方法,其特征在于,所述第四步中,处理空间性特征的具体过程:设所有传感器在当前时刻的隐含状态h=<h1,h2,h3,…,hi,…,ht>,hi是第i个传感器在当前时刻的隐含状态,然后对每一个hi计算权重,得到当前传感器的新的隐含状态h′i3. a kind of traffic missing data completion method based on bidirectional recurrent neural network according to claim 1 or 2, it is characterized in that, in described 4th step, the concrete process of processing spatial characteristic: set all sensors in the current The implicit state at the moment h=<h 1 , h 2 , h 3 , ..., hi , ..., h t >, hi is the hidden state of the ith sensor at the current moment, and then calculates for each hi weight, get the new hidden state h′ i of the current sensor;
Figure FDA0002271602250000048
Figure FDA0002271602250000048
其中,l表示传感器数目,hij表示第j个传感器i时刻的隐含状态。Among them, l represents the number of sensors, and h ij represents the implicit state of the jth sensor i at the moment.
4.根据权利要求1或2所述的一种基于双向循环神经网络的交通缺失数据补全方法,其特征在于,所述第六步中,计算每次迭代所得到的补全得到后的数据和车流量数据真值的均方误差MAE,使用Adam方法最小化MAE;4. a kind of traffic missing data completion method based on bidirectional cyclic neural network according to claim 1 and 2, is characterized in that, in the described 6th step, calculate the data after the completion obtained by each iteration and the mean square error MAE of the true value of the traffic flow data, using the Adam method to minimize the MAE; 其中,x′i表示第i时刻的传感器真实值,xi表示第i时刻的传感器补全值。Among them, x′ i represents the real value of the sensor at the ith moment, and xi represents the sensor complement value at the ith moment. 5.根据权利要求3所述的一种基于双向循环神经网络的交通缺失数据补全方法,其特征在于,所述第六步中,计算每次迭代所得到的补全得到后的数据和车流量数据真值的均方误差MAE,使用Adam方法最小化MAE;5. a kind of traffic missing data completion method based on bidirectional cyclic neural network according to claim 3, is characterized in that, in described 6th step, calculate the data and vehicle after the completion obtained by each iteration The mean square error MAE of the true value of the flow data, using the Adam method to minimize the MAE;
Figure FDA0002271602250000052
Figure FDA0002271602250000052
其中,x′i表示第i时刻的传感器真实值,xi表示第i时刻的传感器补全值。Among them, x′ i represents the real value of the sensor at the ith moment, and xi represents the sensor complement value at the ith moment.
CN201911106967.7A 2019-11-13 2019-11-13 Traffic missing data completion method based on bidirectional cyclic neural network Withdrawn CN110837888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106967.7A CN110837888A (en) 2019-11-13 2019-11-13 Traffic missing data completion method based on bidirectional cyclic neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106967.7A CN110837888A (en) 2019-11-13 2019-11-13 Traffic missing data completion method based on bidirectional cyclic neural network

Publications (1)

Publication Number Publication Date
CN110837888A true CN110837888A (en) 2020-02-25

Family

ID=69576320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106967.7A Withdrawn CN110837888A (en) 2019-11-13 2019-11-13 Traffic missing data completion method based on bidirectional cyclic neural network

Country Status (1)

Country Link
CN (1) CN110837888A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417000A (en) * 2020-11-18 2021-02-26 杭州电子科技大学 A Time Series Missing Value Filling Method Based on Bidirectional Recurrent Codec Neural Network
CN113094357A (en) * 2021-04-23 2021-07-09 大连理工大学 Traffic missing data completion method based on space-time attention mechanism
CN113239029A (en) * 2021-05-18 2021-08-10 国网江苏省电力有限公司镇江供电分公司 Completion method for missing daily freezing data of electric energy meter
CN113392139A (en) * 2021-06-04 2021-09-14 中国科学院计算技术研究所 Multi-element time sequence completion method and system based on association fusion
CN113554105A (en) * 2021-07-28 2021-10-26 桂林电子科技大学 A spatiotemporal fusion-based method for missing data completion in the Internet of Things
CN114611396A (en) * 2022-03-15 2022-06-10 国网安徽省电力有限公司蚌埠供电公司 A method of analyzing line loss based on big data
CN114936206A (en) * 2022-06-07 2022-08-23 大连理工大学 Pretreatment system and method for multi-source heterogeneous data of agricultural Internet of things
CN116595806A (en) * 2023-07-14 2023-08-15 江西师范大学 Self-adaptive temperature data complement method

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822712A (en) * 1992-11-19 1998-10-13 Olsson; Kjell Prediction method of traffic parameters
US20150120174A1 (en) * 2013-10-31 2015-04-30 Here Global B.V. Traffic Volume Estimation
CN107154150A (en) * 2017-07-25 2017-09-12 北京航空航天大学 A kind of traffic flow forecasting method clustered based on road with double-layer double-direction LSTM
CN107610469A (en) * 2017-10-13 2018-01-19 北京工业大学 A kind of day dimension regional traffic index forecasting method for considering multifactor impact
CN107680377A (en) * 2017-11-06 2018-02-09 浙江工商大学 Traffic flow data based on trend fitting intersects complementing method
CN107992536A (en) * 2017-11-23 2018-05-04 中山大学 Urban transportation missing data complementing method based on tensor resolution
CN108010320A (en) * 2017-12-21 2018-05-08 北京工业大学 A kind of complementing method of the road grid traffic data based on adaptive space-time constraint low-rank algorithm
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108205889A (en) * 2017-12-29 2018-06-26 长春理工大学 Freeway traffic flow Forecasting Methodology based on convolutional neural networks
CN109146156A (en) * 2018-08-03 2019-01-04 大连理工大学 A method of for predicting charging pile system charge volume
CN109598935A (en) * 2018-12-14 2019-04-09 银江股份有限公司 A kind of traffic data prediction technique based on ultra-long time sequence
CN110070713A (en) * 2019-04-15 2019-07-30 浙江工业大学 A kind of traffic flow forecasting method based on two-way nested-grid ocean LSTM neural network
CN110162744A (en) * 2019-05-21 2019-08-23 天津理工大学 A kind of multiple estimation new method of car networking shortage of data based on tensor
CN110223510A (en) * 2019-04-24 2019-09-10 长安大学 A kind of multifactor short-term vehicle flowrate prediction technique based on neural network LSTM
US20190286990A1 (en) * 2018-03-19 2019-09-19 AI Certain, Inc. Deep Learning Apparatus and Method for Predictive Analysis, Classification, and Feature Detection
CN110322695A (en) * 2019-07-23 2019-10-11 内蒙古工业大学 A kind of Short-time Traffic Flow Forecasting Methods based on deep learning

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822712A (en) * 1992-11-19 1998-10-13 Olsson; Kjell Prediction method of traffic parameters
US20150120174A1 (en) * 2013-10-31 2015-04-30 Here Global B.V. Traffic Volume Estimation
CN107154150A (en) * 2017-07-25 2017-09-12 北京航空航天大学 A kind of traffic flow forecasting method clustered based on road with double-layer double-direction LSTM
CN107610469A (en) * 2017-10-13 2018-01-19 北京工业大学 A kind of day dimension regional traffic index forecasting method for considering multifactor impact
CN107680377A (en) * 2017-11-06 2018-02-09 浙江工商大学 Traffic flow data based on trend fitting intersects complementing method
CN107992536A (en) * 2017-11-23 2018-05-04 中山大学 Urban transportation missing data complementing method based on tensor resolution
CN108010320A (en) * 2017-12-21 2018-05-08 北京工业大学 A kind of complementing method of the road grid traffic data based on adaptive space-time constraint low-rank algorithm
CN108205889A (en) * 2017-12-29 2018-06-26 长春理工大学 Freeway traffic flow Forecasting Methodology based on convolutional neural networks
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
US20190286990A1 (en) * 2018-03-19 2019-09-19 AI Certain, Inc. Deep Learning Apparatus and Method for Predictive Analysis, Classification, and Feature Detection
CN109146156A (en) * 2018-08-03 2019-01-04 大连理工大学 A method of for predicting charging pile system charge volume
CN109598935A (en) * 2018-12-14 2019-04-09 银江股份有限公司 A kind of traffic data prediction technique based on ultra-long time sequence
CN110070713A (en) * 2019-04-15 2019-07-30 浙江工业大学 A kind of traffic flow forecasting method based on two-way nested-grid ocean LSTM neural network
CN110223510A (en) * 2019-04-24 2019-09-10 长安大学 A kind of multifactor short-term vehicle flowrate prediction technique based on neural network LSTM
CN110162744A (en) * 2019-05-21 2019-08-23 天津理工大学 A kind of multiple estimation new method of car networking shortage of data based on tensor
CN110322695A (en) * 2019-07-23 2019-10-11 内蒙古工业大学 A kind of Short-time Traffic Flow Forecasting Methods based on deep learning

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
DONALD B. RUBIN: "Inference and missing data", 《BIOMETRIKA》 *
FILIPE RODRIGUES ET AL.: "Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation", 《IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS》 *
HAN-GYU KIM ET AL.: "Medical examination data prediction with missing information imputation based on recurrent neural networks", 《INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS》 *
LABLACK MOURAD ET AL.: "ASTIR: Spatio-Temporal Data Mining for Crowd Flow Prediction", 《IEEE ACCESS》 *
WEI CAO ET AL.: "BRITS: Bidirectional Recurrent Imputation for Time Series", 《ARXIV》 *
YI-FAN ZHANG ET AL.: "SSIM—A Deep Learning Approach for Recovering Missing Time Series Sensor Data", 《IEEE INTERNET OF THINGS JOURNAL》 *
任艺柯: "基于改进的LSTM网络的交通流预测", 《万方》 *
朱勇: "基于时空关联混合模型的交通流预测方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417000A (en) * 2020-11-18 2021-02-26 杭州电子科技大学 A Time Series Missing Value Filling Method Based on Bidirectional Recurrent Codec Neural Network
CN113094357A (en) * 2021-04-23 2021-07-09 大连理工大学 Traffic missing data completion method based on space-time attention mechanism
CN113239029A (en) * 2021-05-18 2021-08-10 国网江苏省电力有限公司镇江供电分公司 Completion method for missing daily freezing data of electric energy meter
CN113392139A (en) * 2021-06-04 2021-09-14 中国科学院计算技术研究所 Multi-element time sequence completion method and system based on association fusion
CN113392139B (en) * 2021-06-04 2023-10-20 中国科学院计算技术研究所 Environment monitoring data completion method and system based on association fusion
CN113554105A (en) * 2021-07-28 2021-10-26 桂林电子科技大学 A spatiotemporal fusion-based method for missing data completion in the Internet of Things
CN113554105B (en) * 2021-07-28 2023-04-18 桂林电子科技大学 Missing data completion method for Internet of things based on space-time fusion
CN114611396A (en) * 2022-03-15 2022-06-10 国网安徽省电力有限公司蚌埠供电公司 A method of analyzing line loss based on big data
CN114936206A (en) * 2022-06-07 2022-08-23 大连理工大学 Pretreatment system and method for multi-source heterogeneous data of agricultural Internet of things
CN116595806A (en) * 2023-07-14 2023-08-15 江西师范大学 Self-adaptive temperature data complement method
CN116595806B (en) * 2023-07-14 2023-10-10 江西师范大学 Self-adaptive temperature data complement method

Similar Documents

Publication Publication Date Title
CN113094357B (en) A traffic-missing data completion method based on spatiotemporal attention mechanism
CN110837888A (en) Traffic missing data completion method based on bidirectional cyclic neural network
CN112365040B (en) A short-term wind power prediction method based on multi-channel convolutional neural network and temporal convolutional network
CN110619430B (en) A spatiotemporal attention mechanism approach for traffic prediction
Liu et al. Short-term load forecasting using a long short-term memory network
CN110766212B (en) Ultra-short-term photovoltaic power prediction method for historical data missing electric field
Cui et al. Research on power load forecasting method based on LSTM model
CN109146156B (en) Method for predicting charging amount of charging pile system
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN111985719B (en) Power load prediction method based on improved long-term and short-term memory network
CN107993012B (en) Time-adaptive online transient stability evaluation method for power system
CN104951836A (en) Posting predication system based on nerual network technique
CN115660161A (en) Medium-term and small-term load probability prediction method based on time sequence fusion Transformer model
CN114662791B (en) A long-time series PM2.5 prediction method and system based on spatiotemporal attention
CN109583565A (en) Forecasting Flood method based on the long memory network in short-term of attention model
CN108711847A (en) A short-term wind power forecasting method based on encoding and decoding long short-term memory network
CN114595874A (en) Ultra-short-term power load prediction method based on dynamic neural network
CN112734100B (en) Road network travel time prediction method based on tensor neural network
CN112330951A (en) Method for realizing road network traffic data restoration based on generation of countermeasure network
CN114519471A (en) Electric load prediction method based on time sequence data periodicity
CN116739130A (en) Multi-time scale load prediction method of TCN-BiLSTM network
CN113947182A (en) Traffic flow prediction model construction method based on double-stage stack graph convolution network
CN115545503B (en) A medium and short-term power load forecasting method and system based on parallel sequential convolutional neural network
Yangzhen et al. A software reliability prediction model: Using improved long short term memory network
Cui et al. Short-time series load forecasting by seq2seq-lstm model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200225