CN111524348A

CN111524348A - A long-term and short-term traffic flow prediction model and method

Info

Publication number: CN111524348A
Application number: CN202010289673.9A
Authority: CN
Inventors: 屈立成; 吕娇; 王海飞; 屈艺华; 张明皓; 李翔; 李昭璐; 张壮壮
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2020-08-11

Abstract

The present application belongs to the field of traffic technology, and in particular relates to a long-term and short-term traffic flow prediction model and method. Predicting future traffic flow with external environmental influence data containing random errors will only make this random error magnified again, which will inevitably have a greater impact and fluctuation on the accuracy of model prediction. The present application provides a long-term and short-term traffic flow prediction model, which includes a context factor input layer, a feature learning and pattern recognition layer, and a traffic flow data output layer. Provides a multi-scale prediction model with high accuracy, good prediction effect, short training time, strong robustness, and is not affected by the lack of historical data. Using contextual factors as prediction input improves the accuracy of traffic flow prediction models and plays a vital role in advanced traffic management and traveler route planning.

Description

A long-term and short-term traffic flow prediction model and method

技术领域technical field

本申请属于交通技术领域，特别是涉及一种长短期交通流预测模型及方法。The present application belongs to the field of traffic technology, and in particular relates to a long-term and short-term traffic flow prediction model and method.

背景技术Background technique

交通流是指汽车在道路上连续行驶形成的车流。广义上还包括其他车辆的车流和人流。在某段时间内，在不受横向交叉影响的路段上，交通流呈连续流状态；在遇到路口信号灯管制时，呈断续流状态。交通流预测对预测道路拥堵和事故发生以及交通信号控制等起着十分重要的作用。特别是对于固定时间的信号控制策略，交通流预测是至关重要的。交通流预测模式、合适的历史数据等对预测精度都有很大的影响。Traffic flow refers to the traffic flow formed by the continuous driving of cars on the road. In a broad sense, it also includes the traffic and people flow of other vehicles. For a certain period of time, on the road section not affected by the lateral intersection, the traffic flow is in a continuous flow state; when it encounters traffic light control, it is in an intermittent flow state. Traffic flow prediction plays an important role in predicting road congestion and accidents, as well as traffic signal control. Especially for fixed-time signal control strategies, traffic flow prediction is crucial. The traffic flow prediction mode, suitable historical data, etc. have a great influence on the prediction accuracy.

影响交通流的内外部影响因素有多种多样，传统的交通流预测有的只考虑了历史交通流数据本身，没有考虑交通流数据的内外部影响因素。有的将交通流数据和外部因素相结合，但由于外部影响因素数据获取困难度较大，且在模型预测阶段，这些外部因素的数据和将要预测的交通流数据一样都是未知的，唯一能使用的只有外部因素的预报数据。使用这些预报得来的含有随机性误差的外部环境影响数据去预测未来的交通流，只会使得这种随机性误差被再次放大，不可避免地会对模型预测的准确性产生较大的影响和波动。There are various internal and external influencing factors affecting traffic flow. Some traditional traffic flow prediction only considers the historical traffic flow data itself, and does not consider the internal and external influencing factors of traffic flow data. Some combine traffic flow data with external factors, but it is difficult to obtain data due to external factors, and in the model prediction stage, the data of these external factors are unknown like the traffic flow data to be predicted. Only forecast data for external factors are used. Using the external environmental impact data with random errors obtained from these forecasts to predict future traffic flow will only make this random error magnified again, which will inevitably have a greater impact on the accuracy of the model prediction. fluctuation.

发明内容SUMMARY OF THE INVENTION

1.要解决的技术问题1. Technical problems to be solved

基于影响交通流的内外部影响因素有多种多样，传统的交通流预测有的只考虑了历史交通流数据本身，没有考虑交通流数据的内外部影响因素。有的将交通流数据和外部因素相结合，但由于外部影响因素数据获取困难度较大，且在模型预测阶段，这些外部因素的数据和将要预测的交通流数据一样都是未知的，唯一能使用的只有外部因素的预报数据。使用这些预报得来的含有随机性误差的外部环境影响数据去预测未来的交通流，只会使得这种随机性误差被再次放大，不可避免地会对模型预测的准确性产生较大的影响和波动的问题，本申请提供了一种基于语境因素和历史交通流数据的长短期交通流预测模型及方法。There are various internal and external factors that affect traffic flow. Some traditional traffic flow prediction only considers the historical traffic flow data itself, and does not consider the internal and external factors of traffic flow data. Some combine traffic flow data with external factors, but it is difficult to obtain data due to external factors, and in the model prediction stage, the data of these external factors are unknown like the traffic flow data to be predicted. Only forecast data for external factors are used. Using the external environmental impact data with random errors obtained from these forecasts to predict future traffic flow will only make this random error magnified again, which will inevitably have a greater impact on the accuracy of the model prediction. To solve the problem of fluctuation, this application provides a long-term and short-term traffic flow prediction model and method based on contextual factors and historical traffic flow data.

2.技术方案2. Technical solutions

为了达到上述的目的，本申请提供了一种长短期交通流预测模型，所述模型依次包括语境因素输入层、特征学习与模式识别层和交通流数据输出层；In order to achieve the above purpose, the present application provides a long-term and short-term traffic flow prediction model, the model sequentially includes a context factor input layer, a feature learning and pattern recognition layer, and a traffic flow data output layer;

所述语境因素输入层，用于将语境因素预处理后输入神经网络；The context factor input layer is used to input the context factor into the neural network after preprocessing;

所述学习与模式识别层，用于将输入的语境因素逐层变换，提取隐含的模式和特征；The learning and pattern recognition layer is used to transform the input context factors layer by layer to extract hidden patterns and features;

所述交通流数据输出层，用于对前面隐藏层学习得到的模式和特征进行聚集和汇总，进行非线性加权变换后得到相应的交通流预测数据。The traffic flow data output layer is used for aggregating and summarizing the patterns and features learned by the previous hidden layer, and performing nonlinear weighted transformation to obtain corresponding traffic flow prediction data.

本申请提供的另一种实施方式为：所述语境因素输入层采用未来的语境因素作为输入。Another embodiment provided by the present application is: the context factor input layer uses future context factors as input.

本申请提供的另一种实施方式为：所述交通流数据输出层输出的交通流预测包括长时交通流预测和短时交通流预测。Another embodiment provided by the present application is that the traffic flow prediction output by the traffic flow data output layer includes long-term traffic flow prediction and short-term traffic flow prediction.

本申请提供的另一种实施方式为：所述交通流数据输出层输出的交通流预测时长包括5分钟、10分钟、15分钟、20分钟、30分钟或1小时。Another embodiment provided by the present application is: the traffic flow prediction duration output by the traffic flow data output layer includes 5 minutes, 10 minutes, 15 minutes, 20 minutes, 30 minutes or 1 hour.

本申请还提供一种长短期交通流预测方法，所述方法包括如下步骤：The present application also provides a long-term and short-term traffic flow prediction method, the method includes the following steps:

步骤1)：将历史交通流数据和语境因素输入到特征学习与模式识别层中进行训练；Step 1): Input the historical traffic flow data and contextual factors into the feature learning and pattern recognition layer for training;

步骤2)：利用神经网络反向传播算法不断的进行模型的更新迭代与测试；Step 2): Utilize the neural network back-propagation algorithm to continuously carry out the updating iteration and testing of the model;

步骤3)：生成能够表达交通流语境因素特征的深度信念网络模型；Step 3): generate a deep belief network model that can express the characteristics of traffic flow contextual factors;

步骤4)：加载训练好的深度信念网络模型；Step 4): Load the trained deep belief network model;

步骤5)：将准备好的未来语境因素按顺序送入预测模型；Step 5): Feed the prepared future contextual factors into the prediction model in sequence;

步骤6)：通过交通流数据输出层输出预测的结果。Step 6): output the predicted result through the traffic flow data output layer.

本申请提供的另一种实施方式为：所述语境因素包括年、月、日、星期、节日、假日、时、分和每日数据时间点。Another embodiment provided by the present application is: the context factors include year, month, day, week, holiday, holiday, hour, minute and daily data time point.

本申请提供的另一种实施方式为：所述预测方法从历史数据中挖掘出给定时间间隔内的交通流数据与语境因素之间的关系。Another embodiment provided by the present application is that the prediction method mines the relationship between traffic flow data and contextual factors within a given time interval from historical data.

本申请提供的另一种实施方式为：所述交通流数据与语境因素之间的关系采用深度信念网络模型进行挖掘。Another embodiment provided by the present application is that the relationship between the traffic flow data and contextual factors is mined by using a deep belief network model.

本申请提供的另一种实施方式为：所述交通流数据与语境因素之间的关系采用多层监督学习算法进行挖掘。Another embodiment provided by the present application is: the relationship between the traffic flow data and contextual factors is mined by using a multi-layer supervised learning algorithm.

3.有益效果3. Beneficial effects

与现有技术相比，本申请提供的一种长短期交通流预测模型及方法的有益效果在于：Compared with the prior art, the beneficial effects of a long-term and short-term traffic flow prediction model and method provided by the present application are:

本申请提供的长短期交通流预测模型，提高了交通流预测模型的准确率。提供了一种准确率高，预测效果好，训练时间短，鲁棒性强，不受历史数据缺失影响的多尺度预测模型。对于先进的交通管理和旅行者路线规划起着至关重要的作用。The long-term and short-term traffic flow prediction model provided by the present application improves the accuracy of the traffic flow prediction model. Provides a multi-scale prediction model with high accuracy, good prediction effect, short training time, strong robustness, and is not affected by the lack of historical data. It plays a vital role in advanced traffic management and traveler route planning.

本申请提供的长短期交通流预测模型，研究不同单元模型的训练方法，优化模型结构和参数，减少训练时间。The long-term and short-term traffic flow prediction model provided by this application studies the training methods of different unit models, optimizes the model structure and parameters, and reduces the training time.

本申请提供的长短期交通流预测方法，为一种基于语境因素和历史交通流数据的长短期交通预测算法。The long-term and short-term traffic flow prediction method provided by the present application is a long-term and short-term traffic prediction algorithm based on contextual factors and historical traffic flow data.

本申请提供的长短期交通流预测方法，针对现有的交通预测流在预测准确性方面准确率低的问题，提出了一种基于深度置信网络的长短期交通流预测算法。The long-term and short-term traffic flow prediction method provided by the present application proposes a long-term and short-term traffic flow prediction algorithm based on a deep belief network, aiming at the problem that the existing traffic prediction flow has low accuracy in terms of prediction accuracy.

本申请提供的长短期交通流预测方法，研究给定时间间隔内的交通流与语境因素组合之间的关系。The long-term and short-term traffic flow prediction method provided in this application studies the relationship between the traffic flow and the combination of contextual factors in a given time interval.

本申请提供的长短期交通流预测方法，抵御历史数据缺失问题对预测精度的影响。The long-term and short-term traffic flow prediction method provided in this application can resist the influence of the problem of missing historical data on the prediction accuracy.

本申请提供的长短期交通流预测方法，研究因子特征提取网络的结构和优化方法。The long-term and short-term traffic flow prediction method provided in this application studies the structure and optimization method of the factor feature extraction network.

本申请提供的长短期交通流预测方法，研究长短时多时间尺度交通流预测模型的实现方法。The long-term and short-term traffic flow prediction method provided in this application studies the realization method of the long-term and short-term multi-time-scale traffic flow prediction model.

本申请提供的长短期交通流预测方法，该模型算法通过对交通流数据进行解析，从中提取出影响交通流运行模式的语境因素，建立语境因素交通流预测模型，从历史数据中挖掘出语境因素和交通流模式之间的复杂关系。在模型训练好以后，使用未来的语境因素输入模型，在不依赖任何外部数据的情况下实现对未来交通流的准确估计和预测，抵御了历史数据缺失的问题对预测精度的影响。最后，以给定的时间间隔(例如5分钟，10分钟，15分钟，或1小时)将采集到的交通流量数据送入模型，从而实现长短期神经网络交通流预测。In the long-term and short-term traffic flow prediction method provided by this application, the model algorithm extracts the contextual factors that affect the traffic flow operation mode by analyzing the traffic flow data, establishes a traffic flow prediction model of contextual factors, and mines the historical data. The complex relationship between contextual factors and traffic flow patterns. After the model is trained, future contextual factors are used to input the model to achieve accurate estimation and prediction of future traffic flow without relying on any external data, which resists the impact of missing historical data on the prediction accuracy. Finally, at a given time interval (eg, 5 minutes, 10 minutes, 15 minutes, or 1 hour), the collected traffic flow data is fed into the model to achieve long-term and short-term neural network traffic flow prediction.

附图说明Description of drawings

图1是本申请的长短期交通流预测模型示意图；1 is a schematic diagram of a long-term and short-term traffic flow prediction model of the present application;

图中：1-语境因素输入层、2-特征学习与模式识别层、3-交通流数据输出层。In the figure: 1-context factor input layer, 2-feature learning and pattern recognition layer, 3-traffic flow data output layer.

具体实施方式Detailed ways

在下文中，将参考附图对本申请的具体实施例进行详细地描述，依照这些详细的描述，所属领域技术人员能够清楚地理解本申请，并能够实施本申请。在不违背本申请原理的情况下，各个不同的实施例中的特征可以进行组合以获得新的实施方式，或者替代某些实施例中的某些特征，获得其它优选的实施方式。Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, from which those skilled in the art can clearly understand the present application and be able to implement the present application. Without departing from the principles of the present application, the features of the various embodiments may be combined to obtain new embodiments, or instead of certain features of certain embodiments, to obtain other preferred embodiments.

参见图1，本申请提供一种长短期交通流预测模型，所述模型包括语境因素输入层1、特征学习与模式识别层2和交通流数据输出层3。Referring to FIG. 1 , the present application provides a long-term and short-term traffic flow prediction model. The model includes a context factor input layer 1 , a feature learning and pattern recognition layer 2 and a traffic flow data output layer 3 .

整个模型分为三个部分，最左侧为语境因素输入层，负责将语境因素预处理后输入神经网络；中间为特征变换与模式识别层，是整个模型的最核心部分，也称为隐藏层，输入的语境因素在这里被逐层变换，提取隐含的模式和特征；最后一部分为预测器，也称为输出层，是一个简单的人工神经网络层，对前面隐藏层学习得到的模式和特征进行聚集和汇总，进行非线性加权变换后得到相应的交通流预测数据，是整个模型的输出部分。The whole model is divided into three parts, the leftmost is the context factor input layer, which is responsible for inputting the contextual factors into the neural network after preprocessing; the middle is the feature transformation and pattern recognition layer, which is the core part of the entire model, also known as Hidden layer, where the input contextual factors are transformed layer by layer to extract hidden patterns and features; the last part is the predictor, also known as the output layer, which is a simple artificial neural network layer learned from the previous hidden layer. The corresponding traffic flow prediction data is obtained after non-linear weighted transformation, which is the output part of the entire model.

进一步地，所述语境因素输入层采用未来的语境因素输入层。在不依赖任何外部数据的情况下实现对未来交通流的准确估计和预测，抵御了历史数据缺失的问题对预测精度的影响。Further, the context factor input layer adopts a future context factor input layer. Accurate estimation and prediction of future traffic flow can be achieved without relying on any external data, and the impact of missing historical data on prediction accuracy can be resisted.

进一步地，所述交通流数据输出层输出的交通流预测包括长时交通流预测和短时交通流预测。Further, the traffic flow prediction output by the traffic flow data output layer includes long-term traffic flow prediction and short-term traffic flow prediction.

进一步地，所述交通流数据输出层输出的交通流预测时长包括5分钟、10分钟、15分钟、20分钟、30分钟和1小时。采用了多尺度预测。可以根据预测要求选择不同尺度的预测模型进行长时和短时的交通流预测。Further, the traffic flow prediction duration output by the traffic flow data output layer includes 5 minutes, 10 minutes, 15 minutes, 20 minutes, 30 minutes and 1 hour. Multi-scale forecasting was used. Different scales of forecasting models can be selected for long-term and short-term traffic flow forecasting according to forecasting requirements.

(1)模型训练阶段(1) Model training stage

步骤1)：将历史交通流数据和语境因素输入到特征学习与模式识别层2中进行训练；Step 1): Input historical traffic flow data and contextual factors into feature learning and pattern recognition layer 2 for training;

(2)模型预测阶段(2) Model prediction stage

步骤4)：加载训练好的深度置信网络模型；Step 4): Load the trained deep belief network model;

进一步地，所述语境因素包括年、月、日、星期、节日、假日、时、分和每日数据时间点。Further, the contextual factors include year, month, day, week, holiday, holiday, hour, minute and daily data time point.

进一步地，所述预测方法从历史数据中挖掘出给定时间间隔内的交通流数据与语境因素之间的关系。Further, the prediction method mines the relationship between traffic flow data and contextual factors within a given time interval from historical data.

进一步地，所述交通流数据与语境因素之间的关系采用深度信念网络模型进行挖掘。Further, the relationship between the traffic flow data and contextual factors is mined by using a deep belief network model.

进一步地，所述交通流数据与语境因素之间的关系采用多层监督学习算法进行挖掘。Further, the relationship between the traffic flow data and contextual factors is mined by using a multi-layer supervised learning algorithm.

1、准确率高1. High accuracy

基于深度置信网络的长短期交通流预测算法考虑了语境因素对交通流值的影响，对交通流数据进行充分研究，用深度信念网络模型挖掘出了交通流值与语境因素之间潜在关系。本申请所提出的方法在预测准确性方面优于传统的交通流预测方法。The long-term and short-term traffic flow prediction algorithm based on deep belief network considers the influence of contextual factors on the traffic flow value, fully researches the traffic flow data, and uses the deep belief network model to excavate the potential relationship between the traffic flow value and the contextual factors. . The method proposed in this application outperforms traditional traffic flow prediction methods in terms of prediction accuracy.

2、不受历史数据缺失值的影响2. Not affected by missing values in historical data

基于深度置信网络的日长期交通流预测算法，使用未来的语境因素输入模型，在不依赖任何外部数据的情况下实现对未来交通流的准确估计和预测。The daily and long-term traffic flow prediction algorithm based on deep belief network uses the future contextual factors to input the model to achieve accurate estimation and prediction of future traffic flow without relying on any external data.

3、多尺度预测3. Multi-scale prediction

基于深度置信网络的长短期交通流预测算法采用了多尺度模型预测，不同于以往的预测模型，只能使用一个时间间隔进行训练及预测，本模型训练好之后，可以根据自己的需求选择时间间隔，对未来的交通流量进行多尺度预测。The long-term and short-term traffic flow prediction algorithm based on deep belief network adopts multi-scale model prediction. Unlike previous prediction models, only one time interval can be used for training and prediction. After the model is trained, the time interval can be selected according to your own needs. , multi-scale prediction of future traffic flow.

本申请主要考虑了语境因素对交通流值的影响，用多层监督学习算法挖掘出了给定时间间隔内的交通流值与语境因素之间的关系。本申请的特色之处在于首次在不依赖任何外部数据的情况下使用未来的语境因素输入模型实现对未来交通流的准确估计和预测，结合深度信念网络在非线性变换方面的独特优势，对语境因素特征的进行了更高层次的抽象，实现了对未来交通流量的预测输出。This application mainly considers the influence of contextual factors on traffic flow values, and uses a multi-layer supervised learning algorithm to mine the relationship between traffic flow values and contextual factors within a given time interval. The feature of this application is that it is the first time to use the future contextual factor input model to achieve accurate estimation and prediction of future traffic flow without relying on any external data. Combined with the unique advantages of deep belief networks in nonlinear transformation, The features of contextual factors are abstracted at a higher level, and the prediction output of future traffic flow is realized.

本申请在分析交通流数据的长期趋势、周期特征和外部因素的基础上，重点研究交通流时间序列数据及其内在数据属性之间的关联关系，利用时间序列分析方法和非线性特征拟合模型，建立多时间尺度的长短时交通流预测模型，从历史交通数据中挖掘出交通流及语境因素之间的潜在关系，从而对未来的交通运行状况和趋势作出准确可靠的预测。On the basis of analyzing the long-term trends, periodic characteristics and external factors of traffic flow data, this application focuses on the research on the correlation between traffic flow time series data and its inherent data attributes, and uses time series analysis methods and nonlinear features to fit models. , establish a multi-time-scale long-term and short-term traffic flow prediction model, and mine the potential relationship between traffic flow and contextual factors from historical traffic data, so as to make accurate and reliable predictions of future traffic operation status and trends.

使用历史的交通流数据与语境因素组合进行多数据来源、多时间尺度的测试评估，以验证所提出模型的正确性、鲁棒性以及对长短期交通流预测的适应性；同时使用多尺度多折交叉验证方法与常见典型的交通流预测方法的预测结果进行比对，以验证所提出的预测模型的准确性、普适性和先进性。Use historical traffic flow data and contextual factors to conduct multi-data source, multi-time scale test evaluations to verify the correctness, robustness and adaptability of the proposed model for long-term and short-term traffic flow prediction; simultaneously using multi-scale The multi-fold cross-validation method is compared with the prediction results of common and typical traffic flow prediction methods to verify the accuracy, universality and advancement of the proposed prediction model.

本申请提出了一种基于语境因素和历史交通流数据的深度信念网络长短期交通预测方法。其主要思想是通过挖掘交通流数据与语境因素之间的对应关系，优于传统的交通预测在预测方面的准确性，提高了交通流预测的精确度。本申请提出的考虑语境因素的交通流语境特征预测模型的总体框架如图一所示。This application proposes a long-term and short-term traffic prediction method based on contextual factors and historical traffic flow data using deep belief networks. The main idea is that by mining the correspondence between traffic flow data and contextual factors, it is superior to traditional traffic forecasting in terms of forecasting accuracy and improves the accuracy of traffic flow forecasting. The overall framework of the traffic flow contextual feature prediction model considering contextual factors proposed in this application is shown in Figure 1.

深度神经网络经常被推荐用于在海量的多维历史数据集中挖掘潜在的关系。深度置信网络作为深度神经网络(Deep Neural Network，DNN)的一种，是一种深度学习体系结构，它允许计算模型由多个处理层组成，以便学习具有多个抽象层次的数据表示。为了挖掘语境因素和交通流模式之间复杂的非线性关系，本申请使用了经典的深度信念网络作为工具建立模型。Deep neural networks are often recommended for mining latent relationships in massive multidimensional historical datasets. Deep Belief Network, a type of Deep Neural Network (DNN), is a deep learning architecture that allows computational models to be composed of multiple processing layers in order to learn data representations with multiple levels of abstraction. In order to mine the complex nonlinear relationship between contextual factors and traffic flow patterns, this application uses the classical deep belief network as a tool to build a model.

深度置信网络实际上是单层的人工神经网络在网络层数上的扩展，所谓的“深度”指的就是超过一层的人工神经网络。为了实现复杂特征的提取和转换，深度置信网络将多个具有非线性处理功能的单层神经网络层进行了级联。每一层的节点都基于前一层的输出在一组不同的模式上进行训练，每个后续的神经网络层都在使用前一层网络变换的输出作为输入，语境因素在每一层进行着不同的非线性变换和重组，并逐层向后传递。随着深度置信网络向更深层次的不断推进，可以识别的模式和特征就越复杂，从而可以挖掘出隐藏的更深层次的抽象关系。和单层神经网络一样，深度置信网络是一种全连接的前馈网络，它的模型可以看作是一个简单的数学函数映射f：

或者Z在

上的一个分布。这里，Z是自变量，代表从交通流数据中提取出来的语境因素的集合；

是因变量，是通过函数映射变换后得到的未来的交通流参数，也是模型中的想要输出的预测值。假设z为某一语境因素向量，z∈Z，神经元的网络函数f(z)被定义为一些变换函数g_i(z)的组合，这些组合中的函数还可以进一步被分解为更多的函数，以更加方便地表示复杂的网络结构。公式中的箭头描述函数之间的依赖关系，可以方便的使用组合函数来表示，在神经网络里面广泛使用的组合函数就是非线性加权和如公式1所示:A deep belief network is actually an extension of a single-layer artificial neural network in the number of network layers. The so-called "deep" refers to an artificial neural network with more than one layer. In order to realize the extraction and transformation of complex features, the deep belief network cascades multiple single-layer neural network layers with nonlinear processing functions. The nodes of each layer are trained on a different set of patterns based on the output of the previous layer, each subsequent neural network layer is using the output of the network transformation of the previous layer as input, and the context factor is carried out at each layer. with different nonlinear transformations and reorganizations, and pass them back layer by layer. As the deep belief network continues to advance to a deeper level, the more complex patterns and features can be identified, and the deeper abstract relationships that are hidden can be unearthed. Like a single-layer neural network, a deep belief network is a fully connected feedforward network, and its model can be viewed as a simple mathematical function mapping f:

or Z in

a distribution on . Here, Z is the independent variable, representing the set of contextual factors extracted from the traffic flow data;

is the dependent variable, the future traffic flow parameter obtained after transformation by function mapping, and the predicted value that the model wants to output. Assuming that z is a vector of contextual factors, z∈Z, the network function f(z) of neurons is defined as a combination of some transformation functions _gi (z), and the functions in these combinations can be further decomposed into more function to more conveniently represent complex network structures. The arrows in the formula describe the dependencies between functions, which can be easily represented by the combination function. The combination function widely used in neural networks is the nonlinear weighted sum, as shown in formula 1:

式中：wj为权重；gj指向一个函数向量集合g＝(g1，g2，...，gn)中的一个元素；bj表示偏置；σ代表一个预定的变换函数，又称为激活函数。激活函数的一个重要特征是，当输入值改变时，它提供了一个平滑的过渡。本文激活函数为tanh，它可以生成一个非线性的值，并在-1和1之间进行压缩。In the formula: wj is the weight; gj points to an element in a function vector set g = (g1, g2, ..., gn); bj represents the bias; σ represents a predetermined transformation function, also known as the activation function. An important feature of an activation function is that it provides a smooth transition when the input value changes. The activation function in this paper is tanh, which can generate a nonlinear value and compress between -1 and 1.

深度置信网络预测模型的结构模型分为三个部分，最左侧为语境因素输入层，负责将语境因素预处理后输入神经网络；中间为特征变换与模式识别层，是整个模型的最核心部分，也称为隐藏层，输入的语境因素在这里被逐层变换，提取隐含的模式和特征；最后一部分为预测器，也称为输出层，是一个简单的人工神经网络层，对前面隐藏层学习得到的模式和特征进行聚集和汇总，进行非线性加权变换后得到相应的交通流预测数据，是整个模型的输出部分。The structural model of the deep belief network prediction model is divided into three parts. The leftmost is the contextual factor input layer, which is responsible for preprocessing the contextual factors into the neural network; the middle is the feature transformation and pattern recognition layer, which is the most important part of the entire model. The core part, also known as the hidden layer, where the input contextual factors are transformed layer by layer to extract hidden patterns and features; the last part is the predictor, also known as the output layer, which is a simple artificial neural network layer, The patterns and features learned by the previous hidden layer are aggregated and summarized, and the corresponding traffic flow prediction data is obtained after nonlinear weighted transformation, which is the output part of the entire model.

深度置信网络预测模型的每一个输入节点对应语境因素向量中的一个语境因素z_i(z_i∈Z)，预测器的输出节点对应交通流数据

如同前面的映射关系，整个深度置信网络模型可以表示成如公式2所示：Each input node of the deep belief network prediction model corresponds to a context factor zi (z _i _∈ Z) in the context factor vector, and the output node of the predictor corresponds to the traffic flow data

Like the previous mapping relationship, the entire deep belief network model can be expressed as Equation 2:

式中z为语境因素向量；W为权重参数；b为偏置参数。在这个函数中，深度置信网络模型被看成了一个完成非线性变换的整体，把W和b看作整个模型的权重参数和偏置参数，大大简化了对模型功能的描述和理解。由于深度置信网络本身就是由多个人工神经网络层堆叠而成，如果给W和b赋予不同的含义和范围，这个函数还可以用来表示一个人工神经网络层，甚至是网络中的一个神经元。在将语境因素向量输入网络后，神经元和层的内部状态根据其权重参数和偏差而发生改变，并通过激活函数产生特征输出。深度信念网络就是通过将这些神经元的输出连接到其他神经元的输入来实现的，从前往后逐层传递，从而形成一个有向加权图。神经网络中的这些权重参数不会在建立模型的时候就能够确定，由于其关系复杂、数量众多，这些权重参数需要通过学习不断地进行更新和修正，这个学习的过程也被称为模型训练。由于深度信念网络模型中的权重参数在建立时无法一一确定，最好的办法就是先选用一个随机值，然后根据模型的输出结果与期望值之间的误差来不断地进行迭代优化，直至模型输出结果与期望值之间的误差达到最小或者不再变化为止。当训练误差达到一个预设的最小值时(如10^-5或更小)，迭代过程可以宣告结束，模型的参数得以确定，训练工作也就完成了。当然，也可能存在训练误差停止下降，永远也达不到预设值的情况，这时应该从模型结构和数据集两个方面来查找因，修正模型或更新数据集后重新训练。由此看来，当一个深度信念网络的结构确定之后，深度信念网络的学习或者训练其实就是通过迭代算法来不断地减小模型的误差，优化并确定其权重参数的值。where z is the context factor vector; W is the weight parameter; b is the bias parameter. In this function, the deep belief network model is regarded as a whole that completes the nonlinear transformation, and W and b are regarded as the weight parameters and bias parameters of the entire model, which greatly simplifies the description and understanding of the model function. Since the deep belief network itself is composed of multiple artificial neural network layers stacked, if W and b are given different meanings and ranges, this function can also be used to represent an artificial neural network layer, or even a neuron in the network . After the context factor vector is fed into the network, the internal states of neurons and layers are changed according to their weight parameters and biases, and feature outputs are generated through activation functions. Deep belief networks do this by connecting the outputs of these neurons to the inputs of other neurons, passing from front to back layer by layer, forming a directed weighted graph. These weight parameters in the neural network will not be determined when the model is established. Due to their complex relationship and large number, these weight parameters need to be continuously updated and revised through learning. This learning process is also called model training. Since the weight parameters in the deep belief network model cannot be determined one by one, the best way is to select a random value first, and then iteratively optimize according to the error between the output result of the model and the expected value until the model outputs The error between the result and the expected value reaches a minimum or no longer changes. When the training error reaches a preset minimum value (such as 10 ^-5 or less), the iterative process can be declared over, the parameters of the model are determined, and the training job is completed. Of course, there may also be cases where the training error stops decreasing and never reaches the preset value. In this case, the cause should be found from both the model structure and the data set, and the model should be corrected or the data set should be retrained. From this point of view, when the structure of a deep belief network is determined, the learning or training of the deep belief network is actually to continuously reduce the error of the model through an iterative algorithm, optimize and determine the value of its weight parameters.

(1)模型的损失函数(1) The loss function of the model

在有监督学习的情况下，当语境数据输入到深度信念网络模型中之后，经过隐藏层上的一系列的抽象、变换和提取过程，最终会从输出层输出一个预测值。这个语境因素依附于交通流数据而存在，模型在这时输出的预测值和真实的交通流数据之间不可避免地存在一个误差。将这个误差用一个函数来表示，就是模型训练中要用到的损失函数。In the case of supervised learning, after the context data is input into the deep belief network model, after a series of abstraction, transformation and extraction processes on the hidden layer, a prediction value is finally output from the output layer. This context factor exists depending on the traffic flow data, and there is an inevitable error between the predicted value output by the model and the real traffic flow data. Representing this error as a function is the loss function used in model training.

损失函数被用来表示模型的预测值与真实值的不一致程度，是预测模型与真实数据拟合程度的反映，两者拟合程度越差距离也就越远，损失函数的数值就应该越大，这样才能有效的在优化过程中产生作用。损失函数还要在值域范围内处处可微，要有合理的梯度，在损失数值较大时梯度较大，而在损失数值较小时梯度明显变小，在数值为0时梯度也同时为0。在非线性回归类的应用中，最常用的损失函数是均方差损失函数：The loss function is used to indicate the degree of inconsistency between the predicted value of the model and the actual value, which is a reflection of the degree of fit between the prediction model and the real data. , so that it can effectively play a role in the optimization process. The loss function should also be differentiable everywhere in the value range, and there should be a reasonable gradient. When the loss value is large, the gradient is large, and when the loss value is small, the gradient becomes significantly smaller, and when the value is 0, the gradient is also 0 at the same time. . In the application of nonlinear regression class, the most commonly used loss function is the mean square error loss function:

式中：y为真实值；

为预测值。有了损失函数，就可以方便的计算出每一个语境因素预测模型上的总体损失，对于模型的学习也就转化成了一个最优化问题，优化的目标就是损失函数最小化。In the formula: y is the true value;

is the predicted value. With the loss function, the overall loss of each context factor prediction model can be easily calculated, and the learning of the model is transformed into an optimization problem. The goal of optimization is to minimize the loss function.

(2)响应误差的反向传播与参数更新(2) Back propagation of response error and parameter update

损失是由误差的传递造成的，模型的总体损失也可称为响应误差，或简称误差。深度神经信念网络模型产生的误差从前往后依次传播到输出层，根据损失函数就可以计算得到模型传播的总体损失，但是这些损失是由多个神经元调节参数的误差经过多次的组合变换后传递而来，究竟哪一个参数产生了多少误差目前还无从得知。The loss is caused by the propagation of the error, and the overall loss of the model can also be called the response error, or simply the error. The error generated by the deep neural belief network model is propagated to the output layer from front to back, and the overall loss of the model propagation can be calculated according to the loss function, but these losses are the errors of the parameters adjusted by multiple neurons after multiple combinations and transformations It is not known how much error is generated by which parameter is passed.

反向传播算法(Back Propagation，BP)是一种与梯度下降方法结合使用的训练人工神经网络的常用方法。在网络模型的每次迭代过程中，该方法通过对模型的响应误差进行前向计算和反向传播，从后往前依次计算所有的权重参数的损失梯度，并使用计算出的梯度更新对应的权重参数，从而实现模型的参数调整和优化。Back Propagation (BP) is a common method used in conjunction with gradient descent to train artificial neural networks. In each iteration process of the network model, the method calculates the loss gradients of all weight parameters sequentially from back to front by performing forward calculation and back propagation on the response error of the model, and uses the calculated gradients to update the corresponding Weight parameters, so as to realize the parameter adjustment and optimization of the model.

误差反向传播算法一般由三个阶段组成。第一阶段为激励传播阶段。这个阶段主要是神经网络模型的激励响应的前向传播，根据模型的结构和初始参数，逐层计算加权和和激励响应，并根据损失函数计算模型的响应误差。第二阶段为梯度计算阶段。通过对损失函数求导，反向计算出每一个参数的下降梯度。第三阶段为权重更新阶段，根据计算出的梯度值更新相应的模型参数。The error backpropagation algorithm generally consists of three stages. The first stage is the incentive dissemination stage. This stage is mainly the forward propagation of the excitation response of the neural network model. According to the structure and initial parameters of the model, the weighted sum and the excitation response are calculated layer by layer, and the response error of the model is calculated according to the loss function. The second stage is the gradient calculation stage. By taking the derivation of the loss function, the descending gradient of each parameter is calculated in reverse. The third stage is the weight update stage, and the corresponding model parameters are updated according to the calculated gradient values.

在梯度计算阶段，由于参数的改变导致了误差的产生，因此将模型参数看作损失函数的自变量来求其梯度。设o_i为网络某层第i个神经元的激励响应，根据公式4可知In the gradient calculation stage, the error is generated due to the change of the parameters, so the model parameters are regarded as the independent variables of the loss function to calculate the gradient. Let o _i be the excitation response of the i-th neuron in a certain layer of the network, according to formula 4, we can see

式中：

为激活函数；net_i为神经元i的加权和；b_i为神经元i的偏置参数；w_ji为与前层神经元j相连的神经元i的权重参数。公式中i为本层神经元的索引，j为前层神经元的索引，因此o_j表示前层神经元j的激励响应，在整个模型的第一层o_j＝z_j，在整个模型的最后一层

根据链式求导法则(Chain Rule)逐层向前求取导数，权重参数w_ji的求导公式如下：where:

is the activation function; net _i is the weighted sum of neuron i; b _i is the bias parameter of neuron i; w _ji is the weight parameter of neuron i connected to neuron j in the previous layer. In the formula, i is the index of the neuron in the layer, and j is the index of the neuron in the front layer, so o _j represents the excitation response of the neuron in the front layer. In the first layer of the whole model, o _j =z _j , in the whole model last layer

According to the chain derivation rule (Chain Rule), the derivative is obtained layer by layer forward, and the derivation formula of the weight parameter w _ji is as follows:

式中：E为神经元i的响应误差；其他符号的定义与上式相同。其中，In the formula: E is the response error of neuron i; the definitions of other symbols are the same as the above formula. in,

式中：

为激活函数的导数。对于输出层来说，响应误差就是模型的总体损失，可以使用损失函数直接计算，整个输出层响应误差对于响应激励的导数可以使用下式直接计算：where:

is the derivative of the activation function. For the output layer, the response error is the overall loss of the model, which can be calculated directly using the loss function. The derivative of the response error of the entire output layer to the response excitation can be directly calculated using the following formula:

但是，隐藏层中的神经元却无法直接获得其响应误差，这个响应误差需要把输出层的响应误差由后向前逐层传递来计算。设L为接受来自神经元i的输入的所有神经元(下标)的集合，则当前神经元的响应误差可由这些相连的神经元反向传播而来，数值上等于这些误差的和，对激励响应o_j取全微分，这时神经元i的响应误差对于激励响应的导数就可以写成：However, the neurons in the hidden layer cannot directly obtain their response error, which needs to be calculated by passing the response error of the output layer from back to front. Let L be the set of all neurons (subscripts) that accept input from neuron i, then the response error of the current neuron can be back-propagated from these connected neurons, and the value is equal to the sum of these errors. The response o _j takes the total differential, and the derivative of the response error of the neuron i to the excitation response can be written as:

对比公式中两端可以看出，这里存在着一个明显的递归关系，如果能够得到下一层所有神经元的响应激励的导数，那么就可以计算出本隐藏层神经元响应激励的导数，代入以上方程式可以获得权重参数W相应的梯度公式如下：Comparing the two ends of the formula, it can be seen that there is an obvious recursive relationship here. If the derivative of the response excitation of all neurons in the next layer can be obtained, then the derivative of the response excitation of the neurons in this hidden layer can be calculated and substituted into the above The equation can obtain the weight parameter W and the corresponding gradient formula is as follows:

其中：in:

现在，每一个神经元权重参数的梯度都可以计算出来了，可以使用梯度下降法来更新权重参数的数值值了，权重参数W的更新公式如下：Now, the gradient of each neuron weight parameter can be calculated, and the gradient descent method can be used to update the numerical value of the weight parameter. The update formula of the weight parameter W is as follows:

w_ij(t+1)＝w_ij(t)+Δw_ij+ξ(t)w _ij (t+1)=w _ij (t)+Δw _ij +ξ(t)

式中：t为迭代次数；η是学习率，通常取一个小于1的数值；ξ(t)为随机变量。In the formula: t is the number of iterations; η is the learning rate, usually a value less than 1; ξ(t) is a random variable.

模型中还有另外一个需要学习的参数——偏置参数b，像权重参数W一样，同样使用链式求导法则逐层向前求取导数，偏置参数b_ji的求导公式如下：There is another parameter that needs to be learned in the model - the bias parameter b. Like the weight parameter W, it also uses the chain derivation rule to obtain the derivative forward layer by layer. The derivation formula of the bias parameter b _ji is as follows:

式中：where:

根据前面的推导过程和公式10，对激励响应o_j取全微分，这时神经元i响应误差对于激励响应的导数就可以写成：According to the previous derivation process and formula 10, the total differential is taken for the excitation response o _j , and the derivative of the neuron i response error to the excitation response can be written as:

代入以上方程式可以获得模型偏置b相应的梯度公式如下：Substituting into the above equation can obtain the corresponding gradient formula of the model bias b as follows:

其中：in:

现在，每一个神经元偏置参数的梯度都计算出来了，可以使用梯度下降法来更新偏置参的数值，更新公式如下：Now that the gradient of each neuron's bias parameter has been calculated, the gradient descent method can be used to update the value of the bias parameter. The update formula is as follows:

b_ij(t+1)＝b_ij(t)+Δb_ij+ξ(t)b _ij (t+1)=b _ij (t)+Δb _ij +ξ(t)

式中：t为迭代次数；η是学习率；ξ(t)为随机变量。Where: t is the number of iterations; η is the learning rate; ξ(t) is a random variable.

考虑到在实际应用时可能会选取不同的激活函数，上述公式中对激活函数φ的导数使用φ’进行表示，实际应用时应根据具体函数具体分析。如，模型中使用的双曲正切函数的导数为tanh(x)’＝1－tanh(x)2。Considering that different activation functions may be selected in practical application, the derivative of activation function φ in the above formula is represented by φ’, which should be analyzed according to specific functions in practical application. For example, the derivative of the hyperbolic tangent function used in the model is tanh(x)'=1-tanh(x)2.

梯度下降法是用来寻找最优模型参数的一种优化算法，通过计算训练集合在模型上的响应误差，使模型误差沿着梯度最小的方向不断下降，直至误差不再下降或者找到使模型在该训练集上误差最小的参数为止。小批量梯度下降方法(Mini-Batch GradientDescent)是介于在线学习和离线学习之间的一种梯度下降算法。结合批量梯度下降算法和随机梯度下降算法的优点，将训练数据集分割成较小的批次，在每一个批次上使用离线学习方法，从宏观上看又像是在线学习。The gradient descent method is an optimization algorithm used to find the optimal model parameters. By calculating the response error of the training set on the model, the model error continues to decrease along the direction of the smallest gradient, until the error no longer decreases or the model is found to be in The parameter with the smallest error on the training set. Mini-Batch GradientDescent is a gradient descent algorithm between online learning and offline learning. Combining the advantages of the batch gradient descent algorithm and the stochastic gradient descent algorithm, the training data set is divided into smaller batches, and the offline learning method is used on each batch, which looks like online learning from a macro perspective.

采用小批量梯度下降方法，模型更新频率高，加快了模型更新速度，减少了单个样本的噪声干扰，缩小了梯度的方差，避免模型陷入局部极小，使得模型的收敛性更强。批量更新小批量梯度下降算法在随机梯度下降的鲁棒性和批量梯度下降的效率之间找到了平衡，提供了一个更加有效的计算过程，提高了存储空间的利用效率，是深度学习领域中最常使用的梯度下降方法。Using the mini-batch gradient descent method, the model update frequency is high, the model update speed is accelerated, the noise interference of a single sample is reduced, the variance of the gradient is reduced, the model is prevented from falling into a local minimum, and the convergence of the model is stronger. The batch update mini-batch gradient descent algorithm finds a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent, providing a more efficient computing process and improving the efficiency of storage space utilization. The most commonly used gradient descent method.

使用小批量随机梯度下降算法会在模型训练过程中引入一个超参数batch_size，这个参数可根据经验赋值或者在实际训练过程中进行调整。较小的值给出了一个类似在线学习的过程，以训练过程中的引入噪声为代价而迅速收敛；较大的值给出了一个类似离线学习的过程，可以对误差梯度进行精确估计但收敛缓慢。，这时公式(3)可以写成如下的矩阵表达式：Using the mini-batch stochastic gradient descent algorithm introduces a hyperparameter batch_size during model training, which can be assigned empirically or adjusted during the actual training process. Smaller values give an online learning-like process that converges quickly at the expense of introducing noise during training; larger values give an offline learning-like process that can accurately estimate the error gradient but converges slow. , then formula (3) can be written as the following matrix expression:

式中：n为每批训练的样本数量，本文中根据经验选取256作为批次大小参数；m为语境因素的维度；o为预测值的维度。虽然使用稍大的批量尺寸带来了更高的内存占用，但是由于数据是一批一批地送往模型进行训练，有效地利用计算机的并行处理能力，提高了计算的并行性和机器的吞吐量，缩小了训练时间，同时还保持了较好的泛化性能。In the formula: n is the number of samples for each batch of training. In this paper, 256 is selected as the batch size parameter based on experience; m is the dimension of contextual factors; o is the dimension of predicted value. Although the use of a slightly larger batch size brings higher memory usage, since the data is sent to the model in batches for training, the parallel processing capability of the computer is effectively utilized, and the parallelism of the calculation and the throughput of the machine are improved. It reduces the training time while maintaining good generalization performance.

尽管在上文中参考特定的实施例对本申请进行了描述，但是所属领域技术人员应当理解，在本申请公开的原理和范围内，可以针对本申请公开的配置和细节做出许多修改。本申请的保护范围由所附的权利要求来确定，并且权利要求意在涵盖权利要求中技术特征的等同物文字意义或范围所包含的全部修改。Although the present application has been described above with reference to specific embodiments, it will be understood by those skilled in the art that many modifications may be made in configuration and detail disclosed herein within the spirit and scope of the present disclosure. The scope of protection of the present application is to be determined by the appended claims, and the claims are intended to cover all modifications encompassed by the literal meaning or scope of equivalents to the technical features in the claims.

Claims

1. A long-short term traffic flow prediction model is characterized in that: the model sequentially comprises a context factor input layer, a feature learning and pattern recognition layer and a traffic flow data output layer;

the context factor input layer is used for inputting the preprocessed context factors into the neural network;

the characteristic learning and pattern recognition layer is used for transforming the input context factors layer by layer and extracting implicit patterns and characteristics;

and the traffic flow data output layer is used for aggregating and summarizing the modes and characteristics obtained by learning of the previous hidden layer, and obtaining corresponding traffic flow prediction data after nonlinear weighted transformation.

2. The long and short term traffic flow prediction model according to claim 1, characterized in that: the contextual factor input layer takes future contextual factors as input.

3. The long and short term traffic flow prediction model according to claim 1, characterized in that: the traffic flow prediction output by the traffic flow data output layer comprises long-time traffic flow prediction and short-time traffic flow prediction.

4. The long-and-short-term traffic flow prediction method according to claim 3, characterized in that: the predicted time of the traffic flow output by the traffic flow data output layer comprises 5 minutes, 10 minutes, 15 minutes, 20 minutes, 30 minutes or 1 hour.

5. A long-term and short-term traffic flow prediction method is characterized in that: the method comprises the following steps:

step 1): inputting historical traffic flow data and context factors into a feature learning and pattern recognition layer for training;

step 2): updating iteration and testing of the model are continuously carried out by utilizing a neural network back propagation algorithm;

step 3): generating a deep belief network model capable of expressing the characteristics of the traffic flow context factors;

step 4): loading a trained deep belief network model;

step 5): sending the prepared context factors into a prediction model in sequence;

step 6): and outputting the predicted result through the traffic flow data output layer.

6. The long-and-short-term traffic flow prediction method according to claim 5, characterized in that: the contextual factors include year, month, day, week, holiday, hour, minute, and daily data time points.

7. The long-and-short-term traffic flow prediction method according to claim 5, characterized in that: the prediction method exploits from historical data the relationship between traffic flow data and contextual factors over a given time interval.

8. The long-and-short-term traffic flow prediction method according to claim 7, characterized in that: and mining the relation between the traffic flow data and the contextual factors by adopting a deep belief network model.

9. The long-and-short-term traffic flow prediction method according to claim 7, characterized in that: and mining the relation between the traffic flow data and the contextual factors by adopting a multi-layer supervised learning algorithm.