[go: up one dir, main page]

CN105206040B - A kind of public transport bunching Forecasting Methodology based on IC-card data - Google Patents

A kind of public transport bunching Forecasting Methodology based on IC-card data Download PDF

Info

Publication number
CN105206040B
CN105206040B CN201510483302.3A CN201510483302A CN105206040B CN 105206040 B CN105206040 B CN 105206040B CN 201510483302 A CN201510483302 A CN 201510483302A CN 105206040 B CN105206040 B CN 105206040B
Authority
CN
China
Prior art keywords
bus
train
data
target
station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510483302.3A
Other languages
Chinese (zh)
Other versions
CN105206040A (en
Inventor
马晓磊
陈栋伟
于海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510483302.3A priority Critical patent/CN105206040B/en
Publication of CN105206040A publication Critical patent/CN105206040A/en
Application granted granted Critical
Publication of CN105206040B publication Critical patent/CN105206040B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于IC卡数据的公交串车预测方法,属于公共交通信息处理技术领域。所述预测方法包括公交车IC卡数据采集、数据处理、实际公交串车情况检测、训练学习数据和公交串车预测,其中公交串车预测采用最小二乘支持向量机(LS‑SVM)算法。本发明结合公交IC卡数据,针对多辆车次,提取大量的乘客信息,不需要车载GPS系统,方便快捷,降低了数据处理成本;同时采用最小二乘支持向量机方法能够更好更快更有效地实现公交串车预测,使乘客能够更好地了解公交运行情况,合理调控出行时间,使公交运营部门也能够及时调整公交发车间隔,提升公交服务水平;本发明处理数据简单,成本低,且有较高预测精度。

The invention discloses a bus train prediction method based on IC card data, and belongs to the technical field of public transport information processing. The prediction method includes bus IC card data collection, data processing, actual bus situation detection, training and learning data and bus collision prediction, wherein the bus collision prediction adopts the least squares support vector machine (LS-SVM) algorithm. The present invention combines bus IC card data to extract a large number of passenger information for multiple trips, without the need for a vehicle-mounted GPS system, which is convenient and fast, and reduces data processing costs; at the same time, the method of least squares support vector machine can be better, faster and more effective Realize bus string prediction accurately, enable passengers to better understand bus operation conditions, reasonably control travel time, enable bus operation departments to adjust bus departure intervals in time, and improve bus service level; the invention is simple in data processing, low in cost, and have higher prediction accuracy.

Description

一种基于IC卡数据的公交串车预测方法A method for predicting bus strings based on IC card data

技术领域technical field

本发明涉及公共交通信息处理技术领域,具体地说是一种基于IC卡数据的公交串车预测方法。The invention relates to the technical field of public transport information processing, in particular to a bus train prediction method based on IC card data.

背景技术Background technique

在实际的公交运营过程中,由于交通拥堵、站点停靠时间和上下车人数的变化等因素影响,公交车辆的到站并不规律。特别是高峰时段,乘客常常在公交站台等上十分钟或者更久都不见有一辆公交车来,而一旦来车了却发现来的不止是一辆车,而是若干辆车同时到达,并且车辆的载客量往往不均匀。降低了公交的服务水平,引发安全危害。In the actual bus operation process, due to factors such as traffic congestion, station stop time, and changes in the number of people getting on and off the bus, the arrival of buses is irregular. Especially during peak hours, passengers often wait at the bus station for ten minutes or more without seeing a bus coming, but once they arrive, they find that it is not just one car, but several cars arrive at the same time, and the bus arrives at the same time. Passenger loads tend to be uneven. Reduced bus service levels, causing safety hazards.

事实上,某辆公交车在某个站点的延误,可能会导致其到下一个站点的时间增加,同时造成下一个站点乘客量和等待时间的增加,进一步增加了该次公交车的延误时间。另一方面,下一车次的公交车承载的乘客量将会减少,同时减少了站点延误时间,缩短了与前车的时间间隔,这如同滚雪球效应,在之后的同一路线行走中,这两个公交车有很大可能在某一站点相遇。这种现象是公交串车现象。因此预测公交串车可以减少乘客等待时间,提高公交的服务水平,提升公交分担率。In fact, the delay of a bus at a certain station may lead to an increase in the time it takes to reach the next station, and at the same time cause an increase in the number of passengers and waiting time at the next station, further increasing the delay time of the bus. On the other hand, the number of passengers carried by the next bus will be reduced, and at the same time, the delay time at the station will be reduced, and the time interval between the bus in front and the bus in front will be shortened. This is like a snowball effect. There is a high probability that the buses will meet at a certain stop. This phenomenon is the bus phenomenon. Therefore, it is predicted that the bus train can reduce the waiting time of passengers, improve the service level of the bus, and increase the bus sharing rate.

近年来,在一些大城市(如江苏南京、浙江杭州等)的公交站点,已经出现对公交车辆到站的预测,而针对公交串车情况进行预测的文献非常少。但是目前的公交车辆到站预测均是结合车载GPS系统,而且只针对于一辆车,给出其距离站点的距离及预计到站时间。这虽然能给予公交乘客一定的参考,但是实际上,在高峰时段,道路拥堵情况严重,导致公交串车现象,造成后续车辆先于前车进站,使得预测出的公交到站时间与乘客实际等待时间不符,且车载GPS系统要求大的存储空间,定位精度低,我们需要探寻更好的方法来解决上述问题。In recent years, there have been predictions of bus arrivals at bus stops in some big cities (such as Nanjing, Jiangsu, and Hangzhou, Zhejiang, etc.), but there are very few literatures on the prediction of bus trains. However, the current bus arrival predictions are all combined with the vehicle-mounted GPS system, and only for one vehicle, the distance from the station and the estimated arrival time are given. Although this can give bus passengers a certain reference, in fact, during peak hours, road congestion is serious, leading to the phenomenon of bus stringing, causing subsequent vehicles to enter the station before the previous one, making the predicted bus arrival time different from the passenger's actual time. The waiting time does not match, and the vehicle-mounted GPS system requires a large storage space, and the positioning accuracy is low. We need to find a better way to solve the above problems.

发明内容Contents of the invention

针对上述问题,本发明提供一种充分考虑某一车次到达下游站点的各相关因素、具有实时动态性能的基于IC卡数据的公交串车高精度预测方法。本发明基于公交IC卡数据,从乘客角度出发,预测相邻两车的到站间隔及公交串车,能够更好地了解公交运行情况,合理调控出行时间,提高出行效率;同时对于公交运营部门来说,也能够及时调整公交发车间隔,避免公交串车情况发生,更好提升公交服务水平。In view of the above-mentioned problems, the present invention provides a high-precision prediction method for bus trains based on IC card data, which fully considers the relevant factors of a certain train's arrival at the downstream station and has real-time dynamic performance. Based on the bus IC card data, the present invention predicts the arrival interval of two adjacent buses and the bus train from the perspective of passengers, can better understand the bus operation situation, reasonably regulate the travel time, and improve the travel efficiency; at the same time, for the bus operation department In other words, it is also possible to adjust the bus departure interval in time to avoid the occurrence of bus jamming and better improve the bus service level.

所述的一种基于IC卡数据的公交串车高精度预测方法,对同一个公交线路的两个相邻站点的多辆车次提取车次标识、线路标识、站点标识、到站时间和上下车客流量等信息。首先剔除两个站点的车次标识不对应的异常点,得到车次标识完全对应一致的数据,计算第二个站点按照第一个站点的车次顺序排列得到的车头时距,通过分析车头时距的正负来检测到达第二个站点实际的公交串车情况。若为正,说明没有发生串车,相反,若为负,说明发生了串车。然后要预测某一个车次到达第二个站点的公交串车情况,根据上述提取的线路标识、车次标识、到站时间和上下车客流量等数据,提取训练学习中每一天的小样本数据,包括两个站点的旅行时间、某一个车次在第一个站点的车头时距、某一车次和相邻上一车次分别在第一站点的上下车人数,以及相邻上一车次在第二个站点的上下车人数等信息,这些每一天的小样本数据组成一个大样本数据,根据所述的大样本数据建立预测模型,结合最小二乘支持向量机算法预测某一车次到达第二个站点的公交串车情况。The described method for high-precision prediction of bus strings based on IC card data extracts number identification, line identification, site identification, arrival time, and passenger flow of getting on and off the bus for multiple vehicle trips at two adjacent stations of the same bus line. quantity and other information. Firstly, the outliers where the train identifications of the two stations do not correspond are eliminated, and the data of the train identifications are completely corresponding and consistent, and the headway obtained by arranging the second station according to the order of the train numbers of the first station is calculated. By analyzing the normality of the headway Negative to detect the actual bus train situation to the second station. If it is positive, it means that there is no tandem traffic, on the contrary, if it is negative, it means that tandem traffic has occurred. Then, it is necessary to predict the situation of bus trains arriving at the second station for a certain train, and extract the small sample data of each day in the training and learning based on the above-mentioned extracted line identification, train identification, arrival time, and passenger flow, including The travel time of two stations, the headway of a certain train at the first station, the number of people getting on and off at the first station of a certain train and the adjacent previous train respectively, and the number of people getting on and off at the second station of the adjacent previous train The number of people getting on and off the bus and other information, these small sample data of each day form a large sample data, according to the large sample data to establish a prediction model, combined with the least squares support vector machine algorithm to predict a certain number of buses arriving at the second station String situation.

本发明的优点在于:The advantages of the present invention are:

1、本发明结合公交IC卡数据,针对多辆车次,提取大量的乘客信息,不需要车载GPS系统,方便快捷,降低了数据处理成本;1. The present invention combines bus IC card data to extract a large number of passenger information for multiple trips, without the need for a vehicle-mounted GPS system, which is convenient and fast, and reduces data processing costs;

2、本发明采用最小二乘支持向量机方法能够更好更快更有效地实现公交串车预测,使乘客能够更好地了解公交运行情况,合理调控出行时间;同时使公交运营部门也能够及时调整公交发车间隔,提升公交服务水平;2. The present invention adopts the method of least squares support vector machine to better, faster and more effectively realize the prediction of bus stringing, so that passengers can better understand the bus operation situation, and reasonably control the travel time; at the same time, the bus operation department can also timely Adjust bus departure intervals to improve bus service levels;

3、本发明考虑了上下车人数、到站时间、两站点间的旅行时间、两个相邻车次的车头时距等多个因素,处理数据简单,成本低,且有较高预测精度。3. The present invention considers multiple factors such as the number of people getting on and off the bus, the arrival time, the travel time between two stations, the headway of two adjacent trains, etc., the data processing is simple, the cost is low, and the prediction accuracy is high.

附图说明Description of drawings

图1为本发明所述的基于IC卡数据的公交串车预测方法的原理图;Fig. 1 is the schematic diagram of the bus string prediction method based on IC card data of the present invention;

图2为本发明所述的基于IC卡数据的公交串车预测方法的流程图。Fig. 2 is a flow chart of the method for predicting bus trains based on IC card data according to the present invention.

具体实施方式detailed description

下面结合附图和实施例对本发明做进一步的详细说明,以令本领域技术人员参照说明书文字能够据以实施。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments, so that those skilled in the art can implement it with reference to the description.

本发明提供一种基于IC卡数据的公交串车预测方法,包括以下步骤:The present invention provides a kind of bus stringing prediction method based on IC card data, comprising the following steps:

第一步,公交车IC卡数据采集:通过3G传输网络实时获取公交车IC卡刷卡信息,建立公交运行线路和车辆运行信息数据库。所述的公交车IC卡数据包括车次标识、线路标识、站点标识、到站时间、日期和上下车客流量等信息。从上述采集到的全网IC卡数据中获取一条易发生串车的公交线路,在此条公交线路的基础上找寻相邻两个目标站点,并进一步提取每天经过上述两个目标站点的公交车车次以及每个车次到达这两个目标站点的到站时间、上下车人数。The first step is bus IC card data collection: real-time acquisition of bus IC card swiping information through the 3G transmission network, and establishment of a bus operation route and vehicle operation information database. The bus IC card data includes information such as bus number identification, line identification, station identification, arrival time, date, passenger flow on and off the bus, and the like. Obtain a bus line that is prone to cross-traffic from the IC card data collected above, find two adjacent target stations on the basis of this bus line, and further extract the buses that pass through the above two target stations every day The number of trains, the arrival time of each train to the two target stations, and the number of people getting on and off.

由于每天交通状况不一,所以公交公司发放的公交车次数不尽相同,可根据不同车辆号或车次间隔,提取每天的车次标识。Due to the different traffic conditions every day, the number of buses issued by the bus company is not the same, and the daily bus number can be extracted according to different vehicle numbers or bus intervals.

第二步,数据处理:由于每天公交站点上下车人数具有随机性且不均匀,而且乘客刷卡信息也存在异常,需要进行车次标识匹配,将两个目标站点车次标识不对应的数据作为错误数据剔除,同时与车次标识对应的站点标识、到站时间和上下车客流量等信息也相应剔除,只保留两个目标站点具有相同车次标识的数据。The second step, data processing: Since the number of people getting on and off the bus at the bus station is random and uneven, and the passenger card information is also abnormal, it is necessary to match the bus number identification, and the data that does not correspond to the bus number identification of the two target stations is eliminated as the wrong data. At the same time, information such as station identification, arrival time, and passenger flow corresponding to the train number is also removed accordingly, and only the data with the same train number at the two target stations is retained.

第三步,实际公交串车情况检测:本发明将公交站点的公交串车情况分为二元状态,有串车情况的是1,没有串车情况的是0。根据第二步,在两个目标站点具有相同车次标识的情况下,第二个目标站点按照第一个目标站点的车次顺序排列得到的两相邻车次的车头时距(即Headway),可以得到实际的公交串车情况,若求得的车头时距为正,则说明没有串车情况,记为0;若求得的车头时距为负或者为0,则说明发生了串车情况,记为1。The 3rd step, the actual bus situation detection: the present invention divides the bus situation of the bus station into a binary state, the situation of the bus is 1, and the situation of the bus is 0. According to the second step, when the two target stations have the same train identification, the second target station is arranged according to the order of the first target station, and the headway of two adjacent trains (i.e. Headway) can be obtained In the actual situation of bus tandem, if the calculated headway is positive, it means that there is no tandem situation, which is recorded as 0; is 1.

所述的第一个目标站点定义为两个相邻的目标站点中先到达的站点,所述的第二个目标站点定义为两个相邻的目标站点中后到达的站点。The first target site is defined as the site that arrives first among the two adjacent target sites, and the second target site is defined as the site that arrives later among the two adjacent target sites.

所述的两相邻车次的车头时距(即Headway),就是当前车次在某一个目标站点的到站时间与相邻上一车次在同一个目标站点的到站时间的差值。The headway (ie Headway) of the two adjacent trains is the difference between the arrival time of the current train at a certain target site and the arrival time of the adjacent previous train at the same target site.

第四步,训练学习数据:要实时的预测当前车次到达第二个目标站点的公交串车情况,相关因素就包括相邻的上一个车次在第一个目标站点的到站时间和上下车人数、在第二个目标站点的到站时间和上下车人数,以及当前车次在第一个目标站点的到站时间和上下车人数。作为训练学习中的输入因素包括两个目标站点的旅行时间、两相邻车次在第一个目标站点之间的车头时距、相邻上一个车次和当前车次分别在第一个目标站点的上下车人数、以及相邻上一个车次在第二个目标站点的上下车人数。作为输出变量的因素只有1个,即公交串车情况。本发明中先提取每一天的小样本数据,然后按时间顺序组成一个大样本数据,按照训练集和测试集3:1的样本数据比例选出训练集。The fourth step is to train and learn data: to predict in real time the bus train situation of the current train arriving at the second target site, the relevant factors include the arrival time of the adjacent previous train at the first target site and the number of people getting on and off , the arrival time and the number of people getting on and off at the second target station, and the arrival time and the number of people getting on and off at the first target station of the current train number. The input factors in the training and learning include the travel time of two target stations, the headway distance between two adjacent trains at the first target station, the up and down of the adjacent previous train and the current train at the first target station respectively. The number of people in the car, and the number of people getting on and off at the second target station of the adjacent previous trip. There is only one factor as the output variable, that is, the situation of the bus running in parallel. In the present invention, the small sample data of each day is first extracted, and then a large sample data is formed in chronological order, and the training set is selected according to the sample data ratio of 3:1 between the training set and the test set.

所述的两个目标站点的旅行时间,在车次标识对应的情况下,两个目标站点的旅行时间就是当前车次在第二个目标站点的到站时间和第一个目标站点的到站时间的差值。由于两个目标站点是存在站间距的,根据国家规定的公交车最高车速,两个目标站点间的旅行时间是一个正值并且大于某一个定值,所以要剔除不符合规定的旅行时间数据,同时对应的车次标识、站点标识、到站时间和上下车客流量等信息也相应剔除。The travel time of the two target sites, under the corresponding situation of the train number identification, the travel time of the two target sites is exactly the difference between the arrival time of the current train number at the second target site and the arrival time of the first target site. difference. Since there is a station distance between the two target sites, according to the maximum speed of the bus stipulated by the state, the travel time between the two target sites is a positive value and greater than a certain value, so the travel time data that does not meet the regulations must be eliminated. At the same time, information such as the corresponding train number identification, station identification, arrival time, and passenger flow on and off the bus are also removed accordingly.

第五步,公交串车预测:本发明采用最小二乘支持向量机算法预测公交串车情况,根据第四步中选取的训练集建立预测模型对当前车次到达第二个目标站点的公交串车情况进行预测,得到预测值。The 5th step, the public bus run-in prediction: the present invention adopts the least squares support vector machine algorithm to predict the bus run-in situation, according to the training set that chooses in the 4th step, establishes the prediction model to the bus run-in of the current train number that arrives at the second target site Predict the situation and get the predicted value.

所述的最小二乘支持向量机(LS-SVM)算法是一种遵循结构风险最小化原则的核函数学习机器,LS-SVM应用于公交串车预测主要是运用它的回归算法。利用相邻历史数据建立LS-SVM模型,训练好模型后,得到一个回归函数,将预测输入向量带入回归函数,得到的输出值即为待预测的数据。The least squares support vector machine (LS-SVM) algorithm is a kernel function learning machine following the principle of structural risk minimization, and the application of LS-SVM to bus train prediction mainly uses its regression algorithm. The LS-SVM model is established by using adjacent historical data. After the model is trained, a regression function is obtained. The predicted input vector is brought into the regression function, and the output value obtained is the data to be predicted.

采用所述的LS-SVM应用于公交串车预测,包括训练建模和预测评估两个过程。The application of the LS-SVM to the bus train prediction includes two processes of training modeling and prediction evaluation.

其中,训练过程中,对于训练样本求解方程Among them, during the training process, for the training samples solve equation

方程中y为1维列向量,由训练样本的输出yi(i=1…l)构成;为1维列向量,1的个数为l个;γ为已经确定的超参数,b和α是需要求解的未知数,b为实数,α为1维列向量(称为Lagrange乘子),求解b和α的过程就是建模过程,Ω为核函数矩阵,有输入样本的输入xi通过核函数计算获得,公式为In the equation, y is a 1-dimensional column vector, which is composed of the output y i (i=1...l) of the training sample; is a 1-dimensional column vector, the number of 1 is l; γ is a determined hyperparameter, b and α are unknowns to be solved, b is a real number, α is a 1-dimensional column vector (called Lagrange multiplier), and solving The process of b and α is the modeling process, Ω is the kernel function matrix, and the input xi with input samples is calculated by the kernel function, the formula is

Ωij=K(xi,xj),Ω ij =K(x i ,x j ),

公式中选择径向基(RBF)函数作为核函数,表示为每一个径向基函数的中心对应于一个支持向量,此时得到的支持向量机是径向基函数分类器;The radial basis function (RBF) function is selected as the kernel function in the formula, expressed as The center of each radial basis function corresponds to a support vector, and the support vector machine obtained at this time is a radial basis function classifier;

解方程(1)的关键是求A的逆矩阵,A=Ω+γ-1I,获得A的逆矩阵后,既可获得参数b为:The key of solving equation (1) is to seek the inverse matrix of A, A=Ω+γ - 1I, after obtaining the inverse matrix of A, both can obtain parameter b as:

还可获得参数α为:The parameter α can also be obtained as:

获得b和α后,训练过程结束,获得模型如下:After obtaining b and α, the training process ends, and the obtained model is as follows:

根据式(3)描述的模型,对新的输入X来计算其输出f(X),这个过程称为预测过程。According to the model described by formula (3), the output f(X) is calculated for the new input X, and this process is called the prediction process.

在具体应用中,训练过程的计算量更大,将上述计算过程进行细化,获得下述过程:In specific applications, the amount of calculation in the training process is larger, and the above calculation process is refined to obtain the following process:

核函数矩阵的形成过程:The formation process of the kernel function matrix:

核函数矩阵的形成主要是计算不同输入向量的核函数,核函数采用RBF函数,其具体形式为:The formation of the kernel function matrix is mainly to calculate the kernel function of different input vectors. The kernel function adopts the RBF function, and its specific form is:

其中参数σ为训练前确定的超参数,采取K交叉验证方式来确定,具体过程为:The parameter σ is a hyperparameter determined before training, which is determined by K cross-validation. The specific process is:

步骤a,选定σ初值,σ=0.01;Step a, select the initial value of σ, σ=0.01;

步骤b,将训练集分成k份相等的子集,每次将其中k-1份数据作为训练数据,而将另外一份数据作为测试数据。这样重复k次,根据k次迭代后得到的MSE平均值来估计期望泛化误差,最后选择一组最优的参数,并作为核函数K(x,xi)的参数σ。In step b, the training set is divided into k equal subsets, and each time k-1 data is used as training data, and the other data is used as test data. This is repeated k times, and the expected generalization error is estimated according to the average MSE obtained after k iterations, and finally a set of optimal parameters is selected as the parameter σ of the kernel function K(x, xi ).

RBF函数的计算涉及向量的2范数计算和指数函数的计算,按照Ωij=K(xi,xj)的定义,对于l个训练样本,Ωij为l×l维矩阵,即l个样本中任意两个进行核函数计算而获得核函数矩阵。The calculation of the RBF function involves the calculation of the 2-norm of the vector and the calculation of the exponential function. According to the definition of Ω ij =K( xi , x j ), for l training samples, Ω ij is an l×l-dimensional matrix, that is, l Kernel function calculations are performed on any two of the samples to obtain a kernel function matrix.

核函数矩阵求逆过程:Kernel function matrix inversion process:

获得Ωij后,即可构成矩阵A,从Ωij的计算过程可知,A为对称正定阵。如果获得矩阵A的逆矩阵A-1,则根据式(2)即可获得b和α。求矩阵A的逆矩阵的过程即为训练过程的关键环节。After Ω ij is obtained, matrix A can be formed. From the calculation process of Ω ij , A is a symmetric positive definite matrix. If the inverse matrix A -1 of the matrix A is obtained, b and α can be obtained according to formula (2). The process of finding the inverse matrix of matrix A is the key link in the training process.

实施例Example

如图1所示,两辆相邻车次公交车V1和V2在同一个线路行驶的过程中分为两种情况:情况1:没有公交串车的理想状态:首先在9点02分的时候,公交车V1和V2分别在第1站点和第4站点;在9点18分的时候,公交车V1和V2分别到达了第5站点和第8站点;然后在9点24分的时候,V1到达了第7站点,V2到达了第10站点。两个公交车V1和V2一直保持着三站左右的距离,在各个站点等待的乘客数量也基本比较平均,没有相遇情况,没有发生串车情况。As shown in Figure 1, two adjacent buses V1 and V2 are divided into two situations in the process of driving on the same line: Situation 1: The ideal state without bus strings: first at 9:02, Buses V1 and V2 are at the 1st and 4th stations respectively; at 9:18, buses V1 and V2 arrive at the 5th and 8th stations respectively; then at 9:24, V1 arrives After reaching the 7th station, V2 reached the 10th station. The two buses V1 and V2 have always maintained a distance of about three stops, and the number of passengers waiting at each stop is basically average, and there is no encounter or crossing.

情况2:公交串车状态:首先在9点02分的时候,公交车V1和V2分别在第一站点和第4站点,各个站点等待的乘客基本比较平均;在9点10的时候,V1到达第3站点,V2到达第5站点和第6站点之间,两列公交车距离开始拉近,V2行驶比较慢,到达站点比较晚,在第6站点之后站点等待的乘客数量增加;9点13分的时候,V1到达第4站点,V2马上到达第6站点;然后在9点19分的时候,V1到达第6站点,V2刚驶出第7站点,两列公交车距离非常接近;最后在9点34分的时候,V1和V2在第10站点相遇,而在下一站第11站等待的乘客数量非常大,发生了公交串车情况。这就说明一个滚雪球效应,一列公交车V2延迟增加下一站乘客的数量,也增加了停车延误时间,显然,这也增加了总线的延迟。另一方面,下一列公交车V1的乘客将会减少,也减少了停车延误时间,没有延迟。Situation 2: The status of the bus string: First, at 9:02, buses V1 and V2 are at the first stop and the fourth stop respectively, and the passengers waiting at each stop are basically average; at 9:10, V1 arrives At the 3rd station, V2 arrives between the 5th and 6th stations, the distance between the two buses begins to shorten, V2 travels slowly and arrives at the station later, and the number of passengers waiting at the station after the 6th station increases; 9:13 At the minute, V1 arrived at the 4th station, and V2 immediately arrived at the 6th station; then at 9:19, V1 arrived at the 6th station, and V2 just left the 7th station, the distance between the two buses was very close; finally at At 9:34, V1 and V2 met at the 10th station, and the number of passengers waiting at the next station, the 11th station, was very large, and a bus jam occurred. This shows a snowball effect, a bus V2 delay increases the number of passengers at the next stop, and also increases the stop delay time, obviously, this also increases the delay of the bus. On the other hand, the passengers of the next bus V1 will be reduced, and the parking delay time is also reduced, and there is no delay.

预测的评价指标的定义如下:The prediction evaluation metrics are defined as follows:

对于公交串车这样有两个类别的分类问题,将样本分为串车情况,用1表示,和不串车情况,用0表示。对一个串车二分问题来说,如果一个样本是串车1并且也被预测成串车1,这个样本就是一个正确的串车数量;相应地,一个样本是不串车0被预测成不串车0,这个样本就是一个正确的不串车数量;For the classification problem with two categories, such as the bus shuffle, the samples are divided into the shuffle situation, represented by 1, and the non-shuffle situation, represented by 0. For a split car problem, if a sample is a split car 1 and is also predicted to be a split car 1, this sample is a correct number of split cars; correspondingly, a sample that is not a split car 0 is predicted as not a split car Car 0, this sample is a correct number of cars that are not linked together;

下面预测指标,常常用于评价分类算法的性能:The following predictors are often used to evaluate the performance of classification algorithms:

(1)准确率:计算的是所有的样本中,预测的正确的样本(包括正确的串车样本和正确的不串车样本)所占比例。(1) Accuracy rate: the calculation is the proportion of the predicted correct samples (including the correct sample of the train and the correct sample of the non-travel) among all the samples.

(2)正确串车率:计算的是正确的串车样本占所有串车样本的比例。(2) Correct splitting rate: the calculation is the proportion of correct splitting samples to all splitting samples.

本实施例中,为了方便本发明的参数理解及算法体现,对六个步骤中的具体基础数据进行具体说明。In this embodiment, in order to facilitate the parameter understanding and algorithm realization of the present invention, the specific basic data in the six steps are described in detail.

公交IC卡数据有北京市XX公司提供,公交IC卡数据包括车次标识、线路标识、站点标识、到站时间和上下车客流量、记录号,交易类型,交易序号,交易日期,交易时间,SAM卡号,城市号,卡发行号,卡类型,线路号,车辆号,上车站,下车站,司机号和卡号等信息。以北京市XX公司XX路公交车到达某两个目标站点SA,SB为例,四个月的车次标识、站点标识、到站时间和上、下车客流量基础数据如表1和表2:The bus IC card data is provided by Beijing XX Company. The bus IC card data includes train number identification, line identification, station identification, arrival time and passenger flow, record number, transaction type, transaction serial number, transaction date, transaction time, SAM Card number, city number, card issue number, card type, line number, vehicle number, boarding station, getting off station, driver number and card number and other information. Taking the XX bus of Beijing XX Company to arrive at two target stations SA and SB as an example, the basic data of the four-month bus number, station logo, arrival time and passenger flow of boarding and disembarking are shown in Table 1 and Table 2:

表1:北京市XX公司XX路公交车到达第一个目标站点SA的基础数据Table 1: Basic data of Beijing XX company XX bus arriving at the first target stop SA

表2:北京市XX公司XX路公交车到达第二个目标站点SB的基础数据Table 2: Basic data of Beijing XX company XX bus arriving at the second target stop SB

数据处理主要包括以下几步:Data processing mainly includes the following steps:

1、去除异常点。1. Remove abnormal points.

首先处理数据是按照一天数据进行筛选的,例如提取某一天的基础数据,第二个站点的车次标识、到站时间和上下车客流量要按照第一个站点的车次顺序排列,求得两个站点的旅行时间,根据两个站点的站间距,且知道国家规定的公交车最高车速,可以得出旅行时间是一个正值并且大于某一个定值,所以剔除不符合要求的数据,包括车次标识、到站时间和上下车客流量等信息;First, the processed data is screened according to the data of one day. For example, the basic data of a certain day is extracted. The train number, arrival time and passenger flow of the second station should be arranged in the order of the train number of the first station. The travel time of the station, according to the station distance between the two stations, and knowing the maximum speed of the bus stipulated by the state, it can be concluded that the travel time is a positive value and greater than a certain value, so the data that does not meet the requirements, including the number of trains, is eliminated , arrival time, passenger flow and other information;

然后计算相邻两个车次的车头时距,即Headway,由于第一步中剔除了不符合要求的数据,所以有一些车次不相邻,那只能计算上一个Headway和下一个Headway;Then calculate the headway of two adjacent trains, that is, Headway. Since the data that does not meet the requirements are eliminated in the first step, some trains are not adjacent, so only the previous Headway and the next Headway can be calculated;

2、输入变量。2. Input variables.

要实时的预测当前车次到达下一目标站点SB的公交串车情况,相关因素就包括相邻的上一个车次在站点SA的到站时间和上下车人数,在下一站点SB的到站时间和上下车人数,以及本车次在上一站点SA的到站时间和上下车人数。In order to predict in real time the situation of the current train arriving at the next target station SB, the relevant factors include the arrival time of the adjacent previous train at the station SA and the number of people getting on and off, and the arrival time and the number of people getting on and off at the next station SB. The number of people on the bus, as well as the arrival time of this train at the previous station SA and the number of people getting on and off.

作为输入变量的因素有8个,包括两个目标站点的旅行时间,当前车次在第一目标站点相邻两个车次之间的车头时距,上一车次和当前车次分别在第一目标站点的上下车人数,以及上一车次在第二目标站点的上下车人数。There are 8 factors as input variables, including the travel time of the two target stations, the headway distance between the two adjacent trains of the current train at the first target site, and the distance between the last train and the current train at the first target site respectively. The number of people getting on and off, and the number of people getting on and off at the second target station in the last trip.

3、输出变量。3. Output variables.

本发明基于IC卡数据预测公交串车情况,作为输出变量的因素有1个,即公交串车情况,若本车次在下一站点SB发生串车,记为1,若没有发生串车,记为0。The present invention is based on the IC card data to predict the situation of bus tandem, and there is one factor as an output variable, that is, the bus tandem situation, if this bus trip occurs at the next station SB, it is recorded as 1, if there is no tandem, it is recorded as 0.

以北京市XX公司XX路公交车到达某两个目标站点SA,SB为例,四个月处理后的基础数据,包括8个输入变量和1个输出变量,还有日期如下表3所示:Taking the XX bus of Beijing XX Company to arrive at two target stations SA and SB as an example, the basic data after four months of processing include 8 input variables and 1 output variable, and the date is shown in Table 3 below:

表3table 3

本发明基于公交IC卡数据预测公交串车情况,采用的算法是最小二乘支持向量机(Least Squares Support Vector Machines,LS-SVM),采用的工具是MATLAB 2013b,选取前三个月(20120702-20120930)的数据作为训练数据,后一个月(20121001-20121029)的数据作为测试数据,通过预算最后得出预测结果如下表4所示:The present invention predicts the situation of bus stringing vehicles based on bus IC card data, the algorithm that adopts is Least Squares Support Vector Machines (Least Squares Support Vector Machines, LS-SVM), the tool that adopts is MATLAB 2013b, selects the first three months (20120702- The data of 20120930) is used as the training data, and the data of the next month (20121001-20121029) is used as the test data. The prediction results are finally obtained through the budget as shown in Table 4 below:

表4预测结果Table 4 prediction results

公交串车预测结果Prediction results of bus stringing 准确率(%)Accuracy(%) 正确串车率(%)Correct shuffle rate (%) 预测精度Prediction accuracy 93.77%93.77% 85.54%85.54%

本发明基于IC卡数据的公交串车预测方法,预测精度高,准确率达93.77%。可以实时的预测本车次在下一目标站点的公交串车情况,使公交乘客及时地了解公交运行现状,合理安排出行时间,提高出行效率;同时公交运营部门也可以通过此预测结果适当的调度公交车,调整发车间隔,避免公交串车情况发生,提高了公交的服务水平和服务质量。The bus train prediction method based on the IC card data of the present invention has high prediction precision, and the accuracy rate reaches 93.77%. Real-time prediction of the bus traffic situation at the next target station for this trip can be made, so that bus passengers can timely understand the bus operation status, arrange travel time reasonably, and improve travel efficiency; at the same time, the bus operation department can also use this prediction result to properly dispatch buses , Adjust the departure interval, avoid the occurrence of bus strings, and improve the service level and quality of public transport.

Claims (6)

1.一种基于IC卡数据的公交串车预测方法,其特征在于:包括以下步骤,1. a method for predicting bus strings based on IC card data, is characterized in that: comprise the following steps, 第一步,公交车IC卡数据采集:通过3G传输网络实时获取公交车IC卡刷卡信息,所述的公交车IC卡数据包括车次标识、线路标识、站点标识、到站时间、日期和上下车客流量信息;从上述采集到的IC卡数据中选取一条易发生串车的公交线路,在此条公交线路的基础上找寻相邻两个目标站点,并进一步提取每天经过上述两个目标站点的公交车车次以及每个车次到达这两个目标站点的到站时间和上下车人数;The first step, bus IC card data collection: real-time acquisition of bus IC card swiping information through the 3G transmission network, the bus IC card data includes bus number identification, line identification, station identification, arrival time, date and boarding and alighting Passenger flow information; select a bus line that is prone to cross-traffic from the above-mentioned collected IC card data, find two adjacent target sites on the basis of this bus line, and further extract the traffic that passes through the above two target sites every day The number of bus trips and the arrival time of each bus trip to the two target stations and the number of people getting on and off the bus; 第二步,数据处理:需要做车次标识的匹配,将两个目标站点车次标识不对应的数据作为错误数据剔除,同时与车次标识对应的站点标识、到站时间和上下车客流量信息也相应剔除,只保留两个目标站点具有相同车次标识的数据;The second step, data processing: it is necessary to match the train number identification, and remove the data that does not correspond to the train number identification of the two target stations as error data, and at the same time, the station identification, arrival time and passenger flow information corresponding to the train number identification are also corresponding Eliminate, only keep the data with the same train number at the two target stations; 第三步,实际公交串车情况检测:在两个目标站点具有相同车次标识的情况下,第二个目标站点按照第一个目标站点的车次顺序排列得到两相邻车次的车头时距,若求得的车头时距为正,则说明没有串车情况,记为0;若求得的车头时距为负或者为0,则说明发生了串车情况,记为1;The third step is the detection of the actual bus train situation: in the case that two target sites have the same train number, the second target site is arranged according to the order of the first target site's train numbers to obtain the headway distance of two adjacent trains, if If the obtained headway is positive, it means that there is no cross-traffic situation, which is recorded as 0; if the obtained headway is negative or 0, it indicates that there is a cross-traffic situation, which is recorded as 1; 第四步,训练学习数据:要实时的预测当前车次到达第二个目标站点的公交串车情况,相关因素就包括相邻的上一个车次在第一个目标站点的到站时间和上下车人数、在第二个目标站点的到站时间和上下车人数,以及当前车次在第一个目标站点的到站时间和上下车人数;作为训练学习中的输入因素包括两个目标站点的旅行时间、相邻两个车次在第一个目标站点之间的车头时距、相邻上一个车次和当前车次分别在第一个目标站点的上下车人数、以及相邻上一个车次在第二个目标站点的上下车人数;作为输出变量的因素只有1个,即公交串车情况;先提取每一天的小样本数据,然后按时间顺序组成一个大样本数据,按照训练集和测试集3:1的样本数据比例选出训练集;The fourth step is to train and learn data: to predict in real time the bus connection situation of the current train to the second target site, the relevant factors include the arrival time of the adjacent previous train at the first target site and the number of people getting on and off , the arrival time and the number of people getting on and off at the second target site, and the arrival time and the number of people getting on and off at the first target site of the current train number; the input factors in the training and learning include the travel time of the two target sites, The headway distance between two adjacent trains at the first target station, the number of people getting on and off at the first target station of the previous adjacent train and the current train respectively, and the number of people getting on and off at the second target station of the previous adjacent train The number of people getting on and off the bus; there is only one factor as the output variable, that is, the bus situation; first extract the small sample data of each day, and then form a large sample data in chronological order, according to the 3:1 sample of the training set and the test set The data ratio selects the training set; 第五步,公交串车预测:采用最小二乘支持向量机算法预测公交串车情况,根据第四步中选取的训练集建立预测模型对当前车次到达第二个目标站点的公交串车情况进行预测,得到预测值。The fifth step, bus tandem prediction: use the least squares support vector machine algorithm to predict the bus tandem situation, and establish a prediction model based on the training set selected in the fourth step to predict the bus tandem situation of the current bus number arriving at the second target station. Forecast, get the predicted value. 2.根据权利要求1所述的一种基于IC卡数据的公交串车预测方法,其特征在于,第三步中所述的第一个目标站点定义为两个相邻的目标站点中先到达的站点,所述的第二个目标站点定义为两个相邻的目标站点中后到达的站点。2. a kind of bus string prediction method based on IC card data according to claim 1, is characterized in that, the first target site described in the 3rd step is defined as arriving earlier in two adjacent target sites site, the second target site is defined as the later-arrived site among the two adjacent target sites. 3.根据权利要求1所述的一种基于IC卡数据的公交串车预测方法,其特征在于,第三步中所述的两相邻车次的车头时距,就是当前车次在某一个目标站点的到站时间与相邻上一车次在同一个目标站点的到站时间的差值。3. a kind of bus string prediction method based on IC card data according to claim 1, is characterized in that, the headway distance of two adjacent trips described in the 3rd step is exactly current trip at a certain target site The difference between the arrival time of the train and the arrival time of the previous adjacent train at the same target station. 4.根据权利要求1所述的一种基于IC卡数据的公交串车预测方法,其特征在于,第四步中所述的两个目标站点的旅行时间,就是在车次标识对应的情况下,当前车次在第二个目标站点的到站时间和第一个目标站点的到站时间的差值。4. a kind of bus string prediction method based on IC card data according to claim 1, is characterized in that, the travel time of two target sites described in the 4th step, under the corresponding situation of train number identification exactly, The difference between the arrival time of the current train at the second destination station and the arrival time of the first destination station. 5.根据权利要求1所述的一种基于IC卡数据的公交串车预测方法,其特征在于,第五步中所述的预测模型表示为其中选择径向基函数作为核函数,表示为αi为拉格朗日乘子α的数组元素,5. a kind of bus string prediction method based on IC card data according to claim 1, is characterized in that, the prediction model described in the 5th step is expressed as Among them, the radial basis function is selected as the kernel function, expressed as α i is the array element of Lagrange multiplier α, and 偏移量 Offset 其中矩阵核函数矩阵Ω=K(xi,xj);为一维列向量;σ为训练前确定的超参数;γ为已经确定的超参数;xi及x为输入样本输入量。where matrix Kernel function matrix Ω=K( xi ,x j ); is a one-dimensional column vector; σ is a hyperparameter determined before training; γ is a hyperparameter that has been determined; x i and x are the input samples. 6.根据权利要求5所述的一种基于IC卡数据的公交串车预测方法,其特征在于,所述的径向基核函数K(x,xi)中参数σ采取K交叉验证方式来确定,具体过程为:6. a kind of bus string prediction method based on IC card data according to claim 5, is characterized in that, in described radial basis kernel function K (x, x i ), parameter σ takes K cross-validation mode to come OK, the specific process is: 步骤a,选定σ初值,σ=0.01;Step a, select the initial value of σ, σ=0.01; 步骤b,建立LS-SVM模型;Step b, establishing the LS-SVM model; 步骤c,将选出的训练集分成k份相等的子集,每次将其中k-1份数据作为训练数据,而将另外一份数据作为测试数据;这样重复k次,根据k次迭代后得到的MSE平均值来估计期望泛化误差,最后选择一组最优的参数值,并作为核函数K(x,xi)的参数σ。In step c, the selected training set is divided into k equal subsets, and each time k-1 data is used as training data, and the other data is used as test data; this is repeated k times, according to k iterations The average value of the obtained MSE is used to estimate the expected generalization error, and finally a set of optimal parameter values is selected as the parameter σ of the kernel function K(x, xi ).
CN201510483302.3A 2015-08-07 2015-08-07 A kind of public transport bunching Forecasting Methodology based on IC-card data Expired - Fee Related CN105206040B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510483302.3A CN105206040B (en) 2015-08-07 2015-08-07 A kind of public transport bunching Forecasting Methodology based on IC-card data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510483302.3A CN105206040B (en) 2015-08-07 2015-08-07 A kind of public transport bunching Forecasting Methodology based on IC-card data

Publications (2)

Publication Number Publication Date
CN105206040A CN105206040A (en) 2015-12-30
CN105206040B true CN105206040B (en) 2017-06-23

Family

ID=54953682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510483302.3A Expired - Fee Related CN105206040B (en) 2015-08-07 2015-08-07 A kind of public transport bunching Forecasting Methodology based on IC-card data

Country Status (1)

Country Link
CN (1) CN105206040B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106327867B (en) * 2016-08-30 2020-02-11 北京航空航天大学 Bus punctuation prediction method based on GPS data
CN106448139B (en) * 2016-11-18 2018-10-26 山东浪潮云服务信息科技有限公司 A kind of intelligent public transportation dispatching method and apparatus
CN107220724B (en) * 2017-04-21 2020-12-08 北京航空航天大学 Passenger flow prediction method and device
CN111341096B (en) * 2020-02-06 2020-12-18 长安大学 A method for evaluating bus operation status based on GPS data
CN112269930B (en) * 2020-10-26 2023-10-24 北京百度网讯科技有限公司 Regional heat prediction model and regional heat prediction method and device
CN112347596B (en) * 2020-11-05 2021-08-13 浙江非线数联科技股份有限公司 Urban public transport network optimization method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100809558B1 (en) * 2004-09-17 2008-03-04 김재호 System and method for controlling bus operation
CN101615340A (en) * 2009-07-24 2009-12-30 北京工业大学 Real-time Information Processing Method in Dynamic Bus Dispatching
CN102063791B (en) * 2010-12-17 2013-06-05 北京公共交通控股(集团)有限公司 Public transport travelling control method by combining signal control with positioning monitoring
CN102737500B (en) * 2012-06-05 2015-01-28 东南大学 Method for acquiring arrival interval reliability of urban bus
CN102737503B (en) * 2012-06-20 2014-10-29 东南大学 Communication connectivity analysis method for bus dynamic scheduling under internet of vehicles environment
US9659492B2 (en) * 2013-01-11 2017-05-23 Here Global B.V. Real-time vehicle spacing control
CN104408908B (en) * 2014-11-05 2016-09-07 东南大学 Public transit vehicle more station dispatching method and system

Also Published As

Publication number Publication date
CN105206040A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN105206040B (en) A kind of public transport bunching Forecasting Methodology based on IC-card data
CN103440768B (en) Dynamic-correction-based real-time bus arrival time predicting method
CN104064028B (en) Based on public transport arrival time Forecasting Methodology and the system of multiple information data
Yu et al. Headway-based bus bunching prediction using transit smart card data
CN104658252B (en) Method for evaluating traffic operational conditions of highway based on multisource data fusion
CN104637334B (en) A kind of bus arrival time real-time predicting method
CN104463364B (en) A kind of Metro Passenger real-time distribution and subway real-time density Forecasting Methodology and system
CN105469602B (en) A kind of Forecasting Methodology of the bus passenger waiting time scope based on IC-card data
CN105390013B (en) A method of public transport arrival time is predicted using bus IC card
CN104573849A (en) Bus dispatch optimization method for predicting passenger flow based on ARIMA model
CN108364464B (en) A Probabilistic Model-Based Method for Modeling Travel Time of Public Transport Vehicles
CN118536681B (en) Train arrival late prediction method based on knowledge data collaborative drive and application thereof
Nesheli et al. Public transport user's perception and decision assessment using tactic-based guidelines
CN105303245A (en) Traffic analysis system and traffic analysis method
CN105608502A (en) Prediction method for stopping time in bus stop based on regression fitting
CN103870890A (en) Prediction method for traffic flow distribution of expressway network
Zheng et al. Improved iterative prediction for multiple stop arrival time using a support vector machine
CN104361543B (en) A kind of public bus network evaluation method based on space-time chain model
CN114092297B (en) A rail transit data processing method, device, equipment and storage medium
CN106327867A (en) Bus punctuality prediction method based on GPS data
Xia et al. A distributed EMDN-GRU model on Spark for passenger waiting time forecasting
Li et al. Intersection traffic signal optimisation considering the impact of upstream curbside bus stops
Marszal et al. Phase separation induces congestion waves in electric vehicle charging
CN105303246A (en) Multiline arrival time prediction for public transportation
CN105206037B (en) Public bus network analysis method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170623