CN110458244B - Traffic accident severity prediction method applied to regional road network - Google Patents
Traffic accident severity prediction method applied to regional road network Download PDFInfo
- Publication number
- CN110458244B CN110458244B CN201910770584.3A CN201910770584A CN110458244B CN 110458244 B CN110458244 B CN 110458244B CN 201910770584 A CN201910770584 A CN 201910770584A CN 110458244 B CN110458244 B CN 110458244B
- Authority
- CN
- China
- Prior art keywords
- accident
- model
- formula
- value
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 206010039203 Road traffic accident Diseases 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000003066 decision tree Methods 0.000 claims abstract description 43
- 230000003993 interaction Effects 0.000 claims abstract description 38
- 230000035945 sensitivity Effects 0.000 claims abstract description 22
- 238000007477 logistic regression Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000011218 segmentation Effects 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 19
- 238000007476 Maximum Likelihood Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 6
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims description 3
- 239000011541 reaction mixture Substances 0.000 claims 3
- 150000001875 compounds Chemical class 0.000 claims 2
- 238000012163 sequencing technique Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 6
- 230000002411 adverse Effects 0.000 abstract description 3
- 230000000875 corresponding effect Effects 0.000 description 21
- 238000012360 testing method Methods 0.000 description 4
- 238000000546 chi-square test Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 238000001772 Wald test Methods 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种应用于区域路网的交通事故严重度预测方法,属于道路交通安全分析技术领域。The invention relates to a traffic accident severity prediction method applied to a regional road network, and belongs to the technical field of road traffic safety analysis.
背景技术Background technique
据全球道路安全状况报告,道路交通事故是全球第八大死亡原因,造成每年超过135万人死亡,道路交通安全逐渐成为全球都在关注的重大焦点问题。依靠交通事故数据分析来确定影响事故严重度的因素和提出降低死亡事故风险的对策,是目前最实际的交通安全改善措施之一。然而,道路交通事故是涉及各种驾驶员对外部环境反应,以及车辆、道路状况、交通因素和环境因素之间相互作用的复杂事件,可能存在未观测到的事故影响因素,这使得交通事故数据具有高度异质性,而且事故严重度可能受到各因素之间交互作用的影响。According to the Global State of Road Safety Report, road traffic accidents are the eighth leading cause of death in the world, killing more than 1.35 million people every year. Road traffic safety has gradually become a major focus of global attention. Relying on the analysis of traffic accident data to determine the factors affecting the severity of the accident and to propose countermeasures to reduce the risk of fatal accidents is one of the most practical measures to improve traffic safety at present. However, road traffic accidents are complex events involving various driver responses to the external environment, as well as the interaction between vehicles, road conditions, traffic factors and environmental factors, and there may be unobserved accident influencing factors, which makes traffic accident data There is high heterogeneity, and accident severity may be affected by the interaction between various factors.
在事故严重度(死亡和非死亡事故)分析方法方面,二元logistic回归模型应用最为广泛。然而,该方法忽略了事故数据的异质性和各自变量之间的交互作用对分析结果的影响,可能会导致不准确的参数估计或忽略重要的隐藏的关系。余荣杰等人利用潜在类别分析将事故数据划分为若干同质潜在类别降低事故数据异质性对分析结果的影响(Yu R,Wang X,Abdel-Aty M.A Hybrid Latent Class Analysis Modeling Approach toAnalyze Urban Expressway Crash Risk[J].AccidentAnalysis and Prevention,2017,101:37-43.)。Rusli等人利用决策树筛选自变量间的高阶交互作用,并将高阶交互项和主效应相结合纳入事故严重度模型,定量分析自变量的交互作用对事故严重度的影响,而该方法仅考虑了自变量间的高阶交互作用忽略了自变量间存在的各阶交互作用(RusdiRusli,Md.Mazharul Haque,Mohammad Saifuzzaman,Mark King.Crash severity alongrural mountainous highways in Malaysia:An application of a combined decisiontree and logistic regression model[J].Traffic Injury Prevention,2018,19(7):741-748.)。此外,传统的二元logistic回归模型仅考虑模型的整体预测精度,选取0.5作为模型分类阈值。然而,交通事故数据中死亡事故往往占比较少(即该数据为非平衡数据),采用0.5作为分类阈值虽然使模型能够获得较高的整体预测精度,但会使敏感度过低,使其失去预测意义。In the analysis of accident severity (fatality and non-fatal accidents), the binary logistic regression model is the most widely used. However, this method ignores the heterogeneity of accident data and the influence of interactions between individual variables on the analysis results, which may lead to inaccurate parameter estimates or ignore important hidden relationships. Yu Rongjie et al. used latent class analysis to divide accident data into several homogeneous latent classes to reduce the impact of accident data heterogeneity on analysis results (Yu R, Wang X, Abdel-Aty M.A Hybrid Latent Class Analysis Modeling Approach to Analyze Urban Expressway Crash Risk [J]. Accident Analysis and Prevention, 2017, 101: 37-43.). Rusli et al. used decision trees to screen high-order interactions between independent variables, and combined high-order interaction terms and main effects into the accident severity model to quantitatively analyze the impact of the interaction of independent variables on accident severity. Only the higher-order interactions between independent variables are considered and the various-order interactions between independent variables are ignored (RusdiRusli, Md. Mazharul Haque, Mohammad Saifuzzaman, Mark King. Crash severity alongrural mountainouss in Malaysia: An application of a combined decisiontree and logistic regression model[J].Traffic Injury Prevention,2018,19(7):741-748.). In addition, the traditional binary logistic regression model only considers the overall prediction accuracy of the model, and selects 0.5 as the model classification threshold. However, the proportion of fatal accidents in the traffic accident data is often small (that is, the data is unbalanced data). Although the use of 0.5 as the classification threshold allows the model to obtain a higher overall prediction accuracy, it will make the sensitivity too low and make it lose predictive significance.
发明内容SUMMARY OF THE INVENTION
本发明为克服现有技术的不足之处,提出一种应用于区域路网的交通事故严重度预测方法,以期能降低事故数据异质性对分析结果的不利影响、识别自变量的交互作用项和调整预测模型分类阈值,从而能克服传统交通事故严重度预测模型忽略交互作用项和非平衡数据综合预测效果差的问题,提高事故严重度模型的预测精度和拟合优度。In order to overcome the shortcomings of the prior art, the present invention proposes a traffic accident severity prediction method applied to a regional road network, in order to reduce the adverse effect of the heterogeneity of accident data on the analysis results, and to identify the interaction terms of independent variables. By adjusting the classification threshold of the prediction model, it can overcome the problem that the traditional traffic accident severity prediction model ignores the interaction term and the poor comprehensive prediction effect of the unbalanced data, and improves the prediction accuracy and goodness of fit of the accident severity model.
为达到上述目的,本发明采用如下技术方案:To achieve the above object, the present invention adopts the following technical solutions:
本发明一种应用于区域路网的交通事故严重度预测方法的特点是按如下步骤进行:The characteristics of a traffic accident severity prediction method applied to a regional road network according to the present invention are carried out according to the following steps:
步骤一、区域路网道路交通事故数据的采集与预处理;
从道路交通事故数据库中获取N起事故数据作为事故数据集D,并从任意第i起事故数据中选取K个分类变量组成集合X={x1,x2,…,xk,…,xK}来表征第i起事故,其中,xk表示第k个分类变量,且第k个分类变量xk包含Ck种类别,第k个分类变量xk在Ck种类别中的取值记为sk,令sik表示第i起事故的第k个分类变量的取值,则第i起事故中所有K个分类变量的取值所组成的分类变量取值集合记为Si={si1,si2,...,sik,...,siK};令表示第i起事故的K个分类变量的所有可能取值中的任意一种取值集合;k=1,2,3,...,K;i=1,2,3,...,N;Obtain N accident data from the road traffic accident database as accident data set D, and select K categorical variables from any i-th accident data to form a set X={x 1 ,x 2 ,...,x k ,...,x K } to represent the i-th accident, where x k represents the k-th categorical variable, and the k-th categorical variable x k contains C k categories, and the value of the k-th categorical variable x k in the C k categories Denoted as s k , let s ik represent the value of the kth categorical variable of the ith accident, then the set of categorical variable values composed of the values of all K categorical variables in the ith accident is denoted as S i = {s i1 ,s i2 ,...,s ik ,...,s iK }; let Represents any set of possible values of the K categorical variables of the ith accident; k=1,2,3,...,K; i=1,2,3,..., N;
将第i起事故的严重度作为预测变量,记为yi,且yi的取值为“0”或“1”分别表示非死亡事故和死亡事故;Take the severity of the i-th accident as a predictor variable, denoted as y i , and the value of y i is "0" or "1" to indicate a non-fatal accident and a fatal accident, respectively;
步骤二、根据区域路网道路交通事故数据,建立潜在类别分析模型;Step 2: Establish a potential category analysis model according to the road traffic accident data of the regional road network;
步骤2.1、定义所述潜在类别分析模型中存在一个潜在类别变量V,V包含T种类别,且任意一种类别记为t,t=1,2,...,T;令第i起事故中潜在类别变量V的取值记为Vi;Step 2.1. Define that there is a latent category variable V in the latent category analysis model, V contains T categories, and any category is denoted as t, t=1,2,...,T; let the ith accident The value of the latent categorical variable V in is denoted as V i ;
步骤2.1.1、定义外循环次数为τ、最大外循环迭代次数为τmax;令第τ次所设置的类别数目为Tτ;初始化τ=1;Step 2.1.1, define the number of outer loops to be τ, and the maximum number of outer loop iterations to be τ max ; make the number of categories set for the τ th time be T τ ; initialize τ=1;
步骤2.1.2、初始化t=1;Step 2.1.2, initialize t=1;
步骤2.1.3、初利用式(1)得到第i起事故Vi取值为t,即属于第t种潜在类别时,第i起事故在K个分类变量上的取值集合为的条件概率 Step 2.1.3. Initially use formula (1) to obtain the value of the i-th accident V i as t, that is, when it belongs to the t-th potential category, the value set of the i-th accident on the K categorical variables is: The conditional probability of
式(1)中,P(sik=sk|Vi=t)表示第i起事故属于第t个潜在类别时,第k个分类变量上取值为sk的条件概率;In formula (1), P(s ik =s k |V i =t) represents the conditional probability of the value of s k on the k-th categorical variable when the i-th accident belongs to the t-th potential category;
步骤2.1.4、利用式(2)得到第i起事故中K个分类变量取值集合为的非条件概率即潜在类别分析模型的联合概率 Step 2.1.4, use formula (2) to obtain the value set of K categorical variables in the ith accident as The unconditional probability of is the joint probability of the latent class analysis model
式(2)中,P(Vi=t)是第i起事故属于第t个潜在类别的概率,潜在类别t占总体的比率;In formula (2), P(V i =t) is the probability that the i-th accident belongs to the t-th potential category, and the ratio of the potential category t to the population;
步骤2.2、采用极大似然法进行模型参数估计,得到潜在类别概率和分类变量条件概率的估计值以及潜在类别分析模型的第τ次极大似然函数值Lτ;Step 2.2. Use the maximum likelihood method to estimate the model parameters to obtain the estimated values of the latent class probability and the conditional probability of the categorical variable and the τth maximum likelihood function value L τ of the latent class analysis model;
步骤2.3、利用式(3)计算第i起事故被分类到第t个潜在类别的后验概率 Step 2.3. Use equation (3) to calculate the posterior probability that the i-th accident is classified into the t-th potential category
步骤2.4、令t+1赋值给t,并判断t>Tτ是否成立,若成立,则执行步骤2.5;否则,返回步骤2.1.3执行;Step 2.4, assign t+1 to t, and determine whether t>T τ is established, if so, execute step 2.5; otherwise, return to step 2.1.3 to execute;
步骤2.5、利用式(4)、式(5)、式(6)和式(7)得到模型拟合评价指标,包括:第τ次信息评价指标AICτ、第τ次贝叶斯信息准则BICτ、第τ次样本校正的贝叶斯信息准则aBICτ、第τ次熵值 Step 2.5, use formula (4), formula (5), formula (6) and formula (7) to obtain the model fitting evaluation index, including: the τth information evaluation index AIC τ , the τth Bayesian information criterion BIC τ , the Bayesian Information Criterion aBIC τ of the τth sample correction τ , the τth entropy value
AICτ=-2ln(Lτ)+2M (4)AIC τ = -2ln(L τ )+2M (4)
BICτ=-2ln(Lτ)+ln(N)×M (5)BIC τ =-2ln(L τ )+ln(N)×M (5)
aBICτ=-2ln(Lτ)+ln(n*)×M (6)aBIC τ =-2ln(L τ )+ln(n * )×M (6)
式(4)、式(5)、式(6)和式(7)中,M为潜在类别分析模型中未知参数的个数;n*是调整后的样本量,且n*=(N+2)/24;In formula (4), formula (5), formula (6) and formula (7), M is the number of unknown parameters in the latent class analysis model; n * is the adjusted sample size, and n * =(N+ 2)/24;
步骤2.6、将τ+1赋值给后τ,判断τ>τmax是否成立,若成立,则执行步骤2.7;否则,返回步骤2.1.3执行;Step 2.6, assign τ+1 to the latter τ, and judge whether τ>τ max is established, if so, execute step 2.7; otherwise, return to step 2.1.3 to execute;
步骤2.7、从τmax次信息评价指标AIC、贝叶斯信息准则BIC、样本校正的贝叶斯信息准则aBIC和熵值R2中选出各个模型拟合评价指标均取到最优值时所对应的潜在类别个数,记为T*;将所述事故数据集D划分为T*个事故子类别,记为 表示第t*个事故子类别中的事故数据,t*=1,2,…,T*;Step 2.7. Select the optimal value for each model fitting evaluation index from among the τ max times information evaluation index AIC, the Bayesian information criterion BIC, the sample-corrected Bayesian information criterion aBIC and the entropy value R 2 . The number of corresponding potential categories is denoted as T * ; the accident data set D is divided into T * accident sub-categories, denoted as represents the accident data in the t * th accident sub-category, t * =1,2,…,T * ;
步骤三、根据潜在类别分析模型结果,对T*个事故子类别分别建立CART决策树模型;Step 3: According to the results of the potential category analysis model, establish a CART decision tree model for the T * accident subcategories;
步骤3.1、令所述第t*个事故子类别中的事故数据作为训练样本集,令K个分类变量所组成的集合X为所述CART决策树模型中的特征集;令结点样本阈值为σ、特征值切分点为α、Gini指数阈值为ε;Step 3.1. Let the accident data in the t * th accident sub-category As a training sample set, let the set X composed of K categorical variables be the feature set in the CART decision tree model; let the node sample threshold be σ, the feature value segmentation point be α, and the Gini index threshold be ε;
步骤3.2、初始化t*=1;Step 3.2, initialize t * =1;
步骤3.3、将所述训练样本集特征集X、定义结点样本阈值σ和Gini指数阈值ε输入所述CART决策树模型;Step 3.3, the training sample set The feature set X, the defined node sample threshold σ and the Gini index threshold ε are input into the CART decision tree model;
步骤3.4、令t*+1赋值给t*,并判断t*>T*是否成立,若成立,则表示得到T*个二叉决策树,并执行步骤3.5;否则,返回步骤3.3执行;Step 3.4, assign t * +1 to t * , and judge whether t * > T * is established, if so, it means that T * binary decision trees are obtained, and step 3.5 is performed; otherwise, return to step 3.3 for execution;
步骤3.5、根据所述T*个二叉决策树的树形图,确定分类变量间的交互作用项,其中,第t*个事故子类别对应的二叉决策树所确定的交互作用项;Step 3.5, according to the dendrogram of the T * binary decision trees, determine the interaction term between the categorical variables, wherein, the interaction term determined by the binary decision tree corresponding to the t * th accident subcategory;
步骤四、对T*个事故子类别分别建立基于二元logistic回归的事故严重度模型;Step 4: Establish an accident severity model based on binary logistic regression for the T * accident sub-categories;
步骤4.1、将所述第t*个事故子类别中的事故数据作为事故严重度模型的拟合数据,以K个分类变量所组成集合X和第t*个事故子类别的交互作用项共同作为所述事故严重度模型的自变量X*;定义第t*个事故子类别包含J个事故数据,J的值为第j起事故的预测变量记为yj;Step 4.1. Combine the accident data in the t * th accident sub-category As the fitting data of the accident severity model, the set X composed of K categorical variables and the interaction term of the t * th accident sub-category are taken together as the independent variable X * of the accident severity model; define the t * th The accident subcategory contains J accident data, and the value of J is The predictor of the jth accident is denoted as y j ;
步骤4.2、初始化t*=1;Step 4.2, initialize t * =1;
步骤4.3、利用式(11)得到基于二元logistic回归在自变量X*条件下死亡事故即yj=1的发生概率P(y=1|X*):Step 4.3, use formula (11) to obtain the probability P(y=1|X * ) of fatal accident under the condition of independent variable X * based on binary logistic regression: y j =1:
式(11)中,w*为自变量X*的回归系数;In formula (11), w * is the regression coefficient of the independent variable X * ;
步骤4.4、利用极大似然法估计所述二元logistic回归的事故严重度模型的参数w*:Step 4.4, using the maximum likelihood method to estimate the parameter w * of the accident severity model of the binary logistic regression:
对于第j起事故,为给定自变量条件下yj=1的概率,则给定自变量条件下yj=0的概率为1-Pj;并利用式(12)得到似然函数L(w*):For the jth accident, for the given independent variable The probability of y j = 1 under the condition, then given the independent variable The probability of y j = 0 under the condition is 1-P j ; and the likelihood function L(w * ) is obtained by using equation (12):
利用极大似然估计,求出使得L(w*)取得最大值时的估计参数w′;Using the maximum likelihood estimation, find the estimated parameter w' when L(w * ) takes the maximum value;
根据估计参数w′得到第j起事故在自变量条件下yj=1的预测概率从而得到J起事故的预测概率并进行升序排序,得到排序后的预测概率集合记为{P′1,...,P′j,...,P′J};According to the estimated parameter w', the jth accident is obtained in the independent variable Predicted probability of y j = 1 Thus, the predicted probability of J accidents is obtained And sort in ascending order to get the sorted set of predicted probabilities as {P′ 1 ,...,P′ j ,...,P′ J };
步骤4.5、调整事故严重度模型的预测分类阈值;Step 4.5, adjust the prediction classification threshold of the accident severity model;
步骤4.6、令t*+1赋值给t*,并判断t*>T*是否成立,若成立,则表示获得T*个事故严重度预测模型,否则,返回步骤4.3执行。Step 4.6, assign t * +1 to t * , and judge whether t * >T * is true, if true, it means that T * accident severity prediction models are obtained, otherwise, return to step 4.3 for execution.
本发明所述的交通事故严重度预测方法的特点也在于,所述步骤3.3是按如下过程进行:The characteristic of the traffic accident severity prediction method of the present invention is that the step 3.3 is carried out according to the following process:
步骤3.3.1、CART决策树使用Gini系数作为判定决策树是否进行分支的依据,建立二叉决策树模型,根据特征值切分点α,将所述训练样本集分为第一子集Dα1和第二子集Dα2,利用式(8)得到所述特征值切分点α的Gini指数Gini(Dα):Step 3.3.1. The CART decision tree uses the Gini coefficient as the basis for judging whether the decision tree is branched, establishes a binary decision tree model, and divides the training sample set according to the eigenvalue segmentation point α. It is divided into a first subset D α1 and a second subset D α2 , and the Gini index Gini(D α ) of the eigenvalue segmentation point α is obtained by using formula (8):
式(8)中,|Dα1|和|Dα2|分别表示训练样本集第一子集Dα1和第二子集Dα2中包含事故总数;In formula (8), |D α1 | and |D α2 | denote the training sample set, respectively The first subset D α1 and the second subset D α2 contain the total number of accidents;
Gini(Dα1)表示第一子集Dα1的Gini指数,并有:Gini(D α1 ) represents the Gini index of the first subset D α1 , and has:
式(9)中,和分别表示第一子集Dα1中非死亡和死亡事故的概率;In formula (9), and are the probabilities of non-fatal and fatal accidents in the first subset D α1 , respectively;
式(8)中,Gini(Dα2)表示第二子集Dα2的Gini指数,并有:In formula (8), Gini(D α2 ) represents the Gini index of the second subset D α2 , and has:
式(10)中,和分别表示第二子集Dα2中非死亡和死亡事故的概率;In formula (10), and are the probabilities of non-fatal and fatal accidents in the second subset D α2 , respectively;
步骤3.3.2、遍历所述特征集X中每个特征值的切分点,并计算每个特征值的切分点的Gini指数;若特征集X中每个特征值的切分点的Gini指数小于阈值ε,则表示所述CART决策树模型是一棵单结点的树,并输出所述单结点的树;否则执行步骤3.3.3;Step 3.3.2, traverse the segmentation points of each eigenvalue in the feature set X, and calculate the Gini index of the segmentation points of each eigenvalue; if the Gini index of the segmentation points of each eigenvalue in the feature set X is If the index is less than the threshold ε, it means that the CART decision tree model is a single-node tree, and the single-node tree is output; otherwise, step 3.3.3 is performed;
步骤3.3.3、选择特征集X中最小切分点的Gini指数所对应的特征值Xmin及其相应的切分点αmin,并根据所述切分点αmin将训练样本集分为两个子集Dmin1和Dmin2,再将子集Dmin1和子集Dmin2分别分配到以训练样本集为父节点的两个子结点中;Step 3.3.3. Select the feature value X min corresponding to the Gini index of the minimum segmentation point in the feature set X and its corresponding segmentation point α min , and divide the training sample set according to the segmentation point α min . Divide into two subsets D min1 and D min2 , and then assign the subset D min1 and the subset D min2 to the training sample set respectively In the two child nodes of the parent node;
若子集Dmin1和子集Dmin2的样本数均小于给定的结点样本阈值σ,则表示两个子集Dmin1和Dmin2所在的子结点均是叶子结点,输出二叉决策树;若子集Dmin1和/或子集Dmin2的样本数大于所述结点样本阈值σ,则表示子集Dmin1或子集Dmin2所在的子结点是非叶子结点可进一步进行划分,并执行步骤3.3.4;If the number of samples of the subset D min1 and the subset D min2 are both smaller than the given node sample threshold σ, it means that the child nodes where the two subsets D min1 and D min2 are located are both leaf nodes, and a binary decision tree is output; If the number of samples of the subset D min1 and/or the subset D min2 is greater than the node sample threshold σ, it means that the child node where the subset D min1 or the subset D min2 is located is a non-leaf node and can be further divided, and Perform step 3.3.4;
步骤3.3.4、对于非叶子结点,令训练样本集等于非叶子结点所对应的子集,并将最小切分点的Gini指数所对应的特征值Xmin从特征集X中删除后,返回执行步骤3.3.1,直到所有子结点的样本数均小于结点样本阈值σ或特征集X为空时,输出最终的二叉决策树。Step 3.3.4. For non-leaf nodes, let the training sample set It is equal to the subset corresponding to the non-leaf node, and after deleting the feature value X min corresponding to the Gini index of the minimum segmentation point from the feature set X, return to step 3.3.1 until the number of samples of all child nodes When all are smaller than the node sample threshold σ or the feature set X is empty, the final binary decision tree is output.
所述步骤4.5是按如下过程进行:Said step 4.5 is carried out as follows:
步骤4.5.1、定义θ为模型的预测分类阈值,且0<θ<1;表示事故严重度模型预测第j起事故预测为死亡事故;表示事故严重度模型预测第j起事故预测为非死亡事故;Step 4.5.1. Define θ as the prediction classification threshold of the model, and 0<θ<1; Indicates that the accident severity model predicts that the jth accident is predicted to be a fatal accident; Indicates that the accident severity model predicts that the jth accident is predicted to be a non-fatal accident;
步骤4.5.2、初始化j′=1;Step 4.5.2, initialize j'=1;
步骤4.5.3、令模型的第j′个分类阈值θj′等于P′j′,利用式(13)得到事故严重度模型预测的第j′个敏感度Se(θj′),即事故数据集中死亡事故预测为死亡事故的概率:Step 4.5.3. Set the j'th classification threshold θ j' of the model to be equal to P'j' , and use the formula (13) to obtain the j'th sensitivity Se(θ j' ) predicted by the accident severity model, that is, the accident The probability of fatal accidents predicted as fatal accidents in the dataset:
式(13)中,表示第s起事故预测为死亡事故的概率,ys=1表示第s起事故为死亡事故,1≤s≤J;In formula (13), Represents the probability that the sth accident is predicted to be a fatal accident, y s =1 indicates that the sth accident is a fatal accident, 1≤s≤J;
利用式(14)得到事故严重度模型预测的第j′个特异性Sp(θj′),即事故数据集中非死亡事故预测为非死亡事故的概率:Using Equation (14), the j′-th specificity Sp(θ j′ ) predicted by the accident severity model is obtained, that is, the probability that a non-fatal accident is predicted to be a non-fatal accident in the accident data set:
式(14)中,表示第s起事故预测为死亡事故的概率,ys=0表示第s起事故为死亡事故,1≤s≤J;In formula (14), Represents the probability that the sth accident is predicted to be a fatal accident, y s =0 indicates that the sth accident is a fatal accident, 1≤s≤J;
步骤4.5.4、令j′+1赋值给j′,并判断j′>J是否成立,若成立,则表示得到J对敏感度和特异性取值,并执行步骤4.5.5;否则,返回步骤4.5.3执行;Step 4.5.4. Assign
步骤4.5.5、以第j′个分类阈值θj′为横坐标,分别以第j′个分类阈值θj′所对应的敏感度Se(θj′)和特异性Sp(θj′)值为纵坐标,绘制敏感度与特异性的曲线,以两曲线的交点对应的阈值作为最佳模型预测分类阈值θ′。Step 4.5.5, take the j'th classification threshold θ j' as the abscissa, respectively take the sensitivity Se(θ j' ) and specificity Sp(θ j' ) corresponding to the j'th classification threshold θ j ' The value is the ordinate, and the curve of sensitivity and specificity is drawn, and the threshold corresponding to the intersection of the two curves is used as the optimal model to predict the classification threshold θ′.
与现有技术相比,本发明的有益效果在于:Compared with the prior art, the beneficial effects of the present invention are:
1、本发明方法基于区域路网交通事故数据,建立潜在类别分析模型,将事故数据划分为若干同质子类别;其次,对各子类别分别建立CART决策树模型,识别自变量间交互作用项;然后,基于二元logistic回归对各子类别分别建立考虑交互作用项事故严重度模型,并设置敏感度与特异性曲线交点作为事故严重度模型的预测分类阈值。该方法降低了事故数据异质性对分析结果的不利影响,克服了传统交通事故严重度预测模型忽略交互作用项和非平衡数据综合预测效果差的问题,提高了事故严重度模型的预测精度和拟合优度。1. The method of the present invention establishes a potential category analysis model based on the traffic accident data of the regional road network, and divides the accident data into several homogenous sub-categories; secondly, a CART decision tree model is established for each sub-category, and interaction items between independent variables are identified; Then, an accident severity model considering the interaction term is established for each sub-category based on binary logistic regression, and the intersection of the sensitivity and specificity curves is set as the predicted classification threshold of the accident severity model. This method reduces the adverse impact of accident data heterogeneity on the analysis results, overcomes the problems that the traditional traffic accident severity prediction model ignores the interaction term and the comprehensive prediction effect of unbalanced data is poor, and improves the prediction accuracy and accuracy of the accident severity model. goodness of fit.
2、本发明方法通过潜在类别分析将交通事故数据划分为若干同质子类别,既能够反映事故数据异质性,又能精准识别、分析潜在的道路交通事故发生模式和机理;2. The method of the present invention divides the traffic accident data into several homogeneous sub-categories through potential class analysis, which can not only reflect the heterogeneity of accident data, but also accurately identify and analyze the potential occurrence mode and mechanism of road traffic accidents;
3、本发明方法通过CART决策树模型识别自变量间的各阶交互作用项,并纳入二元logistic回归模型,提高了模型的拟合优度,并识别出影响区域路网交通事故严重度的重要自变量和交互作用项,有助于提高区域路网道路交通安全水平;3. The method of the present invention identifies the interaction terms of each order between the independent variables through the CART decision tree model, and incorporates them into the binary logistic regression model, which improves the goodness of fit of the model and identifies factors that affect the severity of traffic accidents on the regional road network. Important independent variables and interaction terms are helpful to improve the level of road traffic safety in the regional road network;
4、本发明方法使用敏感度和特异性曲线交点对应阈值作为二元logistic回归模型的分类阈值解决了非平衡数据分类问题,提高了事故严重度模型的预测准确度。4. The method of the present invention solves the problem of unbalanced data classification by using the threshold corresponding to the intersection of the sensitivity and specificity curves as the classification threshold of the binary logistic regression model, and improves the prediction accuracy of the accident severity model.
附图说明Description of drawings
图1为本发明类别1CART决策树图;Fig. 1 is a CART decision tree diagram of
图2为本发明类别1的灵敏度与特异度曲线图;Fig. 2 is the sensitivity and specificity curve diagram of
图3为本发明类别1的ROC曲线图;Fig. 3 is the ROC curve diagram of
图4为本发明方法流程图。Figure 4 is a flow chart of the method of the present invention.
具体实施方式Detailed ways
本实施例中,如图4所示,一种应用于区域路网的交通事故严重度预测方法是按如下步骤进行:In this embodiment, as shown in FIG. 4 , a traffic accident severity prediction method applied to a regional road network is performed according to the following steps:
步骤一、区域路网道路交通事故数据的采集与预处理;
步骤1.1、从道路交通事故平台中采集某区域路网的交通事故数据,删除交通事故数据库中记录不全(具有空白项)或记录不合理的事故数据,共获取2595(N=2595)起事故数据作为分析事故数据集D,从人、车、事故特征、路和环境五个方面选取26个分类变量组成集合X={x1,x2,...,x26}来表征第i起事故,并将他们作为预测模型的自变量,自变量具体取值见表1;其中,xk表示第k个分类变量,且第k个分类变量xk包含Ck种类别,xk在Ck种类别中的取值记为sk(例如:x1表示第一个分类变量包括两种类别即C1的值为2,则s1为1女性或2男性),每起事故都可以表示为26个分类变量取值的集合Si={si1,si2,...,sik,...,si26};令表示第i起事故的K个分类变量的所有可能取值中的任意一种取值集合;k=1,2,3,...,K;i=1,2,3,...,N;Step 1.1. Collect the traffic accident data of a certain regional road network from the road traffic accident platform, delete the incomplete records (with blank items) or unreasonable accident data in the traffic accident database, and obtain a total of 2595 (N=2595) accident data As the analysis accident data set D, 26 categorical variables are selected from five aspects of people, vehicles, accident characteristics, road and environment to form a set X={x 1 ,x 2 ,...,x 26 } to represent the ith accident , and use them as independent variables of the prediction model. The specific values of the independent variables are shown in Table 1; among them, x k represents the k-th categorical variable, and the k-th categorical variable x k includes C k categories, and x k is in C k The value in each category is recorded as sk (for example: x 1 indicates that the first categorical variable includes two categories, that is, the value of C 1 is 2, then s 1 is 1 female or 2 male), and each accident can be represented by Set S i = {s i1 ,s i2 ,...,s ik ,...,s i26 } for 26 categorical variables; let Represents any set of possible values of the K categorical variables of the ith accident; k=1,2,3,...,K; i=1,2,3,..., N;
每一起事故的事故严重度作为预测变量,记为yi,yi的取值为“0”或“1”分别表示非死亡事故和死亡事故;The accident severity of each accident is used as a predictor variable, denoted as y i , and the value of y i is "0" or "1" to indicate a non-fatal accident and a fatal accident, respectively;
步骤1.2、利用SPSS软件进行多重共线性检验,删除具有共线性的分类变量,通过共线性检验发现方差膨胀因子(VIF)均小于5,对应容差(TOL)均大于0.1(如表1所示),证明26分类变量之间无共线性关系,均可纳入模型分析。Step 1.2. Use SPSS software to perform multicollinearity test, delete the categorical variables with collinearity, and find that the variance inflation factor (VIF) is less than 5 and the corresponding tolerance (TOL) is greater than 0.1 through the collinearity test (as shown in Table 1). ), proving that there is no collinear relationship among the 26 categorical variables, all of which can be included in the model analysis.
表1自变量定义与赋值及共线性检验Table 1 Definition and assignment of independent variables and collinearity test
步骤二、根据区域路网道路交通事故数据,建立潜在类别分析模型;Step 2: Establish a potential category analysis model according to the road traffic accident data of the regional road network;
步骤2.1、定义潜在类别分析模型中存在一个潜在类别变量V,V包含T种类别,且任意一种类别记为t,t=1,2,...,T;令第i起事故中潜在类别变量V的取值记为Vi;Step 2.1. Define that there is a latent category variable V in the latent category analysis model, V contains T categories, and any category is denoted as t, t=1,2,...,T; let the potential in the i-th accident be The value of the categorical variable V is denoted as V i ;
步骤2.1.1、定义外循环次数为τ、最大外循环迭代次数为5;令第τ次所设置的类别数目为Tτ且Tτ=τ;初始化τ=1;Step 2.1.1. Define the number of outer loops as τ and the maximum number of outer loop iterations as 5; let the number of categories set for the τth time be T τ and T τ =τ; initialize τ = 1;
步骤2.1.2、初始化t=1;Step 2.1.2, initialize t=1;
步骤2.1.3、初利用式(1)得到第i起事故Vi取值为t,即属于第t种潜在类别时,第i起事故在K个分类变量上的取值集合为的条件概率 Step 2.1.3. Initially use formula (1) to obtain the value of the i-th accident V i as t, that is, when it belongs to the t-th potential category, the value set of the i-th accident on the K categorical variables is: The conditional probability of
式(1)中,P(sik=sk|Vi=t)表示第i起事故属于第t个潜在类别时,第k个分类变量上取值为sk的条件概率;In formula (1), P(s ik =s k |V i =t) represents the conditional probability of the value of s k on the k-th categorical variable when the i-th accident belongs to the t-th potential category;
步骤2.1.4、利用式(2)得到第i起事故中K个分类变量取值集合为的非条件概率即潜在类别分析模型的联合概率 Step 2.1.4, use formula (2) to obtain the value set of K categorical variables in the ith accident as The unconditional probability of is the joint probability of the latent class analysis model
式(2)中,P(Vi=t)是第i起事故属于第t个潜在类别的概率,潜在类别t占总体的比率;In formula (2), P(V i =t) is the probability that the i-th accident belongs to the t-th potential category, and the ratio of the potential category t to the population;
此外,潜在类别分析模型的基本限定条件为各潜在类别概率以及每个分类变量的条件概率总和均为1,如式(3)、式(4)所示:In addition, the basic limitation of the latent category analysis model is that the sum of the probability of each latent category and the conditional probability of each categorical variable is 1, as shown in equations (3) and (4):
步骤2.2、采用极大似然法进行模型参数估计,得到潜在类别概率和分类变量条件概率的估计值以及潜在类别分析模型的第τ次极大似然函数值Lτ;Step 2.2. Use the maximum likelihood method to estimate the model parameters to obtain the estimated values of the latent class probability and the conditional probability of the categorical variable and the τth maximum likelihood function value L τ of the latent class analysis model;
步骤2.3、根据贝叶斯理论,利用式(5)计算第i起事故被分类到第t个潜在类别的后验概率 Step 2.3. According to Bayesian theory, use formula (5) to calculate the posterior probability that the ith accident is classified into the tth latent category
其中,由式(6)表示:in, It is represented by formula (6):
第i起事故归属于某一类别的后验概率最大,则第i起事故被划分到该子类别,对所有N起事故数据进行后验概率的计算与比较,从而实现聚类的目的;The posterior probability of the i-th accident belonging to a certain category is the largest, then the i-th accident is divided into this sub-category, and the posterior probability is calculated and compared for all N accident data, so as to achieve the purpose of clustering;
步骤2.4、令t+1赋值给t,并判断t>Tτ是否成立,若成立,则执行步骤2.5;否则,返回步骤2.1.3执行;Step 2.4, assign t+1 to t, and determine whether t>T τ is established, if so, execute step 2.5; otherwise, return to step 2.1.3 to execute;
步骤2.5、利用式(7)、式(8)、式(9)和式(10)得到模型拟合评价指标,包括:第τ次信息评价指标AICτ、第τ次贝叶斯信息准则BICτ、第τ次样本校正的贝叶斯信息准则aBICτ、第τ次熵值 Step 2.5, use formula (7), formula (8), formula (9) and formula (10) to obtain the model fitting evaluation index, including: the τth information evaluation index AIC τ , the τth Bayesian information criterion BIC τ , the Bayesian Information Criterion aBIC τ of the τth sample correction τ , the τth entropy value
AICτ=-2ln(Lτ)+2M (7)AIC τ = -2ln(L τ )+2M (7)
BICτ=-2ln(Lτ)+ln(N)×M (8)BIC τ =-2ln(L τ )+ln(N)×M (8)
aBICτ=-2ln(Lτ)+ln(n*)×M (9)aBIC τ =-2ln(L τ )+ln(n * )×M (9)
利用式(7)、式(8)、式(9)和式(10)中,M为潜在类别分析模型中未知参数的个数;n*是调整后的样本量,且n*=(N+2)/24;Using formula (7), formula (8), formula (9) and formula (10), M is the number of unknown parameters in the latent class analysis model; n * is the adjusted sample size, and n * = (N +2)/24;
步骤2.6、将τ+1赋值给后τ,判断τ>τmax是否成立,若成立,则执行步骤2.7;否则,返回步骤2.1.3执行;Step 2.6, assign τ+1 to the latter τ, and judge whether τ>τ max is established, if so, execute step 2.7; otherwise, return to step 2.1.3 to execute;
步骤2.7、潜在类别分析模型的建模和参数估计采用Mplus vision7.4软件进行,通过限定潜在类别数目T。从T=1开始逐渐增大潜在类别数目到T=5,得到5个不同的潜在类别分析模型估计参数ln(L),即τ的值为5。分别计算5个模型的拟合评价指标,包括:第τ次信息评价指标AICτ、第τ次贝叶斯信息准则BICτ、第τ次样本校正的贝叶斯信息准则aBICτ、第τ次熵值对应的模型拟合指标见表2。Step 2.7, the modeling and parameter estimation of the latent category analysis model is carried out with Mplus vision7.4 software, by limiting the number of latent categories T. From T=1, the number of latent classes is gradually increased to T=5, and five different latent class analysis models are obtained to estimate the parameter ln(L), that is, the value of τ is 5. Calculate the fitting evaluation indicators of the five models respectively, including: the τth information evaluation index AIC τ , the τth Bayesian information criterion BIC τ , the τth sample-corrected Bayesian information criterion aBIC τ , the τth time entropy value The corresponding model fitting indicators are shown in Table 2.
表2模型拟合指标汇总Table 2 Summary of model fitting indicators
表2中,AIC、BIC、aBIC的值越小模型的拟合程度越高,熵值大于0.8表明有90%以上分类正确率,LMR和BLRT是相对拟合指标,P值显著表示T个类别优于T-1个类别显著。因此,考虑将事故数据划分为3个类别进行分析即T*=3。T*=3时潜在类别分析模型估计结果如表3所示,由条件概率分布识别出各事故子类别的事故特点,将类别1命名为县道上的乘用车事故,类别2乡村道路上的机动车事故,类别3老年人非机动车事故,识别出潜在的道路交通事故发生模式。In Table 2, the smaller the values of AIC, BIC, and aBIC, the higher the fitting degree of the model, and the entropy value greater than 0.8 indicates that the classification accuracy rate is more than 90%. LMR and BLRT are relative fitting indicators, and the P value significantly indicates T categories Significantly better than T-1 categories. Therefore, consider dividing the accident data into 3 categories for analysis ie T * =3. When T * =3, the estimation results of the latent category analysis model are shown in Table 3. The accident characteristics of each accident sub-category are identified by the conditional probability distribution. Motor vehicle accidents, Category 3 elderly non-motor vehicle accidents, identify potential patterns of road traffic accident occurrence.
根据贝叶斯理论,利用式(5)计算第i起观测事故数据被分类到第3个潜在类别的后验概率对所有事故数据进行后验概率的计算与比较,从而将2595起事故数据划分为3个事故子类别,记为{D1,D2,D3},分别包含1104、485和1006起事故数据;According to Bayesian theory, use Equation (5) to calculate the posterior probability that the i-th observed accident data is classified into the third latent category Calculate and compare the posterior probability of all accident data, so as to divide the 2595 accident data into 3 accident sub-categories, denoted as {D 1 , D 2 , D 3 }, including 1104, 485 and 1006 accident data respectively ;
表3 T*=3时潜在类别概率和自变量条件概率(部分)Table 3 Latent class probability and independent variable conditional probability when T * =3 (part)
步骤三、根据潜在类别分析模型结果,对3个事故子类别分别建立CART决策树模型;Step 3: According to the results of the potential category analysis model, establish a CART decision tree model for the three accident sub-categories respectively;
步骤3.1、令第t*个事故子类别中的事故数据作为训练样本集t*=1,2,3.,令26个分类变量所组成的集合X为CART决策树模型中的特征集;令结点样本阈值为σ、特征值切分点为α、Gini指数阈值为ε;Step 3.1. Let the accident data in the t * th accident subcategory As the training sample set t * =1,2,3., let the set X composed of 26 categorical variables be the feature set in the CART decision tree model; let the node sample threshold be σ, the feature value cut point be α, Gini index threshold is ε;
步骤3.2、初始化t*=1;Step 3.2, initialize t * =1;
步骤3.3、利用SPSS软件,构建CART决策树模型,输入事故数据集设置特征集X为步骤3.1中识别出显著性的变量、结点样本阈值σ为50和Gini指数阈值ε为0.001;Step 3.3. Use SPSS software to build a CART decision tree model and input the accident data set Set the feature set X to be the variable identified as significant in step 3.1, the node sample threshold σ to be 50 and the Gini index threshold ε to be 0.001;
步骤3.3.1、CART决策树使用Gini系数作为判定决策树是否进行分支的依据,建立二叉决策树模型,根据特征值切分点α,将训练样本集分为第一子集Dα1和第二子集Dα2,即将分类变量xk的某一类别Ck作为切分点α,可以将样本集D划分为两个子集Dα1和Dα2;利用式(11)得到特征值切分点α的Gini指数Gini(Dα):Step 3.3.1. The CART decision tree uses the Gini coefficient as the basis for judging whether the decision tree is branched, establishes a binary decision tree model, and divides the training sample set according to the eigenvalue segmentation point α. Divided into the first subset D α1 and the second subset D α2 , that is, a certain category C k of the categorical variable x k is used as the cutting point α, and the sample set D can be divided into two subsets D α1 and D α2 ; using Equation (11) obtains the Gini index Gini(D α ) of the eigenvalue segmentation point α:
式(11)中,|Dα1|和|Dα2|分别表示训练样本集第一子集Dα1和第二子集Dα2中包含事故总数;In formula (11), |D α1 | and |D α2 | denote the training sample set, respectively The first subset D α1 and the second subset D α2 contain the total number of accidents;
Gini(Dα1)表示第一子集Dα1的Gini指数,并有:Gini(D α1 ) represents the Gini index of the first subset D α1 , and has:
式(12)中,和分别表示第一子集Dα1中非死亡和死亡事故的概率;In formula (12), and are the probabilities of non-fatal and fatal accidents in the first subset D α1 , respectively;
式(11)中,Gini(Dα2)表示第二子集Dα2的Gini指数,并有:In formula (11), Gini(D α2 ) represents the Gini index of the second subset D α2 , and has:
式(13)中,和分别表示第二子集Dα2中非死亡和死亡事故的概率;In formula (13), and are the probabilities of non-fatal and fatal accidents in the second subset D α2 , respectively;
步骤3.3.2、遍历特征集X中每个特征值的切分点,并计算每个特征值的切分点的Gini指数;若特征集X中每个特征值的切分点的Gini指数小于阈值0.001,则表示CART决策树模型是一棵单结点的树,并输出单结点的树,此时无交互作用项;否则执行步骤3.3.3;Step 3.3.2. Traverse the segmentation point of each eigenvalue in the feature set X, and calculate the Gini index of the segmentation point of each eigenvalue; if the Gini index of the segmentation point of each eigenvalue in the feature set X is less than If the threshold is 0.001, it means that the CART decision tree model is a single-node tree and outputs a single-node tree, and there is no interaction item at this time; otherwise, go to step 3.3.3;
步骤3.3.3、选择特征集X中最小切分点的Gini指数所对应的特征值Xmin及其相应的切分点αmin,并根据切分点αmin将训练样本集分为两个子集Dmin1和Dmin2,再将子集Dmin1和子集Dmin2分别分配到以训练样本集为父节点的两个子结点中;Step 3.3.3. Select the feature value X min corresponding to the Gini index of the minimum segmentation point in the feature set X and its corresponding segmentation point α min , and divide the training sample set according to the segmentation point α min . Divide into two subsets D min1 and D min2 , and then assign the subset D min1 and the subset D min2 to the training sample set respectively In the two child nodes of the parent node;
若子集Dmin1和子集Dmin2的样本数均小于给定的结点样本阈值50,则表示两个子集Dmin1和Dmin2所在的子结点均是叶子结点,输出二叉决策树,此时仅存在二阶交互作用项;若子集Dmin1和/或子集Dmin2的样本数大于结点样本阈值50,则表示子集Dmin1或子集Dmin2所在的子结点是非叶子结点可进一步进行划分,并执行步骤3.3.4;If the number of samples of both subsets D min1 and D min2 is less than the given node sample threshold of 50, it means that the child nodes where the two subsets D min1 and D min2 are located are leaf nodes, and the binary decision tree is output, At this time, there is only a second-order interaction term; if the number of samples of subset D min1 and/or subset D min2 is greater than the node sample threshold of 50, it means that the child node where subset D min1 or subset D min2 is located is a non-leaf Nodes can be further divided and step 3.3.4 is executed;
步骤3.3.4、对于非叶子结点,令训练样本集等于非叶子结点所对应的子集,并将最小切分点的Gini指数所对应的特征值Xmin从特征集X中删除后,返回执行步骤3.3.1,直到所有子结点的样本数均小于结点样本阈值50或特征集X为空时,输出最终的二叉决策树;Step 3.3.4. For non-leaf nodes, let the training sample set It is equal to the subset corresponding to the non-leaf node, and after deleting the feature value X min corresponding to the Gini index of the minimum segmentation point from the feature set X, return to step 3.3.1 until the number of samples of all child nodes When both are less than the node sample threshold of 50 or the feature set X is empty, the final binary decision tree is output;
步骤3.4、令t*+1赋值给t*,并判断t*>3是否成立,若成立,则表示得到3个二叉决策树模型,并执行步骤3.5;否则,返回步骤3.3执行;Step 3.4, assign t * +1 to t * , and judge whether t * > 3 is established, if so, it means that three binary decision tree models are obtained, and step 3.5 is executed; otherwise, return to step 3.3 to execute;
步骤3.5、根据3个二叉决策树的树形图,确定分类变量间的交互作用项,其中,第t*个事故子类别对应的二叉决策树所确定的交互作用项;Step 3.5, according to the dendrogram of the three binary decision trees, determine the interaction term between the categorical variables, wherein, the interaction term determined by the binary decision tree corresponding to the t * th accident subcategory;
图1所示是类别1的二叉决策树树形图,该图以类别1中所有数据为根结点,包含4层树高,5个叶子结点。图中每个结点矩形框都标明了该结点包含的事故总数、死亡事故和非死亡事故数及二者比例。从树形图(图1)可知车辆类型与乘客、车辆类型与道路技术等级、道路技术等级与道路线型之间存在二阶交互作用,车辆类型、道路技术等级和道路线型之间存在三阶交互作用;Figure 1 shows a binary decision tree tree diagram of
同理,确定类别2中存在二阶交互项分别是事故形态和照明条件、事故形态和车辆类型,类别3中存在二阶交互作用项是车辆类型和驾驶员年龄。Similarly, it is determined that there are second-order interaction terms in category 2, which are accident shape and lighting conditions, accident shape and vehicle type, and there are second-order interaction terms in category 3, which are vehicle type and driver age.
步骤四、对3个事故子类别分别建立基于二元logistic回归的事故严重度模型;Step 4: Establish an accident severity model based on binary logistic regression for the three accident sub-categories;
步骤4.1、将第t*个事故子类别中的事故数据作为事故严重度模型的拟合数据,以K个分类变量所组成集合X和第t*个事故子类别的交互作用项共同作为事故严重度模型的自变量X*;定义第t*个事故子类别包含J个事故数据,J的值为第j起事故的预测变量记为yj;Step 4.1. Combine the accident data in the t * th accident sub-category As the fitting data of the accident severity model, the set X composed of K categorical variables and the interaction term of the t * th accident sub-category are used as the independent variable X * of the accident severity model; define the t * th accident subcategory The category contains J accident data, and the value of J is The predictor of the jth accident is denoted as y j ;
利用SPSS对各事故子类别进行单因素卡方检验,其中P值小于0.05表示自变量与因变量显著相关。单因素卡方检验结果见表4,类别1中16个变量与事故严重度显著相关。One-way chi-square test was performed on each accident sub-category using SPSS, where the P value less than 0.05 indicated that the independent variable was significantly correlated with the dependent variable. The results of the one-way chi-square test are shown in Table 4, and 16 variables in
表4各事故子类别单因素卡方检验结果Table 4 Single-factor chi-square test results for each accident subcategory
步骤4.2、初始化t*=1;Step 4.2, initialize t * =1;
步骤4.3、利用式(14)得到基于二元logistic回归在自变量X*条件下死亡事故即yj=1的发生概率P(y=1|X*):Step 4.3, use formula (14) to obtain the probability P(y=1|X * ) of a fatal accident based on binary logistic regression under the condition of independent variable X * , that is, y j =1:
式(13)中,w*为自变量X*的回归系数;In formula (13), w * is the regression coefficient of the independent variable X * ;
步骤4.4、利用极大似然法估计二元logistic回归的事故严重度模型的参数w*:Step 4.4, use the maximum likelihood method to estimate the parameter w * of the accident severity model of binary logistic regression:
对于第j起事故,为给定自变量条件下yj=1的概率,则给定自变量条件下yj=0的概率为1-Pj;并利用式(15)得到似然函数L(w*):For the jth accident, for the given independent variable The probability of y j = 1 under the condition, then given the independent variable The probability of y j = 0 under the condition is 1-P j ; and the likelihood function L(w * ) is obtained by using equation (15):
利用极大似然估计,求出使得L(w*)取得最大值时的估计参数w′;利用SPSS软件进行事故严重度模型的参数估计,其中分类变量的交互作用项以分类变量乘积的形式作为模型分析的自变量,为方便模型结果解释并对各自变量设置哑变量;自变量进入或剔除模型采用Wald检验,进入或剔除标准分别为P<0.05和P>0.1,设置迭代次数为20次;The maximum likelihood estimation is used to obtain the estimated parameter w' when L(w * ) reaches the maximum value; SPSS software is used to estimate the parameters of the accident severity model, in which the interaction term of the categorical variables is in the form of the product of the categorical variables As the independent variable of the model analysis, to facilitate the interpretation of the model results and set dummy variables for the respective variables; Wald test was used for the entry or exclusion of the independent variables into the model, the entry or exclusion criteria were P<0.05 and P>0.1, and the number of iterations was set to 20 ;
根据估计参数w′得到第j起事故在自变量条件下yj=1的预测概率从而得到J起事故的预测概率并进行升序排序,得到排序后的预测概率集合记为{P′1,...,P′j,...,P′J};According to the estimated parameter w', the jth accident is obtained in the independent variable Predicted probability of y j = 1 Thus, the predicted probability of J accidents is obtained And sort in ascending order to get the sorted set of predicted probabilities as {P′ 1 ,...,P′ j ,...,P′ J };
步骤4.5、调整事故严重度模型的预测分类阈值;Step 4.5, adjust the prediction classification threshold of the accident severity model;
步骤4.5.1、定义θ为模型预测的分类阈值,且0<θ<1;表示事故严重度模型预测第j起事故预测为死亡事故;表示事故严重度模型预测第j起事故预测为非死亡事故;Step 4.5.1. Define θ as the classification threshold predicted by the model, and 0<θ<1; Indicates that the accident severity model predicts that the jth accident is predicted to be a fatal accident; Indicates that the accident severity model predicts that the jth accident is predicted to be a non-fatal accident;
步骤4.5.2、初始化j′=1;Step 4.5.2, initialize j'=1;
步骤4.5.3、令模型的第j′个分类阈值θj′等于P′j′,利用式(15)得到事故严重度模型预测的第j′个敏感度Se(θj′),即事故数据集中死亡事故预测为死亡事故的概率:Step 4.5.3. Set the j'th classification threshold θ j' of the model equal to P'j' , and use the formula (15) to obtain the j'th sensitivity Se(θ j' ) predicted by the accident severity model, that is, the accident The probability of fatal accidents predicted as fatal accidents in the dataset:
式(15)中,表示第s起事故预测为死亡事故的概率,ys=1表示第s起事故为死亡事故,1≤s≤J;In formula (15), Represents the probability that the sth accident is predicted to be a fatal accident, y s =1 indicates that the sth accident is a fatal accident, 1≤s≤J;
利用式(16)得到事故严重度模型预测的第j′个特异性Sp(θj′),即事故数据集中非死亡事故预测为非死亡事故的概率:Using Equation (16), the j'th specificity Sp(θ j' ) predicted by the accident severity model is obtained, that is, the probability that a non-fatal accident is predicted to be a non-fatal accident in the accident data set:
式(16)中,表示第s起事故预测为死亡事故的概率,ys=0表示第s起事故为死亡事故,1≤s≤J;In formula (16), Represents the probability that the sth accident is predicted to be a fatal accident, y s =0 indicates that the sth accident is a fatal accident, 1≤s≤J;
步骤4.5.4、令j′+1赋值给j′,并判断j′>J是否成立,若成立,则表示得到J对敏感度和特异性取值,并执行步骤4.5.5;否则,返回步骤4.5.3执行;Step 4.5.4. Assign
步骤4.5.5、以第j′个分类阈值θj′为横坐标,分别以第j′个分类阈值θj′所对应的敏感度Se(θj′)和特异性Sp(θj′)值为纵坐标,绘制敏感度与特异性的曲线,以两曲线的交点对应的阈值作为最佳模型预测分类阈值θ′;Step 4.5.5, take the j'th classification threshold θ j' as the abscissa, respectively take the sensitivity Se(θ j' ) and specificity Sp(θ j' ) corresponding to the j'th classification threshold θ j ' The value is the ordinate, and the curve of sensitivity and specificity is drawn, and the threshold corresponding to the intersection of the two curves is used as the optimal model to predict the classification threshold θ′;
步骤4.6、令t*+1赋值给t*,并判断t*>3是否成立,若成立,则表示获得3个事故严重度预测模型,否则,返回步骤4.3执行。Step 4.6, assign t * +1 to t * , and judge whether t * >3 is established, if so, it means that three accident severity prediction models are obtained, otherwise, return to step 4.3 for execution.
得到3个二元logistic回归模型得到事故严重度模型参数估计结果如表6所示;回归系数w*是由常数项β0和自变量回归系数B构成的向量,其中,B值表示自变量的系数,其值为正表示对死亡事故的发生有正向影响,为负则表示有负向影响;OR=exp(B)表示某一自变量的存在使死亡事故发生的概率增大或减少的量。Obtain three binary logistic regression models to obtain the parameter estimation results of the accident severity model As shown in Table 6; the regression coefficient w * is a vector composed of the constant term β 0 and the independent variable regression coefficient B, where the B value represents the coefficient of the independent variable, and a positive value indicates a positive impact on the occurrence of fatal accidents , if it is negative, it means there is a negative impact; OR=exp(B) means that the existence of a certain independent variable increases or decreases the probability of fatal accidents.
表6事故严重度模型估计结果Table 6 Estimation results of accident severity model
注:B为模型回归系数;OR为优势比,OR=exp(B);Note: B is the regression coefficient of the model; OR is the odds ratio, OR=exp(B);
同时,根据估计参数w′得到第j起事故在自变量条件下yj=1的预测概率从而得到以敏感度和特异性交点对应的预测分类阈值,如图2所示为类别1的敏感度与特异性曲线图。从而,得到3个事故子类别的预测分类阈值分别为0.2930、0.3928和0.4133,并求解出对应分类阈值下的模型预测准确度68.8%、75.5%和66.3%;At the same time, according to the estimated parameter w', the independent variable of the jth accident is obtained. Predicted probability of y j = 1 As a result, the predicted classification threshold corresponding to the intersection of sensitivity and specificity is obtained, as shown in Figure 2, which is a graph of sensitivity and specificity of
步骤4.6.1、事故严重度模型结果分析:Step 4.6.1. Analysis of accident severity model results:
由表6可知,各事故子类别中影响事故严重度的因素之间存在显著差异,其中,无证驾驶、酒驾、超速、中央隔离设施、地形,摩托车与乘客的二阶交互作用,以及货车与四级公路、道路线形的三阶交互作用仅在类别1中显著;农用车、撞击固定物、非高峰时段、道路线型、能见度仅在类别2中显著;坠车、等外公路、交通控制设施、年龄与非机动车的交互作用仅在类别3中显著。From Table 6, it can be seen that there are significant differences between the factors affecting the severity of accidents in each accident sub-category. Among them, unlicensed driving, drunk driving, speeding, central isolation facilities, terrain, second-order interactions between motorcycles and passengers, and trucks. The third-order interaction with the fourth-class highway and road alignment is significant only in
以类别1为例,无证驾驶、超速和酒驾的回归系数均为正,三种情况下死亡事故发生概率分别增加约132%、140%和124%。在事故形态方面,撞击非固定物使死亡事故的发生概率增加96%;有乘客状态下死亡事故发生概率增加165%,缺少道路中央隔离设施使死亡事故发生的概率增加120%;夜晚时死亡事故的发生概率上升约44%。Taking
变量交互作用方面,摩托车搭载乘客驾驶时死亡事故发生概率降低约60%;货车在四级公路上行驶时,事故严重度易受道路线型影响,其中弯坡组合路段影响最大(OR值为12.036),其次是弯道路段(OR值为5.57)。In terms of variable interaction, the probability of fatal accidents is reduced by about 60% when motorcycles are driven with passengers; when trucks are driving on Class 4 highways, the accident severity is easily affected by the road alignment, and the combination of curved slopes has the greatest impact (OR value is 12.036), followed by the curved road segment (OR value of 5.57).
步骤4.6.2、模型比较:Step 4.6.2, model comparison:
为比较本发明方法与传统二元logistic回归模型在事故严重度分析方面的优劣性,采用模型预测准确度和ROC曲线两个指标衡量模型预测精度,采用Hosmer-Lemeshow(HL)统计量衡量模型的拟合优度。In order to compare the advantages and disadvantages of the method of the present invention and the traditional binary logistic regression model in the analysis of accident severity, the model prediction accuracy and ROC curve are used to measure the model prediction accuracy, and the Hosmer-Lemeshow (HL) statistic is used to measure the model. goodness of fit.
以敏感度和特异性曲线交点为分类阈值得到模型预测准确度,其值越高表明模型性能越好;以1-特异性为横坐标、敏感度为纵坐标绘制ROC曲线,ROC曲线下的面积即AUC来评价模型的分类效能,AUC值大于0.5表示优于随机猜测具有预测价值,AUC值越接近于1表示模型的预测分类能力越好;以类别1为例,以敏感度和特异性曲线交点对应的阈值作为模型预测分类阈值如图2所示,以1-特异性为横坐标、敏感度为纵坐标绘制ROC曲线如图3所示;此外,模型拟合优度采用Hosmer-Lemeshow(HL)统计量,其服从卡方分布,P值不显著(>0.05)表示模型拟合数据较好。Taking the intersection of the sensitivity and specificity curves as the classification threshold, the prediction accuracy of the model is obtained, and the higher the value, the better the performance of the model; the ROC curve is drawn with 1-specificity as the abscissa and sensitivity as the ordinate, and the area under the ROC curve That is, AUC is used to evaluate the classification performance of the model. The AUC value greater than 0.5 indicates that it has predictive value better than random guessing. The closer the AUC value is to 1, the better the prediction and classification ability of the model. The threshold corresponding to the intersection is used as the model prediction classification threshold as shown in Figure 2, and the ROC curve is drawn with 1-specificity as the abscissa and sensitivity as the ordinate, as shown in Figure 3; HL) statistic, which obeys the chi-square distribution, and the P value is not significant (>0.05), indicating that the model fits the data well.
表7模型检验指标汇总表Table 7 Model test index summary table
由表7可知,本发明提出的一种应用于区域路网的交通事故严重度预测方法在模型预测准确度和拟合优度方面优于传统的二元logistic回归模型。It can be seen from Table 7 that a traffic accident severity prediction method applied to a regional road network proposed by the present invention is superior to the traditional binary logistic regression model in terms of model prediction accuracy and goodness of fit.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770584.3A CN110458244B (en) | 2019-08-20 | 2019-08-20 | Traffic accident severity prediction method applied to regional road network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770584.3A CN110458244B (en) | 2019-08-20 | 2019-08-20 | Traffic accident severity prediction method applied to regional road network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110458244A CN110458244A (en) | 2019-11-15 |
CN110458244B true CN110458244B (en) | 2021-03-30 |
Family
ID=68488078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910770584.3A Active CN110458244B (en) | 2019-08-20 | 2019-08-20 | Traffic accident severity prediction method applied to regional road network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458244B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942260B (en) * | 2019-12-12 | 2024-02-13 | 长安大学 | College traffic safety evaluation method based on Bayesian maximum entropy |
CN111476274B (en) * | 2020-03-16 | 2024-03-08 | 宜通世纪科技股份有限公司 | Big data predictive analysis method, system, device and storage medium |
CN111951550B (en) * | 2020-08-06 | 2021-10-29 | 华南理工大学 | Traffic safety risk monitoring method, device, storage medium and computer equipment |
CN111931861B (en) * | 2020-09-09 | 2021-01-05 | 北京志翔科技股份有限公司 | Anomaly detection method for heterogeneous data set and computer-readable storage medium |
CN112270994B (en) * | 2020-10-14 | 2021-08-17 | 中国医学科学院阜外医院 | Construction method, device, terminal and storage medium of a risk prediction model |
CN112349098A (en) * | 2020-11-03 | 2021-02-09 | 南京信息职业技术学院 | Method for estimating accident severity by environmental elements in exit ramp area of expressway |
CN112561175A (en) * | 2020-12-18 | 2021-03-26 | 深圳赛安特技术服务有限公司 | Traffic accident influence factor prediction method, device, equipment and storage medium |
CN112837533B (en) * | 2021-01-08 | 2021-11-19 | 合肥工业大学 | Highway accident frequency prediction method considering risk factor time-varying characteristics |
CN113762364B (en) * | 2021-08-23 | 2022-11-04 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
CN114386844A (en) * | 2022-01-11 | 2022-04-22 | 合肥工业大学 | Modeling method based on relation between traffic state before accident and accident |
CN115830800A (en) * | 2022-11-28 | 2023-03-21 | 广州城建职业学院 | Traffic accident early warning method, system, device and storage medium |
CN116882780B (en) * | 2023-07-05 | 2024-04-05 | 北京大学 | A method for rural spatial element extraction and local classification planning based on landscape images |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154681A (en) * | 2016-12-06 | 2018-06-12 | 杭州海康威视数字技术股份有限公司 | Risk Forecast Method, the apparatus and system of traffic accident occurs |
CN109598929A (en) * | 2018-11-26 | 2019-04-09 | 北京交通大学 | A kind of multi-class the number of traffic accidents prediction technique |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130331055A1 (en) * | 2012-06-12 | 2013-12-12 | Guardity Technologies, Inc. | Qualifying Automatic Vehicle Crash Emergency Calls to Public Safety Answering Points |
US10783997B2 (en) * | 2016-08-26 | 2020-09-22 | International Business Machines Corporation | Personalized tolerance prediction of adverse drug events |
CN109447306B (en) * | 2018-08-13 | 2021-07-02 | 上海海事大学 | Prediction method of subway accident delay time based on maximum likelihood regression tree |
-
2019
- 2019-08-20 CN CN201910770584.3A patent/CN110458244B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154681A (en) * | 2016-12-06 | 2018-06-12 | 杭州海康威视数字技术股份有限公司 | Risk Forecast Method, the apparatus and system of traffic accident occurs |
CN109598929A (en) * | 2018-11-26 | 2019-04-09 | 北京交通大学 | A kind of multi-class the number of traffic accidents prediction technique |
Non-Patent Citations (4)
Title |
---|
Evaluation of the safety performance of highway alignments based on fault tree analysis and safety boundaries;Yikai Chen等;《Traffic Injury Prevention》;20180301;第19卷(第4期);第409-416页 * |
The Model of Severity Prediction of Traffic Crash on the Curve;Jian-feng Xi等;《Green Transportation System and Safety》;20140109;第1-6页 * |
基于有序Logit和多项Logit模型的高速公路交通事故严重程度预测;李庚凭;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20190115(第01期);第C034-2019页 * |
车辆碰撞中行人死亡风险及颅脑损伤类型预测研究;冯成建;《中国博士学位论文全文数据库 工程科技Ⅱ辑》;20171115(第11期);第C035-25页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110458244A (en) | 2019-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458244B (en) | Traffic accident severity prediction method applied to regional road network | |
CN110097755B (en) | State recognition method of expressway traffic flow based on deep neural network | |
CN113096388B (en) | A Short-term Traffic Flow Forecast Method Based on Gradient Boosting Decision Tree | |
Zhao et al. | Factors affecting traffic risks on bridge sections of freeways based on partial dependence plots | |
CN112668172B (en) | Car-following behavior modeling method and its model considering the heterogeneity of vehicle types and driving styles | |
CN108550263B (en) | Expressway traffic accident cause analysis method based on fault tree model | |
Rovšek et al. | Identifying the key risk factors of traffic accident injury severity on Slovenian roads using a non-parametric classification tree | |
Zhou et al. | Comparing factors affecting injury severity of passenger car and truck drivers | |
CN110288825B (en) | Traffic control subregion clustering division method based on multi-source data fusion and SNMF | |
CN108665093B (en) | Prediction method of highway traffic accident severity based on deep learning | |
CN111563555A (en) | Driver driving behavior analysis method and system | |
CN105809193B (en) | A kind of recognition methods of the illegal vehicle in use based on kmeans algorithm | |
CN110858312A (en) | Driver driving style classification method based on fuzzy C-means clustering algorithm | |
CN104574968A (en) | Determining method for threshold traffic state parameter | |
Xu et al. | Utilizing structural equation modeling and segmentation analysis in real-time crash risk assessment on freeways | |
CN114926825A (en) | Vehicle driving behavior detection method based on space-time feature fusion | |
CN110119891B (en) | A traffic safety influencing factor identification method suitable for big data | |
CN112149922A (en) | Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel | |
CN112035536A (en) | Electric automobile energy consumption prediction method considering dynamic road network traffic flow | |
CN108682153A (en) | A kind of urban road traffic congestion condition discrimination method based on RFID electronic license plate data | |
CN115587536A (en) | Traffic accident severity prediction method, equipment and storage medium | |
CN113011713B (en) | A driver driving stability assessment method based on information entropy | |
CN112036709B (en) | Random forest based rainfall weather expressway secondary accident cause analysis method | |
Yang | Clearance time prediction of traffic accidents: A case study in Shandong, China | |
CN114693072A (en) | Motorcade structure analysis method, motorcade structure analysis system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |