CN111832225B

CN111832225B - Method for constructing driving condition of automobile

Info

Publication number: CN111832225B
Application number: CN202010644339.0A
Authority: CN
Inventors: 白明泽; 邓川; 覃春园; 葛丝雨
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Guangzhou Dayu Chuangfu Technology Co ltd
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2023-01-31
Anticipated expiration: 2040-07-07
Also published as: CN111832225A

Abstract

The invention relates to the field of automobile working condition data construction, in particular to a method for constructing automobile driving working conditions, which comprises the following steps: acquiring original GPS data of automobile driving, and preprocessing; dividing the preprocessed data into kinematic fragments; performing feature calculation on the kinematics segment to obtain feature parameters of the kinematics segment; dividing the kinematic fragments into four fragment libraries by adopting K-Means clustering; constructing a training data set; inputting the training data set into a model for training to obtain a trained long-short term memory neural network model; predicting by using the trained long and short term memory neural network model to obtain time-speed prediction curves corresponding to the four segment libraries respectively; and combining the curves of the four speed sections into a working condition curve. The method effectively identifies the implicit characteristics in the automobile driving data in the special region through the LSTM network, thereby constructing the automobile driving condition curve according with the driving characteristics.

Description

A method for constructing vehicle driving conditions

技术领域technical field

本发明涉及汽车工况数据构建领域，具体涉及一种构建汽车行驶工况的方法。The invention relates to the field of construction of automobile operating condition data, in particular to a method for constructing automobile operating condition.

背景技术Background technique

汽车行驶工况(Driving Cycle)又称车辆测试循环，是描述汽车行驶的速度 -时间曲线，一般总时间在1800秒以内，但没有限制标准，能够体现汽车道路行驶的运动学特征，是汽车行业的一项重要的、共性基础技术，是车辆能耗/排放测试方法和限值标准的基础，也是汽车各项性能指标标定优化时的主要基准。目前，欧、美、日等汽车发达国家，均采用适应于各自的汽车行驶工况标准进行车辆性能标定优化和能耗/排放认证。Driving Cycle, also known as the vehicle test cycle, is the speed-time curve that describes the driving of the car. Generally, the total time is within 1800 seconds, but there is no limit standard. It can reflect the kinematic characteristics of the car on the road. An important and common basic technology, it is the basis of vehicle energy consumption/emission test methods and limit standards, and is also the main benchmark for the calibration and optimization of various performance indicators of automobiles. At present, developed countries such as Europe, the United States, Japan, etc., adopt the standards adapted to their own driving conditions for vehicle performance calibration optimization and energy consumption/emission certification.

本世纪初，我国直接采用欧洲的NEDC行驶工况对汽车产品能耗/排放的认证，但该工况怠速时间比和平均速度这两个最主要的工况特征，与我国实际汽车行驶工况的差异很大。作为车辆开发、评价的最为基础的依据，开展深入研究，制定反映我国实际道路行驶状况的测试工况，显得越来越重要。同时我国地域辽广，各个城市的发展程度、气候条件及交通状况的不同，使得各个城市的汽车行驶工况特征存在明显的不同。因此，基于城市自身的汽车行驶数据进行城市汽车行驶工况的构建研究也越来越迫切，希望所构建的汽车行驶工况与该市汽车的行驶情况尽量吻合，理想情况下是完全代表该市汽车的行驶情况，因此根据各个城市自己的不同道路情况构建各城市的汽车行驶工况也变得有必要。At the beginning of this century, my country directly adopted the European NEDC driving conditions to certify the energy consumption/emissions of automobile products, but the two most important characteristics of the working conditions, the idling time ratio and the average speed, are different from the actual driving conditions of our country. The difference is huge. As the most basic basis for vehicle development and evaluation, it is becoming more and more important to carry out in-depth research and formulate test conditions that reflect the actual road driving conditions in our country. At the same time, our country has a vast territory, and the development degree, climatic conditions and traffic conditions of each city are different, which makes the characteristics of automobile driving conditions in each city significantly different. Therefore, it is more and more urgent to conduct research on the construction of urban vehicle driving conditions based on the city's own vehicle driving data. Therefore, it is necessary to construct the vehicle driving conditions of each city according to the different road conditions of each city.

目前传统的构建方法涉及使用模糊聚类分析、马尔可夫等方法。模糊聚类分析方法基本都是通过在运动学片段库寻找具有代表性的片段组合起来构成一段接近实际情况的工况曲线，工况曲线是否具有代表性较大地依赖于最佳片段的寻找。此外此类方法都不能够有效的估计和抽取出所有的数据里面的隐含的信息，导致所选择的只是一个能够代表所有的路线的一个普通片段。马尔科夫方法是通过状态转移概率根据现在数据对未来数据的影响求得最终的道路行驶工况，虽可随机生成指定时长的代表工况，但结果较依赖于状态转移概率的准确性，且新状态中能够保留的之前数据的信息量较少，因此新的预测数据较之前的数据能够对未来情况的概括性较低。Current traditional construction methods involve the use of methods such as fuzzy cluster analysis, Markov, etc. The fuzzy clustering analysis method basically finds representative segments in the kinematics segment library and combines them to form a working condition curve close to the actual situation. Whether the working condition curve is representative or not depends largely on the search for the best segment. In addition, such methods cannot effectively estimate and extract the hidden information in all the data, resulting in the selection of only a common segment that can represent all routes. The Markov method uses the state transition probability to obtain the final road driving conditions according to the influence of the current data on the future data. Although the representative working conditions of a specified duration can be randomly generated, the result is more dependent on the accuracy of the state transition probability, and The previous data that can be retained in the new state has less information, so the new forecast data can generalize the future situation less than the previous data.

发明内容Contents of the invention

为了解决上述问题，本发明提供一种构建汽车行驶工况的方法。该方法根据所采集的符合某一特殊地域和地区的行车数据，深度学习行车数据中所隐含特征，从而抽取有效信息，并分段对给定区间的工况曲线进行预测，最终构建出符合地域特征的完整汽车行驶工况曲线。In order to solve the above problems, the present invention provides a method for constructing driving conditions of a vehicle. Based on the collected driving data that fits a special region and area, the method deeply learns the hidden features in the driving data, thereby extracting effective information, predicting the working condition curve of a given interval segmentally, and finally constructing a Complete vehicle driving cycle curves of regional characteristics.

一种构建汽车行驶工况的方法，包括以下步骤：A method for constructing a vehicle driving condition, comprising the following steps:

获取汽车行驶的原始GPS数据，对汽车行驶的原始GPS数据进行预处理；Obtain the original GPS data of the car, and preprocess the original GPS data of the car;

采用短行程划分方法对预处理后的数据进行运动学片段的划分；The short-stroke division method is used to divide the preprocessed data into kinematic segments;

对运动学片段进行特征计算，得到运动学片段的特征参数，采用主成分分析方法对无关特征进行过滤，得到有效的特征参数；Carry out feature calculation on the kinematics segment to obtain the characteristic parameters of the kinematics segment, and use the principal component analysis method to filter irrelevant features to obtain effective feature parameters;

采用K-Means聚类将运动学片段划分为四个片段库，分别是：低速区间片段库、中速区间片段库、高数区间片段库和极高速区间片段库；K-Means clustering is used to divide the kinematics fragments into four fragment libraries, namely: low-speed interval fragment library, medium-speed interval fragment library, high-number interval fragment library and extremely high-speed interval fragment library;

构建训练数据集：将每个片段库中的所有运动学片段进行拼接，得到四个长片段，将四个长片段作为训练数据集；Construct the training data set: splice all the kinematic fragments in each fragment library to obtain four long fragments, and use the four long fragments as the training data set;

将上述训练数据集输入至长短期记忆神经网络模型中进行训练，得到训练好的长短期记忆神经网络模型模型；Inputting the above-mentioned training data set into the long-term short-term memory neural network model for training, and obtaining the trained long-term short-term memory neural network model model;

利用训练好的长短期记忆神经网络模型进行预测，得到四个片段库分别对应的时间-速度预测曲线，具体过程包括：将训练数据集的最后一个样本数据作为第一个输入元素，输入到训练好的长短期记忆神经网络模型中，输出第一预测序列；删除第一个输入元素，将第一预测值作为第二输入元素，输入模型得到第二预测序列；以此类推最终得到一个片段库的预测序列，得到四个片段库分别对应的时间-速度预测曲线；Use the trained long-short-term memory neural network model to predict and obtain the time-speed prediction curves corresponding to the four fragment libraries. The specific process includes: taking the last sample data of the training data set as the first input element and inputting it into the training In a good long-short-term memory neural network model, the first prediction sequence is output; the first input element is deleted, and the first prediction value is used as the second input element, and the input model obtains the second prediction sequence; and so on, a fragment library is finally obtained The predicted sequences of the four fragment libraries are respectively corresponding to the time-speed prediction curves;

得到四个片段库分别对应的时间-速度预测曲线之后，根据四个片段库分别在整个运动学片段中所占的时间比例，确定四类片段库分别在最终工况合成中所占的时间，将四个速度段的曲线合并为一条工况曲线；After obtaining the time-velocity prediction curves corresponding to the four fragment libraries, according to the time ratios of the four fragment libraries in the entire kinematics fragment, determine the time occupied by the four types of fragment libraries in the final working condition synthesis, Merge the curves of four speed sections into one working condition curve;

将所述工况曲线发送给控制设备，控制设备根据工况曲线对车辆尾气排放进行评估和环保等级评定。The working condition curve is sent to the control device, and the control device evaluates the exhaust emission of the vehicle and evaluates the environmental protection level according to the working condition curve.

进一步的，对汽车行驶的原始GPS数据进行预处理包括：Further, the preprocessing of the raw GPS data of the car includes:

从头开始对汽车行驶的原始GPS数据进行遍历搜索，寻找第一时间断点，从第一时间断点处将原始GPS数据划分成不同的行驶片段，所述第一时间断点是汽车行驶的原始GPS数据中时间间隔大于55秒的区域；Carry out traversal search on the original GPS data of car driving from the beginning, find the first time breakpoint, divide the original GPS data into different driving segments from the first time breakpoint, and the first time breakpoint is the original time breakpoint of car driving Areas with a time interval greater than 55 seconds in the GPS data;

判断得到的行驶片段内部是否存在第二时间断点，若存在第二时间断点，则根据第二时间断点前后的速度数据采用改进的多项式拟合方法拟合出一系列新的速度数据点，对行驶片段内部的第二时间断点进行补充，所述第二时间断点是汽车行驶的原始GPS数据中时间间隔大于2秒，且小于等于55秒的区域；Judging whether there is a second time breakpoint in the obtained driving segment, if there is a second time breakpoint, a series of new speed data points are fitted by an improved polynomial fitting method based on the speed data before and after the second time breakpoint , supplementing the second time breakpoint inside the driving segment, the second time breakpoint is an area where the time interval is greater than 2 seconds and less than or equal to 55 seconds in the original GPS data of the car driving;

数据拟合补充完成之后，计算出各个行驶片段每个时间点的加速度，根据加速度异常筛选规则，将加速度异常的行驶片段从数据中剔除；After the data fitting is completed, the acceleration of each time point of each driving segment is calculated, and the driving segment with abnormal acceleration is eliminated from the data according to the abnormal acceleration screening rules;

对于大于180秒的长期怠速的异常数据，使用大小为180的滑动窗口，对每个片段的时间和车速进行滑动，滑动的步长为1s，窗口滑动过程中，如果窗口中的所有数据都为怠速数据，则筛除窗口的第一条数据；当窗口的尾部滑到行驶片段的尾部时，如果此时窗口中的数据均为怠速数据，则该窗口中的数据全部筛除，以此类推对所有行驶片段进行筛除数据，得到预处理后的数据。For the abnormal data of long-term idle speed greater than 180 seconds, use a sliding window with a size of 180 to slide the time and vehicle speed of each segment, and the sliding step is 1s. During the window sliding process, if all the data in the window are For idle data, filter out the first piece of data in the window; when the end of the window slides to the end of the driving segment, if all the data in the window are idle data, all the data in the window will be filtered out, and so on All driving segments are screened out to obtain preprocessed data.

进一步的，采用短行程划分方法对预处理后的数据进行运动学片段划分包括：先判断每个行驶片段的行驶时长是否大于20s，若小于20s，则剔除该条行驶片段；若大于20s，则根据运动学片段的寻找规则从该行驶片段中寻找运动学片段，所述运动学片段的寻找规则包括：Further, using the short-stroke division method to divide the preprocessed data into kinematic segments includes: first judging whether the driving time of each driving segment is greater than 20s, if it is less than 20s, then rejecting the driving segment; if it is greater than 20s, then Find the kinematics segment from the driving segment according to the search rule for the kinematics segment, the search rule for the kinematics segment includes:

(1)从行驶片段的起始时间向下寻找第一个GPS车速为0的点，即怠速起点，如果找到了怠速起点，则记录该怠速起点的位置；接着继续向下找第一个GPS车速不为0的点，即中间点，记录该中间点的位置；(1) From the starting time of the driving segment, look for the first point where the GPS vehicle speed is 0, that is, the starting point of idling speed. If the starting point of idling speed is found, record the position of the starting point of idling speed; then continue to find the first GPS point downward The point at which the vehicle speed is not 0, that is, the middle point, records the position of the middle point;

(2)计算中间点到怠速起点的时间差，如果时间差大于20s，则将怠速起点的位置向下移动20s，再判断中间点到怠速起点的时间差，直至时间差小于20s 为止；寻找下一个GPS车速为0的点，即该运动学片段的怠速终点，记录该怠速终点的位置；(2) Calculate the time difference from the middle point to the start point of idle speed. If the time difference is greater than 20s, move the position of the start point of idle speed down for 20s, and then judge the time difference from the middle point to the start point of idle speed until the time difference is less than 20s; find the next GPS vehicle speed as 0, that is, the idling end point of the kinematics segment, record the position of the idling end point;

(3)根据运动学片段筛选规则对该运动学片段进行筛选，若满足运动学片段筛选规则，则根据记录的怠速起点和怠速终点的位置将运动学片段从行驶片段中提取出来；(3) Screen the kinematics segment according to the kinematics segment screening rule, if the kinematics segment screening rule is met, then extract the kinematics segment from the driving segment according to the recorded positions of the idle start point and the idle end point;

所述运动学片段筛选规则包括：The kinematic fragment screening rules include:

(1)运动学片段的持续时间不少于20秒，即从一个怠速起点开始至下一个怠速起点的时间至少为20秒；(1) The duration of the kinematic segment is not less than 20 seconds, that is, the time from one idle start point to the next idle start point is at least 20 seconds;

(2)运动学片段至少包含一个加速状态和一个减速状态，因此，运动学片段中至少要有满足车辆的加速度大于0.1m/s²和减速度小于-0.1m/s²的连续片段；(2) The kinematics segment contains at least one acceleration state and one deceleration state. Therefore, the kinematics segment must at least have a continuous segment satisfying that the acceleration of the vehicle is greater than 0.1m/s ² and the deceleration is less than -0.1m/s ² ;

(3)运动学片段的怠速时长不超过20秒。(3) The idle time of the kinematics segment shall not exceed 20 seconds.

进一步的，运动学片段的特征参数包括时间特征参数、速度特征参数和加速度特征参数，其中，时间特征参数包括：运行时间t(s)、匀速时间t_i(s)、怠速时间t_c(s)、加速时间t_a(s)、减速时间t_d(s)；速度特征参数包括：平均速度v_m(km/h)、平均行驶速度v_mr(km/h)、最大速度v_max(km/h)、速度标准差v_std(km/h)；加速度特征参数包括：平均加速度a_ma(m/s²)、平均减速度a_md(m/s²)、加速度标准差 a_std(m/s²)、匀速时间比P_c(％)、怠速时间比P_i(％)、加速时间比P_a(％)、减速时间比P_d(％)。Further, the characteristic parameters of the kinematics segment include time characteristic parameters, velocity characteristic parameters and acceleration characteristic parameters, wherein the time characteristic parameters include: running time t(s), constant speed time t _i (s), idle time t _c (s ), acceleration time t _a (s), deceleration time t _d (s); speed characteristic parameters include: average speed v _m (km/h), average driving speed v _mr (km/h), maximum speed v _max (km /h), speed standard deviation v _std (km/h); acceleration characteristic parameters include: average acceleration a _ma (m/s ² ), average deceleration a _md (m/s ² ), acceleration standard deviation a _std (m /s ² ), constant speed time ratio P _c (%), idle time ratio P _i (%), acceleration time ratio P _a (%), and deceleration time ratio P _d (%).

进一步的，采用K-Means聚类将运动学片段划分为四个片段库具体包括以下步骤：首先从所有运动学片段中随机选择4个运动学片段作为初始聚类中心；然后进行簇的指派操作：计算每个运动学片段分别到4个初始聚类中心的欧式距离，根据运动学片段与初始聚类中心的欧氏距离进行分类，将每一个运动学片段指派到欧氏距离最近的那个初始聚类中心，形成4个簇；得到4个簇后，重新计算每个簇的聚类中心，执行步骤S42，直至每一个簇的运动学片段组成不再发生变动，最终得到运动学片段的四个片段库。Further, using K-Means clustering to divide the kinematics fragments into four fragment libraries specifically includes the following steps: first randomly select 4 kinematics fragments from all kinematic fragments as the initial cluster centers; then perform the cluster assignment operation : Calculate the Euclidean distance from each kinematic segment to the four initial cluster centers, classify according to the Euclidean distance between the kinematic segment and the initial cluster center, and assign each kinematic segment to the initial segment with the closest Euclidean distance Clustering centers form 4 clusters; after obtaining 4 clusters, recalculate the clustering centers of each cluster, and perform step S42 until the composition of kinematic segments of each cluster no longer changes, and finally obtain the four clusters of kinematic segments. fragment library.

进一步的，运动学片段到聚类中心的欧式距离的计算方式包括：Further, the calculation method of the Euclidean distance from the kinematics segment to the cluster center includes:

其中，d_ij为第i个运动学片段到聚类中心j的欧式距离，x’_im为第i个运动学片段的第m个特征要素，

为聚类中心j的第m个特征要素。Among them, d _ij is the Euclidean distance from the i-th kinematics segment to the cluster center j, and x' _im is the m-th feature element of the i-th kinematics segment,

is the mth feature element of the cluster center j.

进一步的，所述长短期记忆神经网络模型的结构包括：输入层、LSTM层、全连接层和输出层。Further, the structure of the long short-term memory neural network model includes: an input layer, an LSTM layer, a fully connected layer and an output layer.

进一步的，训练数据集在输入模型之前需要先进行预处理，对训练数据集的预处理包括：丢弃长片段的时间维度，保留长片段的速度维度；设定一个滑动窗口，窗口的长度为时间步的大小，将窗口在长片段上从起始位置向后滑动，每次滑动步长为1秒，取当前时刻窗口所覆盖区域的速度-时间序列片段作为当前时刻的速度-时间序列，取下一时刻窗口所覆盖区域的初始速度值作为当前时刻窗口所覆盖区域的标签；以此类推，分别对四个长片段进行预处理，得到四个长片段预处理后的D_L,D_M,D_H,D_EH训练数据集。Further, the training data set needs to be preprocessed before being input into the model. The preprocessing of the training data set includes: discarding the time dimension of long segments, and retaining the speed dimension of long segments; setting a sliding window whose length is time The size of the step, slide the window backward from the starting position on the long segment, each sliding step is 1 second, take the velocity-time sequence segment of the area covered by the window at the current moment as the velocity-time sequence at the current moment, take The initial velocity value of the area covered by the window at the next moment is used as the label of the area covered by the window at the current moment; and so on, the four long segments are preprocessed respectively to obtain the preprocessed D _L , D _M , D _H , D _EH training data set.

本发明的有益效果：Beneficial effects of the present invention:

本发明搭建一种方便快捷的汽车行驶工况的方法，该方法通过长短期记忆神经LSTM网络模型能够有效的识别出特殊地域的汽车行驶数据中的隐含特征，从而构建出符合该行驶特征的汽车行驶工况曲线。The present invention builds a convenient and fast method for driving conditions of automobiles. The method can effectively identify the hidden features in the driving data of automobiles in special regions through the long short-term memory neural LSTM network model, thereby constructing a vehicle that conforms to the driving characteristics. Vehicle driving curve.

附图说明Description of drawings

下面结合附图和具体实施方式对本发明做进一步详细的说明，附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. The accompanying drawings are only for the purpose of illustrating preferred embodiments, and are not considered to limit the present invention.

图1为本发明实施例的一种构建汽车行驶工况的方法流程图；Fig. 1 is a kind of flow chart of the method for constructing automobile running condition according to the embodiment of the present invention;

图2为LSTM预测模型网络结构示意图；Figure 2 is a schematic diagram of the network structure of the LSTM prediction model;

图3为数据预处理流程图；Fig. 3 is a flow chart of data preprocessing;

图4为数据处理前和数据处理后的对比图；Fig. 4 is the comparative figure before and after data processing of data processing;

图5为两种短行程划分方法示意图；Fig. 5 is a schematic diagram of two short stroke division methods;

图6为运动学片段筛选流程图；Fig. 6 is a flow chart of kinematics segment screening;

图7为LSTM模型的输入数据生成策略示意图；Fig. 7 is a schematic diagram of the input data generation strategy of the LSTM model;

图8为中速段的运动学片段训练和预测结果的示意图；Fig. 8 is a schematic diagram of the kinematics segment training and prediction results of the medium-speed section;

图9为高速段的运动学片段训练和预测结果的示意图；Fig. 9 is a schematic diagram of the kinematics segment training and prediction results of the high-speed section;

图10为车辆行驶工况曲线的构建样例结果图。Fig. 10 is a result diagram of a construction example of a vehicle driving condition curve.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

如图1所示，一种构建汽车行驶工况的方法，包括但不限于以下步骤：As shown in Figure 1, a method for constructing a vehicle driving condition includes but is not limited to the following steps:

获取汽车行驶的原始GPS数据，汽车行驶的原始GPS数据中包含采集时间， GPS车速和GPS加速度，由于外界各种因素以及车辆自身原因往往会包含一些不良数据值，而原始数据的可靠性和准确性直接关系到后续构建工况的有效性，因此对汽车行驶的原始GPS数据进行预处理，根据原始数据中的不良数据类型建立数据筛选原则对原始数据进行筛选处理，删除汽车行驶的原始GPS数据中的无效数据，保留有效合理的数据，便于后续数据的分析以及通过优化特征参数提高工况构建的质量。Get the original GPS data of the car. The original GPS data of the car includes the acquisition time, GPS speed and GPS acceleration. Due to various external factors and the vehicle itself, some bad data values are often included, but the reliability and accuracy of the original data It is directly related to the effectiveness of subsequent construction conditions, so the original GPS data of the car is preprocessed, and the data screening principle is established according to the bad data type in the original data to filter the original data, and the original GPS data of the car is deleted Valid and reasonable data are retained to facilitate subsequent data analysis and improve the quality of working condition construction by optimizing characteristic parameters.

在一个实施例中，原始数据中的不良数据类型以及数据筛选原则包括：In one embodiment, the bad data types in the raw data and the data screening principles include:

(1)对时间不连续情况的片段筛选：由于高层建筑覆盖或过隧道等情况可能会导致汽车行驶时GPS信号丢失，造成汽车行驶的原始GPS数据中时间不连续的情况。为了保证原始数据对预测结果的有效性，需要先对这种时间间断的情况进行预处理。处理方式包括：寻找时间间断点，若不存在缺失值，说明时间连续，不做处理；若存在缺失值，则说明时间间断，若间断时间小于或等于55秒，则进行数据补齐；若间断时间大于55秒，则不做处理，间断时间大于55秒的数据对整体数据不会造成太大的影响，同时也保证了怠速时间比。(1) Fragment screening for time discontinuities: due to high-rise building coverage or passing through tunnels, the GPS signal may be lost when the car is driving, resulting in time discontinuity in the original GPS data of the car. In order to ensure the effectiveness of the original data on the prediction results, it is necessary to preprocess this time-interrupted situation first. The processing methods include: looking for time discontinuity points, if there is no missing value, it means that the time is continuous, and no processing is performed; if there is a missing value, it means that the time is discontinuous, if the discontinuity time is less than or equal to 55 seconds, the data will be supplemented; if discontinuous If the time is greater than 55 seconds, it will not be processed. Data with an intermittent time greater than 55 seconds will not have a great impact on the overall data, and at the same time, the idle time ratio is guaranteed.

(2)加速度异常筛选规则：一般情况下，普通汽车从速度0开始加速至100km/h的加速时间大于7秒，汽车的最大加速度为3.968m/s²；同时汽车紧急刹车的最大减速度在7.5-8m/s²。汽车行驶过程的原始数据中，若加速度不满足上述最大加速度的范围和/或最大减速度的范围，则认为该类数据为加、减速度异常的数据，由于这类异常数据度对整个行驶工况的预测影响较大，将加、减速度异常片段的整个运动学片段及相连的一个怠速片段进行剔除，剔除后将异常片段前后的运动片段进行连接。(2) Screening rules for abnormal acceleration: In general, the acceleration time of an ordinary car from speed 0 to 100km/h is greater than 7 seconds, and the maximum acceleration of the car is 3.968m/s ² ; 7.5-8m/s ² . In the raw data of the driving process of the car, if the acceleration does not meet the range of the above-mentioned maximum acceleration and/or maximum deceleration, it is considered that this type of data is data with abnormal acceleration and deceleration. If the prediction of the situation has a great influence, the entire kinematics segment of the abnormal acceleration and deceleration segment and a connected idle segment are removed, and the motion segments before and after the abnormal segment are connected after the removal.

(3)最大怠速时长规则：在汽车行驶的原始GPS数据中会存在停车不熄火等人、停车熄火但数据采集设备没有关闭等长期停车或者长期堵车情况，这些情况下车辆停止运行，但是数据采集设备没有关闭，此时数据设备仍然在工作，采集到长期保持GPS速度为0的数据段，不符合道路实际行驶情况，因此按怠速情况处理；此外，车辆在行驶过程中也存在断断续续低速行驶的情况，即最高车速小于10km/h，这类数据属于毛刺数据，通常将这些数据点修改为怠速点。怠速时间过长的数据段不满足运动学片段的要求，应该进行删除，以减小这些数据对工况曲线的影响,将怠速时间超过180秒的数据认为怠速时间过长，保留180秒内的怠速数据点作为怠速时间段，并删除剩余部分。(3) Maximum idling duration rule: In the original GPS data of the car, there may be long-term parking or long-term traffic jams such as parking without turning off the engine, parking and turning off the engine but the data acquisition device is not turned off, etc. In these cases, the vehicle stops running, but the data acquisition The device is not turned off, and the data device is still working at this time. The data segment that keeps the GPS speed at 0 for a long time is collected, which does not conform to the actual driving conditions on the road, so it is handled according to the idling speed; in addition, the vehicle also has intermittent low-speed driving during driving. In the case that the maximum vehicle speed is less than 10km/h, this type of data belongs to glitch data, and these data points are usually modified to idle speed points. The data segment with too long idling time does not meet the requirements of the kinematics segment and should be deleted to reduce the impact of these data on the working condition curve. The data with an idling time exceeding 180 seconds is considered to be too long and the data within 180 seconds is retained. The idle speed data points are taken as the idle speed period, and the remaining part is deleted.

在一个实施例中，预处理的顺序包括但不限于：In one embodiment, the order of preprocessing includes but is not limited to:

如图3所示，首先对同一辆车在不同时间段内采集的三个原始数据文件进行遍历搜索。对一个文件中的所有数据从头开始，根据第一时间断点(汽车行驶的原始GPS数据中时间间隔大于55秒的区域)划分成不同的行驶片段。As shown in Figure 3, firstly, a traversal search is performed on three raw data files collected by the same vehicle in different time periods. All the data in a file are divided into different driving segments according to the first time breakpoint (the area with a time interval greater than 55 seconds in the original GPS data of the car driving) from the beginning.

然后针对所有得到的行驶片段，判断所有得到的行驶片段内部是否存在第二时间断点(若汽车行驶的原始GPS数据中时间间隔大于2秒小于等于55秒，则该段间隔区域被认为是第二时间断点)，采用改进的多项式拟合方法对这种行驶片段内部的第二时间断点进行补充，补充的长度为时间差的长度，根据第二时间断点前后的速度数据拟合出一系列新的速度数据点填补在第二时间断点处，对行驶片段内部的第二时间断点进行补充，由于实际情况中GPS的车速不小于 0，所以将多项式函数拟合出来的数据点中的负数全部置换为0。Then for all the driving segments obtained, judge whether there is a second time breakpoint inside all the driving segments obtained (if the time interval in the original GPS data of automobile travel is greater than 2 seconds and less than or equal to 55 seconds, then this section interval area is considered to be the first Two time breakpoints), the improved polynomial fitting method is used to supplement the second time breakpoint inside the driving segment, the supplementary length is the length of the time difference, and a speed data is fitted according to the speed data before and after the second time breakpoint. A series of new speed data points are filled at the second time breakpoint to supplement the second time breakpoint inside the driving segment. Since the vehicle speed of GPS is not less than 0 in actual situations, the data points obtained by fitting the polynomial function Replace all negative numbers with 0.

在数据拟合补充完成之后，计算出各个行驶片段每个时间点的加速度，根据加速度异常筛选规则，将加速度异常的行驶片段以及与异常片段相连的一个怠速片段从数据中剔除，剔除后将异常片段前后的运动片段进行连接。After the data fitting is completed, the acceleration of each time point of each driving segment is calculated, and according to the abnormal acceleration screening rules, the driving segment with abnormal acceleration and an idling segment connected to the abnormal segment are removed from the data, and the abnormal Motion clips before and after the clip are connected.

对于大于180秒的长期怠速的异常数据，使用大小为180的滑动窗口，对每个片段的时间和车速进行滑动，滑动的步长为1s。窗口滑动过程中，如果窗口中的所有数据都为怠速数据，则筛除窗口的第一条数据；当窗口的尾部滑到行驶片段的尾部时，如果此时窗口中的数据均为怠速数据，则该窗口中的数据全部筛除。以此类推对所有行驶片段进行筛除数据，得到预处理后的数据。For the abnormal data of long-term idle speed greater than 180 seconds, use a sliding window with a size of 180 to slide the time and vehicle speed of each segment, and the sliding step is 1s. During the sliding process of the window, if all the data in the window are idle speed data, then filter out the first piece of data in the window; Then all the data in this window will be filtered out. By analogy, all driving segments are screened out to obtain preprocessed data.

采用短行程划分方法对预处理后的数据进行运动学片段划分。The preprocessed data is divided into kinematic segments by short-stroke division method.

如图5所示，所述运动学片段包括怠速时间段和行驶时间段，且每个运动学片段时间长度M不超过600s，且其中怠速时间段最长不超过20s，怠速时间超出的部分进行删除。每个运动学片段以速度为零开始，以速度为零结束，开始至结束段之间的区间内可以包含有怠速片段。As shown in Figure 5, the kinematics segment includes an idling time period and a driving time period, and the length M of each kinematics segment does not exceed 600s, and the longest idling time period does not exceed 20s, and the idle time exceeds the delete. Each kinematics segment starts with a velocity of zero and ends with a velocity of zero, and an idle segment can be included in the interval between the start and end segments.

在一个可选的实施例中，运动学片段中，怠速时间段和行驶时间段的截取可以采用方法一：运动学片段以速度为零开始，以速度为零结束，怠速时间段在行驶时间段之前。In an optional embodiment, in the kinematics segment, the interception of the idle time period and the driving time period can adopt the first method: the kinematics segment starts with a speed of zero and ends with a speed of zero, and the idle time period is within the driving time period Before.

在一个可选的实施例中，运动学片段中，怠速时间段和行驶时间段的截取可以采用方法二：运动学片段以速度为零开始，以速度为零结束，怠速时间段在行驶时间段之后。In an optional embodiment, in the kinematics segment, the interception of the idle time period and the driving time period can adopt the second method: the kinematics segment starts with a speed of zero and ends with a speed of zero, and the idle time period is within the driving time period after.

在一个实施例中，首先建立运动学片段筛选规则：In one embodiment, first establish kinematics segment screening rules:

(1)运动学片段的持续时间不少于20秒，即从一个怠速起点开始至下一个怠速起点的时间至少为20秒。(1) The duration of the kinematics segment is not less than 20 seconds, that is, the time from one idle start point to the next idle start point is at least 20 seconds.

(2)运动学片段至少包含一个加速状态和一个减速状态，因此，运动学片段中至少要有满足车辆的加速度大于0.1m/s²和减速度小于-0.1m/s²的连续片段。(2) The kinematics segment contains at least one acceleration state and one deceleration state. Therefore, the kinematics segment must at least have a continuous segment satisfying that the acceleration of the vehicle is greater than 0.1m/s ² and the deceleration is less than -0.1m/s ² .

如附图6所示，根据运动学片段筛选原则，采用短行程划分方法在预处理得到的行驶片段数据中找出运动学片段，具体包括：首先判断每个行驶片段的行驶时长是否大于20s，若小于20s，则剔除该条行驶片段；若大于20s，则根据运动学片段的寻找规则从该行驶片段中寻找运动学片段。As shown in Figure 6, according to the kinematics segment screening principle, the short-stroke division method is used to find the kinematics segment in the pre-processed driving segment data, which specifically includes: firstly, judging whether the driving time of each driving segment is greater than 20s, If it is less than 20s, then delete the driving segment; if it is greater than 20s, then search for the kinematics segment from the driving segment according to the search rule of the kinematics segment.

在一个实施例中，所述运动学片段的寻找规则包括：In one embodiment, the search rule for the kinematics segment includes:

(1)从行驶片段的起始时间向下寻找怠速起点(GPS车速为0的点)，如果找到了怠速起点，则记录该怠速起点的位置。接着继续向下找第一个GPS车速不为0的点，记录为中间点，记录该中间点的位置并判断中间点到怠速起点的时间差；(1) Find the starting point of idling (the point where the GPS vehicle speed is 0) downwards from the starting time of the driving segment. If the starting point of idling is found, record the position of the starting point of idling. Then continue to find the first point where the GPS speed is not 0, record it as the middle point, record the position of the middle point and judge the time difference from the middle point to the starting point of idling;

(2)如果时间差大于20s，则将怠速起点的位置向下移动20，再判断中间点到怠速起点的时间差，直至时间差小于20s为止，寻找下一个怠速点，下一个怠速点即该运动学片段的怠速终点(GPS车速为0的点)，如果找到，则记录该怠速终点的位置。(2) If the time difference is greater than 20s, move the position of the starting point of the idle speed down by 20, and then judge the time difference from the middle point to the starting point of the idle speed until the time difference is less than 20s, and then look for the next idle point, which is the kinematics segment The idle speed end point (the point where the GPS vehicle speed is 0), if found, then record the position of the idle speed end point.

(3)根据运动学片段筛选规则对该运动学片段进行筛选，若满足运动学片段筛选规则，则根据记录的怠速起点和怠速终点的位置将运动学片段从行驶片段中提取出来。(3) The kinematics segment is screened according to the kinematics segment screening rule, and if the kinematics segment screening rule is satisfied, the kinematics segment is extracted from the driving segment according to the recorded positions of the idle start point and the idle end point.

根据运动学片段的特征参数计算公式对运动学片段进行特征计算，得到运动学片段的特征参数。运动学片段的特征参数包括16个特征参数，可以分为时间特征参数、速度特征参数和加速度特征参数三类，其中时间特征参数包括：运行时间t(s)、匀速时间t_i(s)、怠速时间t_c(s)、加速时间t_a(s)、减速时间t_d(s)；速度特征参数包括：平均速度v_m(km/h)、平均行驶速度v_mr(km/h)、最大速度v_max(km/h)、速度标准差v_std(km/h)；加速度特征参数包括：平均加速度a_ma(m/s²)、平均减速度a_md(m/s²)、加速度标准差a_std(m/s²)、匀速时间比P_c(％)、怠速时间比P_i(％)、加速时间比P_a(％)、减速时间比P_d(％)。According to the characteristic parameter calculation formula of the kinematic segment, the feature calculation of the kinematic segment is performed to obtain the characteristic parameter of the kinematic segment. The characteristic parameters of the kinematics segment include 16 characteristic parameters, which can be divided into three types: time characteristic parameters, velocity characteristic parameters and acceleration characteristic parameters. The time characteristic parameters include: running time t(s), constant velocity time t _i (s) , idle time t _c (s), acceleration time t _a (s), deceleration time t _d (s); speed characteristic parameters include: average speed v _m (km/h), average driving speed v _mr (km/h) , maximum speed v _max (km/h), speed standard deviation v _std (km/h); acceleration characteristic parameters include: average acceleration a _ma (m/s ² ), average deceleration a _md (m/s ² ), Acceleration standard deviation a _std (m/s ² ), constant speed time ratio P _c (%), idle time ratio P _i (%), acceleration time ratio P _a (%), deceleration time ratio P _d (%).

在一个实施例中，运动学片段的特征参数计算公式包括：In one embodiment, the formula for calculating the characteristic parameters of the kinematic segment includes:

(1)运行时间t(s)：由于采样频率为1Hz，因此运行时间的计算方式为：(1) Running time t(s): Since the sampling frequency is 1Hz, the calculation method of running time is:

t＝nt=n

其中，n为采集到的行驶数据的个数。Among them, n is the number of collected driving data.

(2)怠速时间t_i：t_i为运动学片段中数据为0的个数。(2) Idle time t _i : t _i is the number of zeros in the kinematics segment.

(3)加速时间t_a：t_a为加速度大于0.1m/s²的总点数。(3) Acceleration time t _a : t _a is the total number of points whose acceleration is greater than 0.1m/s ² .

(4)减速时间t_d：t_d为加速度小于-0.1m/s²的总点数。(4) Deceleration time t _d : t _d is the total number of points whose acceleration is less than -0.1m/s ² .

(5)匀速时间t_c的计算方式为：t_c＝t-t_i-t_a-t_d (5) The calculation method of constant velocity time t _c is: t _c =tt _i -t _a -t _d

(6)平均速度v_m的计算方式为：

(6) The calculation method of the average velocity v _m is:

式(6)中，q为一个运动学片段的总数据点数量，v_p表示数据点p的速度值。In formula (6), q is the total number of data points of a kinematic segment, and v _p represents the velocity value of data point p.

(7)平均行驶速度v_mr的计算方式为：(7) The calculation method of the average driving speed v _mr is:

其中，q为一个运动学片段的总数据点数量。where q is the total number of data points for a kinematic segment.

(8)最大速度v_max的计算方式为：(8) The calculation method of the maximum speed v _max is:

v_max＝max{v_p,p＝1,2,3...q}v _max =max{v _p ,p=1,2,3...q}

(9)速度标准差v_std的计算方式为：(9) The calculation method of the speed standard deviation v _std is:

(10)平均加速度a_ma的计算方式为：(10) The calculation method of the average acceleration a _ma is:

(11)平均减速度a_md的计算方式为：(11) The calculation method of the average deceleration _amd is:

其中，a_p表示GPS测得的数据点p对应的加速度值。Among them, a _p represents the acceleration value corresponding to the data point p measured by GPS.

(12)加速度标准差a_std的计算方式为：

(12) The calculation method of acceleration standard deviation a _std is:

(13)怠速时间比P_i的计算方式为：P_i＝t_i/t(13) The calculation method of the idle time ratio P _i is: P _i =t _i /t

(14)匀速时间比P_c的计算方式为：P_c＝t_c/t(14) The calculation method of constant speed time ratio P _c is: P _c =t _c /t

(15)加速时间比P_a的计算方式为：P_a＝t_a/t(15) The calculation method of the acceleration time ratio P _a is: P _a =t _a /t

(16)减速时间比P_d的计算方式为：P_d＝t_d/t(16) The calculation method of the deceleration time ratio P _d is: P _d =t _d /t

进一步的，在一个优选实施例中，在得到上述运动学片段的特征后，采用主成分分析方法对无关特征进行过滤，得到关系较大的特征参数。Further, in a preferred embodiment, after obtaining the above-mentioned features of the kinematic segment, a principal component analysis method is used to filter irrelevant features to obtain feature parameters with a relatively large relationship.

由于采用主成分分析方法对无关特征进行过滤不是本发明的创新之处，也不是本发明的重点，故本说明书对此不做过多赘述，采用主成分分析方法对无关特征进行过滤的具体过程可参照现有技术“基于K-均值聚类分析的城市道路汽车行驶工况构建方法研究”中相应的主成分分析步骤。Since the use of the principal component analysis method to filter irrelevant features is not the innovation of the present invention, nor is it the focus of the present invention, so this specification will not go into details. The specific process of using the principal component analysis method to filter irrelevant features Reference can be made to the corresponding principal component analysis steps in the prior art "Research on the Construction Method of Vehicle Driving Conditions on Urban Roads Based on K-Means Cluster Analysis".

根据运动学片段的特征，采用K-Means聚类将运动学片段划分为四个片段库，分别是：低速区间片段库L、中速区间片段库M、高数区间片段库H和极高速区间片段库EH。其中，低速区间片段库中的运动学片段最高速度不超过 60km/h，中速区间片段库中的运动学片段最高速度不超过80km/h，高速区间片段库中的运动学片段最高速度不超过100km/h，极高速区间片段库中的运动学片段最高速度不超过130km/h。According to the characteristics of kinematic fragments, K-Means clustering is used to divide kinematic fragments into four fragment libraries, namely: low-speed interval fragment library L, medium-speed interval fragment library M, high-speed interval fragment library H and extremely high-speed interval Fragment library EH. Among them, the maximum speed of the kinematic segments in the low-speed segment library is not more than 60km/h, the maximum speed of the kinematic segment in the medium-speed segment library is not more than 80km/h, and the maximum speed of the kinematic segment in the high-speed segment library is not more than 100km/h, the maximum speed of the kinematics fragments in the very high-speed section fragment library shall not exceed 130km/h.

在一个实施例中，采用K-Means聚类算法进行运动学片段划分，使用欧式距离作为类别远近的距离度量标准，具体包括：In one embodiment, the K-Means clustering algorithm is used to divide the kinematic segments, and the Euclidean distance is used as the distance metric of the category distance, which specifically includes:

(1)首先从所有运动学片段中随机选择4个运动学片段作为初始聚类中心；(1) First randomly select 4 kinematic segments from all kinematic segments as initial cluster centers;

(2)进行簇的指派操作：计算每个运动学片段分别到4个初始聚类中心的欧式距离，根据运动学片段与初始聚类中心的欧氏距离进行分类，将每一个运动学片段指派到欧氏距离最近的那个初始聚类中心，形成4个簇；(2) Perform cluster assignment operation: calculate the Euclidean distance from each kinematic segment to the four initial cluster centers, classify according to the Euclidean distance between the kinematic segment and the initial cluster center, and assign each kinematic segment To the initial cluster center with the closest Euclidean distance, 4 clusters are formed;

(3)得到4个簇后，重新计算每个簇的新的聚类中心，执行步骤(2)，直至每一个簇的运动学片段组成不再发生变动，最终得到运动学片段的四个片段库。(3) After obtaining 4 clusters, recalculate the new cluster center of each cluster, and perform step (2) until the composition of kinematic segments of each cluster does not change, and finally obtain four segments of kinematic segments library.

每个簇中新的聚类中心的计算方式包括：The calculation method of the new cluster center in each cluster includes:

C_i＝mean(x₁,x₂,x₃,…,x_n)C _i ＝mean(x ₁ ,x ₂ ,x ₃ ,…,x _n )

其中，C_i表示簇i中新的聚类中心，x为被指派到簇i中的运动学片段，n 为属于簇i中运动学片段的个数，mean为取均值操作。Among them, C _i represents the new cluster center in cluster i, x is the kinematics segment assigned to cluster i, n is the number of kinematics segments belonging to cluster i, and mean is the mean value operation.

进一步的，在一个实施例中，K-means聚类算法中，度量聚类质量的准则函数是误差的平方和，误差的平方和越小，则说明聚类质量越高。K-means聚类算法的准则函数为：Further, in one embodiment, in the K-means clustering algorithm, the criterion function for measuring clustering quality is the sum of squares of errors, and the smaller the sum of squares of errors, the higher the quality of clustering. The criterion function of the K-means clustering algorithm is:

其中，SSE表示误差平方和(Sumofthe Squared Error，SSE)，k表示k个聚类中心，即是最终聚类结果有k类，本说明书中中k＝4，x为需要聚类的数据点， C_i为聚类中心，dist为欧几里得度量。Wherein, SSE represents the sum of squared errors (Sumofthe Squared Error, SSE), and k represents k clustering centers, that is, the final clustering result has k classes, k=4 in this specification, x is the data point that needs to be clustered, C _i is the cluster center, and dist is the Euclidean metric.

进一步的，在一个实施例中，K-means聚类算法采用欧几里得度量计算相似程度，运动学片段与聚类中心的欧氏距离为：Further, in one embodiment, the K-means clustering algorithm uses the Euclidean metric to calculate the degree of similarity, and the Euclidean distance between the kinematic segment and the cluster center is:

is the mth feature element of the cluster center j.

应当说明的是，除了采用欧几里得度量计算相似程度外，可选的，本说明书中还可以采用其他方式实现度量计算相似程度，例如：欧式平方距离、明可夫斯基距离、切比雪夫距离、布洛克距离等，可以是现有技术中的任意一种可实现的相似度度量方式。It should be noted that, in addition to using the Euclidean metric to calculate the similarity, alternatively, other methods can be used to measure the similarity in this specification, for example: Euclidean square distance, Minkowski distance, Chebi Schiff distance, Block distance, etc., may be any achievable similarity measurement method in the prior art.

进一步的，在一个实施例中，若出现运动学片段到各类中心的最小欧式距离相等而无法判断相似程度时，采用贴近度准则进行判断，贴近度准则的公式如下：Further, in one embodiment, if the minimum Euclidean distances from kinematics segments to various centers are equal and the degree of similarity cannot be judged, the closeness criterion is used for judgment, and the formula of the closeness criterion is as follows:

其中，

表示任一需要进行贴近度判定的运动学片段i的速度-时间向量，

为聚类中心m的速度-时间向量，

表示运动学片段i与聚类中心m的贴近度值。in,

Represents the velocity-time vector of any kinematics segment i that needs to be judged by closeness,

is the velocity-time vector of the cluster center m,

Indicates the closeness value between the kinematics segment i and the cluster center m.

根据上述贴近度准则的公式计算出运动学片段与所有聚类中心的贴近度值，从得到的贴近度值中找到最大值，并将该运动学片段划分到贴近度值最大值所对应的聚类中心，即贴近度值最大值所对应的聚类中心为该运动学片段所属的类别。Calculate the closeness value of the kinematics segment and all cluster centers according to the formula of the above closeness criterion, find the maximum value from the obtained closeness value, and divide the kinematics segment into the cluster corresponding to the maximum closeness value. The cluster center, that is, the cluster center corresponding to the maximum value of closeness, is the category to which the kinematic segment belongs.

将每个片段库中的所有运动学片段分别进行拼接，得到四个长片段，将四个长片段作为训练数据集。All the kinematic fragments in each fragment library are spliced separately to obtain four long fragments, and the four long fragments are used as the training data set.

训练数据集构建方法包括：对于每一个片段库，将每个片段库中的所有运动学片段s进行拼接，得到速度-时间序列Si(即长片段)，对于四个片段库，分别对应得到S_L，S_M，S_H和S_EH四个长片段。The training data set construction method includes: for each fragment library, splicing all the kinematic fragments s in each fragment library to obtain the speed-time sequence Si (ie long fragment), and for the four fragment libraries, correspondingly obtain S _L , _SM , _SH and S _EH four long fragments.

将上述训练数据集输入至长短期记忆神经网络模型(LSTM)中进行训练，长短期记忆神经网络模型(LSTM)在训练过程中不断学习不同类别片段库中包含的速度-时间序列的变化特征，训练出四个对应的时间-速度序列预测模型，得到训练好的长短期记忆神经网络模型。需要说明的是，本说明书的所述时间-速度序列预测模型与所述长短期记忆神经网络模型为相同模型。The above training data set is input into the long-term short-term memory neural network model (LSTM) for training, and the long-term short-term memory neural network model (LSTM) continuously learns the change characteristics of the speed-time series contained in the fragment library of different categories during the training process, Four corresponding time-speed sequence prediction models are trained to obtain a trained long-short-term memory neural network model. It should be noted that the time-speed sequence prediction model in this specification is the same model as the long-short-term memory neural network model.

长短期记忆神经网络模型能够根据不同类别片段库的时空相关性完成时间序列的重构，依靠模型训练对时间关联特性进行识别和强化，预测得到不同类别的具有代表特征的运动学片段。由于长短期记忆神经网络模型的网络结构是确定的，需要输入为特定维度的向量，而训练数据集中的数据长度不一，不满足此条件，因此，训练数据集在输入模型之前需要先进行预处理，将训练数据集处理为时间-速度序列预测模型能够使用的监督学习的数据。The long-short-term memory neural network model can complete the reconstruction of time series according to the spatiotemporal correlation of different categories of fragment libraries, rely on model training to identify and strengthen the temporal correlation characteristics, and predict different categories of kinematic fragments with representative characteristics. Since the network structure of the long-short-term memory neural network model is definite, it needs to be input as a vector of a specific dimension, and the length of the data in the training data set is different, which does not meet this condition. Therefore, the training data set needs to be pre-prepared before inputting the model. Processing, the training data set is processed into supervised learning data that can be used by the time-velocity series prediction model.

对训练数据集进行预处理包括：Preprocessing the training dataset includes:

1.丢弃长片段的时间维度，保留长片段的速度维度；1. Discard the time dimension of long fragments and retain the speed dimension of long fragments;

2.设定一个滑动窗口，窗口的长度为时间步的大小，将窗口在长片段上从起始位置向后滑动，每次滑动步长为1秒，取当前时刻窗口所覆盖区域的速度- 时间序列片段作为当前时刻的速度-时间序列，取下一时刻窗口所覆盖区域的初始速度值作为当前时刻窗口所覆盖区域的标签；2. Set a sliding window, the length of the window is the size of the time step, slide the window backward from the starting position on the long segment, each sliding step is 1 second, take the speed of the area covered by the window at the current moment - The time series segment is used as the velocity-time series at the current moment, and the initial velocity value of the area covered by the window at the next moment is taken as the label of the area covered by the window at the current moment;

具体的，滑动窗口W的长度为n，滑动窗口W在长度为L的速度-时间序列S (一个长片段)上滑动，以t₀时刻开始，t₀时刻滑动窗口的覆盖区域为

取该覆盖区域的速度-时间序列片段作为t₀时刻的速度-时间序列s₀，取t_n+1时刻的速度值

作为s₀的标签，构成t₀时刻的数据片段

以此类推，滑动窗口继续向后滑动，产生t₁时刻的数据片段

直到生成t_L-n时刻的数据片段

共计得到L-n个数据片段，根据L-n个数据片段构建出长片段有预处理后的训练数据集为D＝{d_m|0<m<L-n}。其中，L表示长片段S的数据长度，d_m表示第m个数据片段，s₁表示t₁时刻的速度-时间序列，

表示t_n+2时刻的速度值，

表示t_L时刻的速度值。Specifically, the length of the sliding window W is n, and the sliding window W slides on the speed-time sequence S (a long segment) of length L, starting at time t ₀ , and the coverage area of the sliding window at time t ₀ is

Take the speed-time series segment of the coverage area as the speed-time series s ₀ at time t ₀ , and take the speed value at time t _n+1

As the label of s ₀ , it constitutes the data segment at time t ₀

By analogy, the sliding window continues to slide backwards to generate data fragments at time t ₁

Until the data fragment at time t _Ln is generated

A total of Ln data fragments are obtained, and a long-segment preprocessed training data set is constructed based on the Ln data fragments as D={d _m |0<m<Ln}. Among them, L represents the data length of the long segment S, d _m represents the mth data segment, s ₁ represents the speed-time sequence at time t ₁ ,

Indicates the velocity value at time t _n+2 ,

Indicates the velocity value at time t _L.

3.以此类推，分别对四个长片段进行预处理，得到四个长片段预处理后的 D_L,D_M,D_H,D_EH训练数据集。3. By analogy, the four long fragments are preprocessed respectively, and four long fragment preprocessed D _L , D _M , D _H , D _EH training data sets are obtained.

本发明利用LSTM循环神经网络作为预测核心来生成汽车行驶工况曲线，构建时间-速度序列预测模型。时间-速度序列预测模型如附图2所示，该模型包括输入层、LSTM层、全连接层和输出层，所述输入层后紧接着是一层LSTM 层，再经过两层全连接层，最后接入一层输出层。由于模型的输入层中输入的是K-Means聚类之后的四种类别的运动学片段库，因此输入数据具有潜在的周期性，有助于训练，LSTM层内部包含3个隐含层，使用tanh作为激活函数，用于学习输入序列中潜在的时序关系、抽象特征；所述全连接层将LSTM层的输出降维并将LSTM层学习的抽象特征进行映射，全连接层的输出维度为1；最后输出层输出预测速度数据结果，将预测的结果进行顺序拼接，最终得到汽车行驶工况曲线。The invention utilizes the LSTM cycle neural network as the prediction core to generate the driving condition curve of the automobile, and constructs the time-speed sequence prediction model. The time-velocity sequence prediction model is shown in Figure 2. The model includes an input layer, an LSTM layer, a fully connected layer and an output layer. The input layer is followed by a layer of LSTM layer, and then through two layers of fully connected layers. The last layer is connected to the output layer. Since the input layer of the model is the kinematic fragment library of four categories after K-Means clustering, the input data is potentially periodic, which is helpful for training. The LSTM layer contains 3 hidden layers. Use As an activation function, tanh is used to learn potential temporal relationships and abstract features in the input sequence; the fully connected layer reduces the output dimension of the LSTM layer and maps the abstract features learned by the LSTM layer, and the output dimension of the fully connected layer is 1 ; The final output layer outputs the predicted speed data results, and the predicted results are sequentially spliced, and finally the vehicle driving condition curve is obtained.

对输入的训练数据集进行预处理后，将预处理后的训练数据集 D_L,D_M,D_H,D_EH输入长短期记忆神经网络模型进行时间-速度序列预测模型的训练，时间-速度序列预测模型的训练包括：对于LSTM层部分，第一层隐含层的输入维度为(1，n)，第二层隐含层的输入维度为(1,6)，第三层隐含层的输入维度为(1,8)，输入数据在LSTM层中的流向是：首先经过LSTM门结构，得到处理过的中间输出，之后将中间输出再经过tanh激活函数作为该层的输出结果。最后将LSTM层的输出结果输入到一层全连接层，全连接层的输入维度为4，输出维度为1，输出的是对当前输入向量后紧邻的速度点的速度预测值。模型的训练迭代轮数为100轮，训练过程分为向前传播过程和误差反向传播过程：向前传播使用时间-速度序列预测模型的输入为预处理后的训练集X中的速度向量数据D_i，i＝(L,M,H,EH)作为输入，经过上述的LSTM层和全连接层，所有层的权重随机以0.2的概率失活，得到输出，使用均方误差作为损失函数，得到误差，进行误差反向传播，并使用Adam函数作为每一层的优化函数进行每一层的权重更新，优化函数的学习率设置为0.001，网络训练的批量处理大小设为1，分别一次性对训练集X中的四个片段库D_L,D_M,D_H,D_EH中的全部的输入数据进行训练，训练得到四个对应的时间-速度序列预测模型M_L,M_M,M_H,M_EH，四个模型的内部结构相同，但是训练的数据集不同，因此得到的内部权重不同，对应能够预测汽车工况中的低、中、高和极高四个部分的片段。After preprocessing the input training data set, input the preprocessed training data set D _L , D _M , D _H , D _EH into the long short-term memory neural network model to train the time-speed sequence prediction model, time-speed The training of the sequence prediction model includes: for the LSTM layer part, the input dimension of the first hidden layer is (1,n), the input dimension of the second hidden layer is (1,6), and the third hidden layer The input dimension is (1,8), and the flow direction of the input data in the LSTM layer is: first pass through the LSTM gate structure to obtain the processed intermediate output, and then pass the intermediate output through the tanh activation function as the output result of this layer. Finally, the output of the LSTM layer is input to a fully connected layer. The input dimension of the fully connected layer is 4, and the output dimension is 1. The output is the speed prediction value of the speed point immediately after the current input vector. The number of training iterations of the model is 100 rounds, and the training process is divided into forward propagation process and error back propagation process: Forward propagation uses time-velocity sequence prediction model The input is the velocity vector data in the preprocessed training set X D _i , i=(L, M, H, EH) is used as input, after the above-mentioned LSTM layer and fully connected layer, the weights of all layers are randomly deactivated with a probability of 0.2, and the output is obtained, and the mean square error is used as the loss function. Get the error, perform error backpropagation, and use the Adam function as the optimization function of each layer to update the weight of each layer. The learning rate of the optimization function is set to 0.001, and the batch size of network training is set to 1, respectively. Train all the input data in the four fragment libraries D _L , D _M , D _H , D _EH in the training set X, and get four corresponding time-speed sequence prediction models M _L , M _M , M _H , M _EH , the internal structure of the four models is the same, but the training data sets are different, so the internal weights obtained are different, corresponding to the fragments that can predict the four parts of the low, medium, high and extremely high parts of the vehicle operating conditions.

优选地，采用的LSTM中的基本门结构为：Preferably, the basic gate structure in the adopted LSTM is:

遗忘门：f_t＝σ(W_f×[h_t-1,x_t]+b_f)Forget gate: f _t = σ(W _f ×[h _t-1 ,x _t ]+b _f )

输入门：i_t＝σ(W_i×[h_t-1，x_t]+b_i)Input gate: i _t = σ(W _i ×[h _t-1 , x _t ]+ _bi )

输出门：o_t＝σ(W_o×[h_t-1，x_t]+b_o)Output gate: o _t = σ(W _o ×[h _t-1 , x _t ]+b _o )

h_t＝o_t×tanh(c_t)h _t ＝o _t ×tanh(c _t )

其中，符号“。”表示两个向量对应位置上的元素相乘；符号“σ”表示sigmoid 函数，用其作为激活函数；c代表细胞状态，

代表当前输入的单元状态，x_t为细胞输入，h_t为细胞输出，W，b分别为算法中各个门的权重和偏移量。具体为： f_t是遗忘向量，决定上一时刻中哪些信息从单元状态中遗忘，W_f为遗忘门中的权重，h_t-1为上一个门结构的输出，x_t为t时刻的输入，b_f为遗忘门中的偏置； i_t是输入门向量，决定单元状态中被保留的信息，W_i是输入门中的权重，b_i为输入门中的偏置；o_t为输出门的输出向量，W_o是输出门中的权重，b_o是输出门中的偏置，

是细胞单元状态更新值，W_c是细胞单元网络中的权重，b_c是细胞单元网络中的偏置，c_t-1是上一个时刻(t-1)的细胞状态。Among them, the symbol "." represents the multiplication of elements at the corresponding positions of the two vectors; the symbol "σ" represents the sigmoid function, which is used as the activation function; c represents the cell state,

Represents the current input cell state, x _t is the cell input, h _t is the cell output, W, b are the weight and offset of each gate in the algorithm, respectively. Specifically: f _t is the forgetting vector, which determines which information is forgotten from the unit state at the previous moment, W _f is the weight in the forgetting gate, h _t-1 is the output of the previous gate structure, and x _t is the input at time t , b _f is the bias in the forget gate; _it is the input gate vector, which determines the information retained in the cell state, W _i is the weight in the input gate, b _i is the bias in the input gate; o _t is the output The output vector of the gate, W _o is the weight in the output gate, b _o is the bias in the output gate,

is the update value of the cell state, W _c is the weight in the cell network, b _c is the bias in the cell network, c _t-1 is the cell state at the last moment (t-1).

利用训练好的长短期记忆神经网络模型进行预测，得到四个片段库分别对应的时间-速度预测曲线。The trained long-short-term memory neural network model is used for prediction, and the time-speed prediction curves corresponding to the four fragment libraries are obtained.

对于某一个片段库，根据该片段库中的运动学数据预测出未来某一时间的速度，得到汽车工况行驶曲线。For a segment library, the speed at a certain time in the future is predicted according to the kinematics data in the segment library, and the driving curve of the vehicle operating condition is obtained.

预测过程包括：由上述得到的模型M_i，i＝(L，M，H，EH)，以训练集X中对应训练数据D_i中的最后一个数据

作为输入，预测出t_L+1时刻的速度值

进而由

预测出t_L+2时刻的速度值

知道预测出t_L+M的速度值

取

速度值作为该区间的抽取出的能够代表该区间的所有速度变化曲线特征的时间-速度预测曲线P_i。以此种方式，最后得到低速区间片段库L、中速区间片段库M、高数区间片段库H和极高速区间片段库EH分别对应的时间-速度预测曲线为P_L，P_M，P_H，P_EH。The prediction process includes: the model M _i obtained above, i=(L, M, H, EH), and the last data in the corresponding training data D _i in the training set X

As input, predict the speed value at time t _L+1

And then by

Predict the speed value at time t _L+2

Knowing the predicted speed value of t _L+M

Pick

The speed value is taken as the time-speed prediction curve P _i extracted from the interval and can represent the characteristics of all speed change curves in the interval. In this way, the time-speed prediction curves respectively corresponding to the low-speed segment library L, the medium-speed segment library M, the high-speed segment library H and the extremely high-speed segment library EH are finally obtained as P _L , _PM , P _H , P _EH .

得到四个片段库分别对应的时间-速度预测曲线之后，根据四个片段库分别在整个运动学片段中所占的时间比例，确定四类片段库分别在最终工况合成中所占的时间，将四个速度段的时间-速度预测曲线P_L，P_M，P_H，P_EHy依次按顺序拼接成一条工况曲线，得到最终的汽车行驶工况曲线P。After obtaining the time-velocity prediction curves corresponding to the four fragment libraries, according to the time ratios of the four fragment libraries in the entire kinematics fragment, determine the time occupied by the four types of fragment libraries in the final working condition synthesis, The time-speed prediction curves _PL , _PM , _PH , and P _EHy of the four speed segments are sequentially spliced into a working condition curve to obtain the final vehicle driving condition curve P.

由上述本发明的汽车行驶工况曲线构建方法提供的技术方案可以看出，由于汽车的行驶中速度变化比较复杂，汽车运行产生的时间-速度曲线里面能够包含所有的车辆行驶的信息：加速度，最大加速度，最大减速度以及平均的加减速度，平均的速度和怠速区间的情况，没有严格的规律可循，具有不确定性，不稳定性，在实践和空间上具有非线性，数据量比较大。长短期记忆(LSTM) 循环神经网络结构复杂，其每次的输出信息不仅与输入有关，还与细胞单元的记忆内容、上次的输出结果相关，同时LSTM模型弥补了典型机器学习模型如循环神经网络(RecurrentNeuralNetworks,RNN)等结构的梯度弥散和梯度爆炸、长期记忆能力不足等问题，使得循环神经网络相对于传统的拟合方法，能够真正有效利用长距离的时序信息，汽车工况曲线恰好是一个典型的时序数据，因此利用LSTM模型进行工况曲线预测能够得到较理想的结果。在LSTM模型中，新的细胞状态与之前的状态是一个累加过程，能够对汽车的行驶运动学曲线片段进行非线性的拟合，有效的考虑输入数据的时序性，实现时间序列的编码和解码。As can be seen from the technical solution provided by the vehicle driving condition curve construction method of the present invention, due to the complicated speed changes of the vehicle during driving, the time-speed curve generated by the running of the vehicle can contain all the information of the vehicle running: acceleration, Maximum acceleration, maximum deceleration and average acceleration and deceleration, average speed and idle speed range, there are no strict rules to follow, uncertainty, instability, nonlinearity in practice and space, data volume comparison big. The structure of long short-term memory (LSTM) recurrent neural network is complex, and its output information is not only related to the input, but also related to the memory content of the cell unit and the last output result. At the same time, the LSTM model makes up for typical machine learning models such as recurrent neural network Due to the problems of gradient dispersion and gradient explosion in structures such as Recurrent Neural Networks (RNN), and insufficient long-term memory capacity, compared with traditional fitting methods, recurrent neural networks can truly and effectively utilize long-distance timing information. The vehicle operating condition curve is exactly A typical time series data, so using the LSTM model to predict the operating curve can get better results. In the LSTM model, the new cell state and the previous state are an accumulation process, which can nonlinearly fit the driving kinematics curve segment of the car, effectively consider the timing of the input data, and realize the encoding and decoding of time series .

为了使本说明书更加清楚、完整，接下来结合具体数据进行进一步说明。In order to make this description clearer and more complete, further description will be given in combination with specific data.

在一个实施例中，选取2019年“中国研究生数学建模竞赛”D题中的车载 GPS设备采集的原始数据进行分析，数据为某城市轻型汽车实际道路行驶采集的数据，采样频率1Hz，数据包含采集时间(秒)和GPS车速。In one embodiment, the original data collected by the vehicle-mounted GPS device in the 2019 "China Graduate Mathematical Modeling Competition" question D is selected for analysis. The data is the data collected by the actual road driving of a light-duty vehicle in a certain city, and the sampling frequency is 1Hz. The data includes Acquisition time (seconds) and GPS vehicle speed.

对车载GPS设备采集的原始数据进行预处理，剔除不良数据值和无效数据，保留有效数据。Preprocess the raw data collected by the vehicle-mounted GPS device, eliminate bad data values and invalid data, and retain valid data.

车载GPS设备采集的原始数据包括3个文件。根据上述原始数据的预处理方法进行相应处理，得到最终的预处理结果，如下表1所示：文件1经处理后的运动片段记录数为186255，文件2经处理后的运动片段记录数为149032，文件3经处理后的运动片段记录数为170808。The raw data collected by the vehicle GPS device includes 3 files. Perform corresponding processing according to the preprocessing method of the above raw data, and obtain the final preprocessing results, as shown in Table 1 below: the number of motion segment records after processing in file 1 is 186255, and the number of motion segment records after processing in file 2 is 149032 , the number of motion segment records in file 3 after processing is 170808.

表1各文件数据预处理前后的记录数Table 1 The number of records before and after data preprocessing of each file

为了更好的展示预处理结果，选取文件2中根据3个筛选原则的部分处理后数据片段与原始数据片段进行对比，如图4所示，上面3个图是原始数据片段，这3个原始数据片段分别存在数据丢失、加/减速度异常和怠速时间过长的异常片段问题，下面3个图是分别对数据丢失、加/减速度异常和怠速时间过长的原始数据异常片段进行预处理后的结果，可以很明显地看出，预处理后的汽车行驶的速度-时间曲线更接近实际用车的工况曲线。In order to better display the preprocessing results, some processed data fragments in file 2 according to the three screening principles are selected for comparison with the original data fragments, as shown in Figure 4. The above three figures are original data fragments, and the three original The data fragments have problems of data loss, abnormal acceleration/deceleration, and abnormal fragments with excessive idle time. The following three figures are respectively preprocessing the original data abnormal fragments with data loss, abnormal acceleration/deceleration, and excessive idle time. From the final results, it can be clearly seen that the speed-time curve of the pre-processed car is closer to the operating condition curve of the actual car.

将预处理后的数据通过短行程划分得到运动学片段，建立运动学片段的筛选规则：The preprocessed data is divided into short strokes to obtain kinematic segments, and the screening rules for kinematic segments are established:

应用短行程划分方法，如图6所示，采用短行程划分方法在预处理得到的行驶片段数据中找出运动学片段Apply the short-stroke division method, as shown in Figure 6, use the short-stroke division method to find the kinematics segment in the driving segment data obtained by preprocessing

根据运动学片段的特征参数计算公式对运动学片段进行特征计算，得到运动学片段的特征参数。According to the characteristic parameter calculation formula of the kinematic segment, the feature calculation of the kinematic segment is performed to obtain the characteristic parameter of the kinematic segment.

计算出的部分运动学片段特征参数值示例如下表2所示。An example of the calculated characteristic parameter values of some kinematic segments is shown in Table 2 below.

表2运动学片段特征值Table 2 Kinematic segment eigenvalues

根据运动学片段的特征，采用K-Means聚类将运动学片段划分为四个片段库。通过片段的特征矩阵聚类，将原来的386个运动学片段划分为四个片段库，分别是低速区间片段库L、中速区间片段库M、高速区间片段库H和极高速区间片段库EH。为了进一步分析每一类速度区间片段库中运动学片段所代表的车辆行驶特征，从16个特征参数中选取与速度和加速度相关的特征参数对四个类进行相应的计算，得到各类片段库中所有运动学片段的综合特征值，见表3。According to the characteristics of the kinematic fragments, K-Means clustering is used to divide the kinematic fragments into four fragment libraries. The original 386 kinematic segments are divided into four segment libraries through the feature matrix clustering of the segments, namely the low-speed interval segment library L, the medium-speed interval segment library M, the high-speed interval segment library H, and the extremely high-speed interval segment library EH . In order to further analyze the driving characteristics of the vehicle represented by the kinematic fragments in each type of speed interval fragment library, select the characteristic parameters related to speed and acceleration from the 16 characteristic parameters to perform corresponding calculations on the four classes, and obtain various fragment libraries The comprehensive eigenvalues of all kinematic segments in , see Table 3.

由表3可以看出，中速区间片段库M(第一类)中包括203个运动学片段，平均速度为12.15km/h，最大速度不超过106.74km/h；低速区间片段库L(第二类)中包括39个运动学片段，平均速度为6.02km/h，最大速度不超过149.88km/h；极高速区间片段库EH(第三类)包括10个运动学片段，平均速度为37.54km/h，最大速度不超过76.60km/h；高速区间片段库H(第四类)包括134个运动学片段，平均速度为23.88km/h，最大速度不超过139.61km/h。It can be seen from Table 3 that the medium-speed segment library M (the first category) includes 203 kinematic segments, the average speed is 12.15km/h, and the maximum speed does not exceed 106.74km/h; the low-speed segment library L (the first category) Class II) includes 39 kinematic segments with an average speed of 6.02km/h and a maximum speed of no more than 149.88 km/h; the very high-speed interval segment library EH (type III) includes 10 kinematic segments with an average speed of 37.54 km/h, the maximum speed does not exceed 76.60km/h; the high-speed section segment library H (the fourth category) includes 134 kinematic segments, the average speed is 23.88km/h, and the maximum speed does not exceed 139.61km/h.

表3各类片段库的综合特征参数值Table 3 Comprehensive characteristic parameter values of various fragment libraries

特征参数Characteristic Parameters 第一类the first sort 第二类second category 第三类third category 第四类fourth category 片段总数total number of fragments 203203 3939 1010 134134 vmvm 12.1512.15 6.026.02 37.5437.54 23.8823.88 vmaxvmax 106.74106.74 149.88149.88 76.6076.60 139.61139.61 vstdvstd 9.6579.657 14.78614.786 14.02714.027 14.67214.672 amaama 0.59700.5970 0.43560.4356 0.33870.3387 0.48560.4856 amdamd -0.7155-0.7155 -0.5479-0.5479 -0.4549-0.4549 -0.6179-0.6179 astdastd 0.60230.6023 0.55090.5509 0.42360.4236 0.5810 0.5810

典型的LSTM模型的输入维度包括：样本(sample)、时间步(timestep)、前一个网络的特征(feature)，LSTM的输入必须是“序列”，这与我们的要求不谋而合：通过输入速度序列，来预测下一个时刻的速度。The input dimensions of a typical LSTM model include: sample, timestep, and feature of the previous network. The input of LSTM must be a "sequence", which coincides with our requirements: through input Velocity sequence to predict the velocity at the next moment.

(2)对训练数据集的预处理：由于得到的长片段是时间-速度序列，丢弃长片段中的时间维度，则丢弃时间维度后的长片段的实际维度为dim＝(s，1)，s为片段的长度，但是在进行模型训练的时候需要把数据处理为模型能够使用的监督学习的数据。(2) Preprocessing of the training data set: Since the obtained long segment is a time-velocity sequence, the time dimension in the long segment is discarded, then the actual dimension of the long segment after discarding the time dimension is dim=(s, 1), s is the length of the segment, but when performing model training, the data needs to be processed into supervised learning data that the model can use.

处理的办法为：对于长片段的速度序列，设定一个滑动窗口，窗口的长度为时间步的大小，将窗口从起始速度向后滑动，每次滑动步长为1秒，取窗口的下一个值作为当前窗口的处理值，作为该窗口内覆盖的片段x_t的标签h_t，以此类推，得到长度为s-timestep的输入数据集5，长短期记忆神经网络模型的输入数据生成策略如图7所示。The way to deal with it is: for the speed sequence of long segments, set a sliding window, the length of the window is the size of the time step, slide the window backward from the initial speed, each time the sliding step is 1 second, and take the bottom of the window A value is used as the processing value of the current window, as the label h _t of the segment x _t covered in the window, and so on, to obtain the input data set 5 with a length of s-timestep, the input data generation strategy of the long-term short-term memory neural network model As shown in Figure 7.

(3)对训练数据集进行预处理后，将预处理后的训练数据集输入至长短期记忆神经网络模型(LSTM)中进行训练：(3) After preprocessing the training data set, input the preprocessed training data set into the long short-term memory neural network model (LSTM) for training:

数据集X输入模型进行训练，使用的超参为：批量处理尺寸(batch-size) 等于5，时间步长等于300秒，模型的具体结构为：一层LSTM，内部包含3个隐含层，使用tanh作为激活函数，使用一个全连接层处理所有的LSTM层的输出，全连接层的输出维度为1，针对该当前输入模型的序列速度数据，预测下一个时刻(秒)的速度。The data set X is input to the model for training. The hyperparameters used are: the batch size (batch-size) is equal to 5, and the time step is equal to 300 seconds. The specific structure of the model is: a layer of LSTM, which contains 3 hidden layers. Use tanh as the activation function, use a fully connected layer to process the output of all LSTM layers, the output dimension of the fully connected layer is 1, and predict the speed at the next moment (second) for the sequence speed data of the current input model.

(4)使用模型进行片段的预测：(4) Use the model to predict fragments:

模型的输入为对应长度为len的时间段序列，预测出对应的速度序列Y_len，将训练数据集的最后一个样本数据作为种子输入，产生第一个预测值，同时删除输入的第一个元素，并将第一个预测值作为其后一个元素的输入，模型输出第二个预测值，以此类推得到所有的预测序列Y，对四个片段库分别进行预测后，得到四个片段库分别对应的时间-速度预测曲线，再将四个片段库分别对应的时间-速度预测曲线进行拼接，进而得到整个完整的工况数据，表4给出了模型训练过程的损失函数相关数据。The input of the model is a sequence of time periods corresponding to a length of len, and the corresponding velocity sequence Y _len is predicted, and the last sample data of the training data set is used as a seed input to generate the first predicted value, and delete the first element of the input at the same time , and take the first predicted value as the input of the next element, the model outputs the second predicted value, and so on to get all the predicted sequences Y. After predicting the four fragment libraries respectively, the four fragment libraries are obtained respectively The corresponding time-speed prediction curves, and then the time-speed prediction curves corresponding to the four fragment libraries are spliced together to obtain the entire complete working condition data. Table 4 shows the loss function related data during the model training process.

表4模型训练过程的损失函数相关数据Table 4. Loss function related data in the model training process

使用训练好的长短期记忆神经网络模型做预测，最终得到四个运动学片段库分别对应的时间-速度预测曲线，图8所示为中速段运动学片段训练和预测结果的示意图，其中符号“+”数据点部分(附图8中的“True”曲线)是训练数据，实心曲线(附图8中的“Train”曲线)为模型训练之后再训练数据集上的预测结果，显示预测结果与训练数据重合度较高，符号“▽”数据点部分(附图8中的“Predict”曲线)测出的工况曲线，图9所示为高速段运动学片段训练和预测结果的示意图，其中符号“+”数据点部分是训练数据，实心曲线为模型训练之后再训练数据集上的预测结果，显示预测结果与训练数据重合度较高，符号“▽”数据点部分测出的工况曲线，从图8-9可以看出LSTM模型的预测结果重合度较好，用于工况曲线的构建能够得出较为精确的结果。Use the trained long-short-term memory neural network model to make predictions, and finally obtain the time-speed prediction curves corresponding to the four kinematics segment libraries. Figure 8 shows the schematic diagram of the training and prediction results of the medium-speed segment kinematics segments, where symbols The "+" data point part (the "True" curve in Figure 8) is the training data, and the solid curve (the "Train" curve in Figure 8) is the prediction result on the training data set after model training, showing the prediction result The degree of coincidence with the training data is relatively high, and the operating condition curve measured by the part of the data point of the symbol "▽" (the "Predict" curve in Figure 8) is shown in Figure 9. Among them, the symbol "+" data point is the training data, and the solid curve is the prediction result on the training data set after model training, showing that the prediction result has a high degree of coincidence with the training data, and the working condition measured by the symbol "▽" data point From Figure 8-9, it can be seen that the prediction results of the LSTM model have a good coincidence degree, and the construction of the working condition curve can obtain more accurate results.

得到四个运动学片段库分别对应的时间-速度预测曲线之后，根据各个类别在整个运动学片段中所占的时间比例，确定各类片段在最终工况合成中所占的时间，依据LSTM模型训练得到的预测结果，将四个速度段的曲线合并为一条工况曲线，如附图10所示。After obtaining the time-velocity prediction curves corresponding to the four kinematic fragment libraries, according to the time ratio of each category in the entire kinematic fragment, determine the time occupied by each type of fragment in the final working condition synthesis, according to the LSTM model For the prediction results obtained from training, the curves of the four speed segments are merged into one operating condition curve, as shown in Figure 10.

得到工况曲线后，将所述工况曲线发送给控制设备，控制设备控制被测试车辆根据工况曲线进行模拟行驶，得到被测试车辆在行驶过程的油耗数据和尾气排放等数据，根据得到的油耗数据和尾气排放等数据进行车辆排放评估和环保等级评定。After the working condition curve is obtained, the working condition curve is sent to the control device, and the control device controls the tested vehicle to simulate driving according to the working condition curve, and obtains the fuel consumption data and exhaust emission data of the tested vehicle during driving, and according to the obtained Fuel consumption data and exhaust emission data are used for vehicle emission assessment and environmental protection level assessment.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”、“包含”、“具有”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，这里使用的措辞“和/或”包括一个或更多个相关联的列出项的任一单元和全部组合。Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the words "comprising", "comprising", and "having" used in the description of the present invention refer to the existence of said features, integers, steps, operations, elements and/or components, but do not exclude the existence or Add one or more other characteristics, integers, steps, operations, elements, components and/or groups thereof. It should be understood that the term "and/or" used herein includes any and all combinations of one or more of the associated listed items.

需要说明的是，本领域普通技术人员可以理解实现上述方法实施例中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法实施例的流程。其中，所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(RandomAccess Memory，RAM)等。以上所述仅是本申请的具体实施方式，应当指出，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。It should be noted that those of ordinary skill in the art can understand that all or part of the processes in the above method embodiments can be implemented by instructing related hardware through computer programs, and the programs can be stored in a computer-readable memory In the medium, when the program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), and the like. The above descriptions are only the specific implementation methods of the present application. It should be pointed out that those skilled in the art can understand that various changes and modifications can be made to these embodiments without departing from the principle and spirit of the present invention. , alternatives and modifications, the scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. A method for building automobile driving conditions, is characterized in that, comprises the following steps:

Obtain the original GPS data of the car, and preprocess the original GPS data of the car; the preprocessing of the original GPS data of the car includes:

Traversing and searching the original GPS data of the car from the beginning, looking for the first time breakpoint, and dividing the original GPS data into different driving segments from the first time breakpoint;

Judging whether there is a second time breakpoint in the obtained driving segment, if there is a second time breakpoint, a series of new speed data points are fitted by an improved polynomial fitting method based on the speed data before and after the second time breakpoint , to supplement the second time breakpoint inside the driving segment;

After the data fitting is completed, the acceleration of each time point of each driving segment is calculated, and the driving segment with abnormal acceleration is eliminated from the data according to the abnormal acceleration screening rules;

For the abnormal data of long-term idle speed greater than 180 seconds, use a sliding window with a size of 180 to slide the time and vehicle speed of each segment, and the sliding step is 1s. During the window sliding process, if all the data in the window are For idle data, filter out the first piece of data in the window; when the end of the window slides to the end of the driving segment, if all the data in the window are idle data, all the data in the window will be filtered out, and so on Screen out data for all driving segments to obtain preprocessed data;

Use the short-stroke division method to divide the preprocessed data into kinematic segments, including: first judge whether the driving time of each driving segment is greater than 20s, if it is less than 20s, then remove the driving segment; if it is greater than 20s, then according to The search rule for the kinematics segment is to find the kinematics segment from the driving segment;

The search rules for the kinematics segment include:

(1) From the starting time of the driving segment, look for the first point where the GPS vehicle speed is 0, that is, the starting point of idling speed. If the starting point of idling speed is found, record the position of the starting point of idling speed; then continue to find the first GPS point downward The point at which the vehicle speed is not 0, that is, the middle point, records the position of the middle point;

(2) Calculate the time difference from the middle point to the start point of idle speed. If the time difference is greater than 20s, move the position of the start point of idle speed down for 20s, and then judge the time difference from the middle point to the start point of idle speed until the time difference is less than 20s; find the next GPS vehicle speed of 0, that is, the idling end point of the kinematics segment, record the position of the idling end point;

(3) Screen the kinematics segment according to the kinematics segment screening rule, if the kinematics segment screening rule is met, then extract the kinematics segment from the driving segment according to the recorded positions of the idle start point and the idle end point;

The kinematic fragment screening rules include:

(1) The duration of the kinematic segment is not less than 20 seconds, that is, the time from one idle start point to the next idle start point is at least 20 seconds;

(2) The kinematics segment contains at least one acceleration state and one deceleration state. Therefore, the kinematics segment must at least have a continuous segment satisfying that the acceleration of the vehicle is greater than 0.1m/s ² and the deceleration is less than -0.1m/s ² ;

(3) The idling duration of the kinematics segment shall not exceed 20 seconds;

Carry out feature calculation on the kinematics segment to obtain the characteristic parameters of the kinematics segment, and use the principal component analysis method to filter irrelevant features to obtain effective feature parameters;

K-Means clustering is used to divide the kinematics fragments into four fragment libraries, namely: low-speed interval fragment library, medium-speed interval fragment library, high-number interval fragment library and extremely high-speed interval fragment library;

Construct the training data set: splice all the kinematic fragments in each fragment library to obtain four long fragments, and use the four long fragments as the training data set;

Inputting the above-mentioned training data set into the long-short-term memory neural network model for training, and obtaining the trained long-short-term memory neural network model;

Use the trained long-short-term memory neural network model to predict and obtain the time-speed prediction curves corresponding to the four fragment libraries. The specific process includes: taking the last sample data of the training data set as the first input element and inputting it into the training In a good long-short-term memory neural network model, the first prediction sequence is output; the first input element is deleted, and the first prediction value is used as the second input element, and the input model obtains the second prediction sequence; and so on, a fragment library is finally obtained The predicted sequences of the four fragment libraries are respectively corresponding to the time-speed prediction curves;

After obtaining the time-velocity prediction curves corresponding to the four fragment libraries, according to the time ratios of the four fragment libraries in the entire kinematics fragment, determine the time occupied by the four types of fragment libraries in the final working condition synthesis, Merge the curves of four speed sections into one working condition curve;

The working condition curve is sent to the control device, and the control device evaluates the exhaust emission of the vehicle and evaluates the environmental protection level according to the working condition curve.

2. A kind of method of building automobile running condition according to claim 1, it is characterized in that, the characteristic parameter of kinematics segment comprises time characteristic parameter, speed characteristic parameter and acceleration characteristic parameter, wherein, time characteristic parameter comprises: running Time t(s), constant speed time t _i (s), idle time t _c (s), acceleration time t _a (s), deceleration time t _d (s); speed characteristic parameters include: average speed v _m (km/ h), average driving speed v _mr (km/h), maximum speed v _max (km/h), speed standard deviation v _std (km/h); acceleration characteristic parameters include: average acceleration a _ma (m/s ² ) , average deceleration a _md (m/s ² ), acceleration standard deviation a _std (m/s ² ), constant speed time ratio P _c (%), idle time ratio P _i (%), acceleration time ratio P _a (% ), deceleration time ratio P _d (%).

3. A kind of method of building automobile running condition according to claim 1, is characterized in that, adopts K-Means clustering to divide kinematics segment into four segment libraries and specifically comprises the following steps:

S41. First randomly select 4 kinematic segments from all kinematic segments as initial clustering centers;

S42. Perform cluster assignment operation: calculate the Euclidean distance from each kinematic segment to the four initial cluster centers, classify according to the Euclidean distance between the kinematic segment and the initial cluster center, and assign each kinematic segment to The initial cluster center with the closest Euclidean distance forms 4 clusters;

S43. After obtaining 4 clusters, recalculate the clustering center of each cluster, and perform step S42 until the composition of kinematic segments of each cluster does not change, and finally obtain four segment libraries of kinematic segments.

4. A method of constructing a vehicle driving condition according to claim 3, wherein the calculation method of the Euclidean distance from the kinematics segment to the cluster center comprises:

Among them, d _ij is the Euclidean distance from the i-th kinematics segment to the cluster center j, and x′ _im is the m-th feature element of the i-th kinematics segment,

is the mth feature element of the cluster center j.

5. A kind of method of constructing automobile running condition according to claim 1, is characterized in that, the structure of described long short-term memory neural network model comprises: input layer, LSTM layer, fully connected layer and output layer.

6. A kind of method of building automobile running conditions according to claim 1, is characterized in that, training data set needs to carry out pretreatment earlier before inputting model, and the preprocessing to training data set comprises:

Discard the time dimension of long fragments and retain the speed dimension of long fragments;

Set a sliding window, the length of the window is the size of the time step, slide the window backward from the starting position on the long segment, each sliding step is 1 second, and take the speed-time series of the area covered by the window at the current moment The segment is used as the velocity-time series at the current moment, and the initial velocity value of the area covered by the window at the next moment is taken as the label of the area covered by the window at the current moment;

By analogy, the four long fragments are preprocessed respectively, and four long fragment preprocessed D _L , D _M , D _H , D _EH training data sets are obtained.