CN112590774B

CN112590774B - A deep reinforcement learning-based drift storage control method for smart electric vehicles

Info

Publication number: CN112590774B
Application number: CN202011530836.4A
Authority: CN
Inventors: 冷搏; 刘铭; 熊璐; 余卓平
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-02-18
Anticipated expiration: 2040-12-22
Also published as: CN112590774A

Abstract

The invention relates to a deep reinforcement learning-based drift storage control method for an intelligent electric vehicle, comprising the following steps: 1) constructing a vehicle dynamics model for deep reinforcement learning and a tire model under the tire force saturation condition; 2) using The TD3 algorithm for drift warehousing control realizes the drift warehousing of intelligent electric vehicles. Compared with the prior art, the present invention has high control precision and good robustness, and can make the vehicle accurately complete the drifting and warehousing action, and during the drifting process, the vehicle can accurately reach the warehouse position by continuously adjusting the steering wheel angle, and when the vehicle During the drifting process, the center position of the storage location can be changed actively, so that the vehicle can drift to the updated storage location.

Description

A deep reinforcement learning-based drift storage control method for smart electric vehicles

技术领域technical field

本发明涉及汽车入库控制领域，尤其是涉及一种基于深度强化学习的智能电动汽车漂移入库控制方法。The invention relates to the field of vehicle storage control, in particular to a deep reinforcement learning-based drift storage control method for an intelligent electric vehicle.

背景技术Background technique

车辆持续保持在后轮轮胎力饱和、后轴侧滑的状态下行驶，称为漂移，存在两种不同的漂移状态：The vehicle continues to drive in a state where the rear tires are saturated and the rear axle slips, which is called drift. There are two different drift states:

(1)后轴驱动、后轮滑转，此时可以通过控制后轴驱动力与前轮转向角时车辆质心侧偏角和车速保持在一恒定值，使车辆处于稳定状态，由于市面上绝大多数汽车为前轴驱动，故该状态下的漂移动作研究价值相对较小。(1) The rear axle is driven and the rear wheel is slipping. At this time, the vehicle mass center slip angle and vehicle speed can be kept at a constant value by controlling the rear axle driving force and the steering angle of the front wheel, so that the vehicle is in a stable state. Most cars are driven by the front axle, so the research value of drift action in this state is relatively small.

(2)按照开环控制律复现漂移动作可能受到外界环境和自车状态的干扰，使车辆无法漂移停入库位，例如，由于库位接近过程存在侧向位移误差和航向角误差，车辆在触发漂移动作时未完全满足预设的漂移触发位姿状态，存在一定偏差，根据开环控制器完成漂移动作会将该偏差保留至漂移结束；另外，由于底层执行器响应限制，开环控制下无法保证每一次执行器响应一致，当响应出现偏差时车辆会偏移预设的漂移轨迹；路面不均一造成漂移过程中轮胎力的突变，使漂移路径发生改变。(2) Reproducing the drift action according to the open-loop control law may be disturbed by the external environment and the state of the vehicle, so that the vehicle cannot drift into the storage space. When the drift action is triggered, the preset drift trigger pose state is not fully satisfied, and there is a certain deviation. According to the open-loop controller to complete the drift action, the deviation will be retained until the end of the drift. In addition, due to the response limit of the underlying actuator, the open-loop control Under this circumstance, it is impossible to guarantee the consistent response of each actuator. When the response deviates, the vehicle will deviate from the preset drift trajectory; uneven road surface causes sudden changes in tire force during the drift process, which changes the drift path.

发明内容SUMMARY OF THE INVENTION

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于深度强化学习的智能电动汽车漂移入库控制方法，本发明基于深度强化学习的无人驾驶汽车漂移入库动作的研究与实现，设计漂移控制器，根据车辆与库位间的相对位置和车辆状态参数调整方向盘转角，使车辆漂移停入库位。The purpose of the present invention is to provide a kind of intelligent electric vehicle drift storage control method based on deep reinforcement learning in order to overcome the above-mentioned defects in the prior art. Implement, design a drift controller, adjust the steering wheel angle according to the relative position between the vehicle and the storage space and vehicle state parameters, so that the vehicle drifts into the storage space.

本发明的目的可以通过以下技术方案来实现：The object of the present invention can be realized through the following technical solutions:

1.一种基于深度强化学习的智能电动汽车漂移入库控制方法，其特征在于，包括以下步骤：1. a kind of intelligent electric vehicle drift storage control method based on deep reinforcement learning, is characterized in that, comprises the following steps:

1)构建用于深度强化学习的车辆动力学模型以及轮胎力饱和工况下的轮胎模型；1) Build a vehicle dynamics model for deep reinforcement learning and a tire model under tire force saturation conditions;

2)采用面向漂移入库控制的TD3算法实现智能电动汽车漂移入库。2) The TD3 algorithm for drift warehousing control is used to realize the drift warehousing of intelligent electric vehicles.

所述的步骤1)中，车辆动力学模型具体为考虑前后与左右载荷转移的四轮三自由度车辆动力学模型，所述的三自由度包括车辆质心处速度v_m、质心侧偏角β和横摆角速度ω。In the described step 1), the vehicle dynamics model is specifically a four-wheel, three-degree-of-freedom vehicle dynamics _model that considers front-rear and left-right load transfer. and the yaw angular velocity ω.

四轮三自由度车辆动力学模型中，考虑纵侧向加速度的四轮垂向力的表达式为：In the four-wheel three-degree-of-freedom vehicle dynamics model, the expression of the four-wheel vertical force considering the longitudinal and lateral acceleration is:

式中，h_m为质心高度，b_f、b_r为前、后轮距，a_x、a_y为质心处不考虑车身旋转影响的纵、侧向加速度，F_zFL、F_zFR、F_zRL、F_zRR分别为左前、右前、左后、右后车轮的垂向力，m为电动汽车质量，g为重力加速度，l为轴距，l_f、l_r为前、后轴到质心的距离，F_xFL、F_xFR、F_xRL、F_xRR分别为左前、右前、左后、右后车轮的纵向力，F_yFL、F_yFR、F_yRL、F_yRR分别为左前、右前、左后、右后车轮的侧向力，δ为前轮转角。In the formula, h _m is the height of the center of mass, b _f and _br are the front and rear wheelbases, a _x and a _y are the longitudinal and lateral accelerations at the center of mass without considering the influence of body rotation, F _zFL , F _zFR , F _zRL , F _zRR are the vertical forces of the left front, right front, left rear, and right rear wheels, respectively, m is the mass of the electric vehicle, g is the acceleration of gravity, l is the wheelbase, l _f , l _r are the distances from the front and rear axles to the center of mass, F _xFL , F _xFR , F _xRL , and F _xRR are the longitudinal forces of the left front, right front, left rear, and right rear wheels, respectively, and F _yFL , F _yFR , F _yRL , and F _yRR are the left front, right front, left rear, and right rear wheels, respectively The lateral force, δ is the front wheel angle.

在漂移过程中，考虑到载荷转移过大导致某一个车轮离地，出现使得该车轮的垂向载荷降为0、载荷转移达到上限的情况，当方向盘向左转漂移，载荷向右侧转移，左后轮离地时，则左后轮的垂向力为0，此时，根据纵侧向加速度、轴距和轮距将过多转移的载荷重新分配至左前轮和右后轮，则有：During the drifting process, considering that a certain wheel is lifted off the ground due to the excessive load transfer, the vertical load of the wheel is reduced to 0 and the load transfer reaches the upper limit. When the steering wheel is turned to the left and the load is transferred to the right, When the left rear wheel is off the ground, the vertical force of the left rear wheel is 0. At this time, the excessively transferred load is redistributed to the left front wheel and the right rear wheel according to the longitudinal and lateral acceleration, wheelbase and wheel distance, then Have:

ΔF_trans＝|F_zRL|ΔF _trans = |F _zRL |

F′_zRL＝0F' _zRL = 0

其中，ΔF_trans为过多转移的载荷，F′_zRL为分配后左后轮的垂向力，F′_zRR为分配后右后轮的垂向力，F′_zFL为分配后左前轮的垂向力。Among them, ΔF _trans is the excessively transferred load, F′ _zRL is the vertical force of the left rear wheel after distribution, F′ _zRR is the vertical force of the right rear wheel after distribution, and F′ _zFL is the vertical force of the left front wheel after distribution to force.

对考虑前后与左右载荷转移的四轮三自由度车辆动力学模型进行受力分析，得到车辆动力学平衡方程为：The force analysis of the four-wheel three-degree-of-freedom vehicle dynamics model considering the load transfer between front and rear and left and right is carried out, and the vehicle dynamic balance equation is obtained as:

φ＝β+ψφ=β+ψ

据此计算得到车辆纵向车速v_mx和侧向车速v_my，则有：According to this calculation, the longitudinal vehicle speed v _mx and the lateral vehicle speed v _my are obtained, then:

v_mx＝v_m·cosβv _mx = v _m ·cosβ

v_my＝v_m·sinβv _my = v _m ·sinβ

其中，

为车辆质心处速度的变化率，

为质心侧偏角速度，φ为质心处车速全局方位角，

为质心处车速全局方位角速度，

为横摆角速度的变化率，ψ为车头全局方位角，I_z为横摆转动惯量，v_x为车辆纵向车速，v_y为车辆侧向车速。in,

is the rate of change of the velocity at the center of mass of the vehicle,

is the side-slip angular velocity of the centroid, φ is the global azimuth of the vehicle speed at the centroid,

is the global azimuth velocity of the vehicle speed at the center of mass,

is the rate of change of the yaw rate, ψ is the global azimuth angle of the front of the vehicle, I _z is the yaw moment of inertia, v _x is the longitudinal speed of the vehicle, and v _y is the lateral speed of the vehicle.

所述的步骤1)中，用于深度强化学习训练的轮胎模型包括前轮轮胎力模型和后轮轮胎力模型。In the step 1), the tire model used for deep reinforcement learning training includes a front tire force model and a rear tire force model.

对于后轮轮胎力模型，在漂移过程中，后轮制动抱死并在路面上纯摩擦，后轮的轮胎力方向与车轮轮心瞬时速度的方向相反，通过对后轮进行受力分析得到后轮纵侧向轮胎力分量的表达式为：For the tire force model of the rear wheel, during the drifting process, the rear wheel brake locks and there is pure friction on the road surface, and the direction of the tire force of the rear wheel is opposite to the direction of the instantaneous speed of the wheel center. The expression of the longitudinal and lateral tire force components of the rear wheels is:

对于左后轮：For the left rear wheel:

对于右后轮：For the right rear wheel:

F_{r_sat}＝μ₁F_z F _{r_sat} = μ ₁ F _z

其中，v_xRL、v_yRL分别为左后轮轮心处纵、侧向速度，v_xRR、v_yRR分别为右后轮轮心处纵、侧向速度，λ_L、λ_R分别为左、右后轮轮心侧偏角，F_xRL、F_yRL分别为左后轮纵、侧向力，F_xRR、F_yRR分别为右后轮纵、侧向力，F_{rRL_sat}、F_{rRR_sat}分别为左、右后轮水平饱和轮胎力，F_{r_sat}表示对应车轮水平饱和轮胎力，μ₁为车轮抱死时路面利用附着系数，F_z表示对应车轮的垂向力。Among them, v _xRL and v _yRL are the longitudinal and lateral speeds at the center of the left rear wheel, respectively, v _xRR and _vyRR are the longitudinal and lateral speeds at the center of the right rear wheel, respectively, and λ _L and λ _R are the left and right velocities, respectively. Rear wheel center slip angle, F _xRL and F _yRL are the longitudinal and lateral forces of the left rear wheel, respectively, F _xRR and F _yRR are the longitudinal and lateral forces of the right rear wheel, respectively, and F _{rRL_sat} and F _{rRR_sat} are the left and right forces, respectively The horizontal saturated tire force of the rear wheel, F _{r_sat} represents the horizontal saturated tire force of the corresponding wheel, μ ₁ is the road surface utilization adhesion coefficient when the wheel is locked, and F _z represents the vertical force of the corresponding wheel.

对于前轮轮胎力模型，在漂移过程中，前轮轮胎力尚未饱和，则采用改进Burckhardt轮胎模型对轮胎力进行拟合，用以表述侧向力与侧偏角的关系，则有：For the front tire force model, during the drifting process, the front tire force is not saturated, then the improved Burckhardt tire model is used to fit the tire force to express the relationship between the lateral force and the slip angle, as follows:

其中，θ₁～θ₅为拟合参数，α为前轮侧偏角；Among them, θ ₁ to θ ₅ are fitting parameters, and α is the front wheel slip angle;

左轮侧偏角α_L和右轮侧偏角α_R可通过以下公式求得：The left wheel slip angle α _L and the right wheel slip angle α _R can be obtained by the following formulas:

由于前轮未施加制动力和驱动力，处于自由滚动状态，有F_xFL＝0，F_xFR＝0，在确定前轮轮胎力方向时仅考虑侧向力，则前轮轮胎力方向垂直于轮胎平面，由前轮转向角决定。Since the front wheel does not apply braking force and driving force, it is in a free rolling state, with F _xFL = 0, F _xFR = 0, only the lateral force is considered when determining the tire force direction of the front wheel, then the tire force direction of the front wheel is perpendicular to the tire The plane is determined by the steering angle of the front wheels.

所述的步骤2)具体包括以下步骤：Described step 2) specifically comprises the following steps:

21)设计面向漂移入库控制的TD3算法，构建Actor网络和Critic网络，具体为：21) Design TD3 algorithm for drift warehousing control, build Actor network and Critic network, specifically:

Critic网络和Actor网络均为由全连接层组成的BP神经网络，Critic网络的输入为车辆状态和动作，输出为Q值，Actor网络的输入为车辆状态，输出为动作，所述的车辆状态为表征漂移过程车辆状态的参数，包括以车辆质心为原点，车头朝向为y轴正方向的相对坐标系下库位坐标(e_x、e_y)和库位朝向

车辆质心处速度v_m、质心侧偏角β以及横摆角速度ω，所述的动作为方向盘转角；Both the Critic network and the Actor network are BP neural networks composed of fully connected layers. The input of the Critic network is the vehicle state and action, and the output is the Q value. The input of the Actor network is the vehicle state and the output is the action. The vehicle state is The parameters that characterize the state of the vehicle during the drift process, including the location coordinates (e _x , e _y ) and the location orientation in the relative coordinate system with the center of mass of the vehicle as the origin and the vehicle's head orientation as the positive direction of the y-axis

the velocity _vm at the center of mass of the vehicle, the side-slip angle β of the center of mass and the yaw angular velocity ω, the actions are the steering wheel angle;

22)构建奖励函数r(k)，则有：22) Construct the reward function r(k), then there are:

其中，w_x、w_y、

分别为e_x、e_y和

的权重，k为时间；Among them, w _x , w _y ,

are e _x , e _y and

The weight of , k is time;

23)对Actor网络和Critic网络进行训练，并据此完成智能电动汽车漂移入库。23) Train the Actor network and the Critic network, and complete the drift storage of smart electric vehicles accordingly.

在步骤23)中，对Actor网络和Critic网络进行训练前，先确定漂移入库控制器的边界，根据该边界对每次车辆漂移的目标库位位置进行随机取值，在迭代训练中，车辆以随机选取的目标库位位置和朝向计算车辆状态，并据此对Critic网络和Actor网络进行训练，通过在训练过程中随机更新目标库位位置，拓展训练数据集，提升化能力。In step 23), before the Actor network and the Critic network are trained, the boundary of the drift storage controller is determined first, and the target location of each vehicle drift is randomly selected according to the boundary. In the iterative training, the vehicle The vehicle status is calculated with the randomly selected target location and orientation, and the Critic network and the Actor network are trained accordingly. By randomly updating the target location during the training process, the training data set is expanded and the ability to improve.

与现有技术相比，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

一、基于深度强化学习TD3算法设计了一种智能电动汽车漂移入库的控制方法，提高了控制精度，克服了由于路面不均匀造成的漂移入库存在误差的问题，也可以改变库位中心点，使车辆向更新后的库位位置移动，提高了控制系统的鲁棒性。1. Based on the deep reinforcement learning TD3 algorithm, a control method of intelligent electric vehicle drift storage is designed, which improves the control accuracy, overcomes the problem of drift storage error caused by uneven road surface, and can also change the center point of the storage location. , so that the vehicle moves to the updated warehouse position, which improves the robustness of the control system.

二、漂移入库的过程中可以通过不断调整方向盘角度使车辆调整位姿，使车辆准确的漂移入库。2. During the process of drifting into the warehouse, the vehicle can adjust its posture by continuously adjusting the angle of the steering wheel, so that the vehicle can accurately drift into the warehouse.

附图说明Description of drawings

图1为本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

图2为漂移过程部分状态参数定义示意图。Figure 2 is a schematic diagram of the definition of some state parameters in the drift process.

图3为基于深度强化学习的漂移控制算法流程。Figure 3 shows the flow of the drift control algorithm based on deep reinforcement learning.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

如图1所示，本发明提供一种基于深度强化学习的智能电动汽车漂移入库控制方法，包括以下步骤：As shown in FIG. 1 , the present invention provides a deep reinforcement learning-based intelligent electric vehicle drift storage control method, which includes the following steps:

1)搭建用于深度强化学习训练的车辆动力学模型和轮胎模型，具体包括以下步骤：1) Build a vehicle dynamics model and a tire model for deep reinforcement learning training, including the following steps:

11)搭建用于深度强化学习的车辆动力学模型11) Building a vehicle dynamics model for deep reinforcement learning

考虑前后与左右载荷转移的四轮三自由度车辆动力学模型，三个自由度分别为车辆质心处速度大小v_m，质心侧偏角大小β，横摆角速度ω；A four-wheel three-degree-of-freedom vehicle dynamics model considering the load transfer between front and rear and left and right, the three degrees of freedom are respectively the velocity of the vehicle's center of mass v _m , the center of mass slip angle β, and the yaw angular velocity ω;

由于漂移过程车辆纵侧向加速度都很大，必须考虑车辆前后和左右载荷转移对轮胎垂向力的影响。考虑纵侧向加速度的四轮垂向力计算公式如式(1)：Since the longitudinal and lateral accelerations of the vehicle are large during the drifting process, the influence of the vehicle's front and rear and left and right load transfer on the vertical force of the tire must be considered. The four-wheel vertical force calculation formula considering the longitudinal and lateral acceleration is as formula (1):

式中，h_m为质心高度，b_f、b_r为前后轮距，a_x、a_y为质心处不考虑车身旋转影响的纵侧向加速度，由式(2)求得：In the formula, h _m is the height of the center of mass, b _f and _br are the front and rear wheelbases, and a _x and a _y are the longitudinal and lateral accelerations at the center of mass without considering the influence of the body rotation, which can be obtained from formula (2):

在漂移过程中，需要考虑载荷转移过大导致某一个车轮离地，使对应轮的垂向载荷降为0、载荷转移达到上限的情况。由于是甩尾制动过程，载荷向前轴转移，因此仅考虑后轮离地的可能性。假设方向盘向左转漂移，载荷向右侧转移，则左后轮存在离地可能。当根据公式计算得到F_zRL＜0时，令该轮垂向力为0，且根据纵侧向加速度与轴距、轮距将过多转移的载荷重新分配至左前轮和右后轮，公式表达为：During the drifting process, it is necessary to consider the situation that a certain wheel is lifted off the ground due to the excessive load transfer, so that the vertical load of the corresponding wheel is reduced to 0, and the load transfer reaches the upper limit. Since it is a tail-flick braking process, the load is transferred to the front axle, so only the possibility of the rear wheel getting off the ground is considered. Assuming that the steering wheel drifts to the left and the load shifts to the right, the left rear wheel may leave the ground. When F _zRL <0 calculated according to the formula, let the vertical force of the wheel be 0, and redistribute the excessively transferred load to the left front wheel and the right rear wheel according to the longitudinal and lateral acceleration, wheelbase and wheelbase, the formula Expressed as:

对车辆模型进行受力分析，得到车辆动力学平衡方程为：The force analysis of the vehicle model is carried out, and the dynamic balance equation of the vehicle is obtained as:

式中，δ为前轮转角；φ为质心处车速全局方位角，

为速度方向变化率；ψ为车头全局方位角，

为车辆横摆角速度，即车头方向变化率；根据φ＝β+ψ和对上式积分，即可得到各时刻车辆的质心速度v_m、质心侧偏角β和横摆角速度ω，再根据式(8)求得车辆纵侧向车速：In the formula, δ is the front wheel rotation angle; φ is the global azimuth angle of the vehicle speed at the center of mass,

is the rate of change of the speed direction; ψ is the global azimuth of the head of the vehicle,

is the yaw rate of the vehicle, that is, the rate of change of the head direction; according to φ=β+ψ and the integral of the above formula, the center of mass velocity v _m , the side-slip angle of the center of mass β and the yaw rate ω of the vehicle at each moment can be obtained, and then according to the formula (8) Obtain the longitudinal and lateral speed of the vehicle:

12)搭建用于深度强化学习在轮胎力饱和工况下的轮胎模型12) Build a tire model for deep reinforcement learning under tire force saturation conditions

与常规工况的行驶条件不同，漂移时后轮轮胎力饱和，车身侧向速度与质心侧偏角大，且纵侧向车速均处于急剧变化的状态，因而此时车辆系统是一个强非线性、纵侧高度耦合的时变系统，则车辆实时饱和轮胎力由式(9)求得：Different from the normal driving conditions, when drifting, the tire force of the rear wheel is saturated, the lateral speed of the vehicle body and the side-slip angle of the center of mass are large, and the longitudinal and lateral vehicle speeds are in a state of rapid change, so the vehicle system is a strong nonlinear at this time. , the time-varying system of the longitudinal side height coupling, the real-time saturated tire force of the vehicle can be obtained by formula (9):

F_{r_sat}＝μ₁F_z (9)F _{r_sat} = μ ₁ F _z (9)

式中，

为水平轮胎力合力，μ₁为车轮抱死时路面利用附着系数，μ₁＝0.9μ_max，即为0.9倍的峰值附着系数，峰值附着系数为1。In the formula,

is the resultant horizontal tire force, μ ₁ is the road surface adhesion coefficient when the wheel is locked, μ ₁ =0.9 μ _max , which is 0.9 times the peak adhesion coefficient, and the peak adhesion coefficient is 1.

121)后轮轮胎力模型121) Rear tire force model

在后轴抱死制动的过程中，轮胎力饱和，无论侧偏角大小如何变化，纵侧向力合力大小不变，说明求漂移过程后轮水平轮胎力时可忽略侧偏角变化，可直接求漂移状态下后轴轮胎力大小。In the process of rear axle locking and braking, the tire force is saturated. No matter how the sideslip angle changes, the resultant force of the longitudinal and lateral forces remains unchanged, which means that the change of the sideslip angle can be ignored when calculating the horizontal tire force of the rear wheel during the drifting process. Directly find the tire force of the rear axle in the drift state.

由于后轮制动抱死，车轮在路面上纯摩擦，因而轮胎力方向由轮心速度方向决定，即轮胎力方向与车轮轮心瞬时速度的方向相反。对漂移过程后轮进行受力分析，可得到后轮纵侧向轮胎力分量的表达式：Since the rear wheel is locked, the wheels are purely rubbed on the road surface, so the direction of the tire force is determined by the direction of the wheel center speed, that is, the direction of the tire force is opposite to the direction of the instantaneous speed of the wheel center. By analyzing the force of the rear wheel during the drift process, the expression of the longitudinal and lateral tire force components of the rear wheel can be obtained:

左后轮：Left rear wheel:

右后轮：Right rear wheel:

式中，v_xRL、v_yRL分别为左后轮轮心处纵、侧向速度，v_xRR,v_yRR分别为右后轮轮心处纵、侧向速度；λ_L、λ_R分别为左、右后轮轮心侧偏角；F_xRL、F_yRL分别为左后轮纵、侧向力，F_xRR、F_yRR分别为右后轮纵、侧向力；F_{rRL_sat}、F_{rRR_sat}分别为左、右后轮水平饱和轮胎力。In the formula, v _xRL and v _yRL are the longitudinal and lateral speeds at the wheel center of the left rear wheel, respectively, v _xRR , v _yRR are the longitudinal and lateral speeds at the wheel center of the right rear wheel, respectively; λ _L , λ _R are the left, Right rear wheel center slip angle; F _xRL and F _yRL are the longitudinal and lateral forces of the left rear wheel, respectively, F _xRR and F _yRR are the longitudinal and lateral forces of the right rear wheel, respectively; F _{rRL_sat} and F _{rRR_sat} are the left, The right rear wheel level saturated tire force.

122)前轮轮胎力模型122) Front tire force model

前轮轮胎力尚未饱和，将其纵侧解耦，采用适用于准静态工况的轮胎模型求轮胎侧向力。采用改进Burckhardt轮胎模型对轮胎力进行拟合，表达侧向力与侧偏角的关系，则有：The tire force of the front wheel is not saturated, so the longitudinal side is decoupled, and the tire lateral force is obtained by using a tire model suitable for quasi-static conditions. The tire force is fitted by the improved Burckhardt tire model, and the relationship between the lateral force and the slip angle is expressed as follows:

式中，θ₁～θ₅为拟合参数，α为前轮侧偏角，左轮和右轮侧偏角可通过式(13)、(14)求出。In the formula, θ ₁ to θ ₅ are fitting parameters, α is the front wheel slip angle, and the left and right wheel slip angles can be obtained by formulas (13) and (14).

由于未施加制动力及驱动力，认为前轮处于自由滚动状态，车轮纵向力近似为0，即F_xFL＝0，F_xFR＝0。在确定前轮轮胎力方向时仅考虑侧向力，故前轮轮胎力方向垂直于轮胎平面，由前轮转向角决定。Since no braking force and driving force are applied, it is considered that the front wheel is in a free rolling state, and the longitudinal force of the wheel is approximately 0, that is, F _xFL =0, F _xFR =0. Only the lateral force is considered when determining the tire force direction of the front wheel, so the force direction of the front wheel tire is perpendicular to the tire plane and is determined by the steering angle of the front wheel.

2)面向漂移入库控制的TD3算法设计。2) TD3 algorithm design for drift storage control.

在漂移过程中，采用深度强化学习算法，以搭建的漂移车辆动力学模型为基础，根据端到端漂移控制器，实现车辆的准确漂移入库，具体为：In the drifting process, the deep reinforcement learning algorithm is used, based on the built drifting vehicle dynamics model, and according to the end-to-end drift controller, the vehicle can be accurately drifted into the warehouse, specifically:

TD3算法中，Critic网络的输入为车辆状态和动作，输出为Q值；Actor网络的输入为车辆状态，输出为动作，即方向盘转角大小；In the TD3 algorithm, the input of the Critic network is the vehicle state and action, and the output is the Q value; the input of the Actor network is the vehicle state, and the output is the action, that is, the steering wheel angle;

选定表征漂移过程车辆状态的参数，作为Critic网络和Actor网络的输入，该组参数应能够将漂移中某时刻车辆状态唯一的表示出来，且与方向盘转角输入值存在动力学的相关性。6个状态参数为：以车辆质心为原点、车头朝向为y轴正方向的相对坐标系下库位坐标e_x、e_y和库位朝向

车辆的纵侧向车速的合速度v_m、质心侧偏角β以及横摆角速度ω。e_x、e_y和

反应了漂移过程中车辆当前时刻位置和航向角与期望状态之差，如图2所示，v_m、β和ω表征前三者的变化率。The parameters that characterize the vehicle state during the drift process are selected as the input of the Critic network and the Actor network. This group of parameters should be able to uniquely represent the vehicle state at a certain moment during the drift process, and there is a dynamic correlation with the input value of the steering wheel angle. The 6 state parameters are: the warehouse location coordinates e _x , e _y and the warehouse location orientation in the relative coordinate system with the center of mass of the vehicle as the origin and the vehicle head orientation as the positive direction of the y-axis

The resultant velocity _vm of the vehicle's longitudinal and lateral vehicle speeds, the center of mass slip angle β, and the yaw angular velocity ω. e _x , e _y and

It reflects the difference between the vehicle's current position and heading angle and the expected state during the drift process. As shown in Figure 2, v _m , β and ω represent the rate of change of the first three.

确定了强化学习算法所训练的深度神经网络后，对奖励函数进行设计，以计算车辆在漂移过程中不同状态所对应的奖励值。奖励函数设计如下：After determining the deep neural network trained by the reinforcement learning algorithm, the reward function is designed to calculate the reward value corresponding to the different states of the vehicle during the drifting process. The reward function is designed as follows:

式中，w_x、w_y、

分别为e_x、e_y和

的权重。由于所关注的是车辆停稳时与库位中心的位移误差和与库位朝向的航向角误差，因此将车速的三次方放在分母项，可以使得当车辆车速越低、越接近停止时，其纵侧向位移误差和航向角误差所计算得到的奖励值绝对值越大。根据算法原理，当车辆最终停在远离库位的位置，会计算得到一个很小的奖励值；而当车辆停在库位中心附近时，计算得到的奖励值接近于0，使前序状态和动作对应的目标Q值较大。Actor网络在根据车辆状态计算方向盘转角时会尽可能使Q值最大，使车辆最终停入库位。在进行奖励函数的设计时，应将被控量放入奖励函数中，但是在漂移入库的过程中，方向盘转角一直在调整，这是一个连续的过程，无法界定其中一次的转向对漂移入库的结果影响，所以权重系数置0。In the formula, w _x , w _y ,

are e _x , e _y and

the weight of. Since the focus is on the displacement error from the center of the storage location and the heading angle error from the orientation of the storage location when the vehicle is stationary, the cube of the vehicle speed is placed in the denominator term, so that when the vehicle speed is lower and closer to stopping, the The greater the absolute value of the reward value calculated by the longitudinal and lateral displacement error and the heading angle error. According to the algorithm principle, when the vehicle is finally parked far from the storage space, a small reward value will be calculated; when the vehicle is parked near the center of the storage space, the calculated reward value will be close to 0, making the pre-order state and the The target Q value corresponding to the action is larger. The Actor network will try to maximize the Q value when calculating the steering wheel angle based on the vehicle state, so that the vehicle will eventually park in the storage space. When designing the reward function, the controlled quantity should be put into the reward function, but in the process of drifting into the library, the steering wheel angle has been adjusted. This is a continuous process, and it is impossible to define one of the steering pairs. The result of the library is affected, so the weight coefficient is set to 0.

在进行网络训练之前，首先确定车载漂移入库控制器的“边界”，认为无论施加怎样的方向盘转角，车辆终末位置和终末航向角不会超过此边界。Before network training, the "boundary" of the vehicle-mounted drift storage controller is first determined, and it is believed that no matter what steering wheel angle is applied, the vehicle's final position and final heading angle will not exceed this boundary.

根据控制器边界，对每次车辆漂移的目标库位位置进行随机取值。当一次完整的漂移过程结束后，设定随机目标库位位置(X_aim,Y_aim)和朝向ψ_aim，且满足上述控制器边界的约束。According to the boundary of the controller, the target location of each vehicle drift is randomly selected. When a complete drift process is over, set the random target location (X _aim , Y _aim ) and direction ψ _aim , and satisfy the constraints of the controller boundary above.

在迭代训练中，车辆以该目标库位位置和朝向计算车辆状态e_x、e_y和

依此对Critic网络和Actor网络进行训练，通过在训练过程中随机更新目标库位位置，拓展了训练数据集，可以提升网络的泛化能力。In iterative training, the vehicle calculates the vehicle state _{ex, e y} _and

Based on this, the Critic network and the Actor network are trained, and the training data set is expanded by randomly updating the target location during the training process, which can improve the generalization ability of the network.

实施例Example

本实施例中，根据上述方法实现的漂移入库的控制方法具体为：In this embodiment, the control method for drift storage implemented according to the above method is specifically:

步骤一、搭建基于深度强化学习的漂移入库的四轮三自由度车辆动力学模型以及搭建轮胎力饱和工况下的轮胎模型。考虑前后与左右载荷转移的四轮三自由度车辆动力学模型。三个自由度分别为：车辆质心处速度大小v_m，质心侧偏角大小β，横摆角速度ω。Step 1: Build a four-wheel, three-degree-of-freedom vehicle dynamics model based on deep reinforcement learning for drift storage and build a tire model under tire force saturation conditions. A four-wheel, three-degree-of-freedom vehicle dynamics model considering front-to-rear and left-right load transfer. The three degrees of freedom are: the velocity of the vehicle's center of mass v _m , the side-slip angle of the center of mass β, and the yaw rate ω.

步骤二、基于深度强化学习的车辆动力学模型进行Critic网络与Actor网络设计、以及奖励函数设计。Critic网络的输入为车辆状态和动作，输出为Q值；Actor网络的输入为车辆状态，输出为动作。输入量与输出量个数较少，对应关系较为简单，采用由全连接层组成的BP神经网络进行Critic网络和Actor网络的搭建，基于深度强化学习的漂移控制算法流程如图3所示。Step 2: Design the Critic network and Actor network, and design the reward function based on the vehicle dynamics model of deep reinforcement learning. The input of the Critic network is the vehicle state and action, and the output is the Q value; the input of the Actor network is the vehicle state and the output is the action. The number of inputs and outputs is small, and the corresponding relationship is relatively simple. The BP neural network composed of fully connected layers is used to construct the Critic network and the Actor network. The flow of the drift control algorithm based on deep reinforcement learning is shown in Figure 3.

Claims

1. a kind of intelligent electric vehicle drift storage control method based on deep reinforcement learning, is characterized in that, comprises the following steps:

1) Build a vehicle dynamics model for deep reinforcement learning and a tire model under tire force saturation conditions. The vehicle dynamics model is a four-wheel, three-degree-of-freedom vehicle dynamics model that considers front-rear and left-right load transfer. The described The three degrees of freedom include the vehicle mass center velocity v _m , the mass center slip angle β and the yaw angular velocity ω. In the four-wheel three-degree-of-freedom vehicle dynamics model, the expression of the four-wheel vertical force considering the longitudinal and lateral acceleration is:

In the formula, h _m is the height of the center of mass, b _f and _br are the front and rear wheelbases, a _x and a _y are the longitudinal and lateral accelerations at the center of mass without considering the influence of body rotation, F _zFL , F _zFR , F _zRL , F _zRR are the vertical forces of the left front, right front, left rear, and right rear wheels, respectively, m is the mass of the electric vehicle, g is the acceleration of gravity, l is the wheelbase, l _f , l _r are the distances from the front and rear axles to the center of mass, F _xFL , F _xFR , F _xRL , and F _xRR are the longitudinal forces of the left front, right front, left rear, and right rear wheels, respectively, and F _yFL , F _yFR , F _yRL , and F _yRR are the left front, right front, left rear, and right rear wheels, respectively , δ is the front wheel rotation angle;

During the drifting process, considering that a certain wheel is lifted off the ground due to the excessive load transfer, the vertical load of the wheel is reduced to 0 and the load transfer reaches the upper limit. When the steering wheel is turned to the left and the load is transferred to the right, When the left rear wheel is off the ground, the vertical force of the left rear wheel is 0. At this time, the excessively transferred load is redistributed to the left front wheel and the right rear wheel according to the longitudinal and lateral acceleration, wheelbase and wheel distance, then Have:

ΔF _trans = |F _zRL |

F' _zRL = 0

Among them, ΔF _trans is the excessively transferred load, F′ _zRL is the vertical force of the left rear wheel after distribution, F′ _zRR is the vertical force of the right rear wheel after distribution, and F′ _zFL is the vertical force of the left front wheel after distribution to force;

The force analysis of the four-wheel three-degree-of-freedom vehicle dynamics model considering the load transfer between front and rear and left and right is carried out, and the vehicle dynamic balance equation is obtained as:

φ=β+ψ

According to this calculation, the longitudinal vehicle speed v _mx and the lateral vehicle speed v _my are obtained, then:

v _mx = v _m ·cosβ

v _my = v _m ·sinβ

in,

is the rate of change of the velocity at the center of mass of the vehicle,

is the global azimuth velocity of the vehicle speed at the center of mass,

is the rate of change of the yaw angular velocity, ψ is the global azimuth angle of the front of the vehicle, I _z is the yaw moment of inertia, v _x is the longitudinal speed of the vehicle, and v _y is the lateral speed of the vehicle;

The tire models used for deep reinforcement learning training include the front tire force model and the rear tire force model. For the rear tire force model, during the drifting process, the rear wheel brakes lock and there is pure friction on the road surface, and the rear wheel’s The direction of the tire force is opposite to the direction of the instantaneous speed of the wheel center. Through the force analysis of the rear wheel, the expression of the longitudinal and lateral tire force component of the rear wheel is obtained as:

For the left rear wheel:

For the right rear wheel:

F _{r_sat} = μ ₁ F _z

Among them, v _xRL and v _yRL are the longitudinal and lateral speeds at the center of the left rear wheel, respectively, v _xRR and _vyRR are the longitudinal and lateral speeds at the center of the right rear wheel, respectively, and λ _L and λ _R are the left and right velocities, respectively. Rear wheel center slip angle, F _xRL and F _yRL are the longitudinal and lateral forces of the left rear wheel, respectively, F _xRR and F _yRR are the longitudinal and lateral forces of the right rear wheel, respectively, and F _{rRL_sat} and F _{rRR_sat} are the left and right forces, respectively The horizontal saturated tire force of the rear wheel, F _{r_sat} represents the horizontal saturated tire force of the corresponding wheel, μ ₁ is the adhesion coefficient of the road surface when the wheel is locked, and F _z represents the vertical force of the corresponding wheel;

For the front tire force model, during the drifting process, the front tire force is not saturated, then the improved Burckhardt tire model is used to fit the tire force to express the relationship between the lateral force and the slip angle, as follows:

Among them, θ ₁ to θ ₅ are fitting parameters, and α is the front wheel slip angle;

The left wheel slip angle α _L and the right wheel slip angle α _R can be obtained by the following formulas:

Since the front wheel does not apply braking force and driving force, it is in a free rolling state, with F _xFL = 0, F _xFR = 0, only the lateral force is considered when determining the tire force direction of the front wheel, then the tire force direction of the front wheel is perpendicular to the tire The plane is determined by the steering angle of the front wheel;

2) Adopt the TD3 algorithm for drift storage control to realize the drift storage of intelligent electric vehicles, which includes the following steps:

21) Design TD3 algorithm for drift warehousing control, build Actor network and Critic network, specifically:

Both the Critic network and the Actor network are BP neural networks composed of fully connected layers. The input of the Critic network is the vehicle state and action, and the output is the Q value. The input of the Actor network is the vehicle state and the output is the action. The vehicle state is The parameters that characterize the state of the vehicle during the drift process, including the location coordinates (e _x , e _y ) and the location orientation in the relative coordinate system with the center of mass of the vehicle as the origin and the vehicle's head orientation as the positive direction of the y-axis

22) Construct the reward function r(k), then there are:

Among them, w _x , w _y ,

are e _x , e _y and

The weight of , k is time;

23) Train the Actor network and the Critic network, and complete the drift storage of the smart electric vehicle accordingly. Before training the Actor network and the Critic network, determine the boundary of the drift storage controller, and according to the boundary, each vehicle drifts In the iterative training, the vehicle calculates the vehicle state with the randomly selected target location and orientation, and trains the Critic network and the Actor network accordingly, and randomly updates the target during the training process. Warehouse location, expand the training data set, and improve the ability of transformation.