[go: up one dir, main page]

CN112026744B - A DQN variant-based energy management method for a hybrid hybrid system - Google Patents

A DQN variant-based energy management method for a hybrid hybrid system Download PDF

Info

Publication number
CN112026744B
CN112026744B CN202010845021.9A CN202010845021A CN112026744B CN 112026744 B CN112026744 B CN 112026744B CN 202010845021 A CN202010845021 A CN 202010845021A CN 112026744 B CN112026744 B CN 112026744B
Authority
CN
China
Prior art keywords
hybrid
energy management
vehicle
dqn
battery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010845021.9A
Other languages
Chinese (zh)
Other versions
CN112026744A (en
Inventor
周健豪
薛四伍
廖宇晖
薛源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010845021.9A priority Critical patent/CN112026744B/en
Publication of CN112026744A publication Critical patent/CN112026744A/en
Application granted granted Critical
Publication of CN112026744B publication Critical patent/CN112026744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • B60W2050/0031Mathematical model of the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Hybrid Electric Vehicles (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)

Abstract

本发明公开了一种基于DQN变体的混联式混合动力系统能量管理方法,属于混联式混合动力汽车技术领域,可以提高训练收敛速度以及汽车燃油经济性;本发明包括:建立混联式混合动力汽车模型,获取影响所述能量管理策略的环境参数,包括道路坡度和车载质量;利用动态规划(DP)算法,求解得到最优能量管理策略,将经验保存进最优经验池(OEB),结合混合经验回放(HER)技术,采用Dueling DQN策略训练模型,获得训练后的深度强化学习代理,进行所述混联式混合动力汽车在不同工况下的能量管理。本发明所构建HER技术和DQN变体Dueling架构可以有效提高训练收敛速度、汽车燃油经济性和算法鲁棒性。

Figure 202010845021

The invention discloses an energy management method for a hybrid hybrid power system based on a DQN variant, which belongs to the technical field of hybrid hybrid vehicles, and can improve the training convergence speed and the fuel economy of the vehicle. The invention includes: establishing a hybrid hybrid power system. Hybrid electric vehicle model, obtain the environmental parameters that affect the energy management strategy, including road gradient and vehicle quality; use the dynamic programming (DP) algorithm to solve the optimal energy management strategy, and save the experience into the optimal experience pool (OEB) , combined with the Hybrid Experience Replay (HER) technology, using the Dueling DQN strategy to train the model, to obtain the trained deep reinforcement learning agent, and to carry out the energy management of the hybrid hybrid vehicle under different working conditions. The HER technology and the DQN variant Dueling architecture constructed by the present invention can effectively improve the training convergence speed, vehicle fuel economy and algorithm robustness.

Figure 202010845021

Description

Series-parallel hybrid power system energy management method based on DQN variants
Technical Field
The invention belongs to the technical field of series-parallel hybrid electric vehicles, and particularly relates to a series-parallel hybrid power system energy management method based on DQN variants.
Background
At present, the energy crisis is becoming more serious, the emission standard of automobiles is becoming more severe, the use of pure fuel oil automobiles is challenged, and the hybrid electric automobile has the advantages of long driving range of the fuel oil automobiles and no emission of electric automobiles to solve the problem of fossil fuel combustion, so that the energy management problem of the hybrid power is the key of research all the time.
At present, most of energy management of hybrid electric vehicles is a strategy based on rules, and by setting a certain energy management threshold value, the most common rule of the plug-in hybrid electric vehicles is to firstly consume battery energy, then maintain battery electric quantity and perform energy control on the rule. The optimization-based strategy has a representative benchmark which is Dynamic Programming (DP), and the hybrid electric vehicle relatively optimal energy management obtained off line under the condition that the global working condition information is known utilizes the known speed working condition to carry out corresponding optimal energy demand distribution on an engine and a battery of the hybrid electric vehicle so as to obtain the optimal energy management. In the prior art, engineers are used to develop rules for regular energy management or optimized model predictive control based on known or predicted speeds for energy management, thereby adjusting the equivalent fuel consumption of a hybrid vehicle.
However, the method in the prior art has some disadvantages, the rule-based energy management effect is often not obvious enough, many experience knowledge is needed for single working condition, global working condition is needed to be known based on optimized DP, real-time online application cannot be carried out due to too long calculation time, existing model prediction can be carried out in an optimized and real-time manner, but the prediction control step length cannot be selected too large, and the method still has a large difference compared with the optimization result of DP. And many optimization methods are not comprehensively considered, and the road gradient condition information and the vehicle-mounted mass change condition of the automobile are ignored.
Disclosure of Invention
The invention provides a series-parallel hybrid power system energy management method based on a DQN variant, which can improve the training convergence speed and the automobile fuel economy by combining the mixed experience playback of OEB and PER technologies.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a method for the energy management of a series-parallel hybrid power system based on DQN variants comprises the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
In the above steps, the objective function of energy management:
Figure BDA0002642740590000021
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefRepresents the SOC reference value, mdotfuelFuel consumption for each sampling time;
the passive series-parallel automobile model comprises an automobile dynamic model, a planetary gear transmission, a motor and a battery;
the automobile dynamics model is as follows:
Figure BDA0002642740590000022
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient;
the planetary gear transmission model is as follows:
Figure BDA0002642740590000023
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs motor torque、TeIs the engine torque;
the cell model shown is:
Figure BDA0002642740590000031
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is of the magnitude
Figure BDA0002642740590000032
Wherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery;
the parameters influencing the energy management comprise road conditions, namely road gradient, of the hybrid electric vehicle on different working conditions and vehicle-mounted mass change caused by passenger or cargo change;
the observed quantity influencing energy management comprises the speed of the automobile, the acceleration of the automobile, the engine rotating speed, the engine torque, the motor rotating speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, the automobile displacement, the measurable interference road gradient and the vehicle-mounted quality variation quantity;
the second step to the fourth step specifically comprise the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment, wherein the current moment is the moment of the current observed quantity;
the parameters of the experimental vehicle influencing energy management under different working conditions are obtained as follows: obtaining a plurality of samples, each sample including parameters collected on the test vehicle at different times that may affect energy management;
after obtaining the equivalent fuel consumption of the experimental vehicle under different working conditions and the parameters influencing the energy management, acquiring the equivalent fuel consumption of the experimental vehicle under the working conditions of a fixed route and the parameters influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Has the advantages that: the invention provides a series-parallel hybrid power system energy management method based on DQN variants, which considers environmental parameters influencing energy management due to road gradient and vehicle-mounted quality change, utilizes DP to solve optimal experience in advance and utilizes PER to form HER, so that a Dueling framework DQN agent can be trained better, and environmental parameters influencing energy management due to equivalent fuel consumption of an experimental vehicle under different working conditions are obtained; training a deep reinforcement learning agent model based on parameters and observed quantities influencing the energy management to obtain a trained agent; the method comprises the steps of obtaining environmental parameters influencing energy management in actual running of the automobile, and carrying out energy management on the hybrid electric vehicle based on the environmental parameters influencing energy management in actual running and trained agents, so that energy optimization can be effectively controlled, real-time online application can be realized, more effective control on energy management of the hybrid electric vehicle is realized, and energy consumption is reduced.
Drawings
FIG. 1 is a schematic illustration of the training and application process of a series-parallel hybrid power system energy management method based on DQN variants in an embodiment of the present invention;
FIG. 2 is a flow chart of a specific application of a series-parallel hybrid power system energy management method based on DQN variants in an embodiment of the present invention;
FIG. 3 is a schematic network structure diagram of a hybrid-series hybrid power system energy management method Dueling DQN based on DQN variants in the embodiment of the invention;
FIG. 4 is a graph of reference SOC versus time for an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and specific examples:
a method for the energy management of a series-parallel hybrid power system based on DQN variants comprises the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
In the above steps, the objective function of energy management is:
Figure BDA0002642740590000051
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefRepresents an SOC reference value; as shown in FIG. 4, in the objective functionThe SOC is mainly the working condition running time of the automobile through historical travel information, so that a simple SOC reference can be obtained, namely the reference SOC is uniformly reduced in a linear function manner along with the change of time, and the working effect of the battery under the condition is better;
the passive series-parallel automobile model comprises an automobile dynamic model, a planetary gear transmission, a motor and a battery;
the automobile dynamics model is as follows:
Figure BDA0002642740590000052
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient;
the planetary gear transmission model is as follows:
Figure BDA0002642740590000053
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs the motor torque, TeIs the engine torque;
the cell model shown is:
Figure BDA0002642740590000054
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is largeIs small as
Figure BDA0002642740590000061
Wherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery;
the parameters influencing the energy management comprise road conditions, namely road gradient, of the hybrid electric vehicle on different working conditions and vehicle-mounted mass change caused by passenger or cargo change;
the observed quantity influencing energy management comprises the speed of the automobile, the acceleration of the automobile, the engine rotating speed, the engine torque, the motor rotating speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, the automobile displacement, the measurable interference road gradient and the vehicle-mounted quality variation quantity;
as shown in fig. 1, the off-line training and on-line real-time application process of the series-parallel hybrid power system energy management method based on the DQN variant includes the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment, wherein the current moment is the moment of the current observed quantity;
the parameters of the experimental vehicle influencing energy management under different working conditions are obtained as follows: obtaining a plurality of samples, each sample including parameters collected on the test vehicle at different times that may affect energy management;
after obtaining the equivalent fuel consumption of the experimental vehicle under different working conditions and the parameters influencing the energy management, acquiring the equivalent fuel consumption of the experimental vehicle under the working conditions of a fixed route and the parameters influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
The method can be applied to the scenes of the series-parallel hybrid electric vehicle under different working conditions, for example, the energy management of the vehicle is carried out by using the method, and the vehicle can adjust the online driving condition according to the agent trained in advance, so that the equivalent fuel consumption is reduced and the more accurate control is carried out. As shown in fig. 2, the specific application process of the above method is as follows:
step 201, obtaining environmental parameters of the experimental vehicle influencing energy management under different working conditions
The operating condition may represent a change of the running speed of the experimental vehicle with time, for example, the time from a starting station to an ending station of the automobile is 1024s, and how the speed change is, which may be regarded as an operating condition. In this embodiment, NEDC is used as an experimental training condition, and data of the experimental vehicle circulating at least three times under the condition is collected to ensure reliability of the training data.
The environmental parameter affecting energy management may be at least one of road conditions of different working conditions of the hybrid vehicle, i.e. road grade, vehicle-mounted mass changes due to passenger and cargo changes.
In the implementation, at least one environmental parameter which may influence energy management of the experimental vehicle under each working condition is obtained, and the environmental parameter which may influence energy management is selected from the at least one environmental parameter which may influence energy management. The equivalent fuel consumption of the experimental vehicle under the working condition is obtained by reading the data of the fuel consumption and the battery electric quantity sensor installed on the automobile.
Optionally, the method includes acquiring the equivalent fuel consumption of the experimental vehicle and the environmental parameters and the observed quantity influencing energy management under various working conditions at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
The larger the sampling frequency is, the smaller the interval between sampling points is, the more data are obtained, and the greater the correlation between the data is, so that the result output by the finally trained agent model is more accurate. The technician can preset a sampling time interval, and samples the experimental vehicle in running according to the sampling time interval. For example, the technician may set the sampling frequency to 1 HZ.
By smoothing the collected data, the purpose of suppressing the collected inaccurate data can be achieved.
The acquired parameters are normalized, so that the values corresponding to different parameters have certain comparability, the accuracy of the proxy network model is improved, and great help is provided for setting the reward value.
Specifically, in the normalization process in this embodiment, because multiple sets of parameter items have been obtained in the above process, the normalization process may be calculated by determining the maximum parameter value and the minimum parameter value in each set of parameter items, and according to the determined maximum parameter value and the determined minimum parameter value, through the following formula:
Figure BDA0002642740590000071
wherein X is data after parameter normalization processing in each group of parameter items, and XminFor the minimum parameter value, x, in each set of parameter termsmaxThe maximum parameter value in each group of parameter items.
And 202, based on the environmental parameters and the observed values influencing energy management, training an agent model by utilizing deep reinforcement learning to obtain a trained convergence agent.
The deep reinforcement learning agent is one of the combination of neural network models, and can control the action data of the next moment according to the observation data of the estimated moment. Observation data is input at the input layer of the agent model, and a Q value estimation is output at the output layer of the agent.
The neural network carries out Q function calculation to obtain a Q value: q (s, a | theta)Q) The input is state s, action a, and the output is Q function Q (s, a | theta)Q) The network is divided into an Online evaluation network and a Target evaluation network, the Target evaluation network and the Online evaluation network have the same structure, and the parameter theta of the Online evaluation network is the sameQCarrying out random initialization, and initializing a network parameter theta of a Target evaluation network through the two network parametersQ′And simultaneously, opening up a space OEB as a storage space for experience playback.
After initialization is completed, iterative solution is started, action exploration is carried out by adopting an e-greed greedy algorithm, and action a is executed in the current statetThe corresponding reward and next state are obtained and the process is formed into element groups(s)t,at,rt,st+1) Store into OEB space. Selecting a small batch of data from an OEB space by adopting a PER technology, combining the small batch of data randomly selected from the OEB to perform HER as training data of the Online evaluation network, and updating the Online evaluation network.
Defining an Online evaluation network Loss function: l ═ [ (r + γ Q)t(s′,μt(s′|θt μ)|θt Q))-Q(s,a|θQ)]2The Online evaluation network is updated by minimizing the Loss function. Evaluating network parameter theta using updated OnlineQNetwork parameter theta of network is evaluated for TargetQ′Updating:
Figure BDA0002642740590000081
in the deep reinforcement learning agent dulling DQN neural network framework shown in fig. 3, the Q value of standard DQN represents the value of a state and an action, and in this case, for the case that the Q value is not related to different actions in some states, the evaluation of the value of the action is not accurate, and the robustness and the stability are lacked.
The Dueling DQL divides the abstract features extracted by the convolutional layer into two streams in the fully connected layer, respectively represents the state action value function represented only by the original DQN as a state value function v(s), makes the state estimation value independent of action and environmental noise, makes each state have a relative value with respect to other unselected actions, and the other stream represents a state action dominance function a (s, a) in a certain state. Finally, combining the two streams together through a special aggregation layer to generate an estimate of the state-action-value function has the advantage of generalizing learning across operations without any modification to the underlying RL algorithm, in dulling DQL, the Q-value function is constructed as follows:
Q(s,a;θ,α,β)=V(s;θ,α)+A(s,a;θ,β) (7)
α and β are parameters of two streams in the fully-connected layer, respectively, and θ is a parameter of the convolutional layer. However, when Q is given, the values of V and a are not unique, in other words, different combinations of V and a can result in the same Q value, which makes the algorithm less stable, and therefore the average of the merit function is used to improve the stability of the proposed algorithm:
Figure BDA0002642740590000082
only more layers are needed as compared with standard DQN training, but when there are many behaviors of similar value, dulling DQN can better perform strategy evaluation and improve stability and robustness.
In the actual process, the number of the neurons in the hidden layer of the agent model is set to be 40, and in order to accurately evaluate the effect of deep reinforcement learning energy management, the control effect of the deep reinforcement learning can be evaluated through the equivalent fuel consumption ratio R.
The equivalent fuel consumption ratio reflects a comparison between the effect of the actual control and the DP reference, with good results as the R value approaches 0. The formula for calculating the value of the ratio R is as follows:
Figure BDA0002642740590000091
wherein R is represented as the ratio between DP reference data and actual data, SRLRepresenting equivalent fuel consumption, S, by deep reinforcement learning trainingDPThe equivalent fuel consumption reference data obtained under the DP reference is shown.
It should be noted that the prediction performance of the nonlinear autoregressive dynamic neural network model after the training is completed is evaluated by calculating the ratio and the root mean square between the reference data and the actual data, for example, when the obtained ratio is close to 1, the root mean square is close to 0, which indicates that the proxy model after the training is completed by the deep reinforcement learning has good control performance.
Optionally, the training condition is NEDC, and the detection condition is WLTP, or FTP75, UDDS, JN 1015. In order to make the control data of the deep reinforcement learning algorithm more accurate, the trained agent models can be detected respectively under various different working conditions, the R value is calculated in the training process under each working condition, the control performance of each back propagation training method is compared by taking the R value as an index, and further the effect and robustness of the deep reinforcement learning are detected. The results are shown in table 1:
TABLE 1
Figure BDA0002642740590000092
From table 1, it can be seen that, in the deep reinforcement learning training process, the ratio R value of the obtained reference data to the actual data is close to 90%, which proves the effectiveness of real-time application.
And 303, acquiring environmental parameters and observed quantities influencing energy management in the actual running of the automobile, and controlling the energy management of the automobile based on the environmental parameters influencing the energy management in the actual running and the trained agent model.
In the steps, because the environmental parameter items influencing the energy management and the trained agent model are obtained, the parameters influencing the energy and the observed quantity can be input into the trained agent model in real time to control the running and the energy management of the automobile.
Specifically, if the control action of the vehicle at the current moment is to be controlled, the environmental parameters and the observed quantity affecting the energy management at the estimated moment need to be obtained, such as at least one of the road gradient condition and the vehicle-mounted mass change condition of the hybrid vehicle under different working conditions. The vehicle speed, the vehicle acceleration, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the current fuel consumption, the difference between the SOC and the reference SOC, and the vehicle displacement. Inputting the parameters into the trained proxy model, and outputting the control action of the automobile at the estimated time, namely the engine torque and the rotating speed at the next time, wherein the estimated time is the time for sampling the parameters next to the current time, namely the estimated time is the time corresponding to the next sampling point of the sampling point corresponding to the current time.
The above description is only a preferred embodiment of the present invention, and the purpose, technical solution and advantages of the present invention are further described in detail without limiting the invention, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1.一种基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,包括以下步骤:1. A method for energy management of a hybrid hybrid power system based on a DQN variant, comprising the following steps: 建立被动混联式汽车的模型;在工况信息提前已知情况下利用DP算法求解得到最优的能量管理经验,储存进OEB中,然后在实时工况情况下利用深度强化学习中的Dueling DQN进行训练,在每次训练中,将当前时刻的SOC、SOC与参考SOC之间的差值、汽车速度、汽车位移、汽车加速度、燃油消耗量、道路坡度、车载质量变化量作为Dueling DQN代理的观测值输入数据,以及当前时刻的奖励值作为Dueling DQN代理的奖励值输入数据,将这时候得到的经验储存在PEB中,然后利用PER在PEB进行采样和随机从OEB中采样,将两者经验进行结合,其中PEB中经验的比例会随着时间的进行而不断减少,从而进行HER的Dueling DQN的神经网络进行训练得到收敛的代理,获取实验车辆在不同工况下的等效燃油消耗、影响能量管理的参数和观测量;将所述观测量输入所述深度强化学习代理进行所述混合动力汽车不同工况下的能量管理,输出为当前时刻的下一时刻所述混联式混合动力汽车控制量即发动机转矩需求和转速。Build a passive hybrid vehicle model; use the DP algorithm to solve the optimal energy management experience when the working condition information is known in advance, store it in the OEB, and then use the Dueling DQN in deep reinforcement learning under real-time working conditions. Conduct training. In each training, the difference between the SOC at the current moment, the SOC and the reference SOC, the vehicle speed, the vehicle displacement, the vehicle acceleration, the fuel consumption, the road slope, and the vehicle mass change are used as the Dueling DQN agent. The observation value input data and the reward value at the current moment are used as the reward value input data of the Dueling DQN agent, the experience obtained at this time is stored in the PEB, and then the PER is used to sample in the PEB and randomly sample from the OEB, and the experience of both Combined, the proportion of experience in PEB will continue to decrease over time, so that the neural network of Dueling DQN of HER is trained to obtain a convergent agent, and the equivalent fuel consumption and impact of experimental vehicles under different working conditions are obtained. Parameters and observed quantities of energy management; input the observed quantities into the deep reinforcement learning agent for energy management of the hybrid electric vehicle under different operating conditions, and the output is the hybrid hybrid vehicle at the next moment at the current moment Controlled quantities are engine torque demand and rotational speed. 2.根据权利要求1所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所述被动混联式汽车的模型包括汽车动力学模型、行星齿轮变速器以及电机和电池。2 . The method for energy management of a hybrid hybrid system based on a DQN variant according to claim 1 , wherein the model of the passive hybrid vehicle includes a vehicle dynamics model, a planetary gear transmission, and an electric motor and an electric motor. 3 . Battery. 3.根据权利要求2所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所述汽车动力学模型如下:3. The method for energy management of a hybrid-connected hybrid power system based on a DQN variant according to claim 2, wherein the vehicle dynamics model is as follows:
Figure FDA0003292787060000011
Figure FDA0003292787060000011
其中,Tout是驱动轴扭矩,R是汽车车轮半径,Fa是汽车惯性阻力,Fr是汽车空气阻力,Fg是汽车坡道阻力,Ff是汽车滚动阻力,m是汽车质量,v是汽车速度,a是汽车加速度,ρ是空气密度,A是汽车迎风面积,CD是空气阻力系数,α是汽车道路坡度,μr为滚动阻力系数。where T out is the drive shaft torque, R is the wheel radius of the car, F a is the inertial resistance of the car, F r is the air resistance of the car, F g is the ramp resistance of the car, F f is the rolling resistance of the car, m is the mass of the car, v is the speed of the vehicle, a is the acceleration of the vehicle, ρ is the air density, A is the windward area of the vehicle, C D is the coefficient of air resistance, α is the road gradient of the vehicle, and μ r is the coefficient of rolling resistance.
4.根据权利要求2所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所述行星齿轮变速器模型如下:4. The method for energy management of a hybrid-connected hybrid power system based on a DQN variant according to claim 2, wherein the planetary gear transmission model is as follows:
Figure FDA0003292787060000021
Figure FDA0003292787060000021
其中nm是电机转速;β是行星齿轮参数;nout是驱动轴转速;ne是发动机转速;Tm是电机扭矩、Te是发动机扭矩。where n m is the motor speed; β is the planetary gear parameter; n out is the drive shaft speed; ne is the engine speed; T m is the motor torque, and Te is the engine torque.
5.根据权利要求2所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所示电池模型为:5. The method for energy management of a hybrid-connected hybrid power system based on a DQN variant according to claim 2, wherein the battery model shown is:
Figure FDA0003292787060000022
Figure FDA0003292787060000022
其中Pbatt(t)是电池功率,Voc是电池电压,Ib(t)是电池电流,rint是电池电阻,Pm(t)是电机功率,其大小为
Figure FDA0003292787060000023
其中ηm是电机效率,SOC是电池荷电状态,Qmax是电池最大容量。
where P batt (t) is the battery power, V oc is the battery voltage, I b (t) is the battery current, r int is the battery resistance, and P m (t) is the motor power, whose magnitude is
Figure FDA0003292787060000023
where η m is the motor efficiency, SOC is the battery state of charge, and Q max is the maximum battery capacity.
6.根据权利要求1所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所述影响能量管理的参数包括混合动力汽车不同工况上的道路情况即道路坡度和因乘客或货物变化导致的车载质量变化;所述影响能量管理的观测量包括汽车的速度、汽车的加速度、发动机转速、发动机转矩、电机转速、电机转矩、电池荷电状态、当前时刻燃油消耗量、SOC与参考SOC之间的差值、汽车位移以及可测量干扰道路坡度和车载质量变化量。6 . The method for energy management of a hybrid hybrid system based on a DQN variant according to claim 1 , wherein the parameters affecting energy management include road conditions on different operating conditions of the hybrid vehicle, that is, road gradients. 7 . and changes in onboard mass due to changes in passengers or cargo; the observations that affect energy management include the speed of the car, the acceleration of the car, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the current moment Fuel consumption, difference between SOC and reference SOC, vehicle displacement, and the amount of disturbance road gradient and vehicle mass change can be measured. 7.根据权利要求1所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,能量管理的目标函数为:7. The method for energy management of a hybrid-connected hybrid power system based on a DQN variant according to claim 1, wherein the objective function of energy management is:
Figure FDA0003292787060000024
Figure FDA0003292787060000024
其中,γ是积极的权重因子,表明对燃油消耗和电池电量消耗的一种平衡等效;SOCref表示SOC参考值,
Figure FDA0003292787060000025
为每个取样时间燃油消耗量。
Among them, γ is a positive weighting factor, indicating a balanced equivalence of fuel consumption and battery power consumption; SOC ref represents the SOC reference value,
Figure FDA0003292787060000025
Fuel consumption for each sampling time.
8.根据权利要求1所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所述获取实验车辆在不同工况下影响能量管理的参数为:获取多个样本,每个样本包括不同时刻在所述实验车辆上采集到的可能影响能量管理的参数。8 . The method for energy management of a hybrid hybrid power system based on a DQN variant according to claim 1 , wherein the obtaining parameters that affect the energy management of the experimental vehicle under different working conditions is: obtaining a plurality of samples. 9 . , and each sample includes parameters collected on the experimental vehicle at different times that may affect energy management. 9.根据权利要求1所述的基于DQN变体的混联式混合动力系统能量管理的方法,其特征在于,所述获取实验车辆在在不同工况下的等效燃油消耗和影响所述能量管理的参数之后,以预设的采样频率采集所述实验车辆在固定路线工况下等效燃油消耗和影响所述能量管理的参数,并对采集到数据进行平滑处理和归一化处理。9 . The method for energy management of a hybrid hybrid system based on a DQN variant according to claim 1 , wherein the acquisition of the equivalent fuel consumption of the experimental vehicle under different operating conditions and the influence of the energy After the managed parameters, the equivalent fuel consumption of the experimental vehicle under fixed route conditions and the parameters affecting the energy management are collected at a preset sampling frequency, and the collected data is smoothed and normalized.
CN202010845021.9A 2020-08-20 2020-08-20 A DQN variant-based energy management method for a hybrid hybrid system Active CN112026744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010845021.9A CN112026744B (en) 2020-08-20 2020-08-20 A DQN variant-based energy management method for a hybrid hybrid system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010845021.9A CN112026744B (en) 2020-08-20 2020-08-20 A DQN variant-based energy management method for a hybrid hybrid system

Publications (2)

Publication Number Publication Date
CN112026744A CN112026744A (en) 2020-12-04
CN112026744B true CN112026744B (en) 2022-01-04

Family

ID=73581036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010845021.9A Active CN112026744B (en) 2020-08-20 2020-08-20 A DQN variant-based energy management method for a hybrid hybrid system

Country Status (1)

Country Link
CN (1) CN112026744B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112498334B (en) * 2020-12-15 2022-03-11 清华大学 Robust energy management method and system for intelligent networked hybrid vehicle
CN113492827A (en) * 2021-06-23 2021-10-12 东风柳州汽车有限公司 Energy management method and device for hybrid electric vehicle
CN113997926A (en) * 2021-11-30 2022-02-01 江苏浩峰汽车附件有限公司 Parallel hybrid electric vehicle energy management method based on layered reinforcement learning
CN115284973B (en) * 2022-09-05 2024-07-19 湖南大学 Fuel cell automobile energy management method based on improved multi-objective Double DQN
CN115476841B (en) * 2022-10-10 2025-01-07 湖南大学重庆研究院 A plug-in hybrid vehicle energy management method based on improved multi-objective DDPG

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1252036B1 (en) * 2000-01-31 2006-03-15 Azure Dynamics Inc. Method and apparatus for adaptive hybrid vehicle control
CN102717797A (en) * 2012-06-14 2012-10-10 北京理工大学 Energy management method and system of hybrid vehicle
CN110682905A (en) * 2019-10-12 2020-01-14 重庆大学 A method for obtaining the reference variation of battery state of charge in time domain based on mileage
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 An intelligent variable time domain model prediction energy management method for hybrid electric vehicles
CN111267830A (en) * 2020-02-10 2020-06-12 南京航空航天大学 Hybrid power bus energy management method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8543272B2 (en) * 2010-08-05 2013-09-24 Ford Global Technologies, Llc Distance oriented energy management strategy for a hybrid electric vehicle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1252036B1 (en) * 2000-01-31 2006-03-15 Azure Dynamics Inc. Method and apparatus for adaptive hybrid vehicle control
CN102717797A (en) * 2012-06-14 2012-10-10 北京理工大学 Energy management method and system of hybrid vehicle
CN110682905A (en) * 2019-10-12 2020-01-14 重庆大学 A method for obtaining the reference variation of battery state of charge in time domain based on mileage
CN111267830A (en) * 2020-02-10 2020-06-12 南京航空航天大学 Hybrid power bus energy management method, device and storage medium
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 An intelligent variable time domain model prediction energy management method for hybrid electric vehicles

Also Published As

Publication number Publication date
CN112026744A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN112026744B (en) A DQN variant-based energy management method for a hybrid hybrid system
WO2021114742A1 (en) Comprehensive energy prediction and management method for hybrid electric vehicle
CN112249002B (en) A heuristic series-parallel hybrid energy management method based on TD3
CN110696815B (en) A predictive energy management method for connected hybrid electric vehicles
Liessner et al. Deep reinforcement learning for advanced energy management of hybrid electric vehicles.
CN111267830B (en) A hybrid electric bus energy management method, device and storage medium
CN107688343B (en) Energy control method for a hybrid electric vehicle
CN112668799A (en) Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data
CN108647836B (en) Driver energy-saving evaluation method and system
CN107516107A (en) A method for classification and prediction of driving conditions of hybrid electric vehicles
CN112084700B (en) A hybrid power system energy management method based on A3C algorithm
CN110962837A (en) Plug-in hybrid electric vehicle energy management method considering driving style
CN110682905B (en) Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage
CN101519073A (en) Method for forecasting running load of hybrid electric vehicle
CN115805840B (en) Energy consumption control method and system for range-extending motor loader
CN112406875A (en) Vehicle energy consumption analysis method and device
CN105527110B (en) The appraisal procedure and device of automobile fuel ecomomy
CN113554337A (en) Construction method of energy management strategy for plug-in hybrid vehicle integrating traffic information
CN116946107A (en) Hybrid system mode decision and power distribution method under energy track following
Chen et al. On the relationship between energy consumption and driving behavior of electric vehicles based on statistical features
CN111198501A (en) Method for determining fuel equivalent factor by RBF neural network
CN114154729A (en) A hybrid electric vehicle composite energy storage system energy management system and method
Fechert et al. Using deep reinforcement learning for hybrid electric vehicle energy management under consideration of dynamic emission models
CN114435378A (en) Pure electric vehicle whole vehicle mass estimation method based on neural network
CN115257694A (en) An energy management control method for hybrid electric vehicles based on stochastic dynamic programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant