Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO Approach
<p>The 2-DOF quarter semi-active suspension system model.</p> "> Figure 2
<p>The control structure of semi-active suspension based on PPO.</p> "> Figure 3
<p>The changing of the damping force of the semi-active suspension.</p> "> Figure 4
<p>The PPO algorithm’s structure.</p> "> Figure 5
<p>The road grade distribution.</p> "> Figure 6
<p>Simulation results of fuzzy, PPO, and passive suspension systems for a light vehicle on a C-class road.</p> "> Figure 7
<p>Simulation results of fuzzy, PPO, and passive suspension systems for a light vehicle on a D-class road.</p> "> Figure 7 Cont.
<p>Simulation results of fuzzy, PPO, and passive suspension systems for a light vehicle on a D-class road.</p> "> Figure 8
<p>Body acceleration of fuzzy, PPO, and passive suspension systems for a heavy vehicle on class-B and -C roads.</p> "> Figure 9
<p>Continuously changing road.</p> "> Figure 10
<p>Body acceleration under continuously a changing road of PPO, passive, and fuzzy suspension systems for a light vehicle.</p> "> Figure 11
<p>Number of iterations.</p> ">
Abstract
:1. Introduction
2. System Modeling of Road Disturbance and Vehicle Suspension
2.1. Road Disturbance Model
2.2. Two-DOF Quarter Semi-Active Vehicle Suspension
3. Semi-Active Suspension Control Based on the PPO Algorithm
3.1. State Space and Action Space
3.2. PPO Algorithm Network Model
3.3. Design If the Reward Function with Road Disturbances and Policy Update
3.4. Semi-Active Suspension Control Based on the PPO Architecture
4. Simulation Results and Analysis
4.1. Simulation Experiments for Different Road Levels for a Light Vehicle
4.2. Simulation Experiments for Different Road Levels for a Heavy Vehicle
4.3. Simulation Experiments for a Continuously Changing Road Level
4.4. Number of Iterations and Computation Time
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Du, H.-P.; Sze, K.Y.; Lam, J. Semi-active H-infinity control of vehicle suspension with magnetorheological dampers. J. Sound Vib. 2005, 283, 981–996. [Google Scholar] [CrossRef]
- Rao, L.; Narayanan, S. Sky-hook control of nonlinear quarter car model traversing rough road matching performance of LQR control. J. Sound Vib. 2009, 323, 515–529. [Google Scholar]
- Zhao, Y.; Sun, W.; Gao, H. Robust control synthesis for seat suspension systems with actuator saturation and time-varying input delay. J. Sound Vib. 2010, 329, 4335–4353. [Google Scholar] [CrossRef]
- Pepe, G.A. VFC—Variational feedback controller and its application to semi-active suspensions. Mech. Syst. Signal Process. 2016, 76, 72–92. [Google Scholar] [CrossRef]
- Wu, W.; Chen, X.; Shan, Y. Analysis and experiment of a vibration isolator using a novel magnetic spring with negative stiffness. J. Sound Vib. 2014, 333, 2958–2970. [Google Scholar] [CrossRef]
- Zhang, M.-H.; Jing, X.-J. A bioinspired dynamics-based adaptive fuzzy SMC method for half-car active suspension systems with input dead zones and saturations. IEEE Trans. Cybern. 2021, 51, 1743–1755. [Google Scholar] [CrossRef]
- Bui, Q.-D.; Nguyen, Q.H. A new approach for dynamic modeling of magnetorheological dampers based on quasi-static model and hysteresis multiplication factor. In Proceedings of the IFToMM Asian Conference on Mechanism and Machine Science, Hanoi, Vietnam, 15–18 December 2021; pp. 733–743. [Google Scholar]
- Bui, Q.; Hoang, L.; Mai, D.; Nguyen, Q.H. Design and testing of a new shear-mode magnetorheological damper with self-power component for front-loaded washing machines. In Proceedings of the 2nd Annual International Conference on Material, Machines and Methods for Sustainable Development (MMMS2020), Trang, Vietnam, 12–15 November 2020; pp. 860–866. [Google Scholar]
- Phu, D.X.; Mien, V. Robust control for vibration control systems with dead-zone band and time delay under severe disturbance using adaptive fuzzy neural network. J. Frankl. Inst. 2020, 357, 12281–12307. [Google Scholar] [CrossRef]
- Phu, D.X.; Mien, V.; Tu, P.; Nguyen, N.P.; Choi, S.B. A new optimal sliding mode controller with adjustable gains based on Bolza–Meyer criterion for vibration control. J. Sound Vib. 2020, 485, 115542. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D. Playing Atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Schulman, J.; Levine, S.; Moritz, P. Trust region policy optimization. In Proceedings of the 2015 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Ye, D.; Liu, Z.; Sun, M. Mastering complex control in MOBA games with deep reinforcement learning. AAAI Conf. Artif. Intell. 2020, 34, 6672–6679. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.P. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Le, Z.; Yza, B.; Xin, Z.A. Image captioning via proximal policy optimization. Image Vis. Comput. 2021, 108, 104126. [Google Scholar]
- Sadhukhan, P.; Selmic, R.R. Multi-agent formation control with obstacle avoidance using proximal policy optimization. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 2694–2699. [Google Scholar]
- Tognetti, S.; Savaresi, S.M.; Spelta, C. Batch reinforcement learning for semi-active suspension control. In Proceedings of the 2009 IEEE Control Applications, (CCA) and Intelligent Control, St. Petersburg, Russia, 8–10 July 2009; pp. 582–587. [Google Scholar]
- Bucak, I.O.; Oez, H.R. Vibration control of a nonlinear quarter-car active suspension system by reinforcement learning. Int. J. Syst. Sci. 2012, 43, 1177–1190. [Google Scholar] [CrossRef]
- Li, Z.; Chu, T.; Kalabic, U. Dynamics-enabled safe deep reinforcement learning: Case study on active suspension control. In Proceedings of the IEEE Conference on Control Technology and Applications (CCTA), Hong Kong, China, 19–21 August 2019; Volume 19, pp. 585–591. [Google Scholar]
- Zhao, F.; Jiang, P.; You, K.; Song, S.; Zhang, W.; Tong, L. Setpoint tracking for the suspension system of medium-speed maglev trains via reinforcement learning. In Proceedings of the 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 16–19 July 2019; pp. 1620–1625. [Google Scholar]
- Liu, M.; Li, Y.; Rong, X. Semi-active suspension control based on deep reinforcement learning. IEEE Access 2020, 8, 9978–9986. [Google Scholar]
- Technical Committee ISO/TC, Mechanical Vibration, Shock. Subcommittee SC2 Measurement, Mechanical vibration-road surface profiles-reporting of measured data. Int. Organ. Stand. 1995, 8608. Available online: https://kns.cnki.net/kcms/detail/detail.aspx?FileName=SCSF00013909&DbName=SCSF (accessed on 1 January 2006).
- Zhang, Y.; Zhang, J. Numerical simulation of stochastic road process using white noise filtration. Mech. Syst. Signal Process. 2006, 20, 363–372. [Google Scholar]
- Nawathe, P.R.; Shire, H.; Wable, V. Simulation of passive suspension system for improving ride comfort of vehicle. Int. J. Manag. Technol. Eng. 2018, 8, 401–411. [Google Scholar]
- Du, W.; Ding, S. A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artif. Intell. Rev. 2021, 54, 3215–3238. [Google Scholar] [CrossRef]
- Obando-Ceron, J.S.; Castro, P.S. Revisiting rainbow: Promoting more insightful and inclusive deep reinforcement learning research. In Proceedings of the 2021 International Conference on Machine Learning (ICML), Online, 18–24 July 2021; pp. 1373–1383. [Google Scholar]
- Yoo, H.; Kim, B.; Kim, J.W. Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation. Comput. Chem. Eng. 2021, 144, 107133. [Google Scholar] [CrossRef]
- Zimmer, M.; Weng, P. Exploiting the sign of the advantage function to learn deterministic policies in continuous domains. In Proceedings of the 2019 International Joint Conferences on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 4496–4502. [Google Scholar]
- Zhang, H.; Bai, S.; Lan, X. Hindsight trust region policy optimization. In Proceedings of the 2019 International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 19–27 August 2021; pp. 3335–3341. [Google Scholar]
- Qu, Q.; Qi, M.; Gong, R. An improved enhanced fireworks algorithm based on adaptive explosion amplitude and levy flight. Eng. Lett. 2020, 28, 1348–1357. [Google Scholar]
Road Level | Road Unevenness Coefficient | ||
---|---|---|---|
Lower Limit | Geometric Mean | Upper Limit | |
A | 8 | 16 | 32 |
B | 32 | 64 | 128 |
C | 128 | 256 | 512 |
D | 512 | 1024 | 2048 |
E | 2048 | 4096 | 8192 |
F | 8192 | 16,384 | 32,768 |
G | 32,768 | 65,536 | 131,072 |
H | 131,072 | 262,144 | 524,288 |
Procedure: PPO-semi-active suspension |
---|
The on-board sensor is used to collect road information in time for the semi-active suspension system to obtain the |
suspension response performance . |
Randomly initialize the actor network and critic network with weights and . |
Initialize the actor target network and critic target network: , . |
for do |
Generate a set of trajectories by running policy in the environment. |
Sample the data information in the memory network, and calculate the quality of the corresponding state action pair according to the reward function R. |
Compute the advantage function based on the current value function |
. |
Update the policy parameter by maximizing the PPO-clip objective: |
Update the value parameter by minimizing the advantage function: |
Soft update the target network parameters: |
end for |
Parameters | Symbols | Unit | Light | Heavy |
---|---|---|---|---|
Sprung mass | (kg) | 167.8 | 1465.4 | |
Unsprung mass | (kg) | 22.3 | 122.3 | |
Sprung stiffness | (N/m) | 11,530 | 11,530 | |
Tire stiffness | (N/m) | 115,300 | 315,300 | |
Damping coefficient of suspension | (N*m/s) | 278.4 | 4670.4 |
Hyperparameters | Symbols | Values |
---|---|---|
Actor learning rate | 0.0002 | |
Critic learning rate | 0.001 | |
Discount factor | 0.98 | |
Clip parameter | 0.2 | |
GAE parameter | 0.9 | |
Batch size | 64 | |
Time steps per batch | 600 |
Road Level | Body Acceleration | Suspension Deflection | Tire Dynamic Load | |||
---|---|---|---|---|---|---|
Fuzzy | PPO | Fuzzy | PPO | Fuzzy | PPO | |
C | 26.4% | 61.8% | 6.1% | 47.8% | −18% | 28.8% |
D | 26.3% | 59.1% | 27% | 45.5% | 13.3% | 28% |
E | 26.4% | 59.3% | 27.1% | 46.5% | 13.3% | 28.8% |
Road Level | Body Acceleration | Suspension Deflection | Tire Dynamic Load | ||||||
---|---|---|---|---|---|---|---|---|---|
Passive | Fuzzy | PPO | Passive | Fuzzy | PPO | Passive | Fuzzy | PPO | |
C | 0.5896 | 0.4735 | 0.3492 | 0.1275 | 0.06218 | 0.05513 | 12.71 | 9.624 | 11.93 |
D | 0.8338 | 0.6844 | 0.6614 | 0.1424 | 0.08794 | 0.07839 | 18.77 | 16.86 | 13.61 |
E | 1.179 | 1.058 | 0.9787 | 0.1671 | 0.1244 | 0.1109 | 25.45 | 24.23 | 19.25 |
Percentage of Body Acceleration | B | C | D |
---|---|---|---|
Fuzzy | 18.1% | 3.8% | 5% |
PPO | 44.4% | 31.6% | 10% |
Body Acceleration | B | C | D |
---|---|---|---|
Passive | 0.218 | 0.3083 | 0.436 |
Fuzzy | 0.1973 | 0.3023 | 0.425 |
PPO | 0.1625 | 0.255 | 0.4147 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, S.-Y.; Liang, T. Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO Approach. Appl. Sci. 2022, 12, 3078. https://doi.org/10.3390/app12063078
Han S-Y, Liang T. Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO Approach. Applied Sciences. 2022; 12(6):3078. https://doi.org/10.3390/app12063078
Chicago/Turabian StyleHan, Shi-Yuan, and Tong Liang. 2022. "Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO Approach" Applied Sciences 12, no. 6: 3078. https://doi.org/10.3390/app12063078