TY - JOUR
T1 - Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning
AU - Xu, Xinli
AU - Cai, Peng
AU - Ahmed, Zahoor
AU - Yellapu, Vidya Sagar
AU - Zhang, Weidong
N1 - Funding Information:
Weidong Zhang received his BS, MS, and PhD degrees from Zhejiang University, China, in 1990, 1993, and 1996, respectively. He joined Shanghai Jiao Tong University in 1998 as an Associate Professor and has been a Full Professor since 1999. From 2003 to 2004 he worked at the University of Stuttgart, Germany, as an Alexander von Humboldt Fellow. He is a recipient of National Science Fund for Distinguished Young Scholars of China and Cheung Kong Scholars Program. Presently he is Director of the Engineering Research Center of Marine Automation, Shanghai Municipal Education Commission. His research interests include control theory and pattern recognition theory and their applications in several fields, including power/chemical processes and USV/UAV/AUV. He is the author of 1 book and more than 300 refereed papers, and holds 61 patents. The Quantitative Control Theory he presented has been widely applied in 35 different backgrounds by more than 30 groups.
Funding Information:
This paper is partly supported by the Key R&D Program of Hainan (ZDYF2021GXJS041) and the Key R&D Program of Guangdong(2020B1111010002).
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2022/1/11
Y1 - 2022/1/11
N2 - As one of the core technologies of the automatic control system for unmanned surface vehicles (USVs), autonomous collision avoidance algorithm is the key to ensure the safe navigation of USVs. In this paper, path planning and dynamic collision avoidance (PPDC) algorithm which obeys COLREGs is proposed for USVs. In order to avoid unnecessary collision avoidance actions, the risk assessment model is developed, which is used to determine the switching time of path planning and dynamic collision avoidance. In order to train the algorithm which complies with the COLREGs, the encounter situation is divided quantitatively, which is regarded as the input state of the system, so that the high-dimensional input is successfully avoided. The state space of the USV is defined by relative parameters to improve the generalization ability of the algorithm, meanwhile, a network structure based on DDPG is designed to achieve the continuous output of thrust and rudder angle. Combined with path planning, collision avoidance, compliance with COLREGs and smooth arrival task, four kinds of reward functions are designed. In order to solve the problem of low training efficiency of experience replay mechanism in DDPG, cumulative priority sampling mechanism is proposed. Through the simulation and verification in a variety of scenarios, it is proved that PPDC algorithm has the function of path planning and dynamic collision avoidance in compliance with COLREGs, which has good real-time performance and security.
AB - As one of the core technologies of the automatic control system for unmanned surface vehicles (USVs), autonomous collision avoidance algorithm is the key to ensure the safe navigation of USVs. In this paper, path planning and dynamic collision avoidance (PPDC) algorithm which obeys COLREGs is proposed for USVs. In order to avoid unnecessary collision avoidance actions, the risk assessment model is developed, which is used to determine the switching time of path planning and dynamic collision avoidance. In order to train the algorithm which complies with the COLREGs, the encounter situation is divided quantitatively, which is regarded as the input state of the system, so that the high-dimensional input is successfully avoided. The state space of the USV is defined by relative parameters to improve the generalization ability of the algorithm, meanwhile, a network structure based on DDPG is designed to achieve the continuous output of thrust and rudder angle. Combined with path planning, collision avoidance, compliance with COLREGs and smooth arrival task, four kinds of reward functions are designed. In order to solve the problem of low training efficiency of experience replay mechanism in DDPG, cumulative priority sampling mechanism is proposed. Through the simulation and verification in a variety of scenarios, it is proved that PPDC algorithm has the function of path planning and dynamic collision avoidance in compliance with COLREGs, which has good real-time performance and security.
KW - COLREGs
KW - Cumulative priority sampling mechanism
KW - DDPG
KW - Dynamic collision avoidance
KW - Unmanned surface vehicle
UR - http://www.scopus.com/inward/record.url?scp=85118789415&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2021.09.071
DO - 10.1016/j.neucom.2021.09.071
M3 - Article
AN - SCOPUS:85118789415
VL - 468
SP - 181
EP - 197
JO - Neurocomputing
JF - Neurocomputing
SN - 0925-2312
ER -