Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning

Xinli Xu, Peng Cai, Zahoor Ahmed, Vidya Sagar Yellapu, Weidong Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

As one of the core technologies of the automatic control system for unmanned surface vehicles (USVs), autonomous collision avoidance algorithm is the key to ensure the safe navigation of USVs. In this paper, path planning and dynamic collision avoidance (PPDC) algorithm which obeys COLREGs is proposed for USVs. In order to avoid unnecessary collision avoidance actions, the risk assessment model is developed, which is used to determine the switching time of path planning and dynamic collision avoidance. In order to train the algorithm which complies with the COLREGs, the encounter situation is divided quantitatively, which is regarded as the input state of the system, so that the high-dimensional input is successfully avoided. The state space of the USV is defined by relative parameters to improve the generalization ability of the algorithm, meanwhile, a network structure based on DDPG is designed to achieve the continuous output of thrust and rudder angle. Combined with path planning, collision avoidance, compliance with COLREGs and smooth arrival task, four kinds of reward functions are designed. In order to solve the problem of low training efficiency of experience replay mechanism in DDPG, cumulative priority sampling mechanism is proposed. Through the simulation and verification in a variety of scenarios, it is proved that PPDC algorithm has the function of path planning and dynamic collision avoidance in compliance with COLREGs, which has good real-time performance and security.

Original languageEnglish
Pages (from-to)181-197
Number of pages17
JournalNeurocomputing
Volume468
Early online date8 Oct 2021
DOIs
Publication statusPublished - 11 Jan 2022
Externally publishedYes

Fingerprint

Dive into the research topics of 'Path planning and dynamic collision avoidance algorithm under COLREGs via deep reinforcement learning'. Together they form a unique fingerprint.

Cite this