Reinforcement learning-driven continuous maneuvering decision system for maritime collision prevention using proximal deterministic policy gradient
Volume
77
Issue number
3
Article number
77316
Received
30 October 2025
Received in revised form
25 February 2026
Accepted
6 March 2026
Available online
20 March 2026
Authors
Xiao Yang1, Chunlei Wang1,*, Lei Zhou2, Haiyan Wang2, Fengying Wang2
1School of Information and Engineering, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
2Jiangsu Province Engineering Research Center of Smart Poultry Farming and Intelligent Equipment, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
Corresponding author email
Abstract
Continuous ship steering control is a highly nonlinear and complex task, as it is subject to wave and wind disturbances. It is also crucial for timely obstacle avoidance and effective vessel maneuvering. Reinforcement learning (RL) combined with deep neural networks (DNNs) has demonstrated significant potential in controlling systems with nonlinear dynamics, making it well-suited for decision-making and planning in such complex scenarios. However, existing research struggles to ensure optimal control performance. To address this limitation, this paper proposes an improved deep reinforcement learning approach based on the Pathwise Derivative Policy Gradient (PDPG) algorithm to enable intelligent collision avoidance for continuous ship steering. The proposed method leverages the MMG model as the foundation for learning a steering control strategy using DNNs, comprehensively considers various control actions, and evaluates steering performance through a dedicated evaluation network. To enhance the policy network’s representational capacity and balance exploration and exploitation, the PDPG algorithm’s policy network structure is optimized. Additionally, an adaptive exploration rate and a dynamic balancing algorithm for random strategies are introduced to fine-tune the exploration-exploitation trade-off. The improved method’s performance is verified through simulations of continuous ship steering control.
Keywords
Continuous ship steering control, Deep reinforcement learning, Pathwise Derivative Policy Gradient, MMG model, Policy network