Vol 77 No 3 77316 – Brodogradnja

Reinforcement learning-driven continuous maneuvering decision system for maritime collision prevention using proximal deterministic policy gradient

DOI

10.21278/brod77316

Volume

Issue number

Article number

77316

Received

30 October 2025

Received in revised form

25 February 2026

Accepted

6 March 2026

Available online

20 March 2026

Authors

Xiao Yang¹, Chunlei Wang^1,^*, Lei Zhou², Haiyan Wang², Fengying Wang²

¹School of Information and Engineering, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China

²Jiangsu Province Engineering Research Center of Smart Poultry Farming and Intelligent Equipment, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China

Corresponding author email

chunleiwang2022@163.com

Abstract

Continuous ship steering control is a highly nonlinear and complex task, as it is subject to wave and wind disturbances. It is also crucial for timely obstacle avoidance and effective vessel maneuvering. Reinforcement learning (RL) combined with deep neural networks (DNNs) has demonstrated significant potential in controlling systems with nonlinear dynamics, making it well-suited for decision-making and planning in such complex scenarios. However, existing research struggles to ensure optimal control performance. To address this limitation, this paper proposes an improved deep reinforcement learning approach based on the Pathwise Derivative Policy Gradient (PDPG) algorithm to enable intelligent collision avoidance for continuous ship steering. The proposed method leverages the MMG model as the foundation for learning a steering control strategy using DNNs, comprehensively considers various control actions, and evaluates steering performance through a dedicated evaluation network. To enhance the policy network’s representational capacity and balance exploration and exploitation, the PDPG algorithm’s policy network structure is optimized. Additionally, an adaptive exploration rate and a dynamic balancing algorithm for random strategies are introduced to fine-tune the exploration-exploitation trade-off. The improved method’s performance is verified through simulations of continuous ship steering control.

Keywords

Continuous ship steering control, Deep reinforcement learning, Pathwise Derivative Policy Gradient, MMG model, Policy network