CN119270864B

CN119270864B - Automatic collision avoidance methods, devices, equipment and storage media for ships

Info

Publication number: CN119270864B
Application number: CN202411391893.7A
Authority: CN
Inventors: 钟升贵; 华勋钦; 方太猛
Original assignee: Shenzhen Yunchuang Youyi Technology Co ltd
Current assignee: Shenzhen Yunchuang Youyi Technology Co ltd
Priority date: 2024-10-08
Filing date: 2024-10-08
Publication date: 2026-01-30
Anticipated expiration: 2044-10-08
Also published as: CN119270864A

Abstract

This invention belongs to the field of ship automation control and provides a method, device, equipment, and storage medium for automatic collision avoidance of ships. The method includes: constructing a collision avoidance model based on deep reinforcement learning training to formulate avoidance decisions; when a grid sensor detects other ships, acquiring the state information of the other ships and calculating the collision risk area based on the state information; when the collision risk area overlaps with the sensing area of the grid sensor, determining whether the ship is a avoidant or a hold-off ship; if it is a hold-off ship, maintaining its course and speed; if it is an avoidant ship, calculating an avoidance decision; after executing the avoidance decision, updating the state information of the ship and other ships in the environmental information, repeating the process until the distance between the ship and other ships is greater than a threshold or less than a threshold and there is no collision risk; this invention aims to solve the problems of traditional ship collision avoidance technology being unsuitable for complex sea conditions and having high computational complexity.

Description

Ship automatic avoidance anti-collision method, device, equipment and storage medium

Technical Field

The invention relates to the field of automatic control of ships, in particular to an automatic ship collision avoidance method, device and equipment and a storage medium.

Background

With the increasing global mass transit, mass transit is becoming increasingly congested. From the statistics, 89% to 95% of offshore accidents are due to human factors. This highlights the risk of crews making mistakes in the decision making process in complex navigational environments. Therefore, the reduction of marine accidents, particularly the realization of ship collision prevention operation by an automatic means, becomes a key research direction in the field of ocean engineering. Traditional model-based collision avoidance methods, such as an artificial potential field method, a speed obstacle method, an a-algorithm and a particle swarm optimization algorithm, although performing well in specific scenes, have limitations in coping with complex and variable actual sea conditions. The method is complex in model, high in calculation cost, and difficult to adapt to dynamically-changed environments, and lacks of self-learning capability.

Disclosure of Invention

In view of the technical problems, the invention provides an automatic ship collision avoidance method, device, equipment and storage medium, and aims to solve the problems that a traditional ship collision avoidance technology is difficult to adapt under complex sea conditions and has high calculation complexity.

Other features and advantages of the invention will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the invention, a ship automatic avoidance anti-collision method is disclosed, and the method comprises the following steps:

Establishing an anti-collision model based on deep reinforcement Learning training to make an avoidance decision, wherein the anti-collision model comprises an input layer, a decision layer, an action layer and a reward layer, the input layer comprises a grid sensor, the grid sensor is used for detecting ship self and surrounding environment information to serve as input of the anti-collision model, the decision layer comprises a Q-Learning algorithm, the Q-Learning algorithm is used for determining the optimal avoidance decision according to the input of the input layer, the action layer is used for adjusting the speed and the heading of the ship according to the avoidance decision, the reward layer is used for updating output of the decision layer and comprises a target driving reward, a heading deviation reward, a collision avoidance reward and a regular reward, the target driving reward is a ship heading navigation reward, the heading deviation reward is a ship heading navigation reward when the ship is in navigation, the collision avoidance reward is a ship steering right collision reward, and the rule is a ship steering right collision avoidance reward;

acquiring state information of other ships when the grid sensor senses the other ships, wherein the state information comprises speed and heading, and calculating a collision risk occurrence area based on the state information;

When the collision risk occurrence area is overlapped with the sensing area of the grid sensor, judging whether the collision risk occurrence area is an avoidance ship or a holding ship, if the collision risk occurrence area is the holding ship, keeping the heading and the speed unchanged, if the collision risk occurrence area is the avoidance ship, inputting state information of other ships into the anti-collision model to calculate the avoidance decision, updating the state information of the collision risk occurrence area and the other ships into the environment information after the decision layer executes the avoidance decision, and repeatedly executing until the distance between the ship and the other ships is larger than a threshold value or the distance between the ship and the other ships is smaller than the threshold value and no collision risk exists.

Further, in calculating the collision risk occurrence region, the following is calculated:

Where OS is the vessel itself, TS is the other vessel, a is the safe passing distance, a=arcsin (r _S/d), d is the distance between the vessel and the target vessel, V _O is the speed of the vessel, V _T is the speed of the target vessel, a _Z is the azimuth from the vessel to the target vessel location, C _T is the heading of the target vessel, and when the vessel is on the collision course, the relative motion of C _O is as follows:

V _R and C _R are the relative speed and heading of the target vessel with respect to the vessel, respectively, and then calculate the predicted approach time and the distance at approach for the two vessels:

DCPA=d|sin(C_R-A_Z+π)|;

DCPA is the distance at approach, TCPa is the approach time.

Further, when the collision risk occurrence area overlaps with the sensing area of the grid sensor, the component having the largest overlapping area of the state vectors is set to 1, the non-overlapping setting is set to 0, and when the sensing area of the grid sensor overlaps with the collision risk occurrence area, the state vector closest to the ship itself is set to 1.

Further, the target driving rewards and the course deviation rewards work in a target driving mode, and when the collision risk occurrence area is not overlapped with the sensing area of the grid sensor, the ship keeps the course to drive to the target point.

Further, the collision avoidance rewards and the regular rewards work in a collision avoidance mode, and when the collision risk occurrence area is overlapped with the sensing area of the grid sensor, the Q value of the decision layer is updated based on the collision avoidance rewards and the regular rewards.

Based on a second aspect of the present invention, there is provided an automatic collision avoidance device for a ship, the device comprising:

The system comprises a model generation module, a decision layer and a reward layer, wherein the model generation module is used for constructing an anti-collision model based on deep reinforcement Learning training to make an avoidance decision, the anti-collision model comprises an input layer, a decision layer, an action layer and a reward layer, the input layer comprises a grid sensor, the grid sensor is used for detecting the ship and surrounding environment information to serve as input of the anti-collision model, the decision layer comprises a Q-Learning algorithm, the Q-Learning algorithm is used for determining the optimal avoidance decision according to the input of the input layer, the action layer is used for adjusting the speed and the heading of the ship according to the avoidance decision, the reward layer is used for updating output of the decision layer and comprises target driving rewards, deviation rewards, collision avoidance rewards and regular rewards, the target driving rewards are rewards when the ship is in navigation, the heading deviation rewards are rewards when the ship is in navigation direction towards the target point of the ship, and the reward when the ship is in collision avoidance, and the reward is adopted by the ship to avoid collision;

the risk calculation module is used for acquiring state information of other ships when the grid sensor senses the other ships, wherein the state information comprises speed and heading, and calculating a collision risk occurrence area based on the state information;

The collision risk generation module is used for judging whether the collision risk generation area is a collision avoidance ship or a maintenance ship when the collision risk generation area is overlapped with the sensing area of the grid sensor, if the collision risk generation area is the maintenance ship, the course and the speed are kept unchanged, if the collision risk generation area is the collision avoidance ship, the state information of other ships is input into the anti-collision model to calculate the collision decision, after the decision layer executes the collision decision, the state information of the collision risk generation module and the collision risk generation module are updated into the environment information, and the collision risk generation module repeatedly executes the collision risk generation module until the distance between the collision risk generation module and the ship is larger than a threshold value or the distance between the collision risk generation module and the other ships is smaller than the threshold value.

According to a third aspect of the present disclosure there is provided a marine vessel automatic avoidance collision avoidance apparatus comprising a processor, and a memory arranged to store computer executable instructions which, when executed, cause the processor to implement a marine vessel automatic avoidance collision avoidance method as described above;

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium storing a computer program which when executed by a processor implements a ship automatic avoidance collision avoidance method as described above.

The technical scheme of the present disclosure has the following beneficial effects:

The method quantifies the dimension of the observed state, avoids the problem of overhigh dimension of the state space, reduces the computational complexity, improves the execution efficiency of the model, has simple structure and low input dimension, and ensures that the model has quick real-time execution capability in a complex environment. In particular, by setting the risk collision occurrence region, the complexity of the state space is reduced, and the convergence speed of the reinforcement learning algorithm is improved.

By designing a simplified navigation situation judging method, the model can distinguish between avoiding ships and holding ships, extra model complexity is not required to be added, and generalization capability and practicability of the model are improved.

Drawings

FIG. 1 is a flow chart of a method for automatically avoiding collision of a ship in an embodiment of the present disclosure;

fig. 2 is a block diagram of a ship automatic avoidance anti-collision device according to an embodiment of the present disclosure;

FIG. 3 is an apparatus for performing a method for automatically avoiding collision of a ship in an embodiment of the present disclosure;

Fig. 4 is a computer readable storage medium storing a ship automatic avoidance collision avoidance method in an embodiment of the present description.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, systems, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are only schematic illustrations of the present disclosure. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor systems and/or microcontroller systems.

As shown in fig. 1, an embodiment of the present disclosure provides a method for preventing collision of a ship during automatic avoidance, where the method is applied to a ship communication system. The method specifically comprises the following steps S101-S103:

In step S101, an anti-collision model is built based on deep reinforcement Learning training to make an avoidance decision, the anti-collision model includes an input layer, a decision layer, an action layer and a reward layer, the input layer includes a grid sensor, the grid sensor is used for detecting information of the ship and surrounding environment to serve as input of the anti-collision model, the decision layer includes a Q-Learning algorithm, the Q-Learning algorithm is used for determining an optimal avoidance decision according to the input of the input layer, the action layer is used for adjusting speed and heading of the ship according to the avoidance decision, the reward layer is used for updating output of the decision layer, and the reward layer includes a target driving reward, a deviation reward, a collision avoidance reward and a regular reward, the target driving reward is a target point navigation reward, the heading deviation reward is a ship heading target point navigation reward, the collision avoidance reward is a ship collision avoidance reward, and the regular reward is a ship collision avoidance reward.

The input layer uses a grid sensor as an input tool, wherein the grid sensor is a virtual sensor, can detect the state of the ship and the information of the surrounding environment, and takes the information as the input of a model. The Q-Learning algorithm of the decision layer is used to process the information of the input layer and determine the optimal avoidance decision, Q-Learning is a model-free reinforcement Learning algorithm that selects the optimal action by Learning an action cost function (Q-function). And the action layer is responsible for adjusting the speed and the course of the ship according to the avoidance decision of the decision layer, namely executing the avoidance action. The role of the bonus layer is to provide feedback to the decision layer to update and optimize its output, in particular the bonus layer consists of the following parts:

The target drive reward (Goal Reward) is obtained when the vessel is sailing towards the target point, encouraging the vessel to advance towards the predetermined target.

Heading bias rewards (Heading-Error Reward) are obtained when the Heading of the vessel is consistent with the Heading of the target point, promoting the vessel to maintain the correct Heading.

Collision avoidance rewards (Collision Avoidance Reward) are obtained when a vessel successfully avoids collisions with other vessels, encouraging the vessel to take avoidance actions.

The rule rewards (COLREGs Reward) are obtained according to the international maritime collision avoidance rule (COLREGs) when the ship mainly takes action of turning right when needing to avoid.

In the automatic ship avoidance process, the model receives input from the grid sensor in real time, and the Q-Learning algorithm evaluates various possible avoidance strategies according to the input and selects the strategy with the maximum expected return. The action layer then adjusts the heading and speed of the vessel according to this strategy to avoid collisions. Meanwhile, the reward layer can provide corresponding reward signals according to the behavior of the ship and the avoidance result, and the signals can be fed back to the decision layer for adjusting and improving future avoidance decisions.

In the above step, the model is designed to take action tending to take starboard turn when needed to avoid, which is in compliance with the regulations of COLREGs, taking into account compliance with international offshore collision avoidance rules (COLREGs). The model is trained in various complex multi-ship meeting scenes to improve generalization capability and self-learning capability.

In step S102, when the grid sensor senses other ships, status information of the other ships is acquired, wherein the status information includes speed and heading, and a collision risk occurrence area is calculated based on the status information.

The grid sensor is used as a part of the input layer and can sense and acquire state information of other ships nearby. These state information mainly include Speed (Speed), which refers to the current Speed of other vessels, which helps to predict their future position, and Heading (Heading), which refers to the current Heading of other vessels, which is also crucial to predicting their sailing path. With the status information of other vessels, a collision risk occurrence area is calculated based on these data, which is mainly determined by the current position and heading of the vessel, the speed of the vessel and the expected path of travel, the safe passing distance, which is the minimum distance that should be maintained between vessels in order to avoid collisions. By calculating the risk occurrence area, the possibility of collision of the ship with the target ship at a certain point in the future can be evaluated, in particular with respect to analyzing the relative movements of the two ships, including their relative speeds and relative heading. If the risk occurrence area intersects the detection range of the grid sensor, which indicates that there is a risk of collision, the severity and urgency of the collision will be evaluated depending on the specific location of the risk occurrence area and the relative location with other vessels. During the calculation, the calculated collision risk area is provided as input to a decision layer, in particular to the Q-Learning algorithm, in order to make a corresponding avoidance decision. These decisions may include changing heading or speed to avoid entering the risk-occurrence area and reduce the likelihood of collisions. During the navigation of the ship, the grid sensor continuously monitors the surrounding environment and updates the state information of other ships in real time. This means that the calculation of the collision risk area will also be a dynamic process, continuously adjusting with environmental changes

In step S103, when the collision risk occurrence area overlaps with the sensing area of the grid sensor, whether the collision risk occurrence area is an avoidance ship or a holding ship is determined, if the collision risk occurrence area overlaps with the sensing area of the grid sensor, the collision risk occurrence area is a holding ship, the heading and the speed are kept unchanged, if the collision risk occurrence area is an avoidance ship, the state information of other ships is input into the anti-collision model to calculate the collision decision, after the decision making is performed by the decision making layer, the state information of the collision risk occurrence area and the state information of other ships are updated into the environment information, and the collision risk occurrence area are repeatedly performed until the distance between the collision risk occurrence area and the other ships is greater than a threshold value or less than the threshold value and no collision risk exists.

When the grid sensor detects that the collision risk area is overlapped with the sensing area, the system firstly needs to judge whether the ship is an avoidance ship or a maintenance ship, and the collision risk area is determined based on the international offshore collision avoidance rule and other relevant factors, namely, the collision risk area is similar to the road traffic rule, the system turns to avoid going straight, and the like. If the host vessel is judged to be holding the vessel, the current heading and speed will be kept unchanged, meaning that in the current situation the host vessel does not need to take evasive action. If the ship is judged to be an avoidance ship, state information (such as speed and heading) of other ships needs to be input into the anti-collision model, and the information is used for calculating an avoidance decision. Then, the decision layer of the anti-collision model, in particular the Q-Learning algorithm, will calculate the optimal avoidance decision based on the entered state information, including changing heading, adjusting speed or other avoidance actions, which the action layer will perform, adjusting the heading and/or speed of the ship to avoid collisions. After the avoidance maneuver is performed, the status information of the vessel and surrounding vessels, including their position, velocity and heading, is updated and fed back into the environmental information, and the relative position and status between vessels is continuously monitored and the process is repeated, i.e. the process is a dynamic cycle, until either of the following conditions is met:

The distance between the ship and other ships is larger than a certain preset safety threshold, and no collision risk is considered;

the distance between the ship and other ships is smaller than a threshold value, but no collision risk is ensured through the avoidance action.

The purpose of step S103 is to ensure that in a multi-ship environment, the ship can effectively identify the collision risk and make a corresponding avoidance decision, so as to ensure the safety of the ship. Through the continuous risk assessment and decision process, the automatic ship avoidance system is ensured to make timely and effective reactions in the continuously-changing marine environment, and the occurrence of marine traffic accidents is reduced.

In one embodiment, in calculating the collision risk occurrence region, the following is calculated:

DCPA=d|sin(C_R-A_Z+π)|;

DCPA is the distance at approach and TCPA is the approach time.

The calculation result is used for evaluating whether the two ships are on the potential collision airlines or not and helping a decision layer to determine whether avoidance actions need to be taken or not. If the DCPA is less than the safe distance, or the TCPA indicates that the vessels will approach this in a short time, avoidance measures need to be taken. These calculations provide a quantitative risk assessment for automatic avoidance of the vessel, enabling more accurate and timely avoidance decisions to be made.

In one embodiment, when the collision risk occurrence area overlaps with the sensing area of the grid sensor, the component having the largest overlapping area of the state vectors is set to 1, the non-overlapping setting is set to 0, and when the sensing area of the grid sensor overlaps with the collision risk occurrence area, the state vector closest to the ship itself is set to 1.

Wherein when the collision risk occurrence area overlaps with the sensing area of the grid sensor, a specific method is used to update the state vector for use in the anti-collision model. For example, the state vector is a fixed dimension vector for representing the surrounding environment of the ship, all components of the state vector are initialized to 0 when no collision risk is detected, and when the grid sensor senses the collision risk occurrence area, it is checked whether the collision risk occurrence area overlaps with each sensing area (grid cell) of the grid sensor. If the collision risk occurrence area overlaps a certain grid cell and the overlapping area of this grid cell is the largest of all overlapping grid cells, the state vector component corresponding to this grid cell will be set to 1, which means that the grid cell detects the largest collision risk. If the collision risk occurrence area does not overlap any grid cells, the state vector remains at 0, indicating that no collision risk is currently detected. When the collision risk occurrence area overlaps with the plurality of grid cells, the grid cell closest to the ship itself is selected. The state vector component for this grid cell is set to 1, while the other overlapping grid cell corresponding components remain 0 or are unselected. The update of the state vector reflects the collision risk situation in the current environment. This updated state vector is then used as input to the collision avoidance model, and in particular to the Q-Learning algorithm of the decision layer, to calculate the avoidance decision. During the course of the ship's voyage, the grid sensor will continuously monitor the surrounding environment and repeat the above-described detection and update process. By the method, the anti-collision model can accurately identify and evaluate collision risks around the ship, and accordingly corresponding avoidance decisions are made, the arrangement of the state vector ensures that the model can focus on the most urgent collision risks, and the effectiveness and instantaneity of the avoidance decisions are improved.

In an embodiment, the target driving reward and the heading deviation reward work in a target driving mode, and when the collision risk occurrence area is not overlapped with the sensing area of the grid sensor, the ship keeps heading towards the target point.

Wherein in the target piloting mode the main task of the vessel itself is to navigate along a predetermined course to the target point, in which case the target drive reward and course bias reward play a role to ensure that the vessel maintains the correct course. In particular, the target drive rewards encourage the vessel to sail towards the target point, and as the vessel moves along the course towards the target point, a positive reward is obtained which helps to enhance the current sailing behaviour, while the magnitude of the reward is generally inversely proportional to the vessel to target point distance, i.e. the closer to the target point the greater the reward value. The heading deviation rewards are used for reducing the deviation between the current heading of the ship and the heading of the target point, if the heading of the ship is consistent with the heading of the target point, the ship obtains forward rewards, and the rewards are inversely proportional to the deviation angle between the current heading of the ship and the heading of the target point, namely, the smaller the heading deviation is, the larger the rewards are. The target driving mode means that when the collision risk occurrence area is not overlapped with the sensing area of the grid sensor, meaning that no collision risk is detected, the ship can safely continue to navigate to the target point, in which case the ship maintains the current heading and speed, and the target driving reward and the heading deviation reward act together to ensure that the ship navigates to the target point along the most direct and effective path. In the target driving mode, the vessel also needs to continuously monitor the surroundings in order to be able to switch rapidly to the avoidance mode when a new collision risk is detected. If the collision risk occurrence area overlaps with the sensing area of the grid sensor, the vessel will switch from the target driving mode to the avoidance mode, i.e. steps S101-103 described above are performed. Through the embodiment, the ship can effectively navigate to the target point while ensuring safety, and the target driving rewards and the course deviation rewards play a key role in the target driving mode, so that the ship can keep the correct course and course.

In an embodiment, the collision avoidance reward and the rule reward operate in a collision avoidance mode, and when the collision risk occurrence area overlaps with the sensing area of the grid sensor, the Q value of the decision layer is updated based on the collision avoidance reward and the rule reward.

When the collision risk occurrence area is overlapped with the sensing area of the grid sensor, namely, the possible collision risk is detected, the ship automatic avoidance system is switched to a collision avoidance mode. In this mode, collision avoidance rewards and rule rewards will be used to update the Q value in the decision layer, thereby guiding the avoidance decisions. In particular, the collision avoidance reward is activated when the vessel takes an avoidance action to reduce the risk of collision, and if the avoidance action is effective, the vessel will receive a forward reward, which may enhance the avoidance action, the magnitude of the reward may be related to the effectiveness, timeliness, or relative positions of other vessels. The rule rewards are used to encourage the vessel to comply with international maritime collision avoidance rules, for example, according to which the vessel should be properly turned to the starboard side when meeting to avoid collisions, and if the vessel's avoidance decisions and actions meet the international maritime collision avoidance rules requirements, additional forward rewards will be obtained. The Q value, or action cost function, is updated by the Q-Learning algorithm to represent the expected return of taking a particular action in a given state, and in collision avoidance mode, when a vessel takes an avoidance action, the Q value is updated according to collision avoidance rewards and rule rewards, thereby affecting future avoidance decisions. The decision-making layer selects the best avoidance maneuver based on the updated Q value, which includes changing heading, deceleration, or other avoidance strategies, and then the action layer performs these decisions, adjusting the speed and heading of the vessel to avoid collision with the target vessel. In the avoidance mode, the state and the surrounding environment of the ship are continuously monitored, the effectiveness of the avoidance action is ensured, and the avoidance action is adjusted according to the requirement. If the collision risk occurrence area no longer overlaps with the sensing area of the grid sensor or the collision risk is reduced to an acceptable level, switching back to the target driving mode. According to the method and the device, when collision risk is detected, the behavior of the collision risk can be automatically adjusted through reinforcement learning, appropriate avoidance measures are adopted, meanwhile, the marine collision avoidance rules are ensured to be complied with, and the sailing safety is improved

Based on the same thought, as shown in fig. 2, the embodiment of the present disclosure further provides an automatic ship collision avoidance device, where the device includes:

a model generation module 201, configured to construct an anti-collision model based on deep reinforcement Learning training to make an avoidance decision, where the anti-collision model includes an input layer, a decision layer, an action layer, and a reward layer, where the input layer includes a grid sensor, the grid sensor is configured to detect information of a ship and surrounding environment as input of the anti-collision model, the decision layer includes a Q-Learning algorithm, the Q-Learning algorithm is configured to determine an optimal avoidance decision according to the input of the input layer, the action layer is configured to adjust a speed and a heading of the ship according to the avoidance decision, and the reward layer is configured to update an output of the decision layer, and includes a target driving reward, a bias reward, a collision avoidance reward, and a rule reward, where the target driving reward is a target point navigation reward, the heading bias reward is a ship heading target point navigation reward, and the rule reward is a ship reward when the ship avoids a collision and the ship adopts a steering to avoid collision;

A risk calculation module 202, configured to obtain status information of other vessels when the grid sensor senses the other vessels, where the status information includes a speed and a heading, and calculate a collision risk occurrence area based on the status information;

And the avoidance module 203 is configured to determine whether the collision risk occurrence area overlaps with the sensing area of the grid sensor, if so, keep the heading and speed unchanged, and if so, input the state information of other vessels into the anti-collision model to calculate the avoidance decision, and update the state information of the collision risk occurrence area and the other vessels into the environmental information after the decision layer executes the avoidance decision, and repeatedly execute the operation until the distance between the collision risk occurrence area and the other vessels is greater than a threshold value or the distance between the collision risk occurrence area and the other vessels is less than the threshold value and no collision risk exists.

The device has the following advantages:

The method quantifies the dimension of the observed state, avoids the problem of overhigh dimension of the state space, reduces the computational complexity, improves the execution efficiency of the model, has simple structure and low input dimension, and ensures that the model has quick real-time execution capability in a complex environment. In particular, by setting the risk collision occurrence region, the complexity of the state space is reduced, and the convergence speed of the reinforcement learning algorithm is improved. By designing a simplified navigation situation judging method, the model can distinguish between avoiding ships and holding ships, extra model complexity is not required to be added, and generalization capability and practicability of the model are improved.

Based on the same thought, the embodiment of the specification also provides an automatic ship collision avoidance device, as shown in fig. 3.

The ship automatic avoidance anti-collision device can be the terminal device or the server provided by the embodiment.

The ship automatic collision avoidance device may have a relatively large difference due to different configurations or performances, and may include one or more processors 301 and a memory 302, where the memory 302 may store one or more storage applications or data. The memory 302 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) units and/or cache memory units, and may further include read-only memory units. The application programs stored in memory 302 may include one or more program modules (not shown) including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Still further, the processor 301 may be configured to communicate with the memory 302 to execute a series of computer executable instructions in the memory 302 on the marine automatic avoidance collision avoidance device. The marine auto-avoidance collision avoidance device may also include one or more power sources 303, one or more wired or wireless network interfaces 304, one or more I/O interfaces (input output interfaces) 305, one or more external devices 306 (e.g., keyboard, pointing device, bluetooth device, etc.), and may also communicate with one or more devices that enable a user to interact with the device, and/or with any device that enables the device to communicate with one or more other computing devices (e.g., routers, modems, etc.). Such communication may occur through the I/O interface 305. Also, the device may communicate with one or more networks, such as a Local Area Network (LAN), via a wired or wireless interface 304.

In particular, in this embodiment, the automatic collision avoidance device for a ship includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions for the automatic collision avoidance device for a ship, and executing the one or more programs by the one or more processors includes computer executable instructions for:

Based on the same idea, exemplary embodiments of the present disclosure further provide a computer readable storage medium having stored thereon a program product capable of implementing the method described in the present specification. In some possible embodiments, the various aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the present disclosure as described in the "marine automatic avoidance collision avoidance method" section of this specification, when the program product is run on the terminal device.

Referring to fig. 4, a program product 400 for implementing the above-described method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of a readable storage medium include an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal system, or a network device, etc.) to perform the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for automatic collision avoidance by ships, characterized in that the method comprises:

A collision avoidance model is constructed based on deep reinforcement learning to make avoidance decisions. The collision avoidance model includes an input layer, a decision layer, an action layer, and a reward layer. The input layer includes a grid sensor used to detect information about the ship itself and its surrounding environment as input to the collision avoidance model. The decision layer includes a Q-Learning algorithm used to determine the optimal avoidance decision based on the input of the input layer. The action layer is used to adjust the ship's speed and heading based on the avoidance decision. The reward layer is used to update the output of the decision layer and includes a target-driven reward, a heading deviation reward, a collision avoidance reward, and a rule reward. The target-driven reward is the reward for the ship heading toward the target point. The heading deviation reward is the reward for the ship heading toward the target point. The collision avoidance reward is the reward for the ship avoiding a collision. The rule reward is the reward for the ship turning right to avoid a collision.

When the grid sensor detects other vessels, it acquires the status information of those vessels, including speed and heading. Based on this status information, it calculates the collision risk zone. The calculation of the collision risk zone is as follows:

;

In this context, OS represents the vessel itself, and TS represents other vessels. It is a safe distance to pass. , It is the distance between this ship and the target ship. That is the speed of this ship. It is the speed of the target ship. It is the azimuth angle from the position of this ship to the target ship. It refers to the target ship's course when the ship is on the collision path. The relative motion is as follows:

;

and These are the relative speed and heading of the target ship relative to our ship, respectively. Then, we calculate the estimated approach time and the distance at the time of approach:

;

It is the distance when approaching. It is close to time;

When the collision risk area overlaps with the sensing area of the grid sensor, it determines whether it is a avoidance vessel or a hold-off vessel. If it is a hold-off vessel, it maintains its course and speed. If it is an avoidance vessel, it inputs the status information of other vessels into the collision avoidance model to calculate the avoidance decision. After the avoidance decision is executed at the decision layer, the status information of itself and other vessels is updated into the environmental information. This process is repeated until the distance between the vessel and other vessels is greater than a threshold or less than a threshold and there is no collision risk.

2. The automatic collision avoidance method for ships according to claim 1, characterized in that, when the collision risk area overlaps with the sensing area of the grid sensor, the component with the largest overlapping area of the state vector is set to 1, and the component without overlap is set to 0; if the sensing area of the grid sensor overlaps with the collision risk area, the state vector closest to the ship itself is set to 1.

3. The automatic collision avoidance method for ships according to claim 1, characterized in that the target driving reward and the heading deviation reward operate in the target driving mode, and when the collision risk area does not overlap with the sensing area of the grid sensor, the ship itself maintains its heading and sails toward the target point.

4. The automatic collision avoidance method for ships according to claim 1, characterized in that the collision avoidance reward and rule reward operate in the collision avoidance mode, and when the collision risk area overlaps with the sensing area of the grid sensor, the Q value of the decision layer is updated based on the collision avoidance reward and rule reward.

5. An automatic collision avoidance device for ships, characterized in that the device comprises:

A model generation module is used to construct a collision avoidance model based on deep reinforcement learning training to formulate avoidance decisions. The collision avoidance model includes an input layer, a decision layer, an action layer, and a reward layer. The input layer includes a grid sensor, which is used to detect information about the ship itself and its surrounding environment as input to the collision avoidance model. The decision layer includes a Q-Learning algorithm, which is used to determine the optimal avoidance decision based on the input of the input layer. The action layer is used to adjust the ship's speed and heading based on the avoidance decision. The reward layer is used to update the output of the decision layer, which includes a target-driven reward, a heading deviation reward, a collision avoidance reward, and a rule reward. The target-driven reward is the reward when the ship is heading toward the target point. The heading deviation reward is the reward when the ship is heading toward the target point. The collision avoidance reward is the reward when the ship avoids a collision. The rule reward is the reward when the ship turns right to avoid a collision.

The risk calculation module is used to acquire the status information of other vessels when the grid sensor detects them, including speed and heading, and to calculate the collision risk area based on this status information. The calculation of the collision risk area is as follows:

;

It is the distance when approaching. It is close to time;

The collision avoidance module is used to determine whether it is a collision avoidance vessel or a hold-off vessel when the collision risk area overlaps with the sensing area of the grid sensor. If it is a hold-off vessel, it maintains its course and speed. If it is a collision avoidance vessel, it inputs the status information of other vessels into the collision avoidance model to calculate the collision avoidance decision. After the decision layer executes the collision avoidance decision, it updates the status information of itself and other vessels into the environmental information. This process is repeated until the distance between the vessel and other vessels is greater than a threshold or less than a threshold and there is no collision risk.

6. An automatic collision avoidance device for ships, characterized in that it comprises: a processor; and a memory arranged to store computer-executable instructions, wherein the executable instructions, when executed, cause the processor to:

A collision avoidance model is constructed based on deep reinforcement learning to make avoidance decisions. The collision avoidance model includes an input layer, a decision layer, an action layer, and a reward layer. The input layer includes a grid sensor used to detect information about the ship itself and its surrounding environment as input to the collision avoidance model. The decision layer includes a Q-Learning algorithm used to determine the optimal avoidance decision based on the input from the input layer. The action layer is used to adjust the ship's speed and heading based on the avoidance decision. The reward layer updates the output of the decision layer and includes a target-driven reward, a heading deviation reward, a collision avoidance reward, and a rule reward. The target-driven reward is the reward for the ship heading toward the target point. The heading deviation reward is the reward for the ship heading toward the target point. The collision avoidance reward is the reward for the ship avoiding a collision. The rule reward is the reward for the ship turning right to avoid a collision.

;

It is the distance when approaching. It is close to time;

7. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, it implements the automatic collision avoidance method for ships as described in any one of claims 1 to 4.