[go: up one dir, main page]

CN118982176B - Ship task self-adaptive planning method and device based on deep reinforcement learning - Google Patents

Ship task self-adaptive planning method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN118982176B
CN118982176B CN202410991536.8A CN202410991536A CN118982176B CN 118982176 B CN118982176 B CN 118982176B CN 202410991536 A CN202410991536 A CN 202410991536A CN 118982176 B CN118982176 B CN 118982176B
Authority
CN
China
Prior art keywords
data information
planning
ship
information
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410991536.8A
Other languages
Chinese (zh)
Other versions
CN118982176A (en
Inventor
唐琳
李迅
应文威
周万宁
苏思
李勇
岳丽军
张宁燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unit 91977 Of Pla
Original Assignee
Unit 91977 Of Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unit 91977 Of Pla filed Critical Unit 91977 Of Pla
Priority to CN202410991536.8A priority Critical patent/CN118982176B/en
Publication of CN118982176A publication Critical patent/CN118982176A/en
Application granted granted Critical
Publication of CN118982176B publication Critical patent/CN118982176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • G06Q10/063112Skill-based matching of a person or a group to a task
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于深度强化学习的船舶任务自适应规划方法及装置,该方法包括:获取船舶任务规划数据信息;所述船舶任务规划数据信息包括历史数据信息和人员规划数据信息;对所述船舶任务规划数据信息进行预处理,得到预处理数据信息;利用所述预处理数据信息,对预设的深度强化学习规划模型进行训练,得到优化深度强化学习规划模型;利用所述优化深度强化学习规划模型,对待处理的船舶任务数据信息进行处理,得到船舶任务规划方案。本发明通过引入深度强化学习算法,能够更有效地处理这些复杂数据,从中提取出有价值的特征和模式,进而为任务规划提供更准确、更全面的数据支持。

The present invention discloses a method and device for adaptive planning of ship tasks based on deep reinforcement learning, the method comprising: obtaining ship task planning data information; the ship task planning data information comprises historical data information and personnel planning data information; preprocessing the ship task planning data information to obtain preprocessed data information; using the preprocessed data information, training a preset deep reinforcement learning planning model to obtain an optimized deep reinforcement learning planning model; using the optimized deep reinforcement learning planning model, processing the ship task data information to be processed to obtain a ship task planning scheme. By introducing a deep reinforcement learning algorithm, the present invention can more effectively process these complex data, extract valuable features and patterns therefrom, and thus provide more accurate and comprehensive data support for task planning.

Description

Ship task self-adaptive planning method and device based on deep reinforcement learning
Technical Field
The invention relates to the technical field of ship task planning, in particular to a ship task self-adaptive planning method and device based on deep reinforcement learning.
Background
With the increasing complexity and variability of modern environments, ships face unprecedented challenges in performing tasks. Traditional task planning methods, such as methods based on rules or simple machine learning, have been difficult to meet the adaptive requirements in complex environments. These methods often lack flexibility, are difficult to accommodate in rapidly changing environments and task requirements, and have limitations in processing high-dimensional, non-linear data. Therefore, it is important to develop an intelligent mission planning system that can fully utilize historical empirical data, adapt to environmental changes in real time, and provide individualized decision support for ship personnel.
The prior art has several significant drawbacks in marine mission planning that limit the effectiveness and adaptability of the system in complex and dynamic environments. Specifically:
The limited data processing capability, the traditional method is difficult to process high-dimensional, nonlinear and time sequence data, and valuable information cannot be fully extracted from abundant historical experience.
The lack of real-time adaptability-in the face of rapidly changing environmental and task conditions-existing systems often fail to adjust strategies in real-time, resulting in decision delays and inaccuracies.
The personalized support is insufficient, the individual difference and skill level of ship personnel are rarely considered in the prior art, and targeted decision support cannot be provided, so that the overall combat effectiveness is influenced.
Model updating is difficult-when new data or new conditions occur, traditional models may require retraining or extensive adjustments, which are time consuming and laborious in actual practice.
The collaboration capability is limited, and in tasks requiring multi-vessel or multi-department collaboration, the prior art often lacks efficient coordination and communication mechanisms.
Disclosure of Invention
The invention aims to solve the technical problem of providing a ship task self-adaptive planning method and device based on deep reinforcement learning, which are used for training and optimizing a decision model through a complex deep reinforcement learning algorithm by combining historical experience data and personnel data. By the method, not only can efficient task planning be realized, but also personalized action suggestions can be provided according to individual differences of personnel, so that the execution efficiency and success rate of the overall task are improved.
In order to solve the technical problems, a first aspect of the embodiments of the present invention discloses a method for adaptive planning of a ship task based on deep reinforcement learning, the method comprising:
S1, acquiring ship task planning data information, wherein the ship task planning data information comprises historical data information and personnel planning data information;
s2, preprocessing the ship task planning data information to obtain preprocessed data information;
s3, training a preset deep reinforcement learning planning model by utilizing the preprocessing data information to obtain an optimized deep reinforcement learning planning model;
And S4, processing the ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain a ship task planning scheme.
In a first aspect of the embodiment of the present invention, the preprocessing the ship mission planning data information to obtain preprocessed data information includes:
S21, carrying out data cleaning on the ship task planning data information to obtain cleaning data information;
s22, carrying out data conversion on the cleaning data information to obtain standard data information;
s23, extracting the characteristics of the standard data information to obtain the preprocessed data information.
In a first aspect of the embodiment of the present invention, the feature extracting the standard data information to obtain the preprocessed data information includes:
S231, extracting features of the standard data information to obtain first feature information;
S232, carrying out feature extraction on the standard data information to obtain second feature information;
S233, performing mapping interpolation processing on the first characteristic information to obtain first mapping characteristic information;
S234, performing mapping interpolation processing on the second characteristic information to obtain second mapping characteristic information;
S235, the first mapping characteristic information and the second mapping characteristic information are processed to obtain preprocessed data information.
In a first aspect of the embodiment of the present invention, the feature extracting the standard data information to obtain first feature information includes:
Performing feature extraction on the standard data information by using a preset first feature extraction model to obtain first feature information;
the first feature extraction model expression is:
Wherein C s (t, f) is first characteristic information, f is frequency, t is time, f (ζ, τ) represents kernel function, s (t) is input standard data information, τ is time shift, ζ is frequency shift, u is integral variable, Exp is an exponential function, σ is a constant, and x represents taking the conjugate.
In a first aspect of the embodiment of the present invention, the performing a mapping interpolation process on the first feature information to obtain first mapping feature information includes:
Carrying out mapping interpolation processing on the first characteristic information by using a preset mapping interpolation model to obtain first mapping characteristic information;
the preset mapping interpolation model expression is as follows:
Wherein M is first characteristic information, M 'is first mapping characteristic information, M ij is an element in M', M min is a minimum value in M, M max is a maximum value in M, i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to n, and M and n are the row and column lengths of the matrix M.
In a first aspect of the embodiment of the present invention, the processing the first mapping feature information and the second mapping feature information to obtain preprocessed data information includes:
S2351, processing the first mapping characteristic information to obtain an autocovariance matrix Sigma 11;
S2352, processing the second mapping characteristic information to obtain an autocovariance matrix Sigma 22;
S2353, processing the first mapping characteristic information and the second mapping characteristic information to obtain a first cross covariance matrix Sigma 12 and a second cross covariance matrix Sigma 21;
S2354, solving a preset optimization objective function to obtain transformation matrices W x and W y;
The preset optimization objective function is as follows:
Wherein, Q 1=Σ11,Q2=Σ22 is selected from the group consisting of, Beta, lambda and theta are coefficient constants, and T is a transposition;
S2355, the transformation matrices W x and W y are processed to obtain preprocessed data information.
In a first aspect of the embodiment of the present invention, the processing the ship task data information to be processed to obtain a ship task planning scheme by using the optimized deep reinforcement learning planning model includes:
s41, processing ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain decision information;
and S42, carrying out data processing on the decision information and the personnel planning data information to obtain a ship task planning scheme.
The second aspect of the embodiment of the invention discloses a ship task self-adaptive planning device based on deep reinforcement learning, which comprises the following components:
The information acquisition module is used for acquiring ship task planning data information, wherein the ship task planning data information comprises historical data information and personnel planning data information;
The preprocessing module is used for preprocessing the ship task planning data information to obtain preprocessed data information;
The model training module is used for training a preset deep reinforcement learning planning model by utilizing the preprocessing data information to obtain an optimized deep reinforcement learning planning model;
And the ship task planning module is used for processing the ship task data information to be processed by utilizing the optimized deep reinforcement learning planning model to obtain a ship task planning scheme.
In a second aspect of the embodiment of the present invention, the preprocessing the ship mission planning data information to obtain preprocessed data information includes:
S21, carrying out data cleaning on the ship task planning data information to obtain cleaning data information;
s22, carrying out data conversion on the cleaning data information to obtain standard data information;
s23, extracting the characteristics of the standard data information to obtain the preprocessed data information.
In a second aspect of the embodiment of the present invention, the feature extracting the standard data information to obtain the preprocessed data information includes:
S231, extracting features of the standard data information to obtain first feature information;
S232, carrying out feature extraction on the standard data information to obtain second feature information;
S233, performing mapping interpolation processing on the first characteristic information to obtain first mapping characteristic information;
S234, performing mapping interpolation processing on the second characteristic information to obtain second mapping characteristic information;
S235, the first mapping characteristic information and the second mapping characteristic information are processed to obtain preprocessed data information.
In a second aspect of the embodiment of the present invention, the feature extracting the standard data information to obtain first feature information includes:
Performing feature extraction on the standard data information by using a preset first feature extraction model to obtain first feature information;
the first feature extraction model expression is:
Wherein C s (t, f) is first characteristic information, f is frequency, t is time, f (ζ, τ) represents kernel function, s (t) is input standard data information, τ is time shift, ζ is frequency shift, u is integral variable, Exp is an exponential function, σ is a constant, and x represents taking the conjugate.
In a second aspect of the embodiment of the present invention, the performing a mapping interpolation process on the first feature information to obtain first mapping feature information includes:
Carrying out mapping interpolation processing on the first characteristic information by using a preset mapping interpolation model to obtain first mapping characteristic information;
the preset mapping interpolation model expression is as follows:
Wherein M is first characteristic information, M 'is first mapping characteristic information, M ij is an element in M', M min is a minimum value in M, M max is a maximum value in M, i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to n, and M and n are the row and column lengths of the matrix M.
In a second aspect of the embodiment of the present invention, the processing the first mapping feature information and the second mapping feature information to obtain preprocessed data information includes:
S2351, processing the first mapping characteristic information to obtain an autocovariance matrix Sigma 11;
S2352, processing the second mapping characteristic information to obtain an autocovariance matrix Sigma 22;
S2353, processing the first mapping characteristic information and the second mapping characteristic information to obtain a first cross covariance matrix Sigma 12 and a second cross covariance matrix Sigma 21;
S2354, solving a preset optimization objective function to obtain transformation matrices W x and W y;
The preset optimization objective function is as follows:
Wherein, Q 1=Σ11,Q2=Σ22 is selected from the group consisting of, Beta, lambda and theta are coefficient constants, and T is a transposition;
S2355, the transformation matrices W x and W y are processed to obtain preprocessed data information.
In a second aspect of the embodiment of the present invention, the processing the ship task data information to be processed to obtain a ship task planning scheme by using the optimized deep reinforcement learning planning model includes:
s41, processing ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain decision information;
and S42, carrying out data processing on the decision information and the personnel planning data information to obtain a ship task planning scheme.
The third aspect of the invention discloses another adaptive planning device for ship tasks based on deep reinforcement learning, which comprises:
a memory storing executable program code;
A processor coupled to the memory;
The processor invokes the executable program codes stored in the memory to execute part or all of the steps in the adaptive planning method for the ship task based on deep reinforcement learning disclosed in the first aspect of the embodiment of the invention.
A fourth aspect of the present invention discloses a computer-readable storage medium storing computer instructions for performing part or all of the steps in the deep reinforcement learning-based adaptive planning method for a marine task disclosed in the first aspect of the present invention when the computer instructions are called.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
(1) In the prior art, a ship task planning system often has difficulty in processing high-dimensional, nonlinear and time sequence data, so that abundant historical experience data cannot be fully utilized to optimize decisions. The invention can more effectively process the complex data by introducing a deep reinforcement learning algorithm, and extract valuable features and modes from the complex data, thereby providing more accurate and comprehensive data support for task planning.
(2) The real-time adaptability is enhanced, namely, the current system generally lacks the capability of adjusting strategies in real time when facing rapidly-changing environment and task conditions, so that decision delay and inaccuracy are caused. The invention enables the system to dynamically adjust the strategy according to the real-time situation in the task execution process by a real-time feedback mechanism and online learning capability, and ensures the timeliness and accuracy of decision making.
(3) The personalized decision support is realized, the individual difference and skill level of ship personnel are rarely considered in the prior art, and the targeted decision support cannot be provided. According to the invention, by combining personnel data, including individual differences of personnel skills, experiences and the like, personalized action suggestions can be generated for each person, so that the potential of the personnel is brought into full play to the maximum extent, and the overall combat efficiency is improved.
(4) Simplifying the model update process-traditional models often require retraining or extensive adjustments in the face of new data or new conditions, which is time consuming and laborious and may affect the stability of the model. The model design of the invention considers the requirements of continuous learning and incremental updating, and can quickly and effectively update the model when new data arrives without retraining the whole model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of a method for adaptive planning of a marine mission based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another adaptive planning method for ship tasks based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a flow chart of deep reinforcement learning disclosed in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a ship task adaptive planning device based on deep reinforcement learning according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of another adaptive planning device for ship tasks based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or elements but may, in the alternative, include other steps or elements not expressly listed or inherent to such process, method, article, or device.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The invention discloses a self-adaptive planning method and device for ship tasks based on deep reinforcement learning, wherein the method comprises the steps of obtaining ship task planning data information; the ship task planning data information comprises historical data information and personnel planning data information, the ship task planning data information is preprocessed to obtain preprocessed data information, a preset deep reinforcement learning planning model is trained by utilizing the preprocessed data information to obtain an optimized deep reinforcement learning planning model, and ship task planning schemes to be obtained by utilizing the optimized deep reinforcement learning planning model to process ship task data information to be processed. The following will describe in detail.
Example 1
Referring to fig. 1, fig. 1 is a schematic flow chart of a ship task adaptive planning method based on deep reinforcement learning according to an embodiment of the present invention. The adaptive planning method for the ship task based on the deep reinforcement learning described in fig. 1 is applied to the technical field of ship task planning, and the embodiment of the invention is not limited. As shown in fig. 1, the adaptive planning method for the ship task based on the deep reinforcement learning may include the following operations:
S1, acquiring ship task planning data information, wherein the ship task planning data information comprises historical data information and personnel planning data information;
s2, preprocessing the ship task planning data information to obtain preprocessed data information;
s3, training a preset deep reinforcement learning planning model by utilizing the preprocessing data information to obtain an optimized deep reinforcement learning planning model;
And S4, processing the ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain a ship task planning scheme.
Optionally, the preprocessing the ship mission planning data information to obtain preprocessed data information includes:
S21, carrying out data cleaning on the ship task planning data information to obtain cleaning data information;
removing abnormal values in the ship mission planning data information, and performing interpolation processing on the missing values to obtain cleaning data information;
the abnormal value is data information with the average value of the data being larger than a preset threshold value;
the missing value is information which is missing in ship mission planning data information, and the missing value is obtained by comparing the missing value with experience information;
s22, carrying out data conversion on the cleaning data information to obtain standard data information;
and carrying out normalization processing on the cleaning data information, dividing the cleaning data information by a maximum value, and obtaining standard data information which belongs to [0,1].
S23, extracting the characteristics of the standard data information to obtain the preprocessed data information.
Optionally, the feature extracting the standard data information to obtain preprocessed data information includes:
S231, extracting features of the standard data information to obtain first feature information;
S232, carrying out feature extraction on the standard data information to obtain second feature information;
S233, performing mapping interpolation processing on the first characteristic information to obtain first mapping characteristic information;
S234, performing mapping interpolation processing on the second characteristic information to obtain second mapping characteristic information;
S235, the first mapping characteristic information and the second mapping characteristic information are processed to obtain preprocessed data information.
Optionally, the feature extracting the standard data information to obtain first feature information includes:
Performing feature extraction on the standard data information by using a preset first feature extraction model to obtain first feature information;
the first feature extraction model expression is:
Wherein C s (t, f) is first characteristic information, f is frequency, t is time, f (ζ, τ) represents kernel function, s (t) is input standard data information, τ is time shift, ζ is frequency shift, u is integral variable, Exp is an exponential function, σ is a constant, and x represents taking the conjugate.
Optionally, the performing a mapping interpolation process on the first feature information to obtain first mapping feature information includes:
Carrying out mapping interpolation processing on the first characteristic information by using a preset mapping interpolation model to obtain first mapping characteristic information;
the preset mapping interpolation model expression is as follows:
Wherein M is first characteristic information, M 'is first mapping characteristic information, M ij is an element in M', M min is a minimum value in M, M max is a maximum value in M, i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to n, and M and n are the row and column lengths of the matrix M.
Optionally, the processing the first mapping feature information and the second mapping feature information to obtain preprocessed data information includes:
S2351, processing the first mapping characteristic information to obtain an autocovariance matrix Sigma 11;
S2352, processing the second mapping characteristic information to obtain an autocovariance matrix Sigma 22;
S2353, processing the first mapping characteristic information and the second mapping characteristic information to obtain a first cross covariance matrix Sigma 12 and a second cross covariance matrix Sigma 21;
S2354, solving a preset optimization objective function to obtain transformation matrices W x and W y;
The preset optimization objective function is as follows:
Wherein, Q 1=Σ11,Q2=Σ22 is selected from the group consisting of, Beta, lambda and theta are coefficient constants, and T is a transposition;
S2355, the transformation matrices W x and W y are processed to obtain preprocessed data information.
The preprocessing data information is as follows:
z is the pre-processed data information, X 1 is the first mapping characteristic information, and X 2 is the second mapping characteristic information.
Extracting features of the standard data information to obtain second feature information, wherein the feature extraction comprises the following steps:
EMD decomposition is carried out on standard data information, the standard data information is decomposed into IMFs with different time scales, simple components and relatively stable, all maximum value points and minimum value points of x '(t) are determined, a polynomial interpolation is utilized to obtain corresponding upper envelope e max (t) and lower envelope e min (t), mean m (t) = (e min(t)+emax (t))/2 is obtained according to the upper envelope and the lower envelope, details h (t) are extracted according to the x' (t) and the mean m (t), h (t) = x '(t) -m (t), if h (t) accords with IMF, the first modal component IMF1 (marked as c 1 (t)) is obtained, otherwise, h (t) = x' (t) is continuously decomposed, and finally the standard data information is decomposed into Wherein r (t) is the remainder. Normalizing the processing function toWherein x max is a maximum value of data information, x min is a minimum value of data information, the data information is x (t), and x' (t) is normalized data information.
Optionally, the processing the ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain a ship task planning scheme includes:
s41, processing ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain decision information;
and S42, carrying out data processing on the decision information and the personnel planning data information to obtain a ship task planning scheme.
The deep reinforcement learning planning model is multi-reward reinforcement learning (MRRL), which is a generalization of standard single-reward reinforcement learning, so the deep reinforcement learning planning model also takes a Markov decision process as a framework, and consists of (S, A, P, R and gamma), wherein R is a reward function, A is an action space, S is a state space, gamma epsilon (0, 1) is a discount factor, and P is a state transition probability.
The strategy pi is:
π(a|s)=p[At=a|St=s]
as shown in the formula, action a is selected when the state is s, namely, the formulation of a strategy.
But differs from reinforcement learning of a single prize in that the prize function of MRRL returns not a scalar value but a scalar vector reflecting each prize value m as a result.
R(s,a,a')=(R1(s,a,s'),…,Rm(s,a,s'))
The strategy in this case is also determined by a set of vectors of expected value:
the goal of multi-reward reinforcement learning is to find the optimal decision of the whole system or approach the optimal decision of the whole system. In solving the multi-reward reinforcement learning problem, when the Q-learning reinforcement learning method is used, the Q value of each model can be learned in parallel, and the Q values can be stored in the form of vectors as well:
The most common way to derive strategies from these estimates is to calculate a linear scalar or weighted sum from the Q vector and the weight vector w:
The weight vector w represents the weight of each objective function, and for multiple rewards, which rewards should be resolved preferentially, the weight of which rewards is bigger, however, it is difficult and often not scientific to implement multiple rewards trade-off by setting these weights a priori, and a large number of parameters are usually needed for the setting of the weights to adjust.
Example two
Referring to fig. 2, fig. 2 is a flow chart of another adaptive planning method for a ship task based on deep reinforcement learning according to an embodiment of the present invention. The adaptive planning method for the ship task based on the deep reinforcement learning described in fig. 2 is applied to the technical field of ship task planning, and the embodiment of the invention is not limited. As shown in fig. 2, the adaptive planning method for the ship task based on the deep reinforcement learning may include the following operations:
the invention aims to provide a ship task self-adaptive planning system based on deep reinforcement learning, which solves the defects that the data processing capacity is limited, the real-time adaptability is lacking, personalized decisions cannot be provided, and the multi-agent cooperation capability is insufficient in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
The system structure comprises a data preprocessing module, a deep reinforcement learning module, a personalized decision support module and a multi-person intelligent cooperation feedback module. All modules are connected with each other through data flow and control flow, so that information transmission and processing are realized. And the data preprocessing module is responsible for extracting relevant data from the historical experience database and the personnel database, and cleaning, converting and extracting features so as to adapt to the requirements of a deep reinforcement learning algorithm.
The ship tasks comprise a navigation task, a cargo loading and unloading task, a ship maintenance task and the like, the task types are determined according to the task requests and are transmitted to corresponding processing modules, and key characteristics including ship states, environment parameters, task related characteristics and the like are extracted.
And the deep reinforcement learning module adopts a complex deep reinforcement learning algorithm, such as PPO or A2C, and combines a simulator to carry out model training. The module is capable of handling high-dimensional state space and action space, autonomous learning and improvement through interaction with the environment.
And the personalized decision support module is used for generating personalized action suggestions for each person by combining personnel data such as skills, experiences and the like. The module displays suggestions through a user interaction interface and collects feedback of personnel so as to realize personalized decision support.
And the multi-agent cooperation module coordinates respective actions through the multi-agent system in the case that a plurality of ships or departments need to cooperate. The module ensures that the information sharing and communication are smooth so as to realize the efficient completion of the whole task.
And data preprocessing, namely extracting relevant data from a historical experience database and a personnel database, wherein the relevant data comprise task execution records, environment information, personnel skills and the like. And cleaning, converting and extracting features of the data to obtain a data set suitable for the deep reinforcement learning algorithm.
The data preprocessing is responsible for extracting relevant data from a historical experience database and a personnel database, and cleaning, converting and extracting features so as to adapt to the requirements of a deep reinforcement learning algorithm.
Historical experience databases include, but are not limited to, travel logs, incident reports, historical task performance, and the like. These data cover the operational experience and performance of the vessel under different environmental conditions.
Data cleaning, namely removing repeated records, outliers and incomplete data.
And data conversion, namely uniformly converting the data in different formats and units into a format suitable for processing by a deep reinforcement learning algorithm, such as converting time series data into window data with fixed length.
The feature extraction method comprises the following steps:
(1) And compressing ship track data by using Laplace feature mapping (LAPLACIAN EIGENMAPS) and Gaussian kernel density Estimation (G-KDE) to extract key steering points.
(2) And clustering the steering points by using a fuzzy self-adaptive DBSCAN method, and identifying common steering areas.
(3) And extracting the ship characteristics and the task related characteristics as input of a deep reinforcement learning model.
Deep reinforcement learning, namely performing model training by adopting complex deep reinforcement learning algorithms such as PPO or A2C and the like. The historical scene is reproduced by constructing a simulator and expanding the dataset with data enhancement techniques. The model is constantly interacted with the environment in the training process, and the strategy is adjusted according to feedback so as to realize autonomous learning and improvement.
Complex deep reinforcement learning algorithms, such as PPO or A2C, are used in conjunction with simulators to perform model training. The module is capable of handling high-dimensional state space and action space, autonomous learning and improvement through interaction with the environment.
The simulator can simulate the sailing conditions of the ship under different sea conditions, meteorological conditions, mission planning and the like, and provides a rich training environment for the deep reinforcement learning model.
The model training flow is as follows:
the algorithm is selected by adopting a near-end strategy optimization (PPO) or a deep reinforcement learning algorithm such as a dominant actor commentator (A2C).
Simulator construction, namely constructing a simulator based on historical experience data and personnel data.
Training data, namely taking data in a historical experience database as training samples, wherein the training samples comprise ship states, tasks, personnel and corresponding action decisions and result feedback.
And the training process is to input the preprocessed data into a deep reinforcement learning model, and perform multi-round iterative training through a simulator, so as to continuously adjust model parameters to optimize a decision strategy.
Personalized decision support, namely generating personalized action suggestions for each person by combining personnel data. And displaying the advice through a user interaction interface, and collecting feedback of personnel. The system adjusts and optimizes the advice according to the feedback to provide decision support that better meets the actual needs.
Personalized decision support combines personnel data, such as skills, experience, etc., to generate personalized action suggestions for each person. The module displays suggestions through a user interaction interface and collects feedback of personnel so as to realize personalized decision support.
The specific steps of personalized decision support are as follows:
(1) Personnel data analysis, which collects and analyzes personnel data including skill levels, historical operating records, preference settings, and the like.
(2) Decision advice generation, in which the output of the deep reinforcement learning model and personnel data are combined to generate personalized action advice for each personnel.
(3) User interaction and feedback, namely displaying personalized suggestions through a user interaction interface, and collecting real-time feedback of personnel for further optimizing a decision model.
(4) And updating the model, namely dynamically adjusting decision model parameters according to personnel feedback to realize continuous optimization of personalized decisions.
Intelligent collaboration by multiple personnel, where multiple vessels or departments are required to collaborate, the respective actions are coordinated by a multi-agent system. The system ensures that the information sharing and communication are smooth so that each ship or department can know the state and intention of each other in real time, thereby realizing the efficient completion of the whole task.
In the case where multiple vessels or departments are required to cooperate, the respective actions are coordinated by a multi-agent system. The module ensures that the information sharing and communication are smooth so as to realize the efficient completion of the whole task.
The specific implementation mode is as follows:
The agent definition is that each ship or key department is mapped into an agent, and each agent has independent deep reinforcement learning model and decision capability.
And the information sharing and communication are carried out, namely a multi-agent communication system is constructed, and the real-time sharing of state information, task progress and decision results among agents is ensured.
And collaborative decision making, namely extracting key data which is beneficial to collaborative decision by using a attention push processing method, and accumulating interactive experience by using an experience learning method driven by design memory. And a multi-head attention mechanism and a noise network are introduced, so that the exploration capability and the robustness of the decision-making of the intelligent agent are enhanced.
And the conflict resolution and policy optimization are that when decision conflict occurs, the conflict is resolved through a negotiation mechanism or a preset rule, so that the high-efficiency completion of the whole task is ensured. Meanwhile, according to feedback in the task execution process, the decision strategy of each agent is dynamically adjusted, and the cooperative optimization closed-loop control is realized.
The deep reinforcement learning flow chart 3 shows a model training process, which comprises the steps of data input, feature extraction, model training, strategy output and the like. Through continuous iteration and optimization, the model can gradually adapt to environmental changes and propose a better decision strategy.
Environment initialization-the environment provides the agent with initial state data that is the basis for understanding the current environment.
And the feature extraction is carried out after the state data of the environment is received, and the original data is converted into feature vectors which can be understood and processed by the model.
Model training, namely training a deep reinforcement learning model based on the extracted features. By adjusting the parameters of the model, the ability of predicting the optimal action according to the current state is continuously improved.
The dynamic programming equation is adopted in the implementation process of the deep reinforcement learning, and the formula is as follows:
The term pi (a|s) in the equation is indeed a probability distribution representing the probability of selecting action a given state s, and q (s, a) is a function of the action value that estimates the expected return that action a will achieve given state s.
(1) And outputting an action strategy, namely outputting the action strategy by the model according to the characteristics of the current state after training.
(2) And (3) environment feedback, namely after the action is executed, the environment gives out a corresponding reward signal and updates the reward signal to a new state, and the new state provides the basis for the next decision.
(3) Model optimization, namely optimizing the model according to the reward signal and the new state fed back by the environment.
Through continuous iteration, the model gradually adapts to the environment, and the accuracy of decision making is improved.
(4) And (3) iterative loop, namely taking the new state as input of the optimized model, and repeating the steps to form a closed loop iterative process. This process will continue until the model reaches a convergence state or a preset stop condition is met.
Example III
Please refer to in figure 4 of the drawings, fig. 4 is a schematic structural diagram of a ship task adaptive planning device based on deep reinforcement learning according to an embodiment of the present invention. The adaptive planning device for the ship task based on the deep reinforcement learning described in fig. 4 is applied to the technical field of ship task planning, and the embodiment of the invention is not limited. As shown in figure 4 of the drawings, the ship task self-service based on deep reinforcement learning the adaptive planning apparatus may include the following operations:
S301, an information acquisition module is used for acquiring ship task planning data information, wherein the ship task planning data information comprises historical data information and personnel planning data information;
s302, a preprocessing module is used for preprocessing the ship task planning data information to obtain preprocessed data information;
s303, a model training module is used for training a preset deep reinforcement learning planning model by utilizing the preprocessing data information to obtain an optimized deep reinforcement learning planning model;
and S304, a ship task planning module is used for processing ship task data information to be processed by using the optimized deep reinforcement learning planning model to obtain a ship task planning scheme.
Example IV
Referring to fig. 5, fig. 5 is a schematic structural diagram of a ship task adaptive planning device based on deep reinforcement learning according to an embodiment of the present invention. The adaptive planning device for the ship task based on the deep reinforcement learning described in fig. 5 is applied to the technical field of ship task planning, and the embodiment of the invention is not limited. As shown in fig. 5, the adaptive planning apparatus for a ship mission based on deep reinforcement learning may include the following operations:
a memory 401 storing executable program codes;
A processor 402 coupled with the memory 401;
The processor 402 invokes executable program code stored in the memory 401 for performing the steps in the deep reinforcement learning based marine task adaptive planning method described in embodiment one, embodiment two.
Example five
The embodiment of the invention discloses a computer readable storage medium storing a computer program for electronic data exchange, wherein the computer program enables a computer to be used for executing the steps in the ship task adaptive planning method based on deep reinforcement learning described in the first and second embodiments.
The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.
Finally, it should be noted that the disclosed adaptive planning method and device for ship tasks based on deep reinforcement learning are only preferred embodiments of the present invention, and are only used for illustrating the technical scheme of the present invention, but not limiting the technical scheme; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that modifications may be made to the technical solutions described in the foregoing embodiments or equivalents may be substituted for some of the technical features thereof, and that these modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention in essence of the corresponding technical solutions.

Claims (8)

1.一种基于深度强化学习的船舶任务自适应规划方法,其特征在于,所述方法包括:1. A ship mission adaptive planning method based on deep reinforcement learning, characterized in that the method comprises: S1,获取船舶任务规划数据信息;所述船舶任务规划数据信息包括历史数据信息和人员规划数据信息;S1, obtaining ship mission planning data information; the ship mission planning data information includes historical data information and personnel planning data information; 所述历史数据信息包括航行日志、事故报告及历史任务执行情况;The historical data information includes navigation logs, accident reports and historical mission execution status; 所述人员规划数据信息包括技能水平、历史操作记录和偏好设置;The personnel planning data information includes skill level, historical operation records and preference settings; S2,对所述船舶任务规划数据信息进行预处理,得到预处理数据信息;S2, preprocessing the ship mission planning data information to obtain preprocessed data information; S3,利用所述预处理数据信息,对预设的深度强化学习规划模型进行训练,得到优化深度强化学习规划模型;S3, using the preprocessed data information to train a preset deep reinforcement learning planning model to obtain an optimized deep reinforcement learning planning model; 所述预处理数据信息包括船舶状态、任务、人员及相应的行动决策和结果反馈;The pre-processed data information includes ship status, mission, personnel and corresponding action decisions and result feedback; S4,利用所述优化深度强化学习规划模型,对待处理的船舶任务数据信息进行处理,得到船舶任务规划方案,包括:S4, using the optimized deep reinforcement learning planning model to process the ship mission data information to be processed to obtain a ship mission planning solution, including: S41,利用所述优化深度强化学习规划模型,对待处理的船舶任务数据信息进行处理,得到决策信息;S41, using the optimized deep reinforcement learning planning model to process the ship mission data information to be processed to obtain decision information; S42,对所述决策信息和所述人员规划数据信息进行数据处理,得到船舶任务规划方案,为每个人生成个性化的行动建议。S42, performing data processing on the decision information and the personnel planning data information to obtain a ship mission planning plan and generate personalized action suggestions for each person. 2.根据权利要求1所述的基于深度强化学习的船舶任务自适应规划方法,其特征在于,所述对所述船舶任务规划数据信息进行预处理,得到预处理数据信息,包括:2. The method for adaptive ship mission planning based on deep reinforcement learning according to claim 1, characterized in that the preprocessing of the ship mission planning data information to obtain the preprocessed data information comprises: S21,对所述船舶任务规划数据信息进行数据清洗,得到清洗数据信息;S21, performing data cleaning on the ship mission planning data information to obtain cleaned data information; S22,对所述清洗数据信息进行数据转换,得到标准数据信息;S22, performing data conversion on the cleaning data information to obtain standard data information; S23,对所述标准数据信息进行特征提取,得到预处理数据信息。S23, extracting features from the standard data information to obtain pre-processed data information. 3.根据权利要求2所述的基于深度强化学习的船舶任务自适应规划方法,其特征在于,所述对所述标准数据信息进行特征提取,得到预处理数据信息,包括:3. The method for adaptive ship mission planning based on deep reinforcement learning according to claim 2, characterized in that the feature extraction of the standard data information to obtain pre-processed data information comprises: S231,对所述标准数据信息进行特征提取,得到第一特征信息,包括:S231, extracting features from the standard data information to obtain first feature information, including: 利用预设的第一特征提取模型,对所述标准数据信息进行特征提取,得到第一特征信息;Using a preset first feature extraction model, extracting features from the standard data information to obtain first feature information; 所述第一特征提取模型表达式为:The first feature extraction model expression is: 其中,Cs(t,f)是第一特征信息,f为频率,t为时间,f(ξ,τ)表示内核函数;s(t)为输入的标准数据信息,τ为时移,ξ为频移,u是积分变量,exp为指数函数,σ为常数,*表示取共轭;Among them, Cs (t,f) is the first characteristic information, f is frequency, t is time, f(ξ,τ) represents the kernel function; s(t) is the input standard data information, τ is the time shift, ξ is the frequency shift, u is the integral variable, exp is the exponential function, σ is a constant, and * indicates conjugation; S232,对所述标准数据信息进行特征提取,得到第二特征信息,包括:对标准数据信息进行EMD分解,分解为不同时间尺度、成分简单、相对平稳的IMFs;S232, extracting features from the standard data information to obtain second feature information, including: performing EMD decomposition on the standard data information to decompose it into IMFs with different time scales, simple components, and relatively stable; S233,对所述第一特征信息进行映射插值处理,得到第一映射特征信息;S233, performing mapping interpolation processing on the first feature information to obtain first mapping feature information; S234,对所述第二特征信息进行映射插值处理,得到第二映射特征信息;S234, performing mapping interpolation processing on the second feature information to obtain second mapping feature information; S235,对所述第一映射特征信息和所述第二映射特征信息进行处理,得到预处理数据信息。S235: Process the first mapping feature information and the second mapping feature information to obtain preprocessing data information. 4.根据权利要求3所述的基于深度强化学习的船舶任务自适应规划方法,其特征在于,所述对所述第一特征信息进行映射插值处理,得到第一映射特征信息,包括:4. The method for adaptive ship task planning based on deep reinforcement learning according to claim 3, characterized in that the mapping interpolation processing is performed on the first feature information to obtain the first mapping feature information, comprising: 利用预设的映射插值模型,对所述第一特征信息进行映射插值处理,得到第一映射特征信息;Using a preset mapping interpolation model, performing mapping interpolation processing on the first feature information to obtain first mapping feature information; 所述预设的映射插值模型表达式为:The preset mapping interpolation model expression is: 其中,M为第一特征信息,M′为第一映射特征信息,Mij为M′中的元素,Mmin为M中的最小值,Mmax为M中的最大值,1≤i≤m,1≤j≤n,m,n为矩阵M的行和列长度。Among them, M is the first feature information, M′ is the first mapping feature information, Mi′j is the element in M , Mmin is the minimum value in M, Mmax is the maximum value in M, 1≤i≤m, 1≤j≤n, and m,n are the row and column lengths of the matrix M. 5.根据权利要求3所述的基于深度强化学习的船舶任务自适应规划方法,其特征在于,所述对所述第一映射特征信息和所述第二映射特征信息进行处理,得到预处理数据信息,包括:5. The method for adaptive ship mission planning based on deep reinforcement learning according to claim 3, characterized in that the processing of the first mapping feature information and the second mapping feature information to obtain preprocessed data information comprises: S2351,对所述第一映射特征信息进行处理,得到自协方差矩阵Σ11S2351, processing the first mapping feature information to obtain an autocovariance matrix Σ 11 ; S2352,对所述第二映射特征信息进行处理,得到自协方差矩阵Σ22S2352, processing the second mapping feature information to obtain an autocovariance matrix Σ 22 ; S2353,对所述第一映射特征信息和所述第二映射特征信息进行处理,得到第一互协方差矩阵Σ12和第二互协方差矩阵Σ21S2353, processing the first mapping feature information and the second mapping feature information to obtain a first cross-covariance matrix Σ 12 and a second cross-covariance matrix Σ 21 ; S2354,对预设的优化目标函数进行求解,得到变换矩阵Wx和WyS2354, solving the preset optimization objective function to obtain transformation matrices W x and W y ; 所述预设的优化目标函数为:The preset optimization objective function is: 其中,Q1=Σ11,Q2=Σ22β、λ、θ为系数常数,T为转置;Among them, Q 111 , Q 222 , β, λ, θ are coefficient constants, and T is the transpose; S2355,对所述变换矩阵Wx和Wy进行处理,得到预处理数据信息。S2355, process the transformation matrices Wx and Wy to obtain preprocessing data information. 6.一种基于深度强化学习的船舶任务自适应规划装置,其特征在于,所述装置包括:6. A ship task adaptive planning device based on deep reinforcement learning, characterized in that the device comprises: 信息获取模块,用于获取船舶任务规划数据信息;所述船舶任务规划数据信息包括历史数据信息和人员规划数据信息;An information acquisition module is used to acquire ship mission planning data information; the ship mission planning data information includes historical data information and personnel planning data information; 所述历史数据信息包括航行日志、事故报告及历史任务执行情况;The historical data information includes navigation logs, accident reports and historical mission execution status; 所述人员规划数据信息包括技能水平、历史操作记录和偏好设置;The personnel planning data information includes skill level, historical operation records and preference settings; 预处理模块,用于对所述船舶任务规划数据信息进行预处理,得到预处理数据信息;A preprocessing module, used for preprocessing the ship mission planning data information to obtain preprocessed data information; 模型训练模块,用于利用所述预处理数据信息,对预设的深度强化学习规划模型进行训练,得到优化深度强化学习规划模型;A model training module, used to train a preset deep reinforcement learning planning model using the preprocessed data information to obtain an optimized deep reinforcement learning planning model; 所述预处理数据信息包括船舶状态、任务、人员及相应的行动决策和结果反馈;The pre-processed data information includes ship status, mission, personnel and corresponding action decisions and result feedback; 船舶任务规划模块,用于利用所述优化深度强化学习规划模型,对待处理的船舶任务数据信息进行处理,得到船舶任务规划方案,包括:The ship mission planning module is used to process the ship mission data information to be processed using the optimized deep reinforcement learning planning model to obtain a ship mission planning solution, including: S41,利用所述优化深度强化学习规划模型,对待处理的船舶任务数据信息进行处理,得到决策信息;S41, using the optimized deep reinforcement learning planning model to process the ship mission data information to be processed to obtain decision information; S42,对所述决策信息和所述人员规划数据信息进行数据处理,得到船舶任务规划方案,为每个人生成个性化的行动建议。S42, performing data processing on the decision information and the personnel planning data information to obtain a ship mission planning plan and generate personalized action suggestions for each person. 7.一种基于深度强化学习的船舶任务自适应规划装置,其特征在于,所述装置包括:7. A ship task adaptive planning device based on deep reinforcement learning, characterized in that the device comprises: 存储有可执行程序代码的存储器;A memory storing executable program code; 与所述存储器耦合的处理器;a processor coupled to the memory; 所述处理器调用所述存储器中存储的所述可执行程序代码,执行如权利要求1-5任一项所述的基于深度强化学习的船舶任务自适应规划方法。The processor calls the executable program code stored in the memory to execute the ship task adaptive planning method based on deep reinforcement learning as described in any one of claims 1-5. 8.一种计算机可存储介质,其特征在于,所述计算机可存储介质存储有计算机指令,所述计算机指令被调用时,用于执行如权利要求1-5任一项所述的基于深度强化学习的船舶任务自适应规划方法。8. A computer-storable medium, characterized in that the computer-storable medium stores computer instructions, and when the computer instructions are called, they are used to execute the ship task adaptive planning method based on deep reinforcement learning as described in any one of claims 1-5.
CN202410991536.8A 2024-07-23 2024-07-23 Ship task self-adaptive planning method and device based on deep reinforcement learning Active CN118982176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410991536.8A CN118982176B (en) 2024-07-23 2024-07-23 Ship task self-adaptive planning method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410991536.8A CN118982176B (en) 2024-07-23 2024-07-23 Ship task self-adaptive planning method and device based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN118982176A CN118982176A (en) 2024-11-19
CN118982176B true CN118982176B (en) 2025-01-14

Family

ID=93449775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410991536.8A Active CN118982176B (en) 2024-07-23 2024-07-23 Ship task self-adaptive planning method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN118982176B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950873A (en) * 2020-07-30 2020-11-17 上海卫星工程研究所 Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN116360434A (en) * 2023-03-22 2023-06-30 西安电子科技大学 Ship path planning method based on improved CSAC-APF algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112249032B (en) * 2020-10-29 2022-02-18 浪潮(北京)电子信息产业有限公司 An automatic driving decision-making method, system, device and computer storage medium
CN116755447A (en) * 2023-07-12 2023-09-15 王胜正 Ship autonomous collision avoidance decision model construction method and model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950873A (en) * 2020-07-30 2020-11-17 上海卫星工程研究所 Satellite real-time guiding task planning method and system based on deep reinforcement learning
CN116360434A (en) * 2023-03-22 2023-06-30 西安电子科技大学 Ship path planning method based on improved CSAC-APF algorithm

Also Published As

Publication number Publication date
CN118982176A (en) 2024-11-19

Similar Documents

Publication Publication Date Title
Jaafra et al. Reinforcement learning for neural architecture search: A review
GB2614849A (en) Computer-based systems, computing components and computing objects configured to implement dynamic outlier bias reduction in machine learning models
US5226092A (en) Method and apparatus for learning in a neural network
CN110647980A (en) Time sequence prediction method based on GRU neural network
CN114519469B (en) Construction method of multivariable long-sequence time sequence prediction model based on transducer framework
CN107169573A (en) Using composite machine learning model come the method and system of perform prediction
CN116957698A (en) Electricity price prediction method based on improved time sequence mode attention mechanism
AU2023226755A1 (en) Single image concept encoder for personalization using a pretrained diffusion model
Zamfirache et al. Adaptive reinforcement learning-based control using proximal policy optimization and slime mould algorithm with experimental tower crane system validation
Samin-Al-Wasee et al. Time-series forecasting of ethereum price using long short-term memory (lstm) networks
CN118982176B (en) Ship task self-adaptive planning method and device based on deep reinforcement learning
CN117217548A (en) Water quality prediction method based on CEEMDAN-LSTM
CN114594443B (en) A method and system for weather radar echo extrapolation based on self-attention mechanism and predictive recurrent neural network
CN112766513B (en) Knowledge tracking method and system for memory collaboration
CN116432707A (en) A deep sequential convolution knowledge tracking method based on autocorrelation error optimization
Liu et al. LHCnn: A novel efficient multivariate time series prediction framework utilizing convolutional neural networks
Zhan et al. GMINN: a generative moving interactive neural network for enhanced short-term load forecasting in modern electricity markets
Liao et al. Stock Market Volatility Prediction Based on Robust GBM-GRU Model
RU2755935C2 (en) Method and system for machine learning of hierarchically organized purposeful behavior
Gustafsson Some contributions to heteroscedastic time series analysis and computational aspects of Bayesian VARs
Makamo Optimization Algorithms in Deep Learning Models for Improving the Forecasting Accuracy in Sequential Datasets with Application in the South African Stock Market Index: A Review
Gong et al. A Novel LSTM-KAN Networks for Demand Forecasting in Manufacturing
Wang et al. Network Traffic Prediction with Decomposition and Multi-Scale Autocorrelation in Large-Scale Cloud Data Centers
De Blasi Machine Learning for Industrial Process Optimization
JP3491317B2 (en) Construction method of feedforward neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant