[go: up one dir, main page]

CN109362113B - Underwater acoustic sensor network cooperation exploration reinforcement learning routing method - Google Patents

Underwater acoustic sensor network cooperation exploration reinforcement learning routing method Download PDF

Info

Publication number
CN109362113B
CN109362113B CN201811310120.6A CN201811310120A CN109362113B CN 109362113 B CN109362113 B CN 109362113B CN 201811310120 A CN201811310120 A CN 201811310120A CN 109362113 B CN109362113 B CN 109362113B
Authority
CN
China
Prior art keywords
node
value
packet
data packet
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811310120.6A
Other languages
Chinese (zh)
Other versions
CN109362113A (en
Inventor
冯晓宁
宋雪
王卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201811310120.6A priority Critical patent/CN109362113B/en
Publication of CN109362113A publication Critical patent/CN109362113A/en
Application granted granted Critical
Publication of CN109362113B publication Critical patent/CN109362113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/22Communication route or path selection, e.g. power-based or shortest path routing using selective relaying for reaching a BTS [Base Transceiver Station] or an access point
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. Transmission Power Control [TPC] or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0203Power saving arrangements in the radio access network or backbone network of wireless communication networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明涉及水声传感器网络、水声路由协议技术领域,特别涉及一种水声传感器网络合作探索强化学习路由方法。本发明包括以下步骤:(1)初始化各节点Q值及V值;(2)判断

Figure DDA0001854720870000011
是否成立;(3)中继节点收到数据包/控制包,更新邻居列表,并判断是否继续转发;(4)sink收到数据包,结束本次传输。基于强化学习的路由协议在选择路径时能够近似达到全局最优,并且可以合并多项影响性能的因素。本发明中,在算法未收敛时,源节点在发送数据包的同时发送数个控制包,以加速算法的收敛,否则,只发送数据包。在算法收敛后,通过选择V值最高的下一跳节点实现近似全局最优路径,从而均衡了网络能耗,延长了网络寿命,解决了强化学习收敛速度慢的问题。

Figure 201811310120

The invention relates to the technical fields of underwater acoustic sensor networks and underwater acoustic routing protocols, in particular to a method of reinforcement learning routing for cooperative exploration of underwater acoustic sensor networks. The present invention includes the following steps: (1) initializing the Q value and V value of each node; (2) judging

Figure DDA0001854720870000011
Whether it is established; (3) the relay node receives the data packet/control packet, updates the neighbor list, and judges whether to continue forwarding; (4) the sink receives the data packet and ends this transmission. Routing protocols based on reinforcement learning can approximate the global optimum when selecting paths, and can incorporate multiple factors that affect performance. In the present invention, when the algorithm does not converge, the source node sends several control packets while sending the data packets to speed up the convergence of the algorithm, otherwise, only the data packets are sent. After the algorithm converges, the approximate global optimal path is achieved by selecting the next hop node with the highest V value, which balances the network energy consumption, prolongs the network life, and solves the problem of slow convergence of reinforcement learning.

Figure 201811310120

Description

一种水声传感器网络合作探索强化学习路由方法An underwater acoustic sensor network cooperative exploration reinforcement learning routing method

技术领域technical field

本发明涉及水声传感器网络、水声路由协议技术领域,特别涉及一种水声传感器网络合作探索强化学习路由方法。The invention relates to the technical fields of underwater acoustic sensor networks and underwater acoustic routing protocols, in particular to a method of reinforcement learning routing for cooperative exploration of underwater acoustic sensor networks.

背景技术Background technique

水声传感器网络,Underwater Acoustic Sensor Networks,即UASNs,由水下部署的传感器节点和用于接收数据的汇聚节点sink组成。这些节点提供了许多应用如环境监测、战术监视、资源勘探、辅助导航和灾难防御等。由于无线电波高传输损耗的限制,水下通信常采用声波。同时,UASNs面临着电池容量有限、误码率高、端到端时延高、可用带宽有限等独特的挑战。Underwater Acoustic Sensor Networks, or UASNs, consist of sensor nodes deployed underwater and sink nodes for receiving data. These nodes provide many applications such as environmental monitoring, tactical surveillance, resource exploration, aided navigation, and disaster prevention. Due to the limitation of high transmission loss of radio waves, underwater communication often uses acoustic waves. At the same time, UASNs face unique challenges such as limited battery capacity, high bit error rate, high end-to-end latency, and limited available bandwidth.

由于UASNs的高延迟、高能耗以及低带宽等固有特性,其网络拓扑结构通常为分布式网络。其路由协议面临的一个主要问题是寻找高效且节能的路径。与环境试错交互以寻找最大期望奖励的强化学习算法已被应用于UASNs,基于强化学习的路由协议,每一个节点在选择路径时不必知道全网拓扑信息就可近似达到全局最优。强化学习算法可以使节点学习和适应其所处的动态环境,并且能够合并多项影响路由性能的因素,使路由决策考虑的更为全面。在本发明中,用源节点V值的收敛速度表征强化学习的收敛速度。Due to the inherent characteristics of UASNs such as high latency, high energy consumption, and low bandwidth, their network topology is usually a distributed network. A major problem facing its routing protocol is finding efficient and energy-efficient paths. Reinforcement learning algorithms that interact with the environment to find the maximum expected reward have been applied to UASNs. Routing protocols based on reinforcement learning, each node can approximate the global optimality without knowing the topology information of the entire network when choosing a path. Reinforcement learning algorithms can make nodes learn and adapt to the dynamic environment they are in, and can combine multiple factors that affect routing performance, making routing decisions more comprehensive. In the present invention, the convergence speed of reinforcement learning is represented by the convergence speed of the V value of the source node.

在UASNs中,随着网络规模的扩大,强化学习的收敛速度减慢,网络能量消耗大,并在网络拓扑改变时,不能很好的跟踪其变化,影响网络性能。In UASNs, with the expansion of the network scale, the convergence speed of reinforcement learning slows down, the network energy consumption is large, and when the network topology changes, the changes cannot be well tracked, which affects the network performance.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对上述现有技术的不足,提出一种水声传感器网络合作探索强化学习路由方法。在算法未收敛时,源节点发送数据包的同时发送数个控制包对路径进行合作探索,以加速其V值的收敛,解决了强化学习收敛速度慢的问题,同时减小了网络能耗,延长了网络寿命。The purpose of the present invention is to propose a method of reinforcement learning routing for cooperative exploration of underwater acoustic sensor networks in view of the above-mentioned deficiencies of the prior art. When the algorithm does not converge, the source node sends several control packets while sending data packets to explore the path cooperatively to accelerate the convergence of its V value, solve the problem of slow convergence of reinforcement learning, and reduce network energy consumption. Extends network life.

本发明可以通过如下的技术方案实现:The present invention can be achieved through the following technical solutions:

一种水声传感器网络合作探索强化学习路由方法,该方法包括以下步骤:An underwater acoustic sensor network cooperative exploration reinforcement learning routing method, which includes the following steps:

(1)初始化各节点Q值及V值;(1) Initialize the Q value and V value of each node;

(2)确定源节点s下一时刻的V值

Figure GDA0003266433030000011
(2) Determine the V value of the source node s at the next moment
Figure GDA0003266433030000011

(3)根据各节点的Q值及V值,判断

Figure GDA0003266433030000012
是否成立:(3) According to the Q value and V value of each node, judge
Figure GDA0003266433030000012
Is it established:

(3.1)如果判断成立,源节点只发送数据包;(3.1) If the judgment is true, the source node only sends data packets;

(3.2)如果判断不成立,源节点在发送数据包的同时发送控制包;(3.2) If the judgment is not established, the source node sends the control packet while sending the data packet;

(4)根据源节点发送的数据包或控制包,中继节点接收数据并读取包头;(4) According to the data packet or control packet sent by the source node, the relay node receives the data and reads the packet header;

(5)根据中继节点接收的数据更新路由表,并判断其是否继续发往本节点,若判断数据是发往本节点,则计算Q值,更新V值至包头,并继续传输数据包;(5) Update the routing table according to the data received by the relay node, and judge whether it continues to be sent to this node, if it is judged that the data is sent to this node, then calculate the Q value, update the V value to the packet header, and continue to transmit the data packet;

(6)判断汇聚节点sink是否收到数据包:(6) Determine whether the sink node receives the data packet:

(6.1)若sink收到数据包,则结束本次传输;(6.1) If the sink receives the data packet, it will end the transmission;

(6.2)若sink没有收到数据包,则重复步骤(2)到步骤(6),直至sink收到数据包。(6.2) If the sink does not receive the data packet, repeat steps (2) to (6) until the sink receives the data packet.

所述步骤(1)包括以下步骤:Described step (1) comprises the following steps:

(1.1)确定奖励函数;(1.1) Determine the reward function;

(1.2)根据奖励函数,确定各节点的Q值迭代函数;(1.2) According to the reward function, determine the Q-value iteration function of each node;

步骤(1.1)所述奖励函数Rnm为第一节点n向第二节点m传输数据包/控制包完成后所获得的即时奖励,奖励函数按下式结算:The reward function R nm in step (1.1) is the instant reward obtained after the first node n transmits the data packet/control packet to the second node m, and the reward function is calculated as follows:

Rnm=-g-α1c+α2dR nm =-g-α 1 c+α 2 d

其中,g为节点在传输数据时的固定损耗,c为节点剩余能量消耗函数,d为节点能量分布情况,α1为节点剩余能量消耗函数c的比重参数,α2为节点能量分布情况d的比重参数;Among them, g is the fixed loss of the node when transmitting data, c is the node's remaining energy consumption function, d is the node energy distribution, α1 is the proportion parameter of the node's remaining energy consumption function c, and α2 is the node energy distribution d. Specific gravity parameter;

步骤(1.2)所述Q值的迭代函数按下式计算:The iterative function of the Q value described in step (1.2) is calculated as follows:

Figure GDA0003266433030000021
Figure GDA0003266433030000021

其中,

Figure GDA0003266433030000022
表示第一节点n在t+1时刻的Q值,α表示Q值的更新速率,γ是折扣因子,
Figure GDA0003266433030000023
为第二节点m在t时刻的Q值。in,
Figure GDA0003266433030000022
represents the Q value of the first node n at time t+1, α represents the update rate of the Q value, γ is the discount factor,
Figure GDA0003266433030000023
is the Q value of the second node m at time t.

步骤(3)所述判断条件

Figure GDA0003266433030000024
中,
Figure GDA0003266433030000025
表示源节点在t时刻的V值,
Figure GDA0003266433030000026
表示t+1时刻源节点的V值,ε表示一个大于0的极小值;The judgment condition described in step (3)
Figure GDA0003266433030000024
middle,
Figure GDA0003266433030000025
represents the V value of the source node at time t,
Figure GDA0003266433030000026
Represents the V value of the source node at time t+1, and ε represents a minimum value greater than 0;

若步骤(3.1)所述判断成立,源节点终止控制包的传输,结合路由表通过Q值迭代公式计算最优路径向上传输数据包,直至sink;If the judgment in step (3.1) is established, the source node terminates the transmission of the control packet, and calculates the optimal path to transmit the data packet upward through the Q-value iteration formula in combination with the routing table, until the sink;

其中,源节点下一时刻的V值计算函数为:Among them, the calculation function of the V value of the source node at the next moment is:

Figure GDA0003266433030000027
Figure GDA0003266433030000027

其中,α与步骤(1.2)所述Q值的迭代函数中的α数值相同,在这里表示学习速率,意为V值的更新速率;ω为控制包探测路径的归一化参数,

Figure GDA0003266433030000028
表示各数据包或控制包探索路径所得到的经验。Among them, α is the same as the value of α in the iterative function of the Q value described in step (1.2), where it represents the learning rate, which means the update rate of the V value; ω is the normalization parameter that controls the packet detection path,
Figure GDA0003266433030000028
Indicates the experience gained by each data packet or control packet exploring the path.

若如步骤(5)所述该包发往本节点,根则据Q值迭代函数计算Q值,选定Q值为其最大值Qmax时的节点为下一跳节点,并将V值更新为Qmax,并改写节点信息至包头继续传输。If the packet is sent to this node as described in step (5), the Q value is calculated according to the Q value iterative function, the node whose Q value is its maximum value Q max is selected as the next hop node, and the V value is updated is Q max , and rewrite the node information to the packet header to continue transmission.

本发明与现有技术相比,本发明的有益效果在于:Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明提供了一种水声传感器网络合作探索强化学习路由算法,在算法未收敛时,源节点同时发送数据包和控制包,加快了源节点V值的收敛速度。(1) The present invention provides an underwater acoustic sensor network cooperative exploration reinforcement learning routing algorithm. When the algorithm does not converge, the source node sends data packets and control packets at the same time, which speeds up the convergence speed of the V value of the source node.

(2)本发明在算法收敛后,通过选择V值最高的下一跳节点实现近似全局最优路径,从而均衡了网络能耗,延长了网络寿命。(2) After the algorithm converges, the present invention realizes an approximate global optimal path by selecting the next hop node with the highest V value, thereby balancing the network energy consumption and prolonging the network life.

附图说明Description of drawings

图1是水声传感器网络结构图。Figure 1 is a structural diagram of an underwater acoustic sensor network.

图2是合作探索强化学习路由方法的示意图。Figure 2 is a schematic diagram of cooperative exploration of reinforcement learning routing methods.

图3是源节点实现合作探索强化学习算法的流程图。Figure 3 is a flow chart of the source node implementing the cooperative exploration reinforcement learning algorithm.

图4是路由转发流程图。FIG. 4 is a flow chart of routing and forwarding.

具体实施方式Detailed ways

下面结合附图对本发明做进一步阐述。The present invention will be further described below with reference to the accompanying drawings.

显然,所描述的实施例仅是本发明一部分实施例,而不是全部实施例。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于发明的实施例,本领域技术人员没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。Obviously, the described embodiments are only some, but not all, embodiments of the present invention. Thus, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention. Based on the embodiments of the invention, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present invention.

本发明提供一种水声传感器网络合作探索强化学习路由方法。基于强化学习的路由协议在选择路径时能够近似达到全局最优,并且可以合并多项影响性能的因素。本发明中,在算法未收敛时,源节点发送数据包的同时发送数个只含有包头信息的控制包对路径进行合作探索,以加速源节点V值的收敛,否则,只发送数据包。本发明解决了强化学习收敛速度慢的问题,同时减小了网络能耗,延长了网络寿命。本发明具体包含以下步骤:The invention provides an underwater acoustic sensor network cooperative exploration reinforcement learning routing method. Routing protocols based on reinforcement learning can approximate the global optimum when selecting paths, and can incorporate multiple factors that affect performance. In the present invention, when the algorithm does not converge, the source node sends several control packets containing only packet header information to explore the path cooperatively while sending data packets, so as to accelerate the convergence of the V value of the source node, otherwise, only send data packets. The invention solves the problem of slow convergence of reinforcement learning, reduces network energy consumption, and prolongs network life. The present invention specifically comprises the following steps:

(1)初始化各节点Q值及V值。(1) Initialize the Q value and V value of each node.

(2)确定源节点s下一时刻的V值

Figure GDA0003266433030000031
(2) Determine the V value of the source node s at the next moment
Figure GDA0003266433030000031

(3)判断

Figure GDA0003266433030000032
是否成立,其中,
Figure GDA0003266433030000033
表示源节点在t时刻的V值,ε表示一个大于0的极小值。如果成立,源节点只发送数据包;否则,源节点在发送数据包的同时发送控制包。(3) Judgment
Figure GDA0003266433030000032
is established, where,
Figure GDA0003266433030000033
Represents the V value of the source node at time t, and ε represents a minimum value greater than 0. If true, the source node only sends the data packet; otherwise, the source node sends the control packet at the same time as the data packet.

(4)中继节点收到数据包/控制包,更新邻居列表,并判断是否继续转发。(4) The relay node receives the data packet/control packet, updates the neighbor list, and judges whether to continue forwarding.

(5)sink收到数据包,结束本次传输。(5) The sink receives the data packet and ends the transmission.

步骤(2)中,源节点的V值迭代函数为:In step (2), the V value iteration function of the source node is:

Figure GDA0003266433030000041
Figure GDA0003266433030000041

其中,α表示学习速率,意为V值的更新速率,它控制了先前的V值与新的V值之间的差异有多少被考虑在内。γ是折扣因子,意为经验对当前的V值的影响。ω为控制包探测路径的归一化参数,

Figure GDA0003266433030000042
表示各数据包/控制包探索路径所得到的经验。步骤(3)中,在
Figure GDA0003266433030000043
时,源节点只发送数据包;在
Figure GDA0003266433030000044
时,源节点在发送数据包的同时发送控制包。where α represents the learning rate, meaning the update rate of the V value, which controls how much of the difference between the previous V value and the new V value is taken into account. γ is the discount factor, which means the influence of experience on the current value of V. ω is the normalization parameter that controls the packet detection path,
Figure GDA0003266433030000042
Indicates the experience obtained by each packet/control packet exploring the path. In step (3), in
Figure GDA0003266433030000043
When , the source node only sends data packets;
Figure GDA0003266433030000044
, the source node sends control packets at the same time as sending data packets.

附图1为本发明实施例提供的水声传感器网络结构图,附图2为本发明实施例提供的合作探索强化学习路由方法的示意图。结合上述结构图和示意图,本实施例公开了一种水声传感器网络合作探索强化学习路由协议的实现方法,如附图3和附图4所示,具体如下:FIG. 1 is a structural diagram of an underwater acoustic sensor network provided by an embodiment of the present invention, and FIG. 2 is a schematic diagram of a cooperative exploration reinforcement learning routing method provided by an embodiment of the present invention. With reference to the above-mentioned structural diagram and schematic diagram, the present embodiment discloses a method for implementing a reinforcement learning routing protocol for cooperative exploration of underwater acoustic sensor networks, as shown in FIG. 3 and FIG. 4 , and the details are as follows:

(1)初始化各节点Q值V值。(1) Initialize the Q value and V value of each node.

(2)确定奖励值函数。(2) Determine the reward value function.

在本实施例中,奖励值函数Rnm为节点n向节点m传输数据包/控制包完成后所获得的即时奖励。In this embodiment, the reward value function R nm is the instant reward obtained after the node n transmits the data packet/control packet to the node m.

Rnm=-g-α1c+α2dR nm =-g-α 1 c+α 2 d

g为节点在传输数据时的固定损耗,c为节点剩余能量消耗函数,d为节点能量分布情况,α1和α2分别为c与d的比重参数。g is the fixed loss when the node transmits data, c is the residual energy consumption function of the node, d is the energy distribution of the node, α 1 and α 2 are the proportion parameters of c and d, respectively.

(3)确定各节点的Q值迭代函数。(3) Determine the Q-value iteration function of each node.

Figure GDA0003266433030000045
Figure GDA0003266433030000045

Figure GDA0003266433030000046
表示节点n在t+1时刻的Q值,α表示Q值的更新速率,γ是折扣因子,
Figure GDA0003266433030000047
为节点m在t时刻的Q值。
Figure GDA0003266433030000046
represents the Q value of node n at time t+1, α represents the update rate of the Q value, γ is the discount factor,
Figure GDA0003266433030000047
is the Q value of node m at time t.

(4)确定源节点V值计算函数。(4) Determine the V value calculation function of the source node.

Figure GDA0003266433030000048
Figure GDA0003266433030000048

Figure GDA0003266433030000049
表示下一时刻源节点的V值,ω为控制包探测路径的归一化参数,
Figure GDA00032664330300000410
表示各数据包/控制包探索路径所得到的经验。
Figure GDA0003266433030000049
represents the V value of the source node at the next moment, ω is the normalization parameter of the control packet detection path,
Figure GDA00032664330300000410
Indicates the experience obtained by each packet/control packet exploring the path.

(5)源节点进行合作探索。(5) The source node conducts cooperative exploration.

本发明的水声传感器网络结构图如附图1所示,简单起见,本实施例的网络结构为单源-单sink,源节点负责收集数据,并将收集到的数据通过水声网络沿着中继节点逐步向上传输,直到sink。sink通过水声接收器接收来自海面下中继节点的数据,并用无线电波向基站发送数据,基站收到sink的数据后进行后续分析和处理。The structure diagram of the underwater acoustic sensor network of the present invention is shown in FIG. 1. For simplicity, the network structure of this embodiment is a single source-single sink, and the source node is responsible for collecting data, and passing the collected data through the underwater acoustic network along the The relay node gradually transmits upwards until the sink. The sink receives the data from the relay nodes under the sea surface through the underwater acoustic receiver, and sends the data to the base station by radio waves, and the base station performs subsequent analysis and processing after receiving the data from the sink.

结合附图2具体说明,当

Figure GDA0003266433030000051
时,源节点同时发送数据包和控制包,在本实施例中,为方便说明,将控制包定为两个。In conjunction with accompanying drawing 2, it is described in detail that when
Figure GDA0003266433030000051
When the source node sends the data packet and the control packet at the same time, in this embodiment, for the convenience of description, two control packets are set.

源节点根据Q值迭代函数计算Q值并更新V值,根据计算结果选择一个节点发送数据包,两个节点发送控制包,在本实施例中,源节点选择它的邻居节点3作为数据包传输的下一跳节点,同时选择节点1和节点5作为控制包传输的下一跳节点。The source node calculates the Q value and updates the V value according to the Q value iteration function, and selects one node to send the data packet according to the calculation result, and two nodes send the control packet. In this embodiment, the source node selects its neighbor node 3 as the data packet transmission. The next hop node is selected, and node 1 and node 5 are selected as the next hop nodes for control packet transmission.

节点1,3,5监听到数据包/控制包后读取包头,将上一跳节点的信息更新至自己的邻居列表中,如果该包发往本节点,根据Q值迭代函数计算Q值,选定Q值为Qmax的节点为下一跳节点,并将V值更新为Qmax,并改写节点信息至包头继续传输。Nodes 1, 3, and 5 read the packet header after listening to the data packet/control packet, and update the information of the previous hop node to their neighbor list. If the packet is sent to this node, the Q value is calculated according to the Q value iteration function. The node with the Q value of Q max is selected as the next hop node, the V value is updated to Q max , and the node information is rewritten to the packet header to continue transmission.

节点1,3,5的邻居节点重复上述动作直到数据包/控制包到达sink。The neighbor nodes of node 1, 3, and 5 repeat the above actions until the data packet/control packet arrives at the sink.

(6)当

Figure GDA0003266433030000052
时,源节点停止发送控制包。(6) When
Figure GDA0003266433030000052
, the source node stops sending control packets.

源节点判断

Figure GDA0003266433030000053
成立,此时,源节点终止控制包的传输,结合路由表通过Q值迭代公式计算最优路径向上传输数据包,直至sink。source node judgment
Figure GDA0003266433030000053
is established, at this time, the source node terminates the transmission of the control packet, and uses the Q-value iteration formula to calculate the optimal path to transmit the data packet upwards in combination with the routing table until the sink.

Claims (1)

1. A method for collaboratively exploring and strengthening learning routing by an underwater acoustic sensor network is characterized by comprising the following steps:
step 1: initializing Q values and V values of all nodes;
step 2: determining the V value of the source node s at the next moment
Figure FDA0003266433020000011
Figure FDA0003266433020000012
Wherein, Vt sA V value representing the source node s at time t; α is the update rate; gamma is a discount factor; omega is a normalization parameter of the detection path of the control packet; reward function RsjFor the instant reward R obtained after the transmission of data/control packets from the source node s to the node j has been completedsj=-g-α1c+α2d, g is the fixed loss of the node in data transmission, c is the function of the node residual energy consumption, d is the node energy distribution condition, alpha1And alpha2Specific gravity parameters of c and d, respectively;
and step 3: if it is
Figure FDA0003266433020000013
The source node sends the data packet and the control packet at the same time; if it is
Figure FDA0003266433020000014
The source node only sends the data packet; ε represents a minimum value greater than 0;
and 4, step 4: after receiving the data packet/control packet, the relay node reads the packet header, updates the information of the previous hop node into a neighbor list of the relay node, and judges whether the relay node continues to send the data packet/control packet to the relay node; if the data packet/control packet is sent to the node, calculating Q value, and selecting Q value as QmaxIs the next hop node, and updates the value of V to QmaxRewriting the node information to the packet header for continuous transmission;
Figure FDA0003266433020000015
wherein,
Figure FDA0003266433020000016
represents the Q value of the node n at the time t; reward function RnmFor the instant reward obtained after the first node n has finished transmitting data packets/control packets to the second node m, Rnm=-g-α1c+α2d;
And 5: if the sink node receives the data packet, the sink node finishes the transmission; otherwise, repeating the steps (2) to (4) until the sink receives the data packet.
CN201811310120.6A 2018-11-06 2018-11-06 Underwater acoustic sensor network cooperation exploration reinforcement learning routing method Active CN109362113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811310120.6A CN109362113B (en) 2018-11-06 2018-11-06 Underwater acoustic sensor network cooperation exploration reinforcement learning routing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811310120.6A CN109362113B (en) 2018-11-06 2018-11-06 Underwater acoustic sensor network cooperation exploration reinforcement learning routing method

Publications (2)

Publication Number Publication Date
CN109362113A CN109362113A (en) 2019-02-19
CN109362113B true CN109362113B (en) 2022-03-18

Family

ID=65344072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811310120.6A Active CN109362113B (en) 2018-11-06 2018-11-06 Underwater acoustic sensor network cooperation exploration reinforcement learning routing method

Country Status (1)

Country Link
CN (1) CN109362113B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110719617B (en) * 2019-09-30 2023-02-03 西安邮电大学 Q Routing Method Based on Arctangent Learning Rate Factor
CN110868727A (en) * 2019-10-28 2020-03-06 辽宁大学 Optimization method of data transmission delay in wireless sensor network
CN111629440A (en) * 2020-05-19 2020-09-04 哈尔滨工程大学 A Convergence Judgment Method of MAC Protocol Using Q-learning
CN112351400B (en) * 2020-10-15 2022-03-11 天津大学 A Routing Policy Generation Method for Underwater Multimodal Networks Based on Improved Reinforcement Learning
CN112469103B (en) * 2020-11-26 2022-03-08 厦门大学 Underwater sound cooperative communication routing method based on reinforcement learning Sarsa algorithm
CN112867089B (en) * 2020-12-31 2022-04-05 厦门大学 Underwater sound network routing method based on information importance and Q learning algorithm
CN112954769B (en) * 2021-01-25 2022-06-21 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning
CN113141592B (en) * 2021-04-11 2022-08-19 西北工业大学 Long-life-cycle underwater acoustic sensor network self-adaptive multi-path routing method
CN113783782B (en) * 2021-09-09 2023-05-30 哈尔滨工程大学 Opportunity routing candidate set node ordering method for deep reinforcement learning
CN114828141B (en) * 2022-04-25 2024-04-19 广西财经学院 A multi-hop routing method for UWSNs based on AUV networking
CN114786236B (en) * 2022-04-27 2024-05-31 曲阜师范大学 Method and device for heuristic learning of routing protocol by wireless sensor network
CN115175268B (en) * 2022-07-01 2023-07-25 重庆邮电大学 Heterogeneous network energy-saving routing method based on deep reinforcement learning
CN115987886B (en) * 2022-12-22 2024-06-04 厦门大学 A Q-learning routing method for underwater acoustic networks based on meta-learning parameter optimization
CN115843083B (en) * 2023-02-24 2023-05-12 青岛科技大学 Underwater wireless sensor network routing method based on multi-agent reinforcement learning
CN118611781B (en) * 2024-08-08 2024-10-18 中山大学 Underwater network data communication method and system based on reinforcement learning and power control
CN118869095B (en) * 2024-08-15 2025-04-29 中国科学院声学研究所 Cross-layer routing protocol method for underwater acoustic communication network based on channel quality

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN107809781A (en) * 2017-11-02 2018-03-16 中国科学院声学研究所 A kind of loop free route selection method of load balancing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITUB20155144A1 (en) * 2015-10-16 2017-04-16 Univ Degli Studi Di Roma La Sapienza Roma ? METHOD OF ADAPTING AND JOINING THE JOURNEY POLICY AND A RETRANSMISSION POLICY OF A KNOT IN A SUBMARINE NETWORK, AND THE MEANS OF ITS IMPLEMENTATION?

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN107809781A (en) * 2017-11-02 2018-03-16 中国科学院声学研究所 A kind of loop free route selection method of load balancing

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"AUV-Aided Communication Method for Underwater Mobile Sensor Network";冯晓宁;《IEEE》;20160413;全文 *
"基于L-π演算的WSN路由协议形式化方法";冯晓宁;《吉林大学学报(工学版)》;20140527;全文 *
"基于反馈的合作强化学习水下路由算法";卜任菲;《通信技术》;20170810;全文 *
"多普勒辅助水下传感器网络时间同步机制研究";王卓;《通信学报》;20170125;全文 *
金志刚."基于指向性换能器水声传感器网络功率控制算法".《华中科技大学学报(自然科学版) 2017-07-14 》.2017,全文. *

Also Published As

Publication number Publication date
CN109362113A (en) 2019-02-19

Similar Documents

Publication Publication Date Title
CN109362113B (en) Underwater acoustic sensor network cooperation exploration reinforcement learning routing method
CN112821940B (en) Satellite network dynamic routing method based on inter-satellite link attribute
CN112867089B (en) Underwater sound network routing method based on information importance and Q learning algorithm
KR101022054B1 (en) Adaptive communication environment setting method and device for underwater sensor network
CN111278078B (en) A Realization Method of Adaptive Routing Protocol for Mobile Sparse Underwater Acoustic Sensor Network
CN108990129A (en) A kind of wireless sensor network cluster-dividing method and system
CN112188583B (en) Ocean underwater wireless sensing network opportunistic routing method based on reinforcement learning
CN109547351B (en) Routing method based on Q-learning and trust model in Ad Hoc network
CN113207156B (en) A wireless sensor network cluster routing method and system
CN102625404A (en) A Distributed Routing Protocol Method Applied to 3D Underwater Acoustic Sensor Network
CN103701567B (en) A kind of self-adaptive modulation method and system for wireless in-ground sensor network
CN103200643A (en) Distributed fault-tolerant topology control method based on dump energy sensing
CN108112050A (en) Energy balance and deep-controlled Routing Protocol based on underwater wireless sensing network
Zou et al. A cluster-based adaptive routing algorithm for underwater acoustic sensor networks
CN106879044B (en) A hole-aware routing method for underwater sensor networks
CN108650030B (en) Water surface multi-sink node deployment method of underwater wireless sensor network
Rahman et al. Routing protocols for underwater ad hoc networks
CN114531716B (en) A routing selection method based on energy consumption and link quality
CN106879042B (en) A kind of underwater wireless sensor network shortest-path rout ing algorithms
Natarajan et al. Adaptive Time Difference of Time of Arrival in Wireless Sensor Network Routing for Enhancing Quality of Service.
CN103607747A (en) Inter-cluster virtual backbone route protocol method based on power control
Diamant et al. Routing in multi-modal underwater networks: A throughput-optimal approach
CN111901237B (en) Source routing method and system, related device and computer readable storage medium
Saravanan et al. Towards an adaptive routing protocol for low power and lossy networks (RPL) for reliable and energy efficient communication in the Internet of Underwater Things (iout)
KR101654734B1 (en) Method for modelling information transmission network having hierarchy structure and apparatus thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant