[go: up one dir, main page]

CN114706631B - Unloading decision method and system in mobile edge calculation based on deep Q learning - Google Patents

Unloading decision method and system in mobile edge calculation based on deep Q learning Download PDF

Info

Publication number
CN114706631B
CN114706631B CN202210427768.1A CN202210427768A CN114706631B CN 114706631 B CN114706631 B CN 114706631B CN 202210427768 A CN202210427768 A CN 202210427768A CN 114706631 B CN114706631 B CN 114706631B
Authority
CN
China
Prior art keywords
task
layer
edge
mobile
mobile device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210427768.1A
Other languages
Chinese (zh)
Other versions
CN114706631A (en
Inventor
杨柱天
朱伟强
杨蔚
佟令宇
杨佳敏
陈迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
8511 Research Institute of CASIC
Original Assignee
Harbin Institute of Technology Shenzhen
8511 Research Institute of CASIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen, 8511 Research Institute of CASIC filed Critical Harbin Institute of Technology Shenzhen
Priority to CN202210427768.1A priority Critical patent/CN114706631B/en
Publication of CN114706631A publication Critical patent/CN114706631A/en
Application granted granted Critical
Publication of CN114706631B publication Critical patent/CN114706631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An unloading decision method and system in mobile edge calculation based on deep Q learning belongs to the technical field of unloading decision of mobile equipment in a mobile edge calculation system. The invention solves the problems of large time delay and high energy consumption generated in the unloading decision process in the existing mobile edge computing system. The invention applies a deep reinforcement learning algorithm to the unloading decision problem in the mobile edge calculation, and designs the corresponding system state, action and reward equation according to the task scheduling models such as a local calculation queue, a task transmission queue, an edge server queue and the like established in the system. By comparing the average time delay and energy consumption of the method with those of other algorithms, the unloading decision method disclosed by the invention can be used for greatly reducing the time delay and energy consumption generated in the unloading decision process in the mobile edge computing system. The method can be applied to the unloading decision of the mobile equipment in the mobile edge computing system.

Description

基于深度Q学习的移动边缘计算中卸载决策方法及系统Method and system for offloading decision-making in mobile edge computing based on deep Q-learning

技术领域technical field

本发明属于移动边缘计算系统中移动设备的卸载决策技术领域,具体涉及一种基于深度Q学习的移动边缘计算中卸载决策方法及系统。The invention belongs to the technical field of offloading decision-making of mobile devices in a mobile edge computing system, and in particular relates to a method and system for offloading decision-making in mobile edge computing based on deep Q-learning.

背景技术Background technique

随着5G和物联网技术的飞速发展,人们已经步入了一个万物互联的新世界。近年来,具有联网功能的移动设备,如智能手机,智能家电,智能穿戴设备等数目呈井喷式增长,与此同时,诸如虚拟现实,实时路径规划,在线视频处理等新功能的出现也对数据传输和数据计算的能力提出了更为严格的要求。如何找到一种有效的方式解决物联网设备对于数据传输和数据计算的需要是一个急需解决的难题,移动边缘计算成为了一种有效的解决方案。With the rapid development of 5G and IoT technologies, people have stepped into a new world where everything is connected. In recent years, the number of mobile devices with networking capabilities, such as smart phones, smart home appliances, and smart wearable devices, has grown exponentially. The ability to transmit and calculate data imposes even more stringent requirements. How to find an effective way to solve the needs of IoT devices for data transmission and data computing is an urgent problem to be solved, and mobile edge computing has become an effective solution.

虽然现有的移动边缘计算方法已经取得了一定的成就,但是现有的移动边缘计算系统中卸载决策过程产生的时延仍然较大,产生的能耗仍然较高,因此,为移动边缘计算系统提出一种卸载决策的方法以降低卸载决策过程产生的时延与能耗是十分必要的。Although the existing mobile edge computing methods have achieved certain achievements, the delay in the offloading decision-making process in the existing mobile edge computing system is still large, and the energy consumption is still high. It is necessary to propose an unloading decision-making method to reduce the delay and energy consumption of the unloading decision-making process.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为解决现有移动边缘计算系统中卸载决策过程产生的时延大、能耗高的问题,而提出的一种基于深度Q学习的移动边缘计算中卸载决策方法及系统。The purpose of the present invention is to solve the problems of large time delay and high energy consumption in the offloading decision-making process in the existing mobile edge computing system, and propose a method and system for offloading decision-making in mobile edge computing based on deep Q-learning.

本发明为解决上述技术问题所采取的技术方案是:The technical scheme that the present invention takes to solve the above-mentioned technical problems is:

基于本发明的一个方面,基于深度Q学习的移动边缘计算中卸载决策方法,所述方法具体包括以下步骤:Based on one aspect of the present invention, a method for unloading decision-making in mobile edge computing based on deep Q learning, the method specifically includes the following steps:

步骤一、强化学习模型构建Step 1. Build a reinforcement learning model

根据任务特性构建马尔可夫决策过程中的系统状态、系统动作和奖励函数;Construct the system state, system action and reward function in the Markov decision process according to the task characteristics;

步骤二、神经网络构建Step 2, Neural Network Construction

构建包括输入层、LSTM层、第一FC层、第二FC层和输出层的神经网络,输入层用于将系统状态信息传递给LSTM层和第一FC层,并将LSTM层的输出作为第一FC层的输入;Construct a neural network including an input layer, an LSTM layer, a first FC layer, a second FC layer, and an output layer. The input layer is used to pass the system state information to the LSTM layer and the first FC layer, and the output of the LSTM layer is used as the first FC layer. An input to the FC layer;

再将第一FC层的输出作为第二FC层的输入,将第二FC层的输出作为输出层的输入。Then the output of the first FC layer is used as the input of the second FC layer, and the output of the second FC layer is used as the input of the output layer.

进一步地,所述系统状态的构建方式为:Further, the construction mode of the system state is:

将当前时隙开始时移动设备m的自身任务大小表示为λm(t),若当前时隙开始时移动设备m存在新的任务k(t),则λm(t)=k(t),否则λm(t)=0;Denote the task size of mobile device m at the beginning of the current time slot as λ m (t), if there is a new task k (t) in mobile device m at the beginning of the current time slot, then λ m (t)=k(t) , otherwise λ m (t)=0;

构建本地计算队列、任务传输队列和边缘节点计算队列,将当前时隙开始时移动设备m的自身任务在本地计算队列中需要等待的时隙数表示为

Figure BDA0003610402240000021
将当前时隙开始时移动设备m的自身任务在任务传输队列中需要等待的时隙数表示为
Figure BDA0003610402240000022
将移动设备m在边缘节点n处的队列长度表示为
Figure BDA0003610402240000023
Construct the local computing queue, task transmission queue and edge node computing queue, and express the number of time slots that the mobile device m's own task needs to wait in the local computing queue at the beginning of the current time slot as
Figure BDA0003610402240000021
The number of timeslots that the mobile device m's own task needs to wait in the task transmission queue at the beginning of the current timeslot is expressed as
Figure BDA0003610402240000022
Denote the queue length of mobile device m at edge node n as
Figure BDA0003610402240000023

构建表示当前时隙之前的T个时隙内每个边缘服务器负载水平的矩阵M(t),M(t)的维度为T×N,N是边缘服务器的个数;Construct a matrix M(t) representing the load level of each edge server in T time slots before the current time slot, where the dimension of M(t) is T×N, where N is the number of edge servers;

则移动设备m在当前时隙处观察到的系统状态sm(t)为:Then the system state s m (t) observed by the mobile device m at the current time slot is:

Figure BDA0003610402240000024
Figure BDA0003610402240000024

进一步地,所述系统动作表示为a(t)={0,1,2,…,N},其中,0表示本地计算,k=1,2,…,N,k表示卸载的边缘服务器的序号。Further, the system action is expressed as a(t)={0, 1, 2, ..., N}, where 0 represents local computing, k=1, 2, ..., N, k represents the serial number.

进一步地,所述奖励函数的构建方式为:Further, the construction method of the reward function is:

若任务被决策为本地计算,则任务等待的时隙数

Figure BDA0003610402240000025
为:The number of slots the task waits for if it is decided to compute locally
Figure BDA0003610402240000025
for:

Figure BDA0003610402240000026
Figure BDA0003610402240000026

其中,

Figure BDA0003610402240000027
表示在时隙t′产生的任务在本地执行完成后的时刻;in,
Figure BDA0003610402240000027
Represents the moment after the local execution of the task generated in the time slot t' is completed;

任务本地计算中所需要的能量

Figure BDA0003610402240000028
为:energy required for task-local computation
Figure BDA0003610402240000028
for:

Figure BDA0003610402240000029
Figure BDA0003610402240000029

其中,εm代表移动设备m本地计算时CPU的能耗系数,即本地CPU计算一个周期所消耗的能量,dm代表移动设备m当前产生的任务的计算量大小,即执行当前产生的任务需要的CPU计算周期数;Among them, ε m represents the energy consumption coefficient of the CPU when the mobile device m calculates locally, that is, the energy consumed by the local CPU to calculate one cycle, and d m represents the calculation amount of the task currently generated by the mobile device m, that is, the current task needs to be executed. The number of CPU computing cycles;

设置移动用户m对于时延和能耗的偏好系数分别为

Figure BDA00036104022400000210
Figure BDA00036104022400000211
那么移动用户m在卸载决策过程中的奖励函数为:Set the preference coefficients of mobile user m for delay and energy consumption as
Figure BDA00036104022400000210
and
Figure BDA00036104022400000211
Then the reward function of mobile user m in the uninstallation decision process is:

Figure BDA0003610402240000031
Figure BDA0003610402240000031

其中,R为奖励函数值,T为移动用户m在本地计算时产生的总时延,即T等于任务排队等待的时隙数目

Figure BDA0003610402240000032
与任务在本地执行过程中产生的时延之和,E为移动用户m产生的总能耗,即
Figure BDA0003610402240000033
Among them, R is the value of the reward function, T is the total delay generated by the mobile user m in the local calculation, that is, T is equal to the number of time slots waiting for the task to queue up.
Figure BDA0003610402240000032
The sum of the delay generated by the task in the local execution process, E is the total energy consumption generated by the mobile user m, namely
Figure BDA0003610402240000033

进一步地,所述奖励函数的构建方式为:Further, the construction method of the reward function is:

若任务被决策为边缘计算,则任务等待的时隙数通过在边缘服务器n执行完成后的时刻

Figure BDA0003610402240000034
来计算,即任务等待的时隙数为
Figure BDA0003610402240000035
If the task is decided to be edge computing, the number of time slots that the task waits for is determined by the moment after the execution of the edge server n is completed.
Figure BDA0003610402240000034
To calculate, that is, the number of time slots the task waits for is
Figure BDA0003610402240000035

任务边缘计算中所需要的能量包括任务上传和任务执行两个部分,将任务上传时移动设备的功率表示为pup,将任务执行时移动设备的功率表示为pe,则对于移动设备m,所需要的能量

Figure BDA0003610402240000036
为:The energy required in task edge computing includes two parts: task uploading and task execution. The power of the mobile device when the task is uploaded is expressed as p up , and the power of the mobile device when the task is executed is expressed as p e , then for the mobile device m, energy required
Figure BDA0003610402240000036
for:

Figure BDA0003610402240000037
Figure BDA0003610402240000037

其中,tn,up代表移动设备m将任务上传到边缘服务器n中消耗的时间,tn,e代表移动设备m在边缘服务器n中执行任务所消耗的时间。Among them, t n,up represents the time consumed by the mobile device m to upload the task to the edge server n, and t n, e represents the time consumed by the mobile device m to execute the task in the edge server n.

此时,用户在卸载决策过程中的奖励函数为:At this point, the user's reward function in the uninstallation decision process is:

Figure BDA0003610402240000038
Figure BDA0003610402240000038

其中,R为奖励函数值,T是任务排队产生的总时延

Figure BDA0003610402240000039
任务上传到边缘服务器n产生的时延tn,up以及任务在边缘服务器n执行产生的时延tn,e的和,E是边缘计算产生的总能耗,即
Figure BDA00036104022400000310
Among them, R is the reward function value, T is the total delay caused by task queuing
Figure BDA0003610402240000039
The sum of the delay t n,up generated by the task uploading to the edge server n and the delay t n,e generated by the task execution on the edge server n, E is the total energy consumption generated by edge computing, namely
Figure BDA00036104022400000310

进一步地,所述奖励函数的构建方式为:Further, the construction method of the reward function is:

若在任务执行完成前已经达到了任务所允许的最大延迟时间,则任务被丢弃,将此时的奖励函数值R设定为一个固定的惩罚值P。If the maximum delay time allowed by the task has been reached before the task execution is completed, the task is discarded, and the reward function value R at this time is set as a fixed penalty value P.

进一步地,所述LSTM层用于根据矩阵M(t)预测边缘服务器负载水平的时间相关性。Further, the LSTM layer is used to predict the temporal correlation of the load level of the edge server according to the matrix M(t).

进一步地,所述第一FC层和第二FC层用于学习系统状态到系统动作奖励函数值的映射,第一FC层和第二FC层均包含一组具有整流线性单元的神经元。Further, the first FC layer and the second FC layer are used to learn the mapping from the system state to the system action reward function value, and both the first FC layer and the second FC layer include a group of neurons with rectified linear units.

更进一步地,所述输出层用于输出当前系统状态采用当前选择的动作对应的奖励函数值。Furthermore, the output layer is used to output the reward function value corresponding to the currently selected action in the current system state.

基于本发明的另一个方面,基于深度Q学习的移动边缘计算中卸载决策系统,所述系统用于执行基于深度Q学习的移动边缘计算中卸载决策方法。Based on another aspect of the present invention, an offloading decision-making system in mobile edge computing based on deep Q-learning is provided, the system is used to execute a decision-making method for offloading in mobile edge computing based on deep Q-learning.

本发明的有益效果是:The beneficial effects of the present invention are:

本发明将深度强化学习算法应用到移动边缘计算中的卸载决策问题,根据系统中建立的本地计算队列,任务传输队列,边缘服务器队列等任务调度模型,设计对应的系统状态,动作和奖励方程。通过对比本发明方法与其他算法的平均时延和能耗,可以得出,本发明的卸载决策方法极大的降低了移动边缘计算系统中卸载决策过程产生的时延与能耗。The invention applies the deep reinforcement learning algorithm to the unloading decision problem in the mobile edge computing, and designs the corresponding system state, action and reward equation according to the task scheduling models established in the system, such as the local computing queue, the task transmission queue, and the edge server queue. By comparing the average delay and energy consumption of the method of the present invention and other algorithms, it can be concluded that the offloading decision method of the present invention greatly reduces the delay and energy consumption of the offloading decision process in the mobile edge computing system.

附图说明Description of drawings

图1为本发明构建的神经网络结构图;Fig. 1 is the neural network structure diagram that the present invention builds;

图2为本发明方法的奖励函数值随迭代次数收敛曲线图;Fig. 2 is the reward function value of the inventive method with the number of iterations convergence curve diagram;

图3为本发明方法与其他三种基线算法的平均奖励值随用户数目变化曲线图;3 is a graph showing the average reward value of the method of the present invention and other three baseline algorithms changing with the number of users;

图4为本发明方法与其他三种基线算法的平均时延随用户数目变化曲线图。FIG. 4 is a graph showing the variation of the average delay with the number of users of the method of the present invention and the other three baseline algorithms.

具体实施方式Detailed ways

具体实施方式一、本实施方式针对MEC系统中多移动设备多服务器的网络场景,提出一种基于深度强化学习的计算卸载策略。将每个移动用户看作一个智能体,任务在卸载的过程中由于有到达的先后之分,需要进行任务排队,本发明构建了包括本地计算队列,任务传输队列,边缘节点计算队列三个队列的任务调度模型。然后分别建立两类任务执行方式下时延和能耗成本计算模型,以最小化系统成本为目标设计方法,使其在连续若干个时隙内产生最小的系统时延与能耗。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 1. In this embodiment, a computing offloading strategy based on deep reinforcement learning is proposed for the network scenario of multiple mobile devices and multiple servers in the MEC system. Each mobile user is regarded as an agent, and tasks need to be queued due to the order of arrival in the process of unloading tasks. The present invention constructs three queues including a local computing queue, a task transmission queue, and an edge node computing queue. task scheduling model. Then, the calculation models of delay and energy consumption cost under two types of task execution modes are established respectively, and the target design method is to minimize the system cost, so that the minimum system delay and energy consumption can be generated in several consecutive time slots.

步骤1、强化学习模型构建:Step 1. Build a reinforcement learning model:

本步骤对于本发明使用DQN进行任务卸载决策的具体实现做了详细说明。主要包括马尔可夫决策过程中系统状态,动作,奖励方程的定义等。This step provides a detailed description of the specific implementation of the present invention using DQN to perform task offloading decision. It mainly includes the definition of system state, action, and reward equation in Markov decision-making process.

1.马尔可夫决策过程构造1. Markov decision process construction

在每个时隙开始时,每个移动设备都会观察其状态(例如任务大小,队列长度等信息)。如果有新任务要处理,移动设备会为该任务选择合适的卸载决策,使其任务计算的长期成本降到最低。将深度强化学习应用于任务卸载决策问题需要构建一个马尔可夫决策过程,构造过程中需要具体定义系统状态,动作和奖励方程。At the beginning of each slot, each mobile device observes its state (information such as task size, queue length, etc.). If there is a new task to process, the mobile device chooses an appropriate offloading decision for that task, minimizing the long-term cost of its task computation. Applying deep reinforcement learning to the task offloading decision problem requires constructing a Markov decision process, in which the system state, action and reward equation need to be specifically defined.

2.系统状态设置2. System status settings

与卸载决策有关的第一个信息是任务本身的特性。首先考虑任务自身大小λ(t)。在每个时隙开始时,移动设备m需要首先观察自身任务的大小,使用λ(t)表示。其中若当前时隙开始时存在新的任务k(t),那么λ(t)=k(t),否则λ(t)=0。注意因为设置新任务的产生也是在时隙开始时产生,所以不存在时隙中产生任务无法计算λ(t)的问题。The first piece of information related to offloading decisions is the nature of the task itself. First consider the size of the task itself λ(t). At the beginning of each time slot, the mobile device m needs to first observe the size of its own task, denoted by λ(t). Among them, if there is a new task k(t) at the beginning of the current time slot, then λ(t)=k(t), otherwise λ(t)=0. Note that because the generation of the new task is also generated at the beginning of the time slot, there is no problem that the task generated in the time slot cannot calculate λ(t).

同样需要考虑的一个任务特性是任务的最大可接受延迟,这个特性对于卸载决策也是有关的。因此也考虑将其加入到系统状态当中。A task characteristic that also needs to be considered is the maximum acceptable latency of the task, which is also relevant for offloading decisions. Therefore, it is also considered to be added to the system state.

任务在三种排队队列中的执行时间等也与卸载决策有关。因此也应将其添加到系统状态当中。具体包括:The execution time of tasks in the three queues is also related to the unloading decision. So it should also be added to the system state. Specifically include:

Figure BDA0003610402240000051
表示任务在本地计算队列中需要等待的时隙数;
Figure BDA0003610402240000051
Indicates the number of time slots that the task needs to wait in the local computing queue;

Figure BDA0003610402240000052
表示任务在传输队列中需要等待的时隙数;
Figure BDA0003610402240000052
Indicates the number of time slots that the task needs to wait in the transmission queue;

Figure BDA0003610402240000053
表示移动设备m在边缘节点n处的队列长度。
Figure BDA0003610402240000053
represents the queue length of mobile device m at edge node n.

由于每个边缘节点的负载水平,即节点处活动队列的数量,是不断变化的。而当前时刻的负载水平又与上一个时刻有强关联性。因此考虑构建边缘节点负载水平矩阵M(t)来体现一段时间内边缘节点的负载水平。Since the load level of each edge node, i.e. the number of active queues at the node, is constantly changing. The load level at the current moment has a strong correlation with the previous moment. Therefore, it is considered to construct an edge node load level matrix M(t) to reflect the load level of edge nodes in a period of time.

具体来说,使用矩阵M(t)表示前T个时隙(从时隙t-T到时隙t-1)内每个边缘节点的负载水平(即该边缘节点的活动队列的数量,最大为移动设备数M)的历史情况。其为一个T×N维的矩阵,其中,T是总的时隙数目,N是边缘节点个数。举例来说,{M(t)}(i,j)表示第t-T+i-1个时隙时边缘节点j的活动队列的数量。Specifically, the matrix M(t) is used to represent the load level of each edge node in the first T time slots (from time slot tT to time slot t-1) (that is, the number of active queues of this edge node, the maximum is mobile History of the number of devices M). It is a T×N-dimensional matrix, where T is the total number of time slots, and N is the number of edge nodes. For example, {M(t)} (i,j) represents the number of active queues of edge node j at the t-T+i-1th time slot.

综合以上叙述,可以把移动设备m在当前时隙处观察到的系统状态定义为一个若干维度的向量,用sm(t)表示,即Based on the above description, the system state observed by the mobile device m at the current time slot can be defined as a vector of several dimensions, represented by s m (t), that is

Figure BDA0003610402240000054
Figure BDA0003610402240000054

3.系统动作3. System action

每个移动设备需要做出的决策首先要确定卸载到边缘服务器还是本地执行,然后需要考虑卸载到哪个边缘服务器。使用0表示本地计算,使用k表示卸载的边缘服务器的序号。假设一共有M个边缘服务器,那么系统动作可以表示为a(t)={0,1,2,…,M}。The decision that each mobile device needs to make is first to decide whether to offload to an edge server or to execute locally, and then it needs to consider which edge server to offload to. Use 0 for local computing and k for the sequence number of the offloaded edge server. Assuming a total of M edge servers, the system action can be expressed as a(t)={0,1,2,...,M}.

4.奖励函数4. Reward function

最影响移动设备应用体验的是卸载产生的时延与能耗。因此本发明奖励函数的设置也围绕任务卸载过程中产生的时延与能耗来构建。What affects the mobile device application experience the most is the delay and energy consumption caused by uninstallation. Therefore, the setting of the reward function of the present invention is also constructed around the time delay and energy consumption generated during the task unloading process.

任务产生的时延从本地计算和边缘计算两种情况进行考虑。The delay generated by the task is considered from both local computing and edge computing.

如果任务被决策为本地计算,那么任务等待的时隙数

Figure BDA0003610402240000061
可以如下计算。If the task is decided to compute locally, the number of slots the task waits for
Figure BDA0003610402240000061
It can be calculated as follows.

Figure BDA0003610402240000062
Figure BDA0003610402240000062

如果任务被决策为边缘计算,那么任务等待的时隙数通过在边缘节点执行完成后的时刻

Figure BDA0003610402240000063
来计算,即为
Figure BDA0003610402240000064
If the task is decided to be edge computing, the number of time slots the task waits for is determined by the moment after the execution of the edge node is completed.
Figure BDA0003610402240000063
to calculate, that is,
Figure BDA0003610402240000064

任务产生的能耗同样从本地计算和边缘计算两种情况进行考虑。The energy consumption of tasks is also considered from both local computing and edge computing.

任务本地计算所需要的能量为

Figure BDA0003610402240000065
The energy required for task-local computation is
Figure BDA0003610402240000065

任务边缘计算中耗能主要在任务上传和任务执行两个部分,假设任务上传时移动设备的功率为pup,任务执行时移动设备的功率为pe,则对于设备i;The energy consumption in task edge computing is mainly in two parts: task uploading and task execution. Assuming that the power of the mobile device is p up when the task is uploaded, and the power of the mobile device is p e when the task is executed, then for device i;

Figure BDA0003610402240000066
Figure BDA0003610402240000066

设置移动用户i对于时延和能耗的偏好系数分别为

Figure BDA0003610402240000067
Figure BDA0003610402240000068
那么设置用户在卸载决策过程中的奖励函数为Set the preference coefficients of mobile user i for delay and energy consumption as
Figure BDA0003610402240000067
and
Figure BDA0003610402240000068
Then set the reward function of the user in the uninstall decision process as

Figure BDA0003610402240000069
Figure BDA0003610402240000069

另一方面,如果任务由于已经达到了它可以接受的最大延迟时间,那么任务被丢弃,那么定义此时的奖励函数为一个固定的惩罚值P,即On the other hand, if the task is discarded because it has reached the maximum delay time it can accept, then the reward function at this time is defined as a fixed penalty value P, that is

R=P (5)R=P (5)

步骤2、神经网络构建,如图1所示:Step 2. Neural network construction, as shown in Figure 1:

1.输入层:该层负责将状态作为输入并将其传递给以下层。对于移动设备m,状态信息λ(t),

Figure BDA0003610402240000071
M(t)将被传递给FC层进行预测。1. Input layer: This layer is responsible for taking the state as input and passing it to the following layers. For mobile device m, the state information λ(t),
Figure BDA0003610402240000071
M(t) will be passed to the FC layer for prediction.

2.LSTM层:矩阵M(t)表示前T个时隙内每个边缘节点的负载水平,而边缘节点的负载水平是时间相关的,也就是说边缘节点的负载水平具有时间依赖性。因此考虑使用LSTM层预测负载水平的时间相关性。2. LSTM layer: The matrix M(t) represents the load level of each edge node in the first T time slots, and the load level of the edge node is time-dependent, that is, the load level of the edge node is time-dependent. So consider using LSTM layers to predict the temporal correlation of load levels.

3.FC层:两个FC层负责学习状态到动作Q值的映射。每个FC层包含一组具有整流线性单元的神经元。3. FC layer: Two FC layers are responsible for learning the mapping of state to action Q value. Each FC layer contains a set of neurons with rectified linear units.

4.输出层:输出层的值对应于当前状态采用当前选择的动作对应的Q值。用来体现当前决策所带来的综合成本,即时延与能耗的均衡值。4. Output layer: The value of the output layer corresponds to the current state and adopts the Q value corresponding to the currently selected action. It is used to reflect the comprehensive cost brought by the current decision, that is, the equilibrium value of delay and energy consumption.

实施例Example

本实施例从网络模型,任务模型,任务调度模型三个方面对本发明应用的系统架构进行介绍。This embodiment introduces the system architecture of the application of the present invention from three aspects: the network model, the task model, and the task scheduling model.

步骤1、网络模型的建立:本发明针对的场景由以下两个部分组成,若干台配备有边缘服务器的基站,若干个需要执行密集型计算任务的移动设备。每个基站配备有一台具有较高计算能力的边缘服务器。假设在每个时隙开始时,每台移动设备以一定的概率产生一个需要执行的密集型任务,任务要么在本地执行,要么全部卸载到边缘服务器端执行,任务不可分割。Step 1. Establishment of a network model: The scenario targeted by the present invention consists of the following two parts, several base stations equipped with edge servers, and several mobile devices that need to perform intensive computing tasks. Each base station is equipped with an edge server with high computing power. It is assumed that at the beginning of each time slot, each mobile device generates an intensive task that needs to be executed with a certain probability. The task is either executed locally or all offloaded to the edge server for execution, and the task is inseparable.

考虑一个包含T个时隙的时间段T={1,2,3,…,T},本发明在仿真中设置T=100,其中设置每个时隙的时间长度为Δ。对于该小区内的N个移动设备,在每个时隙开始时,使用卸载决策变量αi表示是否将计算任务卸载到边缘服务器中执行以及卸载到哪个服务器当中,如果选择将其卸载到第k个边缘服务器中,决策变量ai=k,如果选择将任务本地执行,决策变量ai=0。Consider a time period T={1, 2, 3, . For the N mobile devices in the cell, at the beginning of each time slot, the offloading decision variable α i is used to indicate whether to offload the computing task to the edge server for execution and which server to offload it to, if it is selected to be offloaded to the kth In each edge server, the decision variable a i =k, if you choose to execute the task locally, the decision variable a i =0.

假设移动设备的任务具有相同的优先级。每个边缘节点都有一个CPU,用于处理队列中的任务。在每个时隙开始时,边缘节点处的若干个移动设备对应的任务队列平均共享边缘节点n处CPU的处理能力。It is assumed that the tasks of the mobile device have the same priority. Each edge node has a CPU for processing tasks in the queue. At the beginning of each time slot, the task queues corresponding to several mobile devices at the edge node share the processing power of the CPU at the edge node n on average.

步骤2、任务模型的建立:移动设备产生的任务表示有两个维度,一个是任务的存储空间大小,这里设置任务大小λm(t)为3Mbit到5Mbit之间的随机值,步长为0.1Mbit。另一个是任务可以接受的最大等待时隙数,使用τm表示。当任务等待的时长超过了τm,此任务在系统中会被丢弃。Step 2. Establishment of the task model: The task representation generated by the mobile device has two dimensions. One is the storage space size of the task. Here, the task size λ m (t) is set to a random value between 3Mbit and 5Mbit, and the step size is 0.1 Mbit. The other is the maximum number of waiting slots that a task can accept, denoted by τ m . When a task waits longer than τ m , the task will be discarded in the system.

步骤3、任务调度模型的建立:以边缘服务器一侧为例,由于任务到来有先后之分,在下一个时隙开始阶段边缘服务器可能还没有处理完上个时隙的任务,因此任务会进行排队。本部分构建了本地计算队列,任务传输队列,边缘节点计算队列三个队列模型,作为任务的调度模型。Step 3. Establishment of task scheduling model: Taking the edge server side as an example, since tasks arrive in sequence, at the beginning of the next time slot, the edge server may not have processed the tasks of the previous time slot, so the tasks will be queued . This part builds three queue models of local computing queue, task transmission queue, and edge node computing queue as task scheduling models.

步骤4、算法性能评价与分析:Step 4. Algorithm performance evaluation and analysis:

选择一个场景对算法收敛性进行研究,设置用户数量为120,边缘服务器数目为5,算法迭代1200次,绘制出算法的奖励函数在连续100次决策后的平均奖励值。从图2可以看到平均奖励值在迭代次数达到500次左右时开始收敛,平均奖励值在之后的迭代中逐渐趋于稳定。这说明在多次训练中智能算法已经学习到了比较稳定的卸载策略。Select a scenario to study the convergence of the algorithm, set the number of users to 120, the number of edge servers to 5, the algorithm iterates 1200 times, and plot the average reward value of the algorithm's reward function after 100 consecutive decisions. It can be seen from Figure 2 that the average reward value begins to converge when the number of iterations reaches about 500, and the average reward value gradually becomes stable in the subsequent iterations. This shows that the intelligent algorithm has learned a relatively stable unloading strategy in multiple trainings.

选择其他三种基线算法与本发明设计的基于DQN的算法进行比较,设置移动用户数量从50变化到120,固定边缘服务器数量为5,如图3所示,分别绘制四种算法的平均奖励值曲线,其中三种基线算法使用的奖励值均为100次平均得到。Select the other three baseline algorithms to compare with the DQN-based algorithm designed by the present invention, set the number of mobile users to change from 50 to 120, and the number of fixed edge servers to 5, as shown in Figure 3, and plot the average reward values of the four algorithms respectively Curves, in which the reward values used by the three baseline algorithms are all obtained by averaging 100 times.

从仿真曲线可以看到,随用户数目增加除全部本地计算算法之外,其余三种算法得到的平均奖励值均呈现下降趋势。这是因为本次仿真设置的边缘服务器数目始终为5,而用户数目却逐渐增加,因此用户所能分得的服务器资源也就逐渐紧张。对于全部本地卸载算法,其平均奖励值随用户数变化基本不变,当用户数小于90时,全部卸载算法的性能要优于全部本地计算算法,这是由于此时服务器计算资源较为充足,而随着用户数量继续增加,全卸载算法的性能则差于全部本地计算算法。It can be seen from the simulation curve that with the increase of the number of users, except for all local computing algorithms, the average reward values obtained by the other three algorithms all show a downward trend. This is because the number of edge servers set in this simulation is always 5, but the number of users is gradually increasing, so the server resources that users can share are gradually tight. For all local offloading algorithms, the average reward value is basically unchanged with the number of users. When the number of users is less than 90, the performance of all offloading algorithms is better than that of all local computing algorithms. This is because the server computing resources are relatively sufficient at this time, and As the number of users continues to increase, the performance of the full offload algorithm is worse than that of the full local computing algorithm.

从图4的仿真曲线可以看到,对比其他三种基线算法,DQN算法始终可以获得更低的任务处理时延。和奖励值曲线的变化趋势类似,当用户数目超过90时,由于边缘服务器的计算资源过于紧张,全卸载算法的时延开始超过全部本地计算算法的时延。As can be seen from the simulation curve in Figure 4, compared with the other three baseline algorithms, the DQN algorithm can always obtain lower task processing delays. Similar to the change trend of the reward value curve, when the number of users exceeds 90, the delay of the full offloading algorithm starts to exceed the delay of all local computing algorithms because the computing resources of the edge server are too tight.

本发明的上述算例仅为详细地说明本发明的计算模型和计算流程,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动,这里无法对所有的实施方式予以穷举,凡是属于本发明的技术方案所引伸出的显而易见的变化或变动仍处于本发明的保护范围之列。The above calculation examples of the present invention are only to illustrate the calculation model and calculation process of the present invention in detail, but are not intended to limit the embodiments of the present invention. For those of ordinary skill in the art, on the basis of the above description, other different forms of changes or changes can also be made, and it is impossible to list all the implementations here. Obvious changes or modifications are still within the scope of the present invention.

Claims (3)

1. The unloading decision method in the moving edge calculation based on the deep Q learning is characterized by comprising the following steps:
step one, building a reinforcement learning model
Constructing a system state, a system action and a reward function in the Markov decision process according to the task characteristics;
the system state is constructed in the following manner:
denote the self task size of the mobile device m at the beginning of the current time slot t as λ m (t), if there is a new task k (t) for the mobile device m when the current time slot t starts, λ m (t) = k (t), otherwise λ m (t)=0,m=1,2,…,M 0 ,M 0 Representing the total number of mobile devices;
constructing a local calculation queue, a task transmission queue and an edge node calculation queue, and representing the number of time slots of the self task of the mobile equipment m needing to wait in the local calculation queue when the current time slot t begins as the number of the time slots
Figure FDA0003836648280000011
The number of time slots that the task of the mobile device m needs to wait in the task transmission queue when the current time slot t begins is expressed as
Figure FDA0003836648280000012
Denote the queue length of mobile m at edge node n as
Figure FDA0003836648280000013
Constructing a matrix M (T) for representing the load level of each edge server in T time slots before the current time slot T, wherein the dimension of the M (T) is T multiplied by N, N is the number of the edge servers, and T is less than or equal to T 0 ,T 0 Representing the total time slot number;
the system state s observed by the mobile device m at the current time slot t m (t) is:
Figure FDA0003836648280000014
the system action is represented as a (t) = {0,1,2, \8230;, N }, where 0 represents local computation, k =1,2, \8230; N, k represents the sequence number of the offloaded edge server;
the construction mode of the reward function is as follows:
if the task is decided to be calculated locally, the number of time slots for which the task waits
Figure FDA0003836648280000015
Comprises the following steps:
Figure FDA0003836648280000016
wherein,
Figure FDA0003836648280000017
indicating the time after the task generated at the time slot t' is executed locally;
energy required in the local calculation of a task
Figure FDA0003836648280000018
Comprises the following steps:
Figure FDA0003836648280000019
wherein epsilon m Representing the CPU's coefficient of energy consumption during the local calculation of the mobile device m, i.e. the energy consumed by the local CPU for a cycle, d m Representing the calculation amount of the currently generated task of the mobile device m, namely the number of CPU calculation cycles needed for executing the currently generated task;
setting preference coefficients of the mobile user m to time delay and energy consumption as
Figure FDA0003836648280000021
And
Figure FDA0003836648280000022
then the reward function for mobile user m in the offload decision process is:
Figure FDA0003836648280000023
wherein, R is the value of the reward function, T is the total time delay generated when the mobile user m is locally calculated, namely T is equal to the number of time slots of task queuing waiting
Figure FDA0003836648280000024
E is the total energy consumption generated by the mobile user m, i.e. the sum of the time delays generated during the local execution of the task
Figure FDA0003836648280000025
If the task is decided as the edge calculation, the time slot number of the task waiting is determined by the time after the execution of the edge server n is completed
Figure FDA0003836648280000026
Is calculated as the number of time slots the task waits
Figure FDA0003836648280000027
n=1,2,…,n 0 ,n 0 Representing the total number of edge servers;
the energy required in task edge computing comprises two parts of task uploading and task execution, and the power of the mobile equipment when the task is uploaded is represented as p up The power of the mobile device when a task is performed is denoted as p e Then for mobile device m, the required energy
Figure FDA0003836648280000028
Comprises the following steps:
Figure FDA0003836648280000029
wherein, t n,up Representing the time it takes for mobile device m to upload a task to edge server n, t n,e Represents the time consumed by the mobile device m to perform a task in the edge server n;
at this time, the reward function of the user in the uninstalling decision process is:
Figure FDA00038366482800000210
wherein R is the value of the reward function, and T is the total time delay generated by task queuing
Figure FDA00038366482800000211
Time delay t generated by uploading task to edge server n n,up And the time delay t generated by the execution of the task at the edge server n n,e E is the total energy consumption resulting from the edge calculation, i.e.
Figure FDA00038366482800000212
If the maximum delay time allowed by the task is reached before the task is executed, the task is discarded, and the reward function value R at the moment is set as a fixed penalty value P;
step two, neural network construction
Constructing a neural network comprising an input layer, an LSTM layer, a first FC layer, a second FC layer and an output layer, wherein the input layer is used for transmitting system state information to the LSTM layer and the first FC layer and taking the output of the LSTM layer as the input of the first FC layer;
then taking the output of the first FC layer as the input of a second FC layer, and taking the output of the second FC layer as the input of an output layer;
the LSTM layer is used for predicting the time correlation of the load level of the edge server according to a matrix M (t);
the first FC layer and the second FC layer are used for learning the mapping of system states to system action reward function values, and each of the first FC layer and the second FC layer comprises a group of neurons with rectifying linear units.
2. The method for offloading decision making in moving edge computing based on deep Q learning of claim 1, wherein the output layer is configured to output the reward function value corresponding to the currently selected action adopted by the current system state.
3. The system for unloading decision in moving edge calculation based on deep Q learning is characterized in that the system is used for executing the method for unloading decision in moving edge calculation based on deep Q learning of claim 1 or claim 2.
CN202210427768.1A 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning Active CN114706631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210427768.1A CN114706631B (en) 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210427768.1A CN114706631B (en) 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning

Publications (2)

Publication Number Publication Date
CN114706631A CN114706631A (en) 2022-07-05
CN114706631B true CN114706631B (en) 2022-10-25

Family

ID=82175067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210427768.1A Active CN114706631B (en) 2022-04-22 2022-04-22 Unloading decision method and system in mobile edge calculation based on deep Q learning

Country Status (1)

Country Link
CN (1) CN114706631B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766241A (en) * 2022-11-21 2023-03-07 西安工程大学 Distributed intrusion detection system task scheduling and unloading method based on DQN algorithm
CN116909717B (en) * 2023-09-12 2023-12-05 国能(北京)商务网络有限公司 Task scheduling method
CN118656195B (en) * 2024-08-16 2024-11-05 南京博裕物联科技有限公司 Edge computing task scheduling method based on multi-Agent reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666149A (en) * 2020-05-06 2020-09-15 西北工业大学 Ultra-dense edge computing network mobility management method based on deep reinforcement learning
CN112616152A (en) * 2020-12-08 2021-04-06 重庆邮电大学 Independent learning-based mobile edge computing task unloading method
CN113067873A (en) * 2021-03-19 2021-07-02 北京邮电大学 Edge cloud collaborative optimization method based on deep reinforcement learning
CN113612843A (en) * 2021-08-02 2021-11-05 吉林大学 A task offloading and resource allocation method for MEC based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666149A (en) * 2020-05-06 2020-09-15 西北工业大学 Ultra-dense edge computing network mobility management method based on deep reinforcement learning
CN112616152A (en) * 2020-12-08 2021-04-06 重庆邮电大学 Independent learning-based mobile edge computing task unloading method
CN113067873A (en) * 2021-03-19 2021-07-02 北京邮电大学 Edge cloud collaborative optimization method based on deep reinforcement learning
CN113612843A (en) * 2021-08-02 2021-11-05 吉林大学 A task offloading and resource allocation method for MEC based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向多用户移动边缘计算轻量任务卸载优化;张文献 等;《小型微型计算机系统》;20201031;第41卷(第10期);全文 *

Also Published As

Publication number Publication date
CN114706631A (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN114706631B (en) Unloading decision method and system in mobile edge calculation based on deep Q learning
CN113612843B (en) MEC task unloading and resource allocation method based on deep reinforcement learning
CN113573324B (en) A Joint Optimization Approach for Collaborative Task Offloading and Resource Allocation in Industrial Internet of Things
CN108920280B (en) Mobile edge computing task unloading method under single-user scene
CN112118601B (en) A method to reduce the task offloading delay of 6G digital twin edge computing network
WO2023040022A1 (en) Computing and network collaboration-based distributed computation offloading method in random network
CN111800828A (en) A mobile edge computing resource allocation method for ultra-dense networks
CN114285853B (en) Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things
CN117194057B (en) A resource scheduling method based on reinforcement learning to optimize edge energy consumption and load
CN113626104B (en) Multi-objective optimization offloading strategy based on deep reinforcement learning under edge cloud architecture
Nath et al. Multi-user multi-channel computation offloading and resource allocation for mobile edge computing
CN113946423A (en) Multi-task edge computing scheduling optimization method based on graph attention network
CN116489712B (en) A mobile edge computing task offloading method based on deep reinforcement learning
CN114860337A (en) Computing unloading method based on meta reinforcement learning algorithm
CN114205353A (en) Calculation unloading method based on hybrid action space reinforcement learning algorithm
CN118733143A (en) A task offloading method based on Lyapunov and deep reinforcement learning
CN116866353B (en) Distributed resource collaborative scheduling method, device, equipment and medium for integrated computing
Tang et al. Divisible task offloading for multiuser multiserver mobile edge computing systems based on deep reinforcement learning
CN116431326A (en) Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning
CN116467005A (en) Distributed task offloading method, device and storage medium based on reinforcement learning
CN116204319A (en) Cloud-edge-device collaborative offloading method and system based on SAC algorithm and task dependencies
CN115118728A (en) A Task Scheduling Method for Edge Load Balancing Based on Ant Colony Algorithm
Chen et al. Multi-agent deep reinforcement learning for collaborative task offloading in mobile edge computing networks
CN110768827B (en) Task unloading method based on group intelligent algorithm
CN118055160A (en) System and method for distributing tasks of edge computing server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant