CN114706631B - Unloading decision method and system in mobile edge calculation based on deep Q learning - Google Patents
Unloading decision method and system in mobile edge calculation based on deep Q learning Download PDFInfo
- Publication number
- CN114706631B CN114706631B CN202210427768.1A CN202210427768A CN114706631B CN 114706631 B CN114706631 B CN 114706631B CN 202210427768 A CN202210427768 A CN 202210427768A CN 114706631 B CN114706631 B CN 114706631B
- Authority
- CN
- China
- Prior art keywords
- task
- layer
- edge
- mobile
- mobile device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000004364 calculation method Methods 0.000 title claims abstract description 24
- 238000005265 energy consumption Methods 0.000 claims abstract description 24
- 230000008569 process Effects 0.000 claims abstract description 21
- 230000009471 action Effects 0.000 claims abstract description 16
- 230000005540 biological transmission Effects 0.000 claims abstract description 10
- 230000002787 reinforcement Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 26
- 238000010276 construction Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 230000001934 delay Effects 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 26
- 238000013461 design Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44594—Unloading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
技术领域technical field
本发明属于移动边缘计算系统中移动设备的卸载决策技术领域,具体涉及一种基于深度Q学习的移动边缘计算中卸载决策方法及系统。The invention belongs to the technical field of offloading decision-making of mobile devices in a mobile edge computing system, and in particular relates to a method and system for offloading decision-making in mobile edge computing based on deep Q-learning.
背景技术Background technique
随着5G和物联网技术的飞速发展,人们已经步入了一个万物互联的新世界。近年来,具有联网功能的移动设备,如智能手机,智能家电,智能穿戴设备等数目呈井喷式增长,与此同时,诸如虚拟现实,实时路径规划,在线视频处理等新功能的出现也对数据传输和数据计算的能力提出了更为严格的要求。如何找到一种有效的方式解决物联网设备对于数据传输和数据计算的需要是一个急需解决的难题,移动边缘计算成为了一种有效的解决方案。With the rapid development of 5G and IoT technologies, people have stepped into a new world where everything is connected. In recent years, the number of mobile devices with networking capabilities, such as smart phones, smart home appliances, and smart wearable devices, has grown exponentially. The ability to transmit and calculate data imposes even more stringent requirements. How to find an effective way to solve the needs of IoT devices for data transmission and data computing is an urgent problem to be solved, and mobile edge computing has become an effective solution.
虽然现有的移动边缘计算方法已经取得了一定的成就,但是现有的移动边缘计算系统中卸载决策过程产生的时延仍然较大,产生的能耗仍然较高,因此,为移动边缘计算系统提出一种卸载决策的方法以降低卸载决策过程产生的时延与能耗是十分必要的。Although the existing mobile edge computing methods have achieved certain achievements, the delay in the offloading decision-making process in the existing mobile edge computing system is still large, and the energy consumption is still high. It is necessary to propose an unloading decision-making method to reduce the delay and energy consumption of the unloading decision-making process.
发明内容SUMMARY OF THE INVENTION
本发明的目的是为解决现有移动边缘计算系统中卸载决策过程产生的时延大、能耗高的问题,而提出的一种基于深度Q学习的移动边缘计算中卸载决策方法及系统。The purpose of the present invention is to solve the problems of large time delay and high energy consumption in the offloading decision-making process in the existing mobile edge computing system, and propose a method and system for offloading decision-making in mobile edge computing based on deep Q-learning.
本发明为解决上述技术问题所采取的技术方案是:The technical scheme that the present invention takes to solve the above-mentioned technical problems is:
基于本发明的一个方面,基于深度Q学习的移动边缘计算中卸载决策方法,所述方法具体包括以下步骤:Based on one aspect of the present invention, a method for unloading decision-making in mobile edge computing based on deep Q learning, the method specifically includes the following steps:
步骤一、强化学习模型构建
根据任务特性构建马尔可夫决策过程中的系统状态、系统动作和奖励函数;Construct the system state, system action and reward function in the Markov decision process according to the task characteristics;
步骤二、神经网络构建
构建包括输入层、LSTM层、第一FC层、第二FC层和输出层的神经网络,输入层用于将系统状态信息传递给LSTM层和第一FC层,并将LSTM层的输出作为第一FC层的输入;Construct a neural network including an input layer, an LSTM layer, a first FC layer, a second FC layer, and an output layer. The input layer is used to pass the system state information to the LSTM layer and the first FC layer, and the output of the LSTM layer is used as the first FC layer. An input to the FC layer;
再将第一FC层的输出作为第二FC层的输入,将第二FC层的输出作为输出层的输入。Then the output of the first FC layer is used as the input of the second FC layer, and the output of the second FC layer is used as the input of the output layer.
进一步地,所述系统状态的构建方式为:Further, the construction mode of the system state is:
将当前时隙开始时移动设备m的自身任务大小表示为λm(t),若当前时隙开始时移动设备m存在新的任务k(t),则λm(t)=k(t),否则λm(t)=0;Denote the task size of mobile device m at the beginning of the current time slot as λ m (t), if there is a new task k (t) in mobile device m at the beginning of the current time slot, then λ m (t)=k(t) , otherwise λ m (t)=0;
构建本地计算队列、任务传输队列和边缘节点计算队列,将当前时隙开始时移动设备m的自身任务在本地计算队列中需要等待的时隙数表示为将当前时隙开始时移动设备m的自身任务在任务传输队列中需要等待的时隙数表示为将移动设备m在边缘节点n处的队列长度表示为 Construct the local computing queue, task transmission queue and edge node computing queue, and express the number of time slots that the mobile device m's own task needs to wait in the local computing queue at the beginning of the current time slot as The number of timeslots that the mobile device m's own task needs to wait in the task transmission queue at the beginning of the current timeslot is expressed as Denote the queue length of mobile device m at edge node n as
构建表示当前时隙之前的T个时隙内每个边缘服务器负载水平的矩阵M(t),M(t)的维度为T×N,N是边缘服务器的个数;Construct a matrix M(t) representing the load level of each edge server in T time slots before the current time slot, where the dimension of M(t) is T×N, where N is the number of edge servers;
则移动设备m在当前时隙处观察到的系统状态sm(t)为:Then the system state s m (t) observed by the mobile device m at the current time slot is:
进一步地,所述系统动作表示为a(t)={0,1,2,…,N},其中,0表示本地计算,k=1,2,…,N,k表示卸载的边缘服务器的序号。Further, the system action is expressed as a(t)={0, 1, 2, ..., N}, where 0 represents local computing, k=1, 2, ..., N, k represents the serial number.
进一步地,所述奖励函数的构建方式为:Further, the construction method of the reward function is:
若任务被决策为本地计算,则任务等待的时隙数为:The number of slots the task waits for if it is decided to compute locally for:
其中,表示在时隙t′产生的任务在本地执行完成后的时刻;in, Represents the moment after the local execution of the task generated in the time slot t' is completed;
任务本地计算中所需要的能量为:energy required for task-local computation for:
其中,εm代表移动设备m本地计算时CPU的能耗系数,即本地CPU计算一个周期所消耗的能量,dm代表移动设备m当前产生的任务的计算量大小,即执行当前产生的任务需要的CPU计算周期数;Among them, ε m represents the energy consumption coefficient of the CPU when the mobile device m calculates locally, that is, the energy consumed by the local CPU to calculate one cycle, and d m represents the calculation amount of the task currently generated by the mobile device m, that is, the current task needs to be executed. The number of CPU computing cycles;
设置移动用户m对于时延和能耗的偏好系数分别为和那么移动用户m在卸载决策过程中的奖励函数为:Set the preference coefficients of mobile user m for delay and energy consumption as and Then the reward function of mobile user m in the uninstallation decision process is:
其中,R为奖励函数值,T为移动用户m在本地计算时产生的总时延,即T等于任务排队等待的时隙数目与任务在本地执行过程中产生的时延之和,E为移动用户m产生的总能耗,即 Among them, R is the value of the reward function, T is the total delay generated by the mobile user m in the local calculation, that is, T is equal to the number of time slots waiting for the task to queue up. The sum of the delay generated by the task in the local execution process, E is the total energy consumption generated by the mobile user m, namely
进一步地,所述奖励函数的构建方式为:Further, the construction method of the reward function is:
若任务被决策为边缘计算,则任务等待的时隙数通过在边缘服务器n执行完成后的时刻来计算,即任务等待的时隙数为 If the task is decided to be edge computing, the number of time slots that the task waits for is determined by the moment after the execution of the edge server n is completed. To calculate, that is, the number of time slots the task waits for is
任务边缘计算中所需要的能量包括任务上传和任务执行两个部分,将任务上传时移动设备的功率表示为pup,将任务执行时移动设备的功率表示为pe,则对于移动设备m,所需要的能量为:The energy required in task edge computing includes two parts: task uploading and task execution. The power of the mobile device when the task is uploaded is expressed as p up , and the power of the mobile device when the task is executed is expressed as p e , then for the mobile device m, energy required for:
其中,tn,up代表移动设备m将任务上传到边缘服务器n中消耗的时间,tn,e代表移动设备m在边缘服务器n中执行任务所消耗的时间。Among them, t n,up represents the time consumed by the mobile device m to upload the task to the edge server n, and t n, e represents the time consumed by the mobile device m to execute the task in the edge server n.
此时,用户在卸载决策过程中的奖励函数为:At this point, the user's reward function in the uninstallation decision process is:
其中,R为奖励函数值,T是任务排队产生的总时延任务上传到边缘服务器n产生的时延tn,up以及任务在边缘服务器n执行产生的时延tn,e的和,E是边缘计算产生的总能耗,即 Among them, R is the reward function value, T is the total delay caused by task queuing The sum of the delay t n,up generated by the task uploading to the edge server n and the delay t n,e generated by the task execution on the edge server n, E is the total energy consumption generated by edge computing, namely
进一步地,所述奖励函数的构建方式为:Further, the construction method of the reward function is:
若在任务执行完成前已经达到了任务所允许的最大延迟时间,则任务被丢弃,将此时的奖励函数值R设定为一个固定的惩罚值P。If the maximum delay time allowed by the task has been reached before the task execution is completed, the task is discarded, and the reward function value R at this time is set as a fixed penalty value P.
进一步地,所述LSTM层用于根据矩阵M(t)预测边缘服务器负载水平的时间相关性。Further, the LSTM layer is used to predict the temporal correlation of the load level of the edge server according to the matrix M(t).
进一步地,所述第一FC层和第二FC层用于学习系统状态到系统动作奖励函数值的映射,第一FC层和第二FC层均包含一组具有整流线性单元的神经元。Further, the first FC layer and the second FC layer are used to learn the mapping from the system state to the system action reward function value, and both the first FC layer and the second FC layer include a group of neurons with rectified linear units.
更进一步地,所述输出层用于输出当前系统状态采用当前选择的动作对应的奖励函数值。Furthermore, the output layer is used to output the reward function value corresponding to the currently selected action in the current system state.
基于本发明的另一个方面,基于深度Q学习的移动边缘计算中卸载决策系统,所述系统用于执行基于深度Q学习的移动边缘计算中卸载决策方法。Based on another aspect of the present invention, an offloading decision-making system in mobile edge computing based on deep Q-learning is provided, the system is used to execute a decision-making method for offloading in mobile edge computing based on deep Q-learning.
本发明的有益效果是:The beneficial effects of the present invention are:
本发明将深度强化学习算法应用到移动边缘计算中的卸载决策问题,根据系统中建立的本地计算队列,任务传输队列,边缘服务器队列等任务调度模型,设计对应的系统状态,动作和奖励方程。通过对比本发明方法与其他算法的平均时延和能耗,可以得出,本发明的卸载决策方法极大的降低了移动边缘计算系统中卸载决策过程产生的时延与能耗。The invention applies the deep reinforcement learning algorithm to the unloading decision problem in the mobile edge computing, and designs the corresponding system state, action and reward equation according to the task scheduling models established in the system, such as the local computing queue, the task transmission queue, and the edge server queue. By comparing the average delay and energy consumption of the method of the present invention and other algorithms, it can be concluded that the offloading decision method of the present invention greatly reduces the delay and energy consumption of the offloading decision process in the mobile edge computing system.
附图说明Description of drawings
图1为本发明构建的神经网络结构图;Fig. 1 is the neural network structure diagram that the present invention builds;
图2为本发明方法的奖励函数值随迭代次数收敛曲线图;Fig. 2 is the reward function value of the inventive method with the number of iterations convergence curve diagram;
图3为本发明方法与其他三种基线算法的平均奖励值随用户数目变化曲线图;3 is a graph showing the average reward value of the method of the present invention and other three baseline algorithms changing with the number of users;
图4为本发明方法与其他三种基线算法的平均时延随用户数目变化曲线图。FIG. 4 is a graph showing the variation of the average delay with the number of users of the method of the present invention and the other three baseline algorithms.
具体实施方式Detailed ways
具体实施方式一、本实施方式针对MEC系统中多移动设备多服务器的网络场景,提出一种基于深度强化学习的计算卸载策略。将每个移动用户看作一个智能体,任务在卸载的过程中由于有到达的先后之分,需要进行任务排队,本发明构建了包括本地计算队列,任务传输队列,边缘节点计算队列三个队列的任务调度模型。然后分别建立两类任务执行方式下时延和能耗成本计算模型,以最小化系统成本为目标设计方法,使其在连续若干个时隙内产生最小的系统时延与能耗。DETAILED DESCRIPTION OF THE
步骤1、强化学习模型构建:
本步骤对于本发明使用DQN进行任务卸载决策的具体实现做了详细说明。主要包括马尔可夫决策过程中系统状态,动作,奖励方程的定义等。This step provides a detailed description of the specific implementation of the present invention using DQN to perform task offloading decision. It mainly includes the definition of system state, action, and reward equation in Markov decision-making process.
1.马尔可夫决策过程构造1. Markov decision process construction
在每个时隙开始时,每个移动设备都会观察其状态(例如任务大小,队列长度等信息)。如果有新任务要处理,移动设备会为该任务选择合适的卸载决策,使其任务计算的长期成本降到最低。将深度强化学习应用于任务卸载决策问题需要构建一个马尔可夫决策过程,构造过程中需要具体定义系统状态,动作和奖励方程。At the beginning of each slot, each mobile device observes its state (information such as task size, queue length, etc.). If there is a new task to process, the mobile device chooses an appropriate offloading decision for that task, minimizing the long-term cost of its task computation. Applying deep reinforcement learning to the task offloading decision problem requires constructing a Markov decision process, in which the system state, action and reward equation need to be specifically defined.
2.系统状态设置2. System status settings
与卸载决策有关的第一个信息是任务本身的特性。首先考虑任务自身大小λ(t)。在每个时隙开始时,移动设备m需要首先观察自身任务的大小,使用λ(t)表示。其中若当前时隙开始时存在新的任务k(t),那么λ(t)=k(t),否则λ(t)=0。注意因为设置新任务的产生也是在时隙开始时产生,所以不存在时隙中产生任务无法计算λ(t)的问题。The first piece of information related to offloading decisions is the nature of the task itself. First consider the size of the task itself λ(t). At the beginning of each time slot, the mobile device m needs to first observe the size of its own task, denoted by λ(t). Among them, if there is a new task k(t) at the beginning of the current time slot, then λ(t)=k(t), otherwise λ(t)=0. Note that because the generation of the new task is also generated at the beginning of the time slot, there is no problem that the task generated in the time slot cannot calculate λ(t).
同样需要考虑的一个任务特性是任务的最大可接受延迟,这个特性对于卸载决策也是有关的。因此也考虑将其加入到系统状态当中。A task characteristic that also needs to be considered is the maximum acceptable latency of the task, which is also relevant for offloading decisions. Therefore, it is also considered to be added to the system state.
任务在三种排队队列中的执行时间等也与卸载决策有关。因此也应将其添加到系统状态当中。具体包括:The execution time of tasks in the three queues is also related to the unloading decision. So it should also be added to the system state. Specifically include:
表示任务在本地计算队列中需要等待的时隙数; Indicates the number of time slots that the task needs to wait in the local computing queue;
表示任务在传输队列中需要等待的时隙数; Indicates the number of time slots that the task needs to wait in the transmission queue;
表示移动设备m在边缘节点n处的队列长度。 represents the queue length of mobile device m at edge node n.
由于每个边缘节点的负载水平,即节点处活动队列的数量,是不断变化的。而当前时刻的负载水平又与上一个时刻有强关联性。因此考虑构建边缘节点负载水平矩阵M(t)来体现一段时间内边缘节点的负载水平。Since the load level of each edge node, i.e. the number of active queues at the node, is constantly changing. The load level at the current moment has a strong correlation with the previous moment. Therefore, it is considered to construct an edge node load level matrix M(t) to reflect the load level of edge nodes in a period of time.
具体来说,使用矩阵M(t)表示前T个时隙(从时隙t-T到时隙t-1)内每个边缘节点的负载水平(即该边缘节点的活动队列的数量,最大为移动设备数M)的历史情况。其为一个T×N维的矩阵,其中,T是总的时隙数目,N是边缘节点个数。举例来说,{M(t)}(i,j)表示第t-T+i-1个时隙时边缘节点j的活动队列的数量。Specifically, the matrix M(t) is used to represent the load level of each edge node in the first T time slots (from time slot tT to time slot t-1) (that is, the number of active queues of this edge node, the maximum is mobile History of the number of devices M). It is a T×N-dimensional matrix, where T is the total number of time slots, and N is the number of edge nodes. For example, {M(t)} (i,j) represents the number of active queues of edge node j at the t-T+i-1th time slot.
综合以上叙述,可以把移动设备m在当前时隙处观察到的系统状态定义为一个若干维度的向量,用sm(t)表示,即Based on the above description, the system state observed by the mobile device m at the current time slot can be defined as a vector of several dimensions, represented by s m (t), that is
3.系统动作3. System action
每个移动设备需要做出的决策首先要确定卸载到边缘服务器还是本地执行,然后需要考虑卸载到哪个边缘服务器。使用0表示本地计算,使用k表示卸载的边缘服务器的序号。假设一共有M个边缘服务器,那么系统动作可以表示为a(t)={0,1,2,…,M}。The decision that each mobile device needs to make is first to decide whether to offload to an edge server or to execute locally, and then it needs to consider which edge server to offload to. Use 0 for local computing and k for the sequence number of the offloaded edge server. Assuming a total of M edge servers, the system action can be expressed as a(t)={0,1,2,...,M}.
4.奖励函数4. Reward function
最影响移动设备应用体验的是卸载产生的时延与能耗。因此本发明奖励函数的设置也围绕任务卸载过程中产生的时延与能耗来构建。What affects the mobile device application experience the most is the delay and energy consumption caused by uninstallation. Therefore, the setting of the reward function of the present invention is also constructed around the time delay and energy consumption generated during the task unloading process.
任务产生的时延从本地计算和边缘计算两种情况进行考虑。The delay generated by the task is considered from both local computing and edge computing.
如果任务被决策为本地计算,那么任务等待的时隙数可以如下计算。If the task is decided to compute locally, the number of slots the task waits for It can be calculated as follows.
如果任务被决策为边缘计算,那么任务等待的时隙数通过在边缘节点执行完成后的时刻来计算,即为 If the task is decided to be edge computing, the number of time slots the task waits for is determined by the moment after the execution of the edge node is completed. to calculate, that is,
任务产生的能耗同样从本地计算和边缘计算两种情况进行考虑。The energy consumption of tasks is also considered from both local computing and edge computing.
任务本地计算所需要的能量为 The energy required for task-local computation is
任务边缘计算中耗能主要在任务上传和任务执行两个部分,假设任务上传时移动设备的功率为pup,任务执行时移动设备的功率为pe,则对于设备i;The energy consumption in task edge computing is mainly in two parts: task uploading and task execution. Assuming that the power of the mobile device is p up when the task is uploaded, and the power of the mobile device is p e when the task is executed, then for device i;
设置移动用户i对于时延和能耗的偏好系数分别为和那么设置用户在卸载决策过程中的奖励函数为Set the preference coefficients of mobile user i for delay and energy consumption as and Then set the reward function of the user in the uninstall decision process as
另一方面,如果任务由于已经达到了它可以接受的最大延迟时间,那么任务被丢弃,那么定义此时的奖励函数为一个固定的惩罚值P,即On the other hand, if the task is discarded because it has reached the maximum delay time it can accept, then the reward function at this time is defined as a fixed penalty value P, that is
R=P (5)R=P (5)
步骤2、神经网络构建,如图1所示:
1.输入层:该层负责将状态作为输入并将其传递给以下层。对于移动设备m,状态信息λ(t),M(t)将被传递给FC层进行预测。1. Input layer: This layer is responsible for taking the state as input and passing it to the following layers. For mobile device m, the state information λ(t), M(t) will be passed to the FC layer for prediction.
2.LSTM层:矩阵M(t)表示前T个时隙内每个边缘节点的负载水平,而边缘节点的负载水平是时间相关的,也就是说边缘节点的负载水平具有时间依赖性。因此考虑使用LSTM层预测负载水平的时间相关性。2. LSTM layer: The matrix M(t) represents the load level of each edge node in the first T time slots, and the load level of the edge node is time-dependent, that is, the load level of the edge node is time-dependent. So consider using LSTM layers to predict the temporal correlation of load levels.
3.FC层:两个FC层负责学习状态到动作Q值的映射。每个FC层包含一组具有整流线性单元的神经元。3. FC layer: Two FC layers are responsible for learning the mapping of state to action Q value. Each FC layer contains a set of neurons with rectified linear units.
4.输出层:输出层的值对应于当前状态采用当前选择的动作对应的Q值。用来体现当前决策所带来的综合成本,即时延与能耗的均衡值。4. Output layer: The value of the output layer corresponds to the current state and adopts the Q value corresponding to the currently selected action. It is used to reflect the comprehensive cost brought by the current decision, that is, the equilibrium value of delay and energy consumption.
实施例Example
本实施例从网络模型,任务模型,任务调度模型三个方面对本发明应用的系统架构进行介绍。This embodiment introduces the system architecture of the application of the present invention from three aspects: the network model, the task model, and the task scheduling model.
步骤1、网络模型的建立:本发明针对的场景由以下两个部分组成,若干台配备有边缘服务器的基站,若干个需要执行密集型计算任务的移动设备。每个基站配备有一台具有较高计算能力的边缘服务器。假设在每个时隙开始时,每台移动设备以一定的概率产生一个需要执行的密集型任务,任务要么在本地执行,要么全部卸载到边缘服务器端执行,任务不可分割。
考虑一个包含T个时隙的时间段T={1,2,3,…,T},本发明在仿真中设置T=100,其中设置每个时隙的时间长度为Δ。对于该小区内的N个移动设备,在每个时隙开始时,使用卸载决策变量αi表示是否将计算任务卸载到边缘服务器中执行以及卸载到哪个服务器当中,如果选择将其卸载到第k个边缘服务器中,决策变量ai=k,如果选择将任务本地执行,决策变量ai=0。Consider a time period T={1, 2, 3, . For the N mobile devices in the cell, at the beginning of each time slot, the offloading decision variable α i is used to indicate whether to offload the computing task to the edge server for execution and which server to offload it to, if it is selected to be offloaded to the kth In each edge server, the decision variable a i =k, if you choose to execute the task locally, the decision variable a i =0.
假设移动设备的任务具有相同的优先级。每个边缘节点都有一个CPU,用于处理队列中的任务。在每个时隙开始时,边缘节点处的若干个移动设备对应的任务队列平均共享边缘节点n处CPU的处理能力。It is assumed that the tasks of the mobile device have the same priority. Each edge node has a CPU for processing tasks in the queue. At the beginning of each time slot, the task queues corresponding to several mobile devices at the edge node share the processing power of the CPU at the edge node n on average.
步骤2、任务模型的建立:移动设备产生的任务表示有两个维度,一个是任务的存储空间大小,这里设置任务大小λm(t)为3Mbit到5Mbit之间的随机值,步长为0.1Mbit。另一个是任务可以接受的最大等待时隙数,使用τm表示。当任务等待的时长超过了τm,此任务在系统中会被丢弃。
步骤3、任务调度模型的建立:以边缘服务器一侧为例,由于任务到来有先后之分,在下一个时隙开始阶段边缘服务器可能还没有处理完上个时隙的任务,因此任务会进行排队。本部分构建了本地计算队列,任务传输队列,边缘节点计算队列三个队列模型,作为任务的调度模型。
步骤4、算法性能评价与分析:
选择一个场景对算法收敛性进行研究,设置用户数量为120,边缘服务器数目为5,算法迭代1200次,绘制出算法的奖励函数在连续100次决策后的平均奖励值。从图2可以看到平均奖励值在迭代次数达到500次左右时开始收敛,平均奖励值在之后的迭代中逐渐趋于稳定。这说明在多次训练中智能算法已经学习到了比较稳定的卸载策略。Select a scenario to study the convergence of the algorithm, set the number of users to 120, the number of edge servers to 5, the algorithm iterates 1200 times, and plot the average reward value of the algorithm's reward function after 100 consecutive decisions. It can be seen from Figure 2 that the average reward value begins to converge when the number of iterations reaches about 500, and the average reward value gradually becomes stable in the subsequent iterations. This shows that the intelligent algorithm has learned a relatively stable unloading strategy in multiple trainings.
选择其他三种基线算法与本发明设计的基于DQN的算法进行比较,设置移动用户数量从50变化到120,固定边缘服务器数量为5,如图3所示,分别绘制四种算法的平均奖励值曲线,其中三种基线算法使用的奖励值均为100次平均得到。Select the other three baseline algorithms to compare with the DQN-based algorithm designed by the present invention, set the number of mobile users to change from 50 to 120, and the number of fixed edge servers to 5, as shown in Figure 3, and plot the average reward values of the four algorithms respectively Curves, in which the reward values used by the three baseline algorithms are all obtained by averaging 100 times.
从仿真曲线可以看到,随用户数目增加除全部本地计算算法之外,其余三种算法得到的平均奖励值均呈现下降趋势。这是因为本次仿真设置的边缘服务器数目始终为5,而用户数目却逐渐增加,因此用户所能分得的服务器资源也就逐渐紧张。对于全部本地卸载算法,其平均奖励值随用户数变化基本不变,当用户数小于90时,全部卸载算法的性能要优于全部本地计算算法,这是由于此时服务器计算资源较为充足,而随着用户数量继续增加,全卸载算法的性能则差于全部本地计算算法。It can be seen from the simulation curve that with the increase of the number of users, except for all local computing algorithms, the average reward values obtained by the other three algorithms all show a downward trend. This is because the number of edge servers set in this simulation is always 5, but the number of users is gradually increasing, so the server resources that users can share are gradually tight. For all local offloading algorithms, the average reward value is basically unchanged with the number of users. When the number of users is less than 90, the performance of all offloading algorithms is better than that of all local computing algorithms. This is because the server computing resources are relatively sufficient at this time, and As the number of users continues to increase, the performance of the full offload algorithm is worse than that of the full local computing algorithm.
从图4的仿真曲线可以看到,对比其他三种基线算法,DQN算法始终可以获得更低的任务处理时延。和奖励值曲线的变化趋势类似,当用户数目超过90时,由于边缘服务器的计算资源过于紧张,全卸载算法的时延开始超过全部本地计算算法的时延。As can be seen from the simulation curve in Figure 4, compared with the other three baseline algorithms, the DQN algorithm can always obtain lower task processing delays. Similar to the change trend of the reward value curve, when the number of users exceeds 90, the delay of the full offloading algorithm starts to exceed the delay of all local computing algorithms because the computing resources of the edge server are too tight.
本发明的上述算例仅为详细地说明本发明的计算模型和计算流程,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动,这里无法对所有的实施方式予以穷举,凡是属于本发明的技术方案所引伸出的显而易见的变化或变动仍处于本发明的保护范围之列。The above calculation examples of the present invention are only to illustrate the calculation model and calculation process of the present invention in detail, but are not intended to limit the embodiments of the present invention. For those of ordinary skill in the art, on the basis of the above description, other different forms of changes or changes can also be made, and it is impossible to list all the implementations here. Obvious changes or modifications are still within the scope of the present invention.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210427768.1A CN114706631B (en) | 2022-04-22 | 2022-04-22 | Unloading decision method and system in mobile edge calculation based on deep Q learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210427768.1A CN114706631B (en) | 2022-04-22 | 2022-04-22 | Unloading decision method and system in mobile edge calculation based on deep Q learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114706631A CN114706631A (en) | 2022-07-05 |
CN114706631B true CN114706631B (en) | 2022-10-25 |
Family
ID=82175067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210427768.1A Active CN114706631B (en) | 2022-04-22 | 2022-04-22 | Unloading decision method and system in mobile edge calculation based on deep Q learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114706631B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115766241A (en) * | 2022-11-21 | 2023-03-07 | 西安工程大学 | Distributed intrusion detection system task scheduling and unloading method based on DQN algorithm |
CN116909717B (en) * | 2023-09-12 | 2023-12-05 | 国能(北京)商务网络有限公司 | Task scheduling method |
CN118656195B (en) * | 2024-08-16 | 2024-11-05 | 南京博裕物联科技有限公司 | Edge computing task scheduling method based on multi-Agent reinforcement learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666149A (en) * | 2020-05-06 | 2020-09-15 | 西北工业大学 | Ultra-dense edge computing network mobility management method based on deep reinforcement learning |
CN112616152A (en) * | 2020-12-08 | 2021-04-06 | 重庆邮电大学 | Independent learning-based mobile edge computing task unloading method |
CN113067873A (en) * | 2021-03-19 | 2021-07-02 | 北京邮电大学 | Edge cloud collaborative optimization method based on deep reinforcement learning |
CN113612843A (en) * | 2021-08-02 | 2021-11-05 | 吉林大学 | A task offloading and resource allocation method for MEC based on deep reinforcement learning |
-
2022
- 2022-04-22 CN CN202210427768.1A patent/CN114706631B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666149A (en) * | 2020-05-06 | 2020-09-15 | 西北工业大学 | Ultra-dense edge computing network mobility management method based on deep reinforcement learning |
CN112616152A (en) * | 2020-12-08 | 2021-04-06 | 重庆邮电大学 | Independent learning-based mobile edge computing task unloading method |
CN113067873A (en) * | 2021-03-19 | 2021-07-02 | 北京邮电大学 | Edge cloud collaborative optimization method based on deep reinforcement learning |
CN113612843A (en) * | 2021-08-02 | 2021-11-05 | 吉林大学 | A task offloading and resource allocation method for MEC based on deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
面向多用户移动边缘计算轻量任务卸载优化;张文献 等;《小型微型计算机系统》;20201031;第41卷(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114706631A (en) | 2022-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114706631B (en) | Unloading decision method and system in mobile edge calculation based on deep Q learning | |
CN113612843B (en) | MEC task unloading and resource allocation method based on deep reinforcement learning | |
CN113573324B (en) | A Joint Optimization Approach for Collaborative Task Offloading and Resource Allocation in Industrial Internet of Things | |
CN108920280B (en) | Mobile edge computing task unloading method under single-user scene | |
CN112118601B (en) | A method to reduce the task offloading delay of 6G digital twin edge computing network | |
WO2023040022A1 (en) | Computing and network collaboration-based distributed computation offloading method in random network | |
CN111800828A (en) | A mobile edge computing resource allocation method for ultra-dense networks | |
CN114285853B (en) | Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things | |
CN117194057B (en) | A resource scheduling method based on reinforcement learning to optimize edge energy consumption and load | |
CN113626104B (en) | Multi-objective optimization offloading strategy based on deep reinforcement learning under edge cloud architecture | |
Nath et al. | Multi-user multi-channel computation offloading and resource allocation for mobile edge computing | |
CN113946423A (en) | Multi-task edge computing scheduling optimization method based on graph attention network | |
CN116489712B (en) | A mobile edge computing task offloading method based on deep reinforcement learning | |
CN114860337A (en) | Computing unloading method based on meta reinforcement learning algorithm | |
CN114205353A (en) | Calculation unloading method based on hybrid action space reinforcement learning algorithm | |
CN118733143A (en) | A task offloading method based on Lyapunov and deep reinforcement learning | |
CN116866353B (en) | Distributed resource collaborative scheduling method, device, equipment and medium for integrated computing | |
Tang et al. | Divisible task offloading for multiuser multiserver mobile edge computing systems based on deep reinforcement learning | |
CN116431326A (en) | Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning | |
CN116467005A (en) | Distributed task offloading method, device and storage medium based on reinforcement learning | |
CN116204319A (en) | Cloud-edge-device collaborative offloading method and system based on SAC algorithm and task dependencies | |
CN115118728A (en) | A Task Scheduling Method for Edge Load Balancing Based on Ant Colony Algorithm | |
Chen et al. | Multi-agent deep reinforcement learning for collaborative task offloading in mobile edge computing networks | |
CN110768827B (en) | Task unloading method based on group intelligent algorithm | |
CN118055160A (en) | System and method for distributing tasks of edge computing server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |