CN114706631B

CN114706631B - Unloading decision method and system in mobile edge calculation based on deep Q learning

Info

Publication number: CN114706631B
Application number: CN202210427768.1A
Authority: CN
Inventors: 杨柱天; 朱伟强; 杨蔚; 佟令宇; 杨佳敏; 陈迪
Original assignee: Harbin Institute of Technology Shenzhen; 8511 Research Institute of CASIC
Current assignee: Harbin Institute of Technology Shenzhen; 8511 Research Institute of CASIC
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2022-10-25
Anticipated expiration: 2042-04-22
Also published as: CN114706631A

Abstract

An unloading decision method and system in mobile edge calculation based on deep Q learning belongs to the technical field of unloading decision of mobile equipment in a mobile edge calculation system. The invention solves the problems of large time delay and high energy consumption generated in the unloading decision process in the existing mobile edge computing system. The invention applies a deep reinforcement learning algorithm to the unloading decision problem in the mobile edge calculation, and designs the corresponding system state, action and reward equation according to the task scheduling models such as a local calculation queue, a task transmission queue, an edge server queue and the like established in the system. By comparing the average time delay and energy consumption of the method with those of other algorithms, the unloading decision method disclosed by the invention can be used for greatly reducing the time delay and energy consumption generated in the unloading decision process in the mobile edge computing system. The method can be applied to the unloading decision of the mobile equipment in the mobile edge computing system.

Description

Method and system for offloading decision-making in mobile edge computing based on deep Q-learning

技术领域technical field

本发明属于移动边缘计算系统中移动设备的卸载决策技术领域，具体涉及一种基于深度Q学习的移动边缘计算中卸载决策方法及系统。The invention belongs to the technical field of offloading decision-making of mobile devices in a mobile edge computing system, and in particular relates to a method and system for offloading decision-making in mobile edge computing based on deep Q-learning.

背景技术Background technique

随着5G和物联网技术的飞速发展，人们已经步入了一个万物互联的新世界。近年来，具有联网功能的移动设备，如智能手机，智能家电，智能穿戴设备等数目呈井喷式增长，与此同时，诸如虚拟现实，实时路径规划，在线视频处理等新功能的出现也对数据传输和数据计算的能力提出了更为严格的要求。如何找到一种有效的方式解决物联网设备对于数据传输和数据计算的需要是一个急需解决的难题，移动边缘计算成为了一种有效的解决方案。With the rapid development of 5G and IoT technologies, people have stepped into a new world where everything is connected. In recent years, the number of mobile devices with networking capabilities, such as smart phones, smart home appliances, and smart wearable devices, has grown exponentially. The ability to transmit and calculate data imposes even more stringent requirements. How to find an effective way to solve the needs of IoT devices for data transmission and data computing is an urgent problem to be solved, and mobile edge computing has become an effective solution.

虽然现有的移动边缘计算方法已经取得了一定的成就，但是现有的移动边缘计算系统中卸载决策过程产生的时延仍然较大，产生的能耗仍然较高，因此，为移动边缘计算系统提出一种卸载决策的方法以降低卸载决策过程产生的时延与能耗是十分必要的。Although the existing mobile edge computing methods have achieved certain achievements, the delay in the offloading decision-making process in the existing mobile edge computing system is still large, and the energy consumption is still high. It is necessary to propose an unloading decision-making method to reduce the delay and energy consumption of the unloading decision-making process.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为解决现有移动边缘计算系统中卸载决策过程产生的时延大、能耗高的问题，而提出的一种基于深度Q学习的移动边缘计算中卸载决策方法及系统。The purpose of the present invention is to solve the problems of large time delay and high energy consumption in the offloading decision-making process in the existing mobile edge computing system, and propose a method and system for offloading decision-making in mobile edge computing based on deep Q-learning.

本发明为解决上述技术问题所采取的技术方案是：The technical scheme that the present invention takes to solve the above-mentioned technical problems is:

基于本发明的一个方面，基于深度Q学习的移动边缘计算中卸载决策方法，所述方法具体包括以下步骤：Based on one aspect of the present invention, a method for unloading decision-making in mobile edge computing based on deep Q learning, the method specifically includes the following steps:

步骤一、强化学习模型构建Step 1. Build a reinforcement learning model

根据任务特性构建马尔可夫决策过程中的系统状态、系统动作和奖励函数；Construct the system state, system action and reward function in the Markov decision process according to the task characteristics;

步骤二、神经网络构建Step 2, Neural Network Construction

构建包括输入层、LSTM层、第一FC层、第二FC层和输出层的神经网络，输入层用于将系统状态信息传递给LSTM层和第一FC层，并将LSTM层的输出作为第一FC层的输入；Construct a neural network including an input layer, an LSTM layer, a first FC layer, a second FC layer, and an output layer. The input layer is used to pass the system state information to the LSTM layer and the first FC layer, and the output of the LSTM layer is used as the first FC layer. An input to the FC layer;

再将第一FC层的输出作为第二FC层的输入，将第二FC层的输出作为输出层的输入。Then the output of the first FC layer is used as the input of the second FC layer, and the output of the second FC layer is used as the input of the output layer.

进一步地，所述系统状态的构建方式为：Further, the construction mode of the system state is:

将当前时隙开始时移动设备m的自身任务大小表示为λ_m(t)，若当前时隙开始时移动设备m存在新的任务k(t)，则λ_m(t)＝k(t)，否则λ_m(t)＝0；Denote the task size of mobile device m at the beginning of the current time slot as λ _m (t), if there is a new task k (t) in mobile device m at the beginning of the current time slot, then λ _m (t)=k(t) , otherwise λ _m (t)=0;

构建本地计算队列、任务传输队列和边缘节点计算队列，将当前时隙开始时移动设备m的自身任务在本地计算队列中需要等待的时隙数表示为

将当前时隙开始时移动设备m的自身任务在任务传输队列中需要等待的时隙数表示为

将移动设备m在边缘节点n处的队列长度表示为

Construct the local computing queue, task transmission queue and edge node computing queue, and express the number of time slots that the mobile device m's own task needs to wait in the local computing queue at the beginning of the current time slot as

The number of timeslots that the mobile device m's own task needs to wait in the task transmission queue at the beginning of the current timeslot is expressed as

Denote the queue length of mobile device m at edge node n as

构建表示当前时隙之前的T个时隙内每个边缘服务器负载水平的矩阵M(t)，M(t)的维度为T×N，N是边缘服务器的个数；Construct a matrix M(t) representing the load level of each edge server in T time slots before the current time slot, where the dimension of M(t) is T×N, where N is the number of edge servers;

则移动设备m在当前时隙处观察到的系统状态s_m(t)为：Then the system state s _m (t) observed by the mobile device m at the current time slot is:

进一步地，所述系统动作表示为a(t)＝{0，1，2，…，N}，其中，0表示本地计算，k＝1，2，…，N，k表示卸载的边缘服务器的序号。Further, the system action is expressed as a(t)={0, 1, 2, ..., N}, where 0 represents local computing, k=1, 2, ..., N, k represents the serial number.

进一步地，所述奖励函数的构建方式为：Further, the construction method of the reward function is:

若任务被决策为本地计算，则任务等待的时隙数

为：The number of slots the task waits for if it is decided to compute locally

for:

其中，

表示在时隙t′产生的任务在本地执行完成后的时刻；in,

Represents the moment after the local execution of the task generated in the time slot t' is completed;

任务本地计算中所需要的能量

为：energy required for task-local computation

for:

其中，ε_m代表移动设备m本地计算时CPU的能耗系数，即本地CPU计算一个周期所消耗的能量，d_m代表移动设备m当前产生的任务的计算量大小，即执行当前产生的任务需要的CPU计算周期数；Among them, ε _m represents the energy consumption coefficient of the CPU when the mobile device m calculates locally, that is, the energy consumed by the local CPU to calculate one cycle, and d _m represents the calculation amount of the task currently generated by the mobile device m, that is, the current task needs to be executed. The number of CPU computing cycles;

设置移动用户m对于时延和能耗的偏好系数分别为

和

那么移动用户m在卸载决策过程中的奖励函数为：Set the preference coefficients of mobile user m for delay and energy consumption as

and

Then the reward function of mobile user m in the uninstallation decision process is:

其中，R为奖励函数值，T为移动用户m在本地计算时产生的总时延，即T等于任务排队等待的时隙数目

与任务在本地执行过程中产生的时延之和，E为移动用户m产生的总能耗，即

Among them, R is the value of the reward function, T is the total delay generated by the mobile user m in the local calculation, that is, T is equal to the number of time slots waiting for the task to queue up.

The sum of the delay generated by the task in the local execution process, E is the total energy consumption generated by the mobile user m, namely

若任务被决策为边缘计算，则任务等待的时隙数通过在边缘服务器n执行完成后的时刻

来计算，即任务等待的时隙数为

If the task is decided to be edge computing, the number of time slots that the task waits for is determined by the moment after the execution of the edge server n is completed.

To calculate, that is, the number of time slots the task waits for is

任务边缘计算中所需要的能量包括任务上传和任务执行两个部分，将任务上传时移动设备的功率表示为p_up，将任务执行时移动设备的功率表示为p_e，则对于移动设备m，所需要的能量

为：The energy required in task edge computing includes two parts: task uploading and task execution. The power of the mobile device when the task is uploaded is expressed as p _up , and the power of the mobile device when the task is executed is expressed as p _e , then for the mobile device m, energy required

for:

其中，t_n,up代表移动设备m将任务上传到边缘服务器n中消耗的时间，t_n,e代表移动设备m在边缘服务器n中执行任务所消耗的时间。Among them, t _n,up represents the time consumed by the mobile device m to upload the task to the edge server n, and t _{n, e} represents the time consumed by the mobile device m to execute the task in the edge server n.

此时，用户在卸载决策过程中的奖励函数为：At this point, the user's reward function in the uninstallation decision process is:

其中，R为奖励函数值，T是任务排队产生的总时延

任务上传到边缘服务器n产生的时延t_n,up以及任务在边缘服务器n执行产生的时延t_n,e的和，E是边缘计算产生的总能耗，即

Among them, R is the reward function value, T is the total delay caused by task queuing

The sum of the delay t _n,up generated by the task uploading to the edge server n and the delay t _n,e generated by the task execution on the edge server n, E is the total energy consumption generated by edge computing, namely

若在任务执行完成前已经达到了任务所允许的最大延迟时间，则任务被丢弃，将此时的奖励函数值R设定为一个固定的惩罚值P。If the maximum delay time allowed by the task has been reached before the task execution is completed, the task is discarded, and the reward function value R at this time is set as a fixed penalty value P.

进一步地，所述LSTM层用于根据矩阵M(t)预测边缘服务器负载水平的时间相关性。Further, the LSTM layer is used to predict the temporal correlation of the load level of the edge server according to the matrix M(t).

进一步地，所述第一FC层和第二FC层用于学习系统状态到系统动作奖励函数值的映射，第一FC层和第二FC层均包含一组具有整流线性单元的神经元。Further, the first FC layer and the second FC layer are used to learn the mapping from the system state to the system action reward function value, and both the first FC layer and the second FC layer include a group of neurons with rectified linear units.

更进一步地，所述输出层用于输出当前系统状态采用当前选择的动作对应的奖励函数值。Furthermore, the output layer is used to output the reward function value corresponding to the currently selected action in the current system state.

基于本发明的另一个方面，基于深度Q学习的移动边缘计算中卸载决策系统，所述系统用于执行基于深度Q学习的移动边缘计算中卸载决策方法。Based on another aspect of the present invention, an offloading decision-making system in mobile edge computing based on deep Q-learning is provided, the system is used to execute a decision-making method for offloading in mobile edge computing based on deep Q-learning.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明将深度强化学习算法应用到移动边缘计算中的卸载决策问题，根据系统中建立的本地计算队列，任务传输队列，边缘服务器队列等任务调度模型，设计对应的系统状态，动作和奖励方程。通过对比本发明方法与其他算法的平均时延和能耗，可以得出，本发明的卸载决策方法极大的降低了移动边缘计算系统中卸载决策过程产生的时延与能耗。The invention applies the deep reinforcement learning algorithm to the unloading decision problem in the mobile edge computing, and designs the corresponding system state, action and reward equation according to the task scheduling models established in the system, such as the local computing queue, the task transmission queue, and the edge server queue. By comparing the average delay and energy consumption of the method of the present invention and other algorithms, it can be concluded that the offloading decision method of the present invention greatly reduces the delay and energy consumption of the offloading decision process in the mobile edge computing system.

附图说明Description of drawings

图1为本发明构建的神经网络结构图；Fig. 1 is the neural network structure diagram that the present invention builds;

图2为本发明方法的奖励函数值随迭代次数收敛曲线图；Fig. 2 is the reward function value of the inventive method with the number of iterations convergence curve diagram;

图3为本发明方法与其他三种基线算法的平均奖励值随用户数目变化曲线图；3 is a graph showing the average reward value of the method of the present invention and other three baseline algorithms changing with the number of users;

图4为本发明方法与其他三种基线算法的平均时延随用户数目变化曲线图。FIG. 4 is a graph showing the variation of the average delay with the number of users of the method of the present invention and the other three baseline algorithms.

具体实施方式Detailed ways

具体实施方式一、本实施方式针对MEC系统中多移动设备多服务器的网络场景，提出一种基于深度强化学习的计算卸载策略。将每个移动用户看作一个智能体，任务在卸载的过程中由于有到达的先后之分，需要进行任务排队，本发明构建了包括本地计算队列，任务传输队列，边缘节点计算队列三个队列的任务调度模型。然后分别建立两类任务执行方式下时延和能耗成本计算模型，以最小化系统成本为目标设计方法，使其在连续若干个时隙内产生最小的系统时延与能耗。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 1. In this embodiment, a computing offloading strategy based on deep reinforcement learning is proposed for the network scenario of multiple mobile devices and multiple servers in the MEC system. Each mobile user is regarded as an agent, and tasks need to be queued due to the order of arrival in the process of unloading tasks. The present invention constructs three queues including a local computing queue, a task transmission queue, and an edge node computing queue. task scheduling model. Then, the calculation models of delay and energy consumption cost under two types of task execution modes are established respectively, and the target design method is to minimize the system cost, so that the minimum system delay and energy consumption can be generated in several consecutive time slots.

步骤1、强化学习模型构建：Step 1. Build a reinforcement learning model:

本步骤对于本发明使用DQN进行任务卸载决策的具体实现做了详细说明。主要包括马尔可夫决策过程中系统状态，动作，奖励方程的定义等。This step provides a detailed description of the specific implementation of the present invention using DQN to perform task offloading decision. It mainly includes the definition of system state, action, and reward equation in Markov decision-making process.

1.马尔可夫决策过程构造1. Markov decision process construction

在每个时隙开始时，每个移动设备都会观察其状态(例如任务大小，队列长度等信息)。如果有新任务要处理，移动设备会为该任务选择合适的卸载决策，使其任务计算的长期成本降到最低。将深度强化学习应用于任务卸载决策问题需要构建一个马尔可夫决策过程，构造过程中需要具体定义系统状态，动作和奖励方程。At the beginning of each slot, each mobile device observes its state (information such as task size, queue length, etc.). If there is a new task to process, the mobile device chooses an appropriate offloading decision for that task, minimizing the long-term cost of its task computation. Applying deep reinforcement learning to the task offloading decision problem requires constructing a Markov decision process, in which the system state, action and reward equation need to be specifically defined.

2.系统状态设置2. System status settings

与卸载决策有关的第一个信息是任务本身的特性。首先考虑任务自身大小λ(t)。在每个时隙开始时，移动设备m需要首先观察自身任务的大小，使用λ(t)表示。其中若当前时隙开始时存在新的任务k(t),那么λ(t)＝k(t)，否则λ(t)＝0。注意因为设置新任务的产生也是在时隙开始时产生，所以不存在时隙中产生任务无法计算λ(t)的问题。The first piece of information related to offloading decisions is the nature of the task itself. First consider the size of the task itself λ(t). At the beginning of each time slot, the mobile device m needs to first observe the size of its own task, denoted by λ(t). Among them, if there is a new task k(t) at the beginning of the current time slot, then λ(t)=k(t), otherwise λ(t)=0. Note that because the generation of the new task is also generated at the beginning of the time slot, there is no problem that the task generated in the time slot cannot calculate λ(t).

同样需要考虑的一个任务特性是任务的最大可接受延迟，这个特性对于卸载决策也是有关的。因此也考虑将其加入到系统状态当中。A task characteristic that also needs to be considered is the maximum acceptable latency of the task, which is also relevant for offloading decisions. Therefore, it is also considered to be added to the system state.

任务在三种排队队列中的执行时间等也与卸载决策有关。因此也应将其添加到系统状态当中。具体包括：The execution time of tasks in the three queues is also related to the unloading decision. So it should also be added to the system state. Specifically include:

表示任务在本地计算队列中需要等待的时隙数；

Indicates the number of time slots that the task needs to wait in the local computing queue;

表示任务在传输队列中需要等待的时隙数；

Indicates the number of time slots that the task needs to wait in the transmission queue;

表示移动设备m在边缘节点n处的队列长度。

represents the queue length of mobile device m at edge node n.

由于每个边缘节点的负载水平，即节点处活动队列的数量，是不断变化的。而当前时刻的负载水平又与上一个时刻有强关联性。因此考虑构建边缘节点负载水平矩阵M(t)来体现一段时间内边缘节点的负载水平。Since the load level of each edge node, i.e. the number of active queues at the node, is constantly changing. The load level at the current moment has a strong correlation with the previous moment. Therefore, it is considered to construct an edge node load level matrix M(t) to reflect the load level of edge nodes in a period of time.

具体来说，使用矩阵M(t)表示前T个时隙(从时隙t-T到时隙t-1)内每个边缘节点的负载水平(即该边缘节点的活动队列的数量，最大为移动设备数M)的历史情况。其为一个T×N维的矩阵，其中，T是总的时隙数目，N是边缘节点个数。举例来说，{M(t)}_(i，j)表示第t-T+i-1个时隙时边缘节点j的活动队列的数量。Specifically, the matrix M(t) is used to represent the load level of each edge node in the first T time slots (from time slot tT to time slot t-1) (that is, the number of active queues of this edge node, the maximum is mobile History of the number of devices M). It is a T×N-dimensional matrix, where T is the total number of time slots, and N is the number of edge nodes. For example, {M(t)} _(i,j) represents the number of active queues of edge node j at the t-T+i-1th time slot.

综合以上叙述，可以把移动设备m在当前时隙处观察到的系统状态定义为一个若干维度的向量，用s_m(t)表示，即Based on the above description, the system state observed by the mobile device m at the current time slot can be defined as a vector of several dimensions, represented by s _m (t), that is

3.系统动作3. System action

每个移动设备需要做出的决策首先要确定卸载到边缘服务器还是本地执行，然后需要考虑卸载到哪个边缘服务器。使用0表示本地计算，使用k表示卸载的边缘服务器的序号。假设一共有M个边缘服务器，那么系统动作可以表示为a(t)＝{0,1，2，…,M}。The decision that each mobile device needs to make is first to decide whether to offload to an edge server or to execute locally, and then it needs to consider which edge server to offload to. Use 0 for local computing and k for the sequence number of the offloaded edge server. Assuming a total of M edge servers, the system action can be expressed as a(t)={0,1,2,...,M}.

4.奖励函数4. Reward function

最影响移动设备应用体验的是卸载产生的时延与能耗。因此本发明奖励函数的设置也围绕任务卸载过程中产生的时延与能耗来构建。What affects the mobile device application experience the most is the delay and energy consumption caused by uninstallation. Therefore, the setting of the reward function of the present invention is also constructed around the time delay and energy consumption generated during the task unloading process.

任务产生的时延从本地计算和边缘计算两种情况进行考虑。The delay generated by the task is considered from both local computing and edge computing.

如果任务被决策为本地计算，那么任务等待的时隙数

可以如下计算。If the task is decided to compute locally, the number of slots the task waits for

It can be calculated as follows.

如果任务被决策为边缘计算，那么任务等待的时隙数通过在边缘节点执行完成后的时刻

来计算，即为

If the task is decided to be edge computing, the number of time slots the task waits for is determined by the moment after the execution of the edge node is completed.

to calculate, that is,

任务产生的能耗同样从本地计算和边缘计算两种情况进行考虑。The energy consumption of tasks is also considered from both local computing and edge computing.

任务本地计算所需要的能量为

The energy required for task-local computation is

任务边缘计算中耗能主要在任务上传和任务执行两个部分，假设任务上传时移动设备的功率为p_up，任务执行时移动设备的功率为p_e，则对于设备i；The energy consumption in task edge computing is mainly in two parts: task uploading and task execution. Assuming that the power of the mobile device is p _up when the task is uploaded, and the power of the mobile device is p _e when the task is executed, then for device i;

设置移动用户i对于时延和能耗的偏好系数分别为

和

那么设置用户在卸载决策过程中的奖励函数为Set the preference coefficients of mobile user i for delay and energy consumption as

and

Then set the reward function of the user in the uninstall decision process as

另一方面，如果任务由于已经达到了它可以接受的最大延迟时间，那么任务被丢弃，那么定义此时的奖励函数为一个固定的惩罚值P，即On the other hand, if the task is discarded because it has reached the maximum delay time it can accept, then the reward function at this time is defined as a fixed penalty value P, that is

R＝P (5)R=P (5)

步骤2、神经网络构建，如图1所示：Step 2. Neural network construction, as shown in Figure 1:

1.输入层：该层负责将状态作为输入并将其传递给以下层。对于移动设备m，状态信息λ(t)，

M(t)将被传递给FC层进行预测。1. Input layer: This layer is responsible for taking the state as input and passing it to the following layers. For mobile device m, the state information λ(t),

M(t) will be passed to the FC layer for prediction.

2.LSTM层：矩阵M(t)表示前T个时隙内每个边缘节点的负载水平，而边缘节点的负载水平是时间相关的，也就是说边缘节点的负载水平具有时间依赖性。因此考虑使用LSTM层预测负载水平的时间相关性。2. LSTM layer: The matrix M(t) represents the load level of each edge node in the first T time slots, and the load level of the edge node is time-dependent, that is, the load level of the edge node is time-dependent. So consider using LSTM layers to predict the temporal correlation of load levels.

3.FC层：两个FC层负责学习状态到动作Q值的映射。每个FC层包含一组具有整流线性单元的神经元。3. FC layer: Two FC layers are responsible for learning the mapping of state to action Q value. Each FC layer contains a set of neurons with rectified linear units.

4.输出层：输出层的值对应于当前状态采用当前选择的动作对应的Q值。用来体现当前决策所带来的综合成本，即时延与能耗的均衡值。4. Output layer: The value of the output layer corresponds to the current state and adopts the Q value corresponding to the currently selected action. It is used to reflect the comprehensive cost brought by the current decision, that is, the equilibrium value of delay and energy consumption.

实施例Example

本实施例从网络模型，任务模型，任务调度模型三个方面对本发明应用的系统架构进行介绍。This embodiment introduces the system architecture of the application of the present invention from three aspects: the network model, the task model, and the task scheduling model.

步骤1、网络模型的建立：本发明针对的场景由以下两个部分组成，若干台配备有边缘服务器的基站，若干个需要执行密集型计算任务的移动设备。每个基站配备有一台具有较高计算能力的边缘服务器。假设在每个时隙开始时，每台移动设备以一定的概率产生一个需要执行的密集型任务，任务要么在本地执行，要么全部卸载到边缘服务器端执行，任务不可分割。Step 1. Establishment of a network model: The scenario targeted by the present invention consists of the following two parts, several base stations equipped with edge servers, and several mobile devices that need to perform intensive computing tasks. Each base station is equipped with an edge server with high computing power. It is assumed that at the beginning of each time slot, each mobile device generates an intensive task that needs to be executed with a certain probability. The task is either executed locally or all offloaded to the edge server for execution, and the task is inseparable.

考虑一个包含T个时隙的时间段T＝{1,2,3，…，T}，本发明在仿真中设置T＝100，其中设置每个时隙的时间长度为Δ。对于该小区内的N个移动设备，在每个时隙开始时，使用卸载决策变量α_i表示是否将计算任务卸载到边缘服务器中执行以及卸载到哪个服务器当中，如果选择将其卸载到第k个边缘服务器中，决策变量a_i＝k，如果选择将任务本地执行，决策变量a_i＝0。Consider a time period T={1, 2, 3, . For the N mobile devices in the cell, at the beginning of each time slot, the offloading decision variable α _i is used to indicate whether to offload the computing task to the edge server for execution and which server to offload it to, if it is selected to be offloaded to the kth In each edge server, the decision variable a _i =k, if you choose to execute the task locally, the decision variable a _i =0.

假设移动设备的任务具有相同的优先级。每个边缘节点都有一个CPU，用于处理队列中的任务。在每个时隙开始时，边缘节点处的若干个移动设备对应的任务队列平均共享边缘节点n处CPU的处理能力。It is assumed that the tasks of the mobile device have the same priority. Each edge node has a CPU for processing tasks in the queue. At the beginning of each time slot, the task queues corresponding to several mobile devices at the edge node share the processing power of the CPU at the edge node n on average.

步骤2、任务模型的建立：移动设备产生的任务表示有两个维度，一个是任务的存储空间大小，这里设置任务大小λ_m(t)为3Mbit到5Mbit之间的随机值，步长为0.1Mbit。另一个是任务可以接受的最大等待时隙数，使用τ_m表示。当任务等待的时长超过了τ_m，此任务在系统中会被丢弃。Step 2. Establishment of the task model: The task representation generated by the mobile device has two dimensions. One is the storage space size of the task. Here, the task size λ _m (t) is set to a random value between 3Mbit and 5Mbit, and the step size is 0.1 Mbit. The other is the maximum number of waiting slots that a task can accept, denoted by τ _m . When a task waits longer than τ _m , the task will be discarded in the system.

步骤3、任务调度模型的建立：以边缘服务器一侧为例，由于任务到来有先后之分，在下一个时隙开始阶段边缘服务器可能还没有处理完上个时隙的任务，因此任务会进行排队。本部分构建了本地计算队列，任务传输队列，边缘节点计算队列三个队列模型，作为任务的调度模型。Step 3. Establishment of task scheduling model: Taking the edge server side as an example, since tasks arrive in sequence, at the beginning of the next time slot, the edge server may not have processed the tasks of the previous time slot, so the tasks will be queued . This part builds three queue models of local computing queue, task transmission queue, and edge node computing queue as task scheduling models.

步骤4、算法性能评价与分析：Step 4. Algorithm performance evaluation and analysis:

选择一个场景对算法收敛性进行研究，设置用户数量为120，边缘服务器数目为5，算法迭代1200次，绘制出算法的奖励函数在连续100次决策后的平均奖励值。从图2可以看到平均奖励值在迭代次数达到500次左右时开始收敛，平均奖励值在之后的迭代中逐渐趋于稳定。这说明在多次训练中智能算法已经学习到了比较稳定的卸载策略。Select a scenario to study the convergence of the algorithm, set the number of users to 120, the number of edge servers to 5, the algorithm iterates 1200 times, and plot the average reward value of the algorithm's reward function after 100 consecutive decisions. It can be seen from Figure 2 that the average reward value begins to converge when the number of iterations reaches about 500, and the average reward value gradually becomes stable in the subsequent iterations. This shows that the intelligent algorithm has learned a relatively stable unloading strategy in multiple trainings.

选择其他三种基线算法与本发明设计的基于DQN的算法进行比较，设置移动用户数量从50变化到120，固定边缘服务器数量为5，如图3所示，分别绘制四种算法的平均奖励值曲线，其中三种基线算法使用的奖励值均为100次平均得到。Select the other three baseline algorithms to compare with the DQN-based algorithm designed by the present invention, set the number of mobile users to change from 50 to 120, and the number of fixed edge servers to 5, as shown in Figure 3, and plot the average reward values of the four algorithms respectively Curves, in which the reward values used by the three baseline algorithms are all obtained by averaging 100 times.

从仿真曲线可以看到，随用户数目增加除全部本地计算算法之外，其余三种算法得到的平均奖励值均呈现下降趋势。这是因为本次仿真设置的边缘服务器数目始终为5，而用户数目却逐渐增加，因此用户所能分得的服务器资源也就逐渐紧张。对于全部本地卸载算法，其平均奖励值随用户数变化基本不变，当用户数小于90时，全部卸载算法的性能要优于全部本地计算算法，这是由于此时服务器计算资源较为充足，而随着用户数量继续增加，全卸载算法的性能则差于全部本地计算算法。It can be seen from the simulation curve that with the increase of the number of users, except for all local computing algorithms, the average reward values obtained by the other three algorithms all show a downward trend. This is because the number of edge servers set in this simulation is always 5, but the number of users is gradually increasing, so the server resources that users can share are gradually tight. For all local offloading algorithms, the average reward value is basically unchanged with the number of users. When the number of users is less than 90, the performance of all offloading algorithms is better than that of all local computing algorithms. This is because the server computing resources are relatively sufficient at this time, and As the number of users continues to increase, the performance of the full offload algorithm is worse than that of the full local computing algorithm.

从图4的仿真曲线可以看到，对比其他三种基线算法，DQN算法始终可以获得更低的任务处理时延。和奖励值曲线的变化趋势类似，当用户数目超过90时，由于边缘服务器的计算资源过于紧张，全卸载算法的时延开始超过全部本地计算算法的时延。As can be seen from the simulation curve in Figure 4, compared with the other three baseline algorithms, the DQN algorithm can always obtain lower task processing delays. Similar to the change trend of the reward value curve, when the number of users exceeds 90, the delay of the full offloading algorithm starts to exceed the delay of all local computing algorithms because the computing resources of the edge server are too tight.

本发明的上述算例仅为详细地说明本发明的计算模型和计算流程，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动，这里无法对所有的实施方式予以穷举，凡是属于本发明的技术方案所引伸出的显而易见的变化或变动仍处于本发明的保护范围之列。The above calculation examples of the present invention are only to illustrate the calculation model and calculation process of the present invention in detail, but are not intended to limit the embodiments of the present invention. For those of ordinary skill in the art, on the basis of the above description, other different forms of changes or changes can also be made, and it is impossible to list all the implementations here. Obvious changes or modifications are still within the scope of the present invention.

Claims

1. The unloading decision method in the moving edge calculation based on the deep Q learning is characterized by comprising the following steps:

step one, building a reinforcement learning model

Constructing a system state, a system action and a reward function in the Markov decision process according to the task characteristics;

the system state is constructed in the following manner:

denote the self task size of the mobile device m at the beginning of the current time slot t as λ _m (t), if there is a new task k (t) for the mobile device m when the current time slot t starts, λ _m (t) = k (t), otherwise λ _m (t)＝0，m＝1,2,…,M ₀ ，M ₀ Representing the total number of mobile devices;

constructing a local calculation queue, a task transmission queue and an edge node calculation queue, and representing the number of time slots of the self task of the mobile equipment m needing to wait in the local calculation queue when the current time slot t begins as the number of the time slots

The number of time slots that the task of the mobile device m needs to wait in the task transmission queue when the current time slot t begins is expressed as

Denote the queue length of mobile m at edge node n as

Constructing a matrix M (T) for representing the load level of each edge server in T time slots before the current time slot T, wherein the dimension of the M (T) is T multiplied by N, N is the number of the edge servers, and T is less than or equal to T ₀ ，T ₀ Representing the total time slot number;

the system state s observed by the mobile device m at the current time slot t _m (t) is:

the system action is represented as a (t) = {0,1,2, \8230;, N }, where 0 represents local computation, k =1,2, \8230; N, k represents the sequence number of the offloaded edge server;

the construction mode of the reward function is as follows:

if the task is decided to be calculated locally, the number of time slots for which the task waits

Comprises the following steps:

wherein,

indicating the time after the task generated at the time slot t' is executed locally;

energy required in the local calculation of a task

Comprises the following steps:

wherein epsilon _m Representing the CPU's coefficient of energy consumption during the local calculation of the mobile device m, i.e. the energy consumed by the local CPU for a cycle, d _m Representing the calculation amount of the currently generated task of the mobile device m, namely the number of CPU calculation cycles needed for executing the currently generated task;

setting preference coefficients of the mobile user m to time delay and energy consumption as

And

then the reward function for mobile user m in the offload decision process is:

wherein, R is the value of the reward function, T is the total time delay generated when the mobile user m is locally calculated, namely T is equal to the number of time slots of task queuing waiting

E is the total energy consumption generated by the mobile user m, i.e. the sum of the time delays generated during the local execution of the task

If the task is decided as the edge calculation, the time slot number of the task waiting is determined by the time after the execution of the edge server n is completed

Is calculated as the number of time slots the task waits

n＝1,2,…,n ₀ ，n ₀ Representing the total number of edge servers;

the energy required in task edge computing comprises two parts of task uploading and task execution, and the power of the mobile equipment when the task is uploaded is represented as p _up The power of the mobile device when a task is performed is denoted as p _e Then for mobile device m, the required energy

Comprises the following steps:

wherein, t _n,up Representing the time it takes for mobile device m to upload a task to edge server n, t _n,e Represents the time consumed by the mobile device m to perform a task in the edge server n;

at this time, the reward function of the user in the uninstalling decision process is:

wherein R is the value of the reward function, and T is the total time delay generated by task queuing

Time delay t generated by uploading task to edge server n _n,up And the time delay t generated by the execution of the task at the edge server n _n,e E is the total energy consumption resulting from the edge calculation, i.e.

If the maximum delay time allowed by the task is reached before the task is executed, the task is discarded, and the reward function value R at the moment is set as a fixed penalty value P;

step two, neural network construction

Constructing a neural network comprising an input layer, an LSTM layer, a first FC layer, a second FC layer and an output layer, wherein the input layer is used for transmitting system state information to the LSTM layer and the first FC layer and taking the output of the LSTM layer as the input of the first FC layer;

then taking the output of the first FC layer as the input of a second FC layer, and taking the output of the second FC layer as the input of an output layer;

the LSTM layer is used for predicting the time correlation of the load level of the edge server according to a matrix M (t);

the first FC layer and the second FC layer are used for learning the mapping of system states to system action reward function values, and each of the first FC layer and the second FC layer comprises a group of neurons with rectifying linear units.

2. The method for offloading decision making in moving edge computing based on deep Q learning of claim 1, wherein the output layer is configured to output the reward function value corresponding to the currently selected action adopted by the current system state.

3. The system for unloading decision in moving edge calculation based on deep Q learning is characterized in that the system is used for executing the method for unloading decision in moving edge calculation based on deep Q learning of claim 1 or claim 2.