CN114339879A - A service migration method based on reinforcement learning in mobile edge computing - Google Patents
A service migration method based on reinforcement learning in mobile edge computing Download PDFInfo
- Publication number
- CN114339879A CN114339879A CN202111492744.6A CN202111492744A CN114339879A CN 114339879 A CN114339879 A CN 114339879A CN 202111492744 A CN202111492744 A CN 202111492744A CN 114339879 A CN114339879 A CN 114339879A
- Authority
- CN
- China
- Prior art keywords
- user
- consumption
- migration
- state
- link
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
Description
技术领域technical field
本发明属于移动边缘计算领域,特别是涉及一种移动边缘计算中基于强化学习的服务迁移方法。The invention belongs to the field of mobile edge computing, in particular to a service migration method based on reinforcement learning in mobile edge computing.
背景技术Background technique
移动边缘计算通过将计算资源下放至距离用户更近的各类节点上,为用户提供更好的服务,更好地提高了系统的服务质量(QoS)和提升了用户的服务体验(QoE)。移动边缘计算的出现推动了物联网、5G和运营商个性化服务的发展,同时它也被广泛应用于增强现实(AR)、视频优化加速、视频流分析、物联网(IoT)和车联网等领域之中。移动边缘计算中的主要问题就是任务卸载的问题,而任务卸载主要包括三个方面,分别是任务卸载的决策问题、资源分配问题和移动性管理问题。其中移动性管理问题产生于用户的移动性,一个有效地解决移动性管理问题的方法就是服务迁移。服务迁移通过将运行在距离用户较远距离的服务器上的服务迁移至距离用户较近的服务器,极大地降低了时延,为用户提供了更好的服务。Mobile edge computing provides users with better services by delegating computing resources to various nodes closer to users, better improving the quality of service (QoS) of the system and the service experience (QoE) of users. The emergence of mobile edge computing drives the development of IoT, 5G, and operator personalization, and it is also widely used in augmented reality (AR), video optimization acceleration, video streaming analysis, Internet of Things (IoT), and the Internet of Vehicles, etc. in the field. The main problem in mobile edge computing is the problem of task offloading, and task offloading mainly includes three aspects, namely the decision problem of task offloading, the problem of resource allocation and the problem of mobility management. The mobility management problem arises from the user's mobility, and an effective solution to the mobility management problem is service migration. Service migration greatly reduces latency and provides better services for users by migrating services running on servers farther away from users to servers closer to users.
已有的服务迁移策略主要通过马尔可夫决策过程、时间窗口技术和预测技术等来实现。服务迁移作为一个由于用户移动性所带来的问题,导致其考虑的往往是长期的优化,因此可以将服务迁移问题建模为时序决策问题来进行求解,而马尔可夫决策过程作为时序决策问题的经典形式化表示,可用于研究服务迁移问题。时间窗口技术和预测技术能够很好地对未来的能耗进行预测,进而能够寻找最优化的服务放置策略,也同样可以用来解决服务迁移问题。The existing service migration strategies are mainly implemented through Markov decision process, time window technology and prediction technology. As a problem caused by user mobility, service migration often considers long-term optimization. Therefore, the service migration problem can be modeled as a time series decision problem to solve, while the Markov decision process is a time series decision problem. The classical formal representation of , which can be used to study the problem of service migration. Time window technology and prediction technology can well predict the future energy consumption, and then can find the optimal service placement strategy, and can also be used to solve the problem of service migration.
尽管服务迁移已经使用上述方法进行了诸多研究,但基于马尔可夫决策过程的研究往往对于环境因素考虑不充足,同时也较少考虑用户的实际移动特性,并且迁移决策制定后如何进行迁移的问题也很少被提到。因此,综合考虑多种环境因素做出迁移决策,并选择合适的迁移路径至关重要。Although there have been many studies on service migration using the above methods, the research based on Markov decision process often does not take enough consideration of environmental factors, and also less considers the actual mobile characteristics of users, and how to migrate after the migration decision is made. Also rarely mentioned. Therefore, it is crucial to comprehensively consider a variety of environmental factors to make migration decisions and choose an appropriate migration path.
经过检索,申请公开号为CN110347495A,一种使用深度强化学习进行移动边缘计算的任务迁移方法,先设定系统模型各参数,再描述强化学习中的决策公式,然后基于公式给出任务迁移算法;通过本方法能够获得高效的任务迁移机制,而高效的任务迁移机制能够提高系统实时性,充分利用计算资源,并减少能耗;本方法同时运用了深度强化学习思想进行任务调度,即决策是否迁移计算任务,尤其使用了马尔可夫决策过程,能够在极短时间内给出较优解,实时性强;本方法适用于用户处在高速运动状态时,解决是否更换使用的服务器基站问题。在该专利中,通过使用深度强化学习算法求解移动边缘计算中任务迁移问题,但是由于移动状态的不确定性,往往不能够覆盖用户全部移动轨迹。本专利通过使用用户之前移动方向预测用户之后移动方向,进而构建用户移动模型,再结合迁移与否的决策构建状态转移矩阵,能够覆盖所有可能的用户移动状态,进而能够求解更加符合实际场景的迁移决策问题;同时本专利还针对迁移路径的选择问题使用强化学习算法进行了求解。After retrieval, the application publication number is CN110347495A, a task migration method using deep reinforcement learning for mobile edge computing. First, the parameters of the system model are set, and then the decision formula in reinforcement learning is described, and then the task migration algorithm is given based on the formula; Through this method, an efficient task migration mechanism can be obtained, and an efficient task migration mechanism can improve the real-time performance of the system, make full use of computing resources, and reduce energy consumption; this method also uses the idea of deep reinforcement learning for task scheduling, that is, the decision whether to migrate The calculation task, especially the Markov decision process, can give a better solution in a very short time, and the real-time performance is strong; this method is suitable for solving the problem of whether to replace the server base station when the user is in a high-speed motion state. In this patent, the task migration problem in mobile edge computing is solved by using a deep reinforcement learning algorithm, but due to the uncertainty of the movement state, it is often unable to cover all the user's movement trajectories. This patent uses the user's previous movement direction to predict the user's subsequent movement direction, and then builds a user movement model, and then combines the decision of whether to migrate or not to build a state transition matrix, which can cover all possible user movement states, and can solve the migration that is more in line with the actual scene. decision-making problem; at the same time, this patent also uses reinforcement learning algorithm to solve the selection problem of migration path.
申请公开号为CN110830560A,一种基于强化学习的多用户移动边缘计算迁移方法,包括以下步骤:首先移动设备确定当前工作负载到达率、可再生能源和电池电量等状态;然后通过访问动作状态值矩阵,根据∈-greedy策略决定在本地处理的任务量并采取相应动作;再计算可以反映当前动作好坏的奖赏值并以此更新动作状态值矩阵;最后计算移动设备的总成本(包括延迟成本和计算成本)。本发明将强化学习应用于5G关键技术之一的移动边缘计算技术,并结合Q-learning无模型的优势,制定了移动设备的任务分配策略,显著减少了移动设备的成本。该专利通过强化学习求解了多用户任务迁移问题,主要用于解决任务卸载过程中系统的长期成本优化问题。与本专利求解不同位置不同移动方向下迁移决策的制定问题有所不同,同时本专利还求解了迁移路径选择问题。The application publication number is CN110830560A, a multi-user mobile edge computing migration method based on reinforcement learning, including the following steps: first, the mobile device determines the current workload arrival rate, renewable energy and battery power and other states; then, by accessing the action state value matrix , according to the ∈-greedy strategy, determine the amount of tasks to be processed locally and take corresponding actions; then calculate the reward value that can reflect the quality of the current action and update the action state value matrix; finally calculate the total cost of the mobile device (including delay cost and Computing costs). The invention applies reinforcement learning to the mobile edge computing technology, one of the key technologies of 5G, and combines the model-free advantage of Q-learning to formulate a task allocation strategy for mobile devices, which significantly reduces the cost of mobile devices. This patent solves the multi-user task transfer problem through reinforcement learning, and is mainly used to solve the long-term cost optimization problem of the system during the task offloading process. It is different from this patent to solve the problem of making migration decisions in different positions and different moving directions. At the same time, this patent also solves the problem of migration path selection.
发明内容SUMMARY OF THE INVENTION
本发明旨在解决现有移动边缘计算中的服务迁移问题,提出了一种综合考虑多种环境因素影响和移动预测的迁移决策制定模型并使用价值迭代算法对问题进行求解;同时使用强化学习算法来实现动态网络环境下自适应迁移路径的选择。本发明的技术方案如下:The invention aims to solve the problem of service migration in the existing mobile edge computing, and proposes a migration decision-making model that comprehensively considers the influence of multiple environmental factors and mobile prediction, and uses a value iteration algorithm to solve the problem; at the same time, a reinforcement learning algorithm is used to solve the problem. To realize the selection of adaptive migration path in dynamic network environment. The technical scheme of the present invention is as follows:
一种移动边缘计算中基于强化学习的服务迁移方法,其包括以下步骤:A service migration method based on reinforcement learning in mobile edge computing, comprising the following steps:
S1,根据用户服务所处服务器位置、用户当前所处区域位置以及当前处理任务的服务器负载构建奖励函数;S1, construct a reward function according to the server location where the user service is located, the current regional location of the user, and the server load of the current processing task;
S2,根据用户当前所处位置,之前移动方向以及是否迁移构建状态转移矩阵;S2, construct a state transition matrix according to the current position of the user, the previous movement direction and whether to migrate;
S3,根据所述奖励函数和所述状态转移矩阵,使用价值迭代算法进行迁移决策制定;S3, according to the reward function and the state transition matrix, use a value iteration algorithm to make a migration decision;
S4,根据路由之间的时延消耗和网络消耗做规范化处理来赋值链路消耗;S4, assigning link consumption by normalizing the delay consumption and network consumption between routes;
S5,根据规范化后的链路消耗,使用强化学习算法进行路径选择并自适应地更新链路选择以适应动态网络的链路变化。S5, according to the normalized link consumption, use a reinforcement learning algorithm to perform path selection and adaptively update the link selection to adapt to the link changes of the dynamic network.
进一步的,S1所述根据用户服务所处服务器位置、用户当前所处区域位置以及用户当前服务器负载构建奖励函数,具体包括:Further, according to the location of the server where the user service is located, the current regional location of the user and the current server load of the user, the reward function is constructed according to S1, which specifically includes:
(1)使用用户距离处理任务服务器的距离dt和处理任务服务器的负载ht构建用户服务满意度函数;(1) Use the distance d t of the user to the processing task server and the load h t of the processing task server to construct a user service satisfaction function;
(2)使用用户距离处理任务服务器的距离dt构建迁移消耗函数;(2) Use the distance d t of the user to the processing task server to construct the migration consumption function;
(3)使用服务满意度函数和迁移消耗函数的加权和作为奖励函数。(3) Use the weighted sum of the service satisfaction function and the migration cost function as the reward function.
进一步的,所述(1)使用用户距离处理任务服务器的距离和处理任务服务器的负载构建用户满意度c1(st,at),具体公式为:Further, the (1) user satisfaction c 1 (s t , at ) is constructed by using the distance between the user and the processing task server and the load of the processing task server, and the specific formula is:
c1(st,at)=D-μ1dt-μ2ht c 1 (s t ,at )=D-μ 1 d t -μ 2 h t
其中,D表示用户能够获得最大服务满意度,dt表示用户t时刻距离处理任务服务器的距离,ht代表t时刻处理任务的服务器负载情况,μ1和μ2是比例系数,表示距离和负载对于用户服务满意度的影响程度。dt通过计算用户当前位置lt=(xt,yt)与处理任务服务器位置ls=(xs,ys)的欧式距离获得;Among them, D represents that the user can obtain the maximum service satisfaction, d t represents the distance between the user and the processing task server at time t, h t represents the server load of the processing task at time t, μ 1 and μ 2 are proportional coefficients, indicating the distance and load The degree of influence on user service satisfaction. d t is obtained by calculating the Euclidean distance between the user's current position lt =(x t , y t ) and the processing task server position ls =(x s , y s );
(2)使用用户距离处理任务服务器的距离dt构建迁移消耗函数c2(st,at):(2) Construct the migration cost function c 2 (s t , at t ) using the distance d t of the user from the processing task server:
c2(st,at)=μ3+μ4dt c 2 (s t ,at )=μ 3 +μ 4 d t
其中,使用距离dt的线性函数表示迁移消耗,μ3表示常数消耗,μ4表示距离的影响系数;Among them, the linear function of distance d t is used to represent the migration consumption, μ 3 represents the constant consumption, and μ 4 represents the influence coefficient of distance;
(3)使用用户服务满意度函数和迁移消耗函数的加权和作为奖励函数r(s,a):(3) Use the weighted sum of the user service satisfaction function and the migration consumption function as the reward function r(s, a):
其中,a表示迁移决策,a=0表示不进行迁移,a=1表示进行迁移;dmax代表处理服务所允许的最大距离,超出该距离会有极大的惩罚M。Among them, a represents the migration decision, a=0 means no migration, a=1 means migration; d max represents the maximum distance allowed by the processing service, beyond which there will be a great penalty M.
进一步的,S2所述根据用户当前所处位置,之前移动方向以及是否迁移构建状态转移矩阵,主要包括:Further, according to the current position of the user, the previous movement direction and whether to migrate to construct a state transition matrix according to S2, which mainly includes:
(1)将用户位置转化为相对于服务器的相对位置并记录用户前一时刻移动方向;(1) Convert the user's position into a relative position relative to the server and record the user's movement direction at the previous moment;
(2)不同的移动方向会对用户接下来的移动轨迹产生影响,用户的移动模型为用户有较大的概率不改变方向,较小的概率改变方向;(2) Different moving directions will have an impact on the user's next movement trajectory. The user's movement model is that the user has a high probability of not changing the direction, and a small probability of changing the direction;
(3)基于用户的移动模型与是否迁移的决策,决定下一时刻用户的状态。(3) Determine the state of the user at the next moment based on the user's mobility model and the decision of whether to migrate.
进一步的,所述(1)记录用户前一时刻移动方向zt,使用用户当前所处位置lt与之前移动方向zt表示用户目前状态st=(xt,yt,zt);Further, described (1) recording the user's movement direction z t at the previous moment, and using the user's current position lt and the previous movement direction z t to represent the user's current state s t = (x t , y t , z t ) ;
(2)所述不同的移动方向zt会对用户接下来的移动轨迹产生影响,用户在下一时序有较大的概率p保持移动方向zt不变并到达位置同时,用户在下一时序有较小的概率改变移动方向为或并到达位置或 (2) The different moving directions z t will have an impact on the user's next movement trajectory, and the user has a large probability p in the next time sequence to keep the moving direction z t unchanged and reach the position At the same time, the user has a small probability in the next time series Change the movement direction to or and reach the location or
(3)基于用户的移动模型与迁移决策,确定状态转移概率P(s'|s,a):(3) Based on the user's mobility model and transition decision, determine the state transition probability P(s'|s,a):
其中,表示在迁移后用户与处理任务的服务器处于同一位置;表示迁移后用户移动方向不变,同时不迁移时有p的概率用户移动方向不变。in, Indicates that the user is in the same location as the server processing the task after the migration; It means that the user's moving direction remains unchanged after the migration, and there is a probability p that the user's moving direction remains unchanged when the migration is not performed.
进一步的,S3所述根据所述奖励函数和所述状态转移矩阵,使用价值迭代算法进行迁移决策制定,主要包括:Further, according to the reward function and the state transition matrix, the value iteration algorithm is used to make a migration decision according to S3, which mainly includes:
(1)随机初始化用户在不同位置不同移动方向下的状态价值函数v(s);(1) Randomly initialize the state value function v(s) of the user at different positions and different moving directions;
(2)基于贝尔曼最优方程使用上一迭代周期的状态价值函数值更新下一迭代周期的状态价值函数值,具体公式为:(2) Using the state-value function value of the previous iteration cycle to update the state-value function value of the next iteration cycle based on the Bellman optimal equation, the specific formula is:
其中,vk+1(s)表示第k+1个迭代周期状态s所对应的状态价值函数,表示状态s选取动作a所获得的奖励,表示状态s选取动作a到达状态s'的概率,vk(s')表示第k个迭代周期状态s'所对应的状态价值函数;Among them, v k+1 (s) represents the state value function corresponding to the state s of the k+1th iteration cycle, represents the reward obtained by state s selecting action a, Represents the probability that the state s selects the action a to reach the state s', v k (s') represents the state value function corresponding to the k-th iteration cycle state s';
(3)重复步骤(2),直至不同位置不同方向下的状态价值函数均收敛。(3) Step (2) is repeated until the state value functions in different positions and different directions converge.
进一步的,S4所述根据路由之间的时延消耗t和网络消耗p做规范化处理来赋值链路消耗c的方法包括步骤:Further, the method for assigning link consumption c by performing normalization processing according to the delay consumption t and network consumption p between routes described in S4 includes the steps:
(1)记录链路中传输所需要的时延消耗t以及网络消耗p;(1) Record the delay consumption t and network consumption p required for transmission in the link;
(2)对二者进行均一化处理后加权求和赋值链路消耗c:(2) After normalizing the two, the weighted summation assigns the link consumption c:
ci=ωtti+ωppi c i =ω t t i +ω p p i
其中,ti和pi表示每条链路对应时延消耗和网络消耗,表示链路时延消耗的最小值,表示链路时延消耗的最大值,表示链路网络消耗的最小值,表示链路网络消耗的最大值;ωt和ωp分别表示时延消耗与网络消耗的加权系数。Among them, t i and p i represent the corresponding delay consumption and network consumption of each link, Indicates the minimum value of link delay consumption, Indicates the maximum value of link delay consumption, represents the minimum value of link network consumption, Represents the maximum value of link network consumption; ω t and ω p represent the weighting coefficient of delay consumption and network consumption, respectively.
进一步的,S5所述使用Sarsa强化学习选择迁移路径的方法主要包括:Further, the method of using Sarsa reinforcement learning to select a migration path described in S5 mainly includes:
(1)随机初始化各路由所连接的链路信息,包括时延消耗t和网络消耗p;(1) Randomly initialize the link information connected to each route, including delay consumption t and network consumption p;
(2)从原服务器至目标服务器的数据信息随机选择路径传输;(2) The data information from the original server to the target server is randomly selected for transmission;
(3)记录数据传输过程中产生的时延消耗t以及产生的网络消耗p,并将其进行标准化后加权求得对应链路消耗c;(3) Record the time delay consumption t and the network consumption p generated in the data transmission process, and weight them to obtain the corresponding link consumption c after normalization;
(4)各路由根据ε贪婪策略选取数据传输的链路,同时记录选择该链路传输至下一路由的链路消耗,各路由根据本次数据的传输更新其对应的状态动作Q值表;(4) Each route selects the link for data transmission according to the ε greedy strategy, and records the link consumption of selecting the link to transmit to the next route at the same time, and each route updates its corresponding state action Q value table according to the transmission of this data;
(5)伴随数据的传输,各个路由重复步骤(4),进行本路由Q值表的动态更新并选择更优化的路径。(5) With the transmission of data, each route repeats step (4) to dynamically update the Q-value table of the route and select a more optimized path.
进一步的,所述使用ε贪婪策略选取动作方式和状态动作Q值表更新方式分别为:Further, the use of the ε greedy strategy to select the action mode and the state-action Q-value table update mode are respectively:
Q(S,A)←Q(S,A)+α(R+γQ(S',A')-Q(S,A))Q(S,A)←Q(S,A)+α(R+γQ(S',A')-Q(S,A))
其中,π(a|s)表示在状态s下选取动作a的概率,a*表示当前状态s下能够使得Q值最大的动作,m表示可供选择的动作个数。Q(S,A)表示各个状态下选取不同动作对应的状态动作函数值,α是学习速率参数,γ是衰减因子,Q(S',A')表示下一状态对应的状态动作函数值。Among them, π(a|s) represents the probability of selecting action a in state s, a * represents the action that can maximize the Q value in the current state s, and m represents the number of actions that can be selected. Q(S, A) represents the state-action function value corresponding to different actions selected in each state, α is the learning rate parameter, γ is the decay factor, and Q(S', A') represents the state-action function value corresponding to the next state.
本发明的优点及有益效果如下:The advantages and beneficial effects of the present invention are as follows:
1.本发明在移动边缘环境中综合考虑用户关注的时延因素与商家关注的迁移消耗因素,并基于移动预测构建了更加符合实际场景的移动模型和状态转移矩阵,最终使用价值迭代算法求得不同位置不同移动方向下的迁移决策。指导用户何时进行迁移能够收益最大化。本发明最终求得的迁移决策,不同于其他在相同位置时有相同迁移决策的服务迁移策略,而是在相同位置下也会因用户之前移动方向不同而产生不同的迁移决策,更加符合实际场景。1. The present invention comprehensively considers the time delay factor concerned by users and the migration consumption factor concerned by merchants in the mobile edge environment, and builds a mobility model and state transition matrix that are more in line with the actual scene based on mobile prediction, and finally uses the value iteration algorithm to obtain Migration decisions in different moving directions at different locations. Instruct users when to migrate to maximize benefits. The migration decision finally obtained by the present invention is different from other service migration strategies that have the same migration decision at the same location, but in the same location, different migration decisions will also be generated due to the different moving directions of the user before, which is more in line with the actual scene. .
2.本发明在移动边缘环境中服务确定需要迁移以后,综合考虑用户和商家的利益所在,基于时延和网络消耗赋值链路消耗,并使用强化学习算法求解动态网络环境下自适应的迁移路径选择问题。本发明最终求得的迁移路径,会随着网络链路状态的变化而实时动态更新,能够为服务提供更优的迁移路径。2. After the service is determined to need to be migrated in the mobile edge environment, the present invention comprehensively considers the interests of users and businesses, assigns link consumption based on delay and network consumption, and uses reinforcement learning algorithms to solve adaptive migration paths in a dynamic network environment Choose a question. The migration path finally obtained by the present invention will be dynamically updated in real time with the change of the network link state, which can provide a better migration path for the service.
附图说明Description of drawings
图1是本发明提供优选实施例基于价值迭代的迁移决策制定算法;Fig. 1 is the migration decision-making algorithm based on value iteration based on the preferred embodiment provided by the present invention;
图2是基于Sarsa的动态路径选择算法;Fig. 2 is the dynamic path selection algorithm based on Sarsa;
图3是移动边缘计算中基于强化学习的服务迁移方法流程图。FIG. 3 is a flowchart of a service migration method based on reinforcement learning in mobile edge computing.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、详细地描述。所描述的实施例仅仅是本发明的一部分实施例。The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the accompanying drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.
本发明解决上述技术问题的技术方案是:The technical scheme that the present invention solves the above-mentioned technical problems is:
如图3所示,本发明公开了一种移动边缘计算中基于强化学习的服务迁移方法,包括以下步骤:As shown in Figure 3, the present invention discloses a service migration method based on reinforcement learning in mobile edge computing, comprising the following steps:
S1,基于用户服务所处服务器位置ls、用户当前所处区域位置lt以及处理任务的服务器负载ht构建奖励函数r(s,a)。使用奖励函数表示用户每次进行迁移决策时所能够获得的收益,将其作为用户服务体验与迁移消耗的均衡体现;S1, construct a reward function r(s,a) based on the server location ls where the user service is located, the current regional location lt where the user is located, and the server load h t of the processing task. Use the reward function to represent the benefits that users can obtain each time they make a migration decision, and take it as a balanced reflection of user service experience and migration consumption;
S2,基于用户当前所处位置lt、之前移动方向zt以及是否迁移构建状态转移矩阵P。使用状态转移矩阵表示用户每次进行迁移决策以后产生的状态变化,包括位置的变化和移动方向的变化;S2, a state transition matrix P is constructed based on the user's current position lt , the previous movement direction z t and whether the user has migrated. Use the state transition matrix to represent the state changes after each user makes a migration decision, including the change of position and the change of moving direction;
S3,基于奖励函数r(s,a)和状态转移矩阵P,使用价值迭代算法求解迁移决策的制定问题。进而确定不同位置不同移动方向下的用户服务是否需要迁移。S3, based on the reward function r(s, a) and the state transition matrix P, use the value iteration algorithm to solve the problem of making the transfer decision. Further, it is determined whether user services in different locations and different moving directions need to be migrated.
S4,基于路由之间的时延消耗t和网络消耗p,对二者进行规范化处理后赋值链路消耗c;S4, based on the delay consumption t between the routes and the network consumption p, the link consumption c is assigned after normalizing the two;
S5,基于规范后的链路消耗c,使用Sarsa强化学习算法进行迁移路径选择,并自适应地更新链路选择以适应动态网络的链路变化。S5, based on the normalized link consumption c, use the Sarsa reinforcement learning algorithm for migration path selection, and adaptively update the link selection to adapt to the link changes of the dynamic network.
在本实施例中,所述步骤S1中根据用户服务所处服务器位置ls、用户当前所处区域位置lt以及处理任务的服务器负载ht构建奖励函数r(s,a)的方法包括步骤:In this embodiment, the method for constructing the reward function r(s, a) in step S1 according to the server location ls where the user service is located, the current regional location lt where the user is located, and the server load h t of the processing task includes the steps of: :
(1)使用用户距离处理任务的服务器距离dt和处理任务的服务器负载ht构建用户满意度函数c1(st,at):(1) Construct the user satisfaction function c 1 (s t , at t ) using the distance d t of the server from the user to the processing task and the server load h t of the processing task:
c1(st,at)=D-μ1dt-μ2ht c 1 (s t ,at )=D-μ 1 d t -μ 2 h t
其中,D表示用户能够获得最大服务满意度,dt表示用户t时刻距离处理任务服务器的距离,ht代表t时刻处理任务的服务器负载情况,μ1和μ2是比例系数,表示距离和负载对于用户服务满意度的影响程度。dt通过计算用户当前位置lt=(xt,yt)与处理任务服务器位置ls=(xs,ys)的欧式距离获得。Among them, D represents that the user can obtain the maximum service satisfaction, d t represents the distance between the user and the processing task server at time t, h t represents the server load of the processing task at time t, μ 1 and μ 2 are proportional coefficients, indicating the distance and load The degree of influence on user service satisfaction. d t is obtained by calculating the Euclidean distance between the user's current location lt =(x t , y t ) and the processing task server location ls =(x s , y s ).
(2)使用用户距离处理任务服务器的距离dt构建迁移消耗函数c2(st,at):(2) Construct the migration cost function c 2 (s t , at t ) using the distance d t of the user from the processing task server:
c2(st,at)=μ3+μ4dt c 2 (s t ,at )=μ 3 +μ 4 d t
其中,使用距离dt的线性函数表示迁移消耗,μ3表示常数消耗,μ4表示距离的影响系数。Among them, the linear function of distance d t is used to represent the migration cost, μ 3 represents the constant cost, and μ 4 represents the influence coefficient of distance.
(3)使用用户服务满意度函数和迁移消耗函数的加权和作为奖励函数r(s,a):(3) Use the weighted sum of the user service satisfaction function and the migration consumption function as the reward function r(s, a):
其中,a表示迁移决策,a=0表示不进行迁移,a=1表示进行迁移;dmax代表处理服务所允许的最大距离,超出该距离会有极大的惩罚M。Among them, a represents the migration decision, a=0 means no migration, a=1 means migration; d max represents the maximum distance allowed by the processing service, beyond which there will be a great penalty M.
在本实施例中,所述步骤S2中根据用户当前所处位置lt、之前移动方向zt以及是否迁移构建状态转移矩阵的方法包括步骤:In this embodiment, the method for constructing a state transition matrix according to the current position 1 t of the user, the previous moving direction z t and whether the user has migrated in the step S2 includes the following steps:
(1)记录用户前一时刻移动方向zt,使用用户当前所处位置lt与之前移动方向zt表示用户目前状态st=(xt,yt,zt)。(1) Record the user's movement direction z t at the previous moment, and use the user's current position lt and the previous movement direction z t to represent the user's current state s t =(x t , y t , z t ) .
(2)不同的移动方向zt会对用户接下来的移动轨迹产生影响,用户在下一时序有较大的概率p保持移动方向zt不变并到达位置同时,用户在下一时序有较小的概率改变移动方向为或并到达位置或 (2) Different moving directions z t will have an impact on the user's next movement trajectory, and the user has a greater probability p in the next time sequence to keep the moving direction z t unchanged and reach the position At the same time, the user has a small probability in the next time series Change the movement direction to or and reach the location or
(3)基于用户的移动模型与迁移决策,确定用户的状态转移概率P(s'|s,a)如下:(3) Based on the user's mobility model and transition decision, determine the user's state transition probability P(s'|s,a) as follows:
其中,表示在迁移后用户与处理任务的服务器处于同一位置;表示迁移后用户移动方向不变,同时不迁移时有p的概率用户移动方向不变。in, Indicates that the user is in the same location as the server processing the task after the migration; It means that the user's moving direction remains unchanged after the migration, and there is a probability p that the user's moving direction remains unchanged when the migration is not performed.
在本实施例中,所述步骤S3中根据奖励函数r(s,a)和状态转移矩阵P,使用价值迭代算法进行迁移决策制定的方法包括步骤:In this embodiment, according to the reward function r(s, a) and the state transition matrix P in the step S3, the method for making a migration decision using the value iteration algorithm includes the steps:
(1)随机初始化用户在不同位置不同移动方向下的状态价值函数v(s);(1) Randomly initialize the state value function v(s) of the user at different positions and different moving directions;
(2)基于贝尔曼最优方程使用上一迭代周期的状态价值函数值更新下一迭代周期的状态价值函数值,具体公式为:(2) Using the state-value function value of the previous iteration cycle to update the state-value function value of the next iteration cycle based on the Bellman optimal equation, the specific formula is:
其中,vk+1(s)表示第k+1个迭代周期状态s所对应的状态价值函数,表示状态s选取动作a所获得的奖励,表示状态s选取动作a到达状态s'的概率,vk(s')表示第k个迭代周期状态s'所对应的状态价值函数;Among them, v k+1 (s) represents the state value function corresponding to the state s of the k+1th iteration cycle, represents the reward obtained by state s selecting action a, Represents the probability that the state s selects the action a to reach the state s', v k (s') represents the state value function corresponding to the k-th iteration cycle state s';
(3)重复步骤(2),直至不同位置不同方向下的状态价值函数均收敛。(3) Step (2) is repeated until the state value functions in different positions and different directions converge.
在本实施例中,所述步骤S4中根据路由之间的时延消耗t和网络消耗p做规范化处理来赋值链路消耗c的方法包括步骤:In this embodiment, the method for assigning link consumption c by performing normalization processing according to the delay consumption t and network consumption p between routes in the step S4 includes the steps:
(1)记录链路中传输所需要的时延消耗t以及网络消耗p;(1) Record the delay consumption t and network consumption p required for transmission in the link;
(2)对二者进行均一化处理后加权求和赋值链路消耗c:(2) After normalizing the two, the weighted summation assigns the link consumption c:
ci=ωtti+ωppi c i =ω t t i +ω p p i
其中,ti和pi表示每条链路对应时延消耗和网络消耗,表示链路时延消耗的最小值,表示链路时延消耗的最大值,表示链路网络消耗的最小值,表示链路网络消耗的最大值;ωt和ωp分别表示时延消耗与网络消耗的加权系数。Among them, t i and p i represent the corresponding delay consumption and network consumption of each link, Indicates the minimum value of link delay consumption, Indicates the maximum value of link delay consumption, represents the minimum value of link network consumption, Represents the maximum value of link network consumption; ω t and ω p represent the weighting coefficient of delay consumption and network consumption, respectively.
在本实施例中,所述步骤S5中根据规范后的链路消耗,使用Sarsa强化学习算法进行路径选择并自适应地更新链路选择以适应动态网络的链路变化的方法包括步骤:In this embodiment, according to the standardized link consumption in the step S5, the method of using the Sarsa reinforcement learning algorithm for path selection and adaptively updating the link selection to adapt to the link change of the dynamic network includes the steps:
(1)随机初始化各路由所连接的链路信息,包括时延消耗t和网络消耗p;(1) Randomly initialize the link information connected to each route, including delay consumption t and network consumption p;
(2)从原服务器至目标服务器的数据信息随机选择路径传输;(2) The data information from the original server to the target server is randomly selected for transmission;
(3)记录数据传输过程中产生的时延消耗t以及产生的网络消耗p,并将其进行标准化后加权求得对应链路消耗c;(3) Record the time delay consumption t and the network consumption p generated in the data transmission process, and weight them to obtain the corresponding link consumption c after normalization;
(4)各路由根据ε贪婪策略选取数据传输的链路,同时记录选择该链路传输至下一路由的链路消耗,各路由根据本次数据的传输更新其对应的状态动作Q值表,其中使用ε贪婪策略选取动作方式和状态动作Q值表更新方式分别为:(4) Each route selects the link for data transmission according to the ε greedy strategy, and records the link consumption of selecting this link to transmit to the next route, and each route updates its corresponding state action Q value table according to the current data transmission, Among them, using the ε greedy strategy to select the action method and the state-action Q-value table update method are:
Q(S,A)←Q(S,A)+α(R+γQ(S',A')-Q(S,A))Q(S,A)←Q(S,A)+α(R+γQ(S',A')-Q(S,A))
其中,π(a|s)表示在状态s下选取动作a的概率,a*表示当前状态s下能够使得Q值最大的动作,m表示可供选择的动作个数。Q(S,A)表示各个状态下选取不同动作对应的状态动作函数值,α是学习速率参数,γ是衰减因子,Q(S',A')表示下一状态对应的状态动作函数值。Among them, π(a|s) represents the probability of selecting action a in state s, a * represents the action that can maximize the Q value in the current state s, and m represents the number of actions that can be selected. Q(S, A) represents the state-action function value corresponding to different actions selected in each state, α is the learning rate parameter, γ is the decay factor, and Q(S', A') represents the state-action function value corresponding to the next state.
本发明综合考虑多种环境因素进行迁移决策的制定以及迁移路径的选择,与已有的服务迁移方法相比,本发明具有以下主要优点:(1)综合考虑多种环境因素,引入服务器负载作为影响用户服务体验的因素,同时引入用户之前移动方向作为一种预测指标,使其影响用户之后移动,更加符合实际场景;(2)综合考虑用户关注的时延因素以及服务供应商所关注的网络消耗因素,使用强化学习求解动态网络环境下自适应的服务迁移路径。The present invention comprehensively considers multiple environmental factors to make migration decisions and select migration paths. Compared with the existing service migration methods, the present invention has the following main advantages: (1) Comprehensively considering multiple environmental factors, the server load is introduced as a Factors that affect the user service experience, and introduce the user's previous movement direction as a predictor to make it affect the user's subsequent movement, which is more in line with the actual scenario; (2) Comprehensively consider the delay factors that users are concerned about and the network that service providers are concerned about Consumption factor, using reinforcement learning to solve adaptive service migration path in dynamic network environment.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.
以上这些实施例应理解为仅用于说明本发明而不用于限制本发明的保护范围。在阅读了本发明的记载的内容之后,技术人员可以对本发明作各种改动或修改,这些等效变化和修饰同样落入本发明权利要求所限定的范围。The above embodiments should be understood as only for illustrating the present invention and not for limiting the protection scope of the present invention. After reading the contents of the description of the present invention, the skilled person can make various changes or modifications to the present invention, and these equivalent changes and modifications also fall within the scope defined by the claims of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111492744.6A CN114339879B (en) | 2021-12-08 | 2021-12-08 | A service migration method based on reinforcement learning in mobile edge computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111492744.6A CN114339879B (en) | 2021-12-08 | 2021-12-08 | A service migration method based on reinforcement learning in mobile edge computing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114339879A true CN114339879A (en) | 2022-04-12 |
CN114339879B CN114339879B (en) | 2025-06-24 |
Family
ID=81050194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111492744.6A Active CN114339879B (en) | 2021-12-08 | 2021-12-08 | A service migration method based on reinforcement learning in mobile edge computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114339879B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114979134A (en) * | 2022-04-21 | 2022-08-30 | 云南大学 | Path selection method for service migration in edge computing environment |
CN118102393A (en) * | 2024-04-25 | 2024-05-28 | 江西师范大学 | A multi-user service migration method based on mobile edge computing |
CN118102234A (en) * | 2024-04-29 | 2024-05-28 | 江西师范大学 | Mobile edge computing service migration method and system based on track information |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105516312A (en) * | 2015-12-09 | 2016-04-20 | 重庆邮电大学 | Software defined networking load balancingdevice and method |
CN111666149A (en) * | 2020-05-06 | 2020-09-15 | 西北工业大学 | Ultra-dense edge computing network mobility management method based on deep reinforcement learning |
US20200320397A1 (en) * | 2019-04-04 | 2020-10-08 | Cisco Technology, Inc. | Learning-based service migration in mobile edge computing |
-
2021
- 2021-12-08 CN CN202111492744.6A patent/CN114339879B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105516312A (en) * | 2015-12-09 | 2016-04-20 | 重庆邮电大学 | Software defined networking load balancingdevice and method |
US20200320397A1 (en) * | 2019-04-04 | 2020-10-08 | Cisco Technology, Inc. | Learning-based service migration in mobile edge computing |
CN111666149A (en) * | 2020-05-06 | 2020-09-15 | 西北工业大学 | Ultra-dense edge computing network mobility management method based on deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
吴和: "《移动边缘计算卸载与资源分配策略研究》", CNKI, 15 February 2021 (2021-02-15), pages 1 - 65 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114979134A (en) * | 2022-04-21 | 2022-08-30 | 云南大学 | Path selection method for service migration in edge computing environment |
CN114979134B (en) * | 2022-04-21 | 2023-01-17 | 云南大学 | Path selection method for service migration in edge computing environment |
CN118102393A (en) * | 2024-04-25 | 2024-05-28 | 江西师范大学 | A multi-user service migration method based on mobile edge computing |
CN118102393B (en) * | 2024-04-25 | 2024-07-02 | 江西师范大学 | A multi-user service migration method based on mobile edge computing |
CN118102234A (en) * | 2024-04-29 | 2024-05-28 | 江西师范大学 | Mobile edge computing service migration method and system based on track information |
Also Published As
Publication number | Publication date |
---|---|
CN114339879B (en) | 2025-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qi et al. | Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach | |
CN114339879A (en) | A service migration method based on reinforcement learning in mobile edge computing | |
CN109947545B (en) | A Decision Method for Task Offloading and Migration Based on User Mobility | |
CN109753751B (en) | MEC random task migration method based on machine learning | |
CN112118601A (en) | Method for reducing task unloading delay of 6G digital twin edge computing network | |
Wu et al. | Mobility-aware deep reinforcement learning with glimpse mobility prediction in edge computing | |
CN119254688B (en) | Hotspot prediction and adaptive routing method for private live broadcast rooms based on swarm intelligence | |
CN113781002B (en) | Low-cost workflow application migration method based on agent model and multi-population optimization in cloud-edge collaborative network | |
CN114500561B (en) | Power Internet of things network resource allocation decision-making method, system, equipment and medium | |
CN115809147B (en) | Multi-edge cooperative cache scheduling optimization method, system and model training method | |
KR20240072551A (en) | Method and apparatus for optimizing energy consumption and delay cost based on resource allocation in mobile edge computing in industrial Internet of Things environment | |
CN113573320A (en) | SFC deployment method based on improved actor-critic algorithm in edge network | |
CN117675918A (en) | Edge area collaborative cache updating method based on multi-agent deep reinforcement learning | |
Yang et al. | Multi-objective deep reinforcement learning for mobile edge computing | |
Wang et al. | Toward intelligent and adaptive task scheduling for 6G: an intent-driven framework | |
Xiang et al. | A new hybrid network traffic prediction method | |
Qu et al. | Optimizing dynamic cache allocation in vehicular edge networks: A method combining multisource data prediction and deep reinforcement learning | |
CN118317365A (en) | Edge cloud computing task migration and resource allocation combined optimization method based on deep learning | |
Mashal et al. | Multiobjective offloading optimization in fog computing using deep reinforcement learning | |
CN117170766A (en) | Real-time task unloading scheduling method for information asymmetric edge computing network | |
CN115914227A (en) | Edge Internet of things agent resource allocation method based on deep reinforcement learning | |
CN115913955A (en) | A Neural Network Model Segmentation and Resource Allocation Method in an Edge Computing System | |
Zhang et al. | A virtual network embedding algorithm based on RBF neural network | |
Channappa et al. | Multi-objective optimization method for task scheduling and resource allocation in cloud environment | |
Boopalan et al. | A Hybrid Improved Particle Swarm Optimization and Genetic Algorithm for Energy Efficient Task Offloading in Industrial IoT Edge Computing Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20250427 Address after: Room 003, 13th Floor, Building 2, No. 9 Tianyou Avenue, High speed Rail Economic Experimental Zone, Shangrao City, Jiangxi Province, 334000 Applicant after: Jiangxi Hualianyuan Universe Digital Technology Co.,Ltd. Country or region after: China Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing Applicant before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS Country or region before: China |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |