[go: up one dir, main page]

CN109802964B - DQN-based HTTP adaptive flow control energy consumption optimization method - Google Patents

DQN-based HTTP adaptive flow control energy consumption optimization method Download PDF

Info

Publication number
CN109802964B
CN109802964B CN201910060941.7A CN201910060941A CN109802964B CN 109802964 B CN109802964 B CN 109802964B CN 201910060941 A CN201910060941 A CN 201910060941A CN 109802964 B CN109802964 B CN 109802964B
Authority
CN
China
Prior art keywords
state
action
value
energy consumption
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910060941.7A
Other languages
Chinese (zh)
Other versions
CN109802964A (en
Inventor
高岭
赵子鑫
袁璐
刘艺
秦晨光
任杰
王海
郑杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN201910060941.7A priority Critical patent/CN109802964B/en
Publication of CN109802964A publication Critical patent/CN109802964A/en
Application granted granted Critical
Publication of CN109802964B publication Critical patent/CN109802964B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种基于DQN的HTTP自适应流控制能耗优化方法,考虑了不同的网络状况,缓存区域内的加载状况,以及客户端设备电量剩余情况,并基于此环境下模拟了使用状况,客户端与服务器的交互过程中,流媒体通过DQN学习系统对多媒体文件进行质量不同的切换,高频低频内核的切换从而达到能耗优化的目的。

Figure 201910060941

A DQN-based HTTP adaptive flow control energy consumption optimization method, which considers different network conditions, loading conditions in the cache area, and the remaining power of the client device, and simulates the usage based on this environment. During the interaction of the server, the streaming media switches the multimedia files with different qualities through the DQN learning system, and switches the high-frequency and low-frequency cores to achieve the purpose of energy consumption optimization.

Figure 201910060941

Description

DQN-based HTTP adaptive flow control energy consumption optimization method
Technical Field
The invention belongs to the technical field of computer network communication, and particularly relates to a DQN-based HTTP adaptive flow control energy consumption optimization method.
Background
In recent years, the development speed of the multimedia field is rapid, the transmission of multimedia contents is more and more emphasized by people, the HTTP video protocol is a mainstream online video watching mode after the popularization of the internet, the HTTP protocol is mainly divided into two stages, the first stage is a progressive downloading stage, and colloquially, the user is supported to download and play the multimedia files at the same time, and the whole files do not need to be downloaded and played. But the streaming transmission is not real streaming transmission and is not different from the downloading of common files, the second stage HTTP streaming technology is mainly characterized in that a server end divides a media file into small slices, a service receives a request and sends the slices of the media file through HTTP response, in the interaction process of the server and a client end, the client end adjusts the slicing code rate in real time according to the state of a network, a high code rate is used under the condition of good network state, a low code rate is used under the condition of busy network state and automatic switching is carried out, the main realization method is that the server section has the marked code rate in each list file provided, a player of the client end can automatically adjust according to the playing progress and the downloading speed, on the basis of ensuring the playing continuity and fluency, the user experience is improved as much as possible, and what we need to do is to carry out deep-level optimization on the energy consumption of client end equipment under the premise of ensuring the playing continuity and fluency, when the client plays an online video, the network state, the cache state and the mobile phone residual power are parts which are often ignored by people, the code rate selection flexibility of the HTTP adaptive stream is low, the HTTP adaptive stream cannot well cope with complex network conditions, the code rate of the video stream which is frequently switched not only causes uncomfortable experience for viewers, but also ignores energy consumption overhead caused by switching, and an energy consumption optimization model based on deep q learning of an enhanced learning and neural network is provided.
Q-learning is a classic method of reinforcement learning, the main core idea of reinforcement learning is that an agent enters the next state by continuously interacting with the environment, the agent obtains a return value by taking a proper action, and Q-learning core Q-table, row and column represent state and action respectively, the Q value in the Q-table is just the measure of how good the state s takes the action a, and how the neural network works here, we can regard it as a black box, the input is a state value, the output is the value of the state, the training data come from some data generated in the whole system operation process, the data are corrected in the process of calculating the return, the corrected values are used as the input of the neural network, secondary training is carried out, the convergence effect is finally achieved, and the optimal strategy is selected.
Disclosure of Invention
In order to overcome the defects of the prior art, the present invention provides a DQN-based HTTP adaptive flow control energy consumption optimization method, which uses a q-learning reinforcement learning method combined with a bp (back propagation) neural network to interact with an environment, wherein the environment is continuously changing, the network changes, and the power consumption during the online video watching process of a user, and the system dynamically matches and switches the video quality in a video player under a variable environment and dynamically schedules different cpu cores to obtain the most suitable media quality level and the most suitable cpu core. Finally, the function of reducing energy consumption is achieved.
In order to achieve the purpose, the invention adopts the technical scheme that:
a DQN-based HTTP adaptive flow control energy consumption optimization method comprises the following steps:
1) environment acquisition modeling: the method comprises the steps that a network used in daily life is simulated by Dummynet, a client is used in 3g,4g and Wifi network environments, current environment information is collected, a client data cache state B is provided, namely a set consisting of three states including a segment length in a current cache region, a network state N and a battery electric quantity E, S is (B, N and E), time is divided into a plurality of time points, the time points correspond to one another, and data are stored;
2) definition of client action set and reward function: establishing a Q-learning state space and an action set of the model according to the environment data collected in the step 1) as a state set, selecting a proper action to enter the next state by the system through the network state, the cache state and the battery power, wherein the action set for establishing the model mainly comprises two action states, and switching the video quality and the high-frequency core and the low-frequency core; the switching of the video slice quality defines the sum of the energy consumption level and the switching overhead as a return function, and the return function has the following two points, wherein the first is energy consumptionThe grade value is a mapping relation formed by the energy consumption grade, different network grades, different video qualities and different CPU cores, wherein the energy consumption grade value is selected from a mapping table, the second value is the expense caused by video switching and large and small core switching, and the value is negative feedback, so that the return function expression is as follows: r ═ C1Re nergy+C2RswitchHere with C1C2The weights of the two reported values are respectively, a specific value is set according to the preferred weight of the user, and the weight value can be 1;
3) the algorithm is realized as follows: selecting an optimal action by continuously interacting with the environment, wherein the neural network mainly has the function of converting a high-dimensional state into a low-dimensional output, the neural network inputs an environment state s by changing the environment state into a low-dimensional state value, outputs a Q value corresponding to the action, uses an epsilon-greedy algorithm in a vector form, randomly selects an action with a small probability epsilon in each state, selects an optimal action with 1-epsilon according to the bp neural network, then adds the randomly selected action and the action selected according to the neural network into a reproduction _ buffer experience pool in the neural network for secondary training, makes an action, reaches the next state, trains the neural network to optimize the input state, and uses an optimal solution strategy for the output value, outputting an optimal solution;
4) in practical problems, the device obtains the environment state value through the system, and selects the best matching quality video and the kernel which saves the most power and does not influence the user experience through the DQN.
The system environment information is characterized in that a defined state set S contains network levels, the network levels are divided into six levels from high to low, but through measurement, the lowest quality, the remaining value of the electric quantity of the mobile phone and the length of a cache segment in a test video cannot be normally loaded under the conditions of 1, 2 levels or 3g, and the cache state of a unit time point, namely the length of the segment, is selected by compiling a script for calling cache information.
The system in the invention interacts with the constantly changing state in the environment, and allocates reasonable streaming media quality and a reasonable CPU core to each segment, and experimental results show that the optimization method can effectively reduce the energy consumption of the mobile streaming media to equipment without influencing user experience, and the energy consumption of a loading part is reduced by twenty-one percent.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Fig. 2 is a diagram of the DQN learning process of the present invention.
Fig. 3 is a diagram of an application scenario of the present invention.
Detailed Description
The present invention will be further described with reference to the following examples, but the present invention is not limited to the following examples.
An HTTP adaptive flow control energy consumption optimization method based on DQN, as shown in fig. 1 and 3, the main working situation of HTTP adaptive flow work is to divide a streaming media file into smaller segments for HTTP request, transmission, etc., so we receive a slice of the streaming media file first, and the system collects the network environment and current electric quantity condition and processes the data, the specific process is as follows:
defining a state set S, dividing the state set S into six levels from high to low, but measuring that the quality in the test video cannot be normally loaded at two levels of 1, 2 or 3g, calculating the residual value of the electric quantity of the mobile phone and the length of the cache segment according to the return value of 0, and selecting the cache state of a unit time point, namely the length of the segment, by writing a script for calling cache information.
Defining action sets, namely an android XU3 development version used by people, a main Cortex-A15 high-frequency core and a Cortex-A7 low-frequency core, wherein the actions are mainly used for adjusting which core is used for working according to environmental changes, which core sleeps, a task selection A15 and a task selection A7, and streaming media quality is divided into lossless, high-definition and low-definition, and the actions are only limited to a video set of experimental tests.
3) The construction of the reward function and the model is selected,
firstly, initializing a neural network, wherein the main function of using the BP neural network is to estimate the value of action in each state, reduce the dimension of a vector, and assign values to a learning rate alpha and a discount factor gamma in a Q value iterative formula and an exploration probability epsilon in action selection. For each iteration cycle, the following process is carried out as shown in fig. 2, after initialization is completed, the state of the system is input, the output is the value generated by the current action, the output is estimated to replace the previous output, the optimal solution is found by optimizing step by step, and after the value of each action is obtained, an epsilon-tree strategy is adopted to find the optimal solution, here, a threshold value is initialized, the initial value is 0.8, namely eighty percent of actions are randomly selected when the actions are selected, twenty percent of actions are calculated through a neural network, the most appropriate one is selected, and the initialized value is lower and lower until the most appropriate one is not randomly selected as learning continues.

Claims (1)

1.一种基于DQN的HTTP自适应流控制能耗优化方法,其特征在于,包括以下步骤:1. a kind of HTTP adaptive flow control energy consumption optimization method based on DQN, is characterized in that, comprises the following steps: 1)环境采集建模:使用Dummynet模拟日常生活中所用到的网络,在3g,4g,Wifi网络环境下使用客户端,并对当前环境信息进行采集,分别由客户端数据缓存状态B,即当前缓存区域内的片段长度,网络状态N,电池电量E三个状态组成的集合,S=(B,N,E),将时间划分为多个时间点,一一对应,并保存数据;1) Environmental acquisition modeling: Use Dummynet to simulate the network used in daily life, use the client in the 3g, 4g, Wifi network environment, and collect the current environmental information, respectively, by the client data cache state B, that is, the current The length of the segment in the cache area, the network state N, and the set of three states of the battery power E, S=(B, N, E), divide the time into multiple time points, one-to-one correspondence, and save the data; 2)客户端动作集与回报函数的定义:根据步骤1)中所采集的环境数据作为状态集建立Q-learning的状态空间,建立模型的动作集,系统通过对网络状态,缓存状态以及电池电量来选择合适的动作进入下一个状态,建立模型的动作集主要由两个动作状态构成,视频切片质量的切换,高频内核低频内核的切换;视频切片质量的切换,将能耗能级与切换开销之和定义为回报函数,回报函数构成有以下两点,第一是能耗等级值,由能耗等级,不同的网络等级,不同的视频质量,不同的cpu内核的使用形成一个映射关系,这里的能耗等级值从映射表中选取,第二个值是视频切换以及大小核切换所带来的开销,这个值是一个负反馈,所以回报函数表达式为:R=C1Renergy+C2Rswitch,这里与C1、C2分别是两个回报值的权值,根据用户偏好的侧重来设定具体的值,权重值可为1;2) Definition of client action set and reward function: According to the environmental data collected in step 1), the state space of Q-learning is established as the state set, and the action set of the model is established. To select the appropriate action to enter the next state, the action set for establishing the model is mainly composed of two action states, the switching of video slice quality, the switching of high-frequency kernel and low-frequency kernel; the switching of video slice quality, the energy consumption level and switching The sum of the overhead is defined as the reward function. The reward function consists of the following two points. The first is the energy consumption level value, which is a mapping relationship formed by the energy consumption level, different network levels, different video quality, and the use of different cpu cores. The energy consumption level value here is selected from the mapping table. The second value is the overhead caused by video switching and large and small core switching. This value is a negative feedback, so the reward function expression is: R=C 1 R energy + C 2 R switch , where C 1 and C 2 are the weights of the two reward values, respectively. The specific value is set according to the user's preference, and the weight value can be 1; 3)算法实现:运用Deep Q Learning算法,是结合了bp神经网络的Q-learning算法,通过与环境的不断交互,选取最佳动作,神经网络的主要作用是将高维度的状态转换为低维输出,神经网络通过将环境状态中的高维度状态值变为低维 度状态值,使环境状态s进行输入,输出动作所对应的Q值以一个向量的形式,使用了ε-greedy贪婪算法,在每一个状态下,以小的概率ε随机选择动作,以1-ε的概率根据bp神经网络选择最优的动作,之后将随机选择的动作和根据神经网络选择的动作加入我们神经网络中的回访缓存经验池中进行二次训练,做出动作,到达下一状态,神经网络训练优化输入状态,输出值运用最优解策略,输出最优解;3) Algorithm implementation: The Deep Q Learning algorithm is used, which is a Q-learning algorithm combined with the bp neural network. Through continuous interaction with the environment, the optimal action is selected. The main function of the neural network is to convert the high-dimensional state into a low-dimensional state. Output, the neural network makes the environment state s input by changing the high-dimensional state value in the environmental state into a low-dimensional state value, and the Q value corresponding to the output action is in the form of a vector, using the ε-greedy greedy algorithm, in In each state, randomly select an action with a small probability ε, select the optimal action according to the bp neural network with a probability of 1-ε, and then add the randomly selected action and the action selected according to the neural network to the return visit in our neural network Perform secondary training in the cache experience pool, make actions, and reach the next state. The neural network is trained to optimize the input state, and the output value uses the optimal solution strategy to output the optimal solution; 4)在实际问题中,设备通过系统获取环境状态值,通过DQN选择最匹配的质量视频与最省电且不影响用户体验的内核。4) In the actual problem, the device obtains the environmental state value through the system, and selects the most matching quality video and the most power-saving kernel without affecting the user experience through DQN.
CN201910060941.7A 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method Active CN109802964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910060941.7A CN109802964B (en) 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910060941.7A CN109802964B (en) 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method

Publications (2)

Publication Number Publication Date
CN109802964A CN109802964A (en) 2019-05-24
CN109802964B true CN109802964B (en) 2021-09-28

Family

ID=66560085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910060941.7A Active CN109802964B (en) 2019-01-23 2019-01-23 DQN-based HTTP adaptive flow control energy consumption optimization method

Country Status (1)

Country Link
CN (1) CN109802964B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414725B (en) * 2019-07-11 2021-02-19 山东大学 Wind farm energy storage system dispatching method and device integrating forecasting and decision making
CN114885208B (en) * 2022-03-21 2023-08-08 中南大学 Dynamic adaptive method, device and medium for scalable streaming media transmission under NDN network
CN117979054B (en) * 2024-02-04 2025-08-22 山东大学 Energy saving method for playing short video streaming media in mobile terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN108737382A (en) * 2018-04-23 2018-11-02 浙江工业大学 SVC coding HTTP streaming media self-adaption method based on Q-L earning
AU2017268276A1 (en) * 2016-05-16 2018-12-06 Wi-Tronix, Llc Video content analysis system and method for transportation system
CN108966330A (en) * 2018-09-21 2018-12-07 西北大学 A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11062207B2 (en) * 2016-11-04 2021-07-13 Raytheon Technologies Corporation Control systems using deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017268276A1 (en) * 2016-05-16 2018-12-06 Wi-Tronix, Llc Video content analysis system and method for transportation system
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN108737382A (en) * 2018-04-23 2018-11-02 浙江工业大学 SVC coding HTTP streaming media self-adaption method based on Q-L earning
CN108966330A (en) * 2018-09-21 2018-12-07 西北大学 A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Evaluation of Q-Learning approach for HTTP Adaptive Streaming;Virginia Martın;《2016 IEEE International Conference on Consumer Electronics》;20160314;第293-294页 *
Live Streaming with Content Centric Networking;Hongfeng Xu;《2012 Third International Conference on Networking and Distributed Computing》;20121231;第1-5页 *
基于Q-learning的HTTP自适应流码率控制方法研究;熊丽荣;《通信学报》;20170925;第38卷(第9期);第18-24页 *

Also Published As

Publication number Publication date
CN109802964A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
Zhang et al. A multi-agent reinforcement learning approach for efficient client selection in federated learning
CN113434212A (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN109802964B (en) DQN-based HTTP adaptive flow control energy consumption optimization method
CN104598292B (en) A kind of self adaptation stream adaptation and method for optimizing resources applied to cloud game system
CN113114756A (en) Video cache updating method for self-adaptive code rate selection in mobile edge calculation
US11570063B2 (en) Quality of experience optimization system and method
CN114706433B (en) Equipment control method and device and electronic equipment
CN108966330A (en) A kind of mobile terminal music player dynamic regulation energy consumption optimization method based on Q-learning
CN114090108B (en) Computing task execution methods, devices, electronic equipment and storage media
CN112866756B (en) Code rate control method, device, medium and equipment for multimedia file
CN110209845B (en) Recommendation method, device and storage medium of multimedia content
CN118069506A (en) Method and system for generating path coverage test data based on reinforcement learning selection strategy
CN113672372A (en) A multi-edge collaborative load balancing task scheduling method based on reinforcement learning
Mowafi et al. Energy efficient fuzzy-based DASH adaptation algorithm
CN120264046A (en) Energy-saving video adaptive bitrate optimization method based on deep reinforcement learning
Lin et al. KNN-Q learning algorithm of bitrate adaptation for video streaming over HTTP
CN114138493A (en) Edge computing power resource scheduling method based on energy consumption perception
CN113015179B (en) Network resource selection method and device based on deep Q network and storage medium
Jia et al. DQN algorithm based on target value network parameter dynamic update
CN116233088B (en) Real-time super-division video stream transmission optimization method based on end cloud cooperation
CN118605807A (en) A file storage method based on deep reinforcement learning file value algorithm
CN118093054A (en) Edge computing task offloading method and device
CN110879730B (en) Method and device for automatically adjusting game configuration, electronic equipment and storage medium
CN118301675A (en) Star-earth cooperative network cache optimization method and related equipment
CN110743164A (en) Dynamic resource partitioning method for reducing response delay in cloud game

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant