CN109802964B

CN109802964B - DQN-based HTTP adaptive flow control energy consumption optimization method

Info

Publication number: CN109802964B
Application number: CN201910060941.7A
Authority: CN
Inventors: 高岭; 赵子鑫; 袁璐; 刘艺; 秦晨光; 任杰; 王海; 郑杰
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2021-09-28
Anticipated expiration: 2039-01-23
Also published as: CN109802964A

Abstract

A DQN-based HTTP adaptive flow control energy consumption optimization method, which considers different network conditions, loading conditions in the cache area, and the remaining power of the client device, and simulates the usage based on this environment. During the interaction of the server, the streaming media switches the multimedia files with different qualities through the DQN learning system, and switches the high-frequency and low-frequency cores to achieve the purpose of energy consumption optimization.

Description

DQN-based HTTP adaptive flow control energy consumption optimization method

Technical Field

The invention belongs to the technical field of computer network communication, and particularly relates to a DQN-based HTTP adaptive flow control energy consumption optimization method.

Background

In recent years, the development speed of the multimedia field is rapid, the transmission of multimedia contents is more and more emphasized by people, the HTTP video protocol is a mainstream online video watching mode after the popularization of the internet, the HTTP protocol is mainly divided into two stages, the first stage is a progressive downloading stage, and colloquially, the user is supported to download and play the multimedia files at the same time, and the whole files do not need to be downloaded and played. But the streaming transmission is not real streaming transmission and is not different from the downloading of common files, the second stage HTTP streaming technology is mainly characterized in that a server end divides a media file into small slices, a service receives a request and sends the slices of the media file through HTTP response, in the interaction process of the server and a client end, the client end adjusts the slicing code rate in real time according to the state of a network, a high code rate is used under the condition of good network state, a low code rate is used under the condition of busy network state and automatic switching is carried out, the main realization method is that the server section has the marked code rate in each list file provided, a player of the client end can automatically adjust according to the playing progress and the downloading speed, on the basis of ensuring the playing continuity and fluency, the user experience is improved as much as possible, and what we need to do is to carry out deep-level optimization on the energy consumption of client end equipment under the premise of ensuring the playing continuity and fluency, when the client plays an online video, the network state, the cache state and the mobile phone residual power are parts which are often ignored by people, the code rate selection flexibility of the HTTP adaptive stream is low, the HTTP adaptive stream cannot well cope with complex network conditions, the code rate of the video stream which is frequently switched not only causes uncomfortable experience for viewers, but also ignores energy consumption overhead caused by switching, and an energy consumption optimization model based on deep q learning of an enhanced learning and neural network is provided.

Q-learning is a classic method of reinforcement learning, the main core idea of reinforcement learning is that an agent enters the next state by continuously interacting with the environment, the agent obtains a return value by taking a proper action, and Q-learning core Q-table, row and column represent state and action respectively, the Q value in the Q-table is just the measure of how good the state s takes the action a, and how the neural network works here, we can regard it as a black box, the input is a state value, the output is the value of the state, the training data come from some data generated in the whole system operation process, the data are corrected in the process of calculating the return, the corrected values are used as the input of the neural network, secondary training is carried out, the convergence effect is finally achieved, and the optimal strategy is selected.

Disclosure of Invention

In order to overcome the defects of the prior art, the present invention provides a DQN-based HTTP adaptive flow control energy consumption optimization method, which uses a q-learning reinforcement learning method combined with a bp (back propagation) neural network to interact with an environment, wherein the environment is continuously changing, the network changes, and the power consumption during the online video watching process of a user, and the system dynamically matches and switches the video quality in a video player under a variable environment and dynamically schedules different cpu cores to obtain the most suitable media quality level and the most suitable cpu core. Finally, the function of reducing energy consumption is achieved.

In order to achieve the purpose, the invention adopts the technical scheme that:

a DQN-based HTTP adaptive flow control energy consumption optimization method comprises the following steps:

1) environment acquisition modeling: the method comprises the steps that a network used in daily life is simulated by Dummynet, a client is used in 3g,4g and Wifi network environments, current environment information is collected, a client data cache state B is provided, namely a set consisting of three states including a segment length in a current cache region, a network state N and a battery electric quantity E, S is (B, N and E), time is divided into a plurality of time points, the time points correspond to one another, and data are stored;

2) definition of client action set and reward function: establishing a Q-learning state space and an action set of the model according to the environment data collected in the step 1) as a state set, selecting a proper action to enter the next state by the system through the network state, the cache state and the battery power, wherein the action set for establishing the model mainly comprises two action states, and switching the video quality and the high-frequency core and the low-frequency core; the switching of the video slice quality defines the sum of the energy consumption level and the switching overhead as a return function, and the return function has the following two points, wherein the first is energy consumptionThe grade value is a mapping relation formed by the energy consumption grade, different network grades, different video qualities and different CPU cores, wherein the energy consumption grade value is selected from a mapping table, the second value is the expense caused by video switching and large and small core switching, and the value is negative feedback, so that the return function expression is as follows: r ═ C₁R_{e nergy}+C₂R_switchHere with C₁C₂The weights of the two reported values are respectively, a specific value is set according to the preferred weight of the user, and the weight value can be 1;

3) the algorithm is realized as follows: selecting an optimal action by continuously interacting with the environment, wherein the neural network mainly has the function of converting a high-dimensional state into a low-dimensional output, the neural network inputs an environment state s by changing the environment state into a low-dimensional state value, outputs a Q value corresponding to the action, uses an epsilon-greedy algorithm in a vector form, randomly selects an action with a small probability epsilon in each state, selects an optimal action with 1-epsilon according to the bp neural network, then adds the randomly selected action and the action selected according to the neural network into a reproduction _ buffer experience pool in the neural network for secondary training, makes an action, reaches the next state, trains the neural network to optimize the input state, and uses an optimal solution strategy for the output value, outputting an optimal solution;

4) in practical problems, the device obtains the environment state value through the system, and selects the best matching quality video and the kernel which saves the most power and does not influence the user experience through the DQN.

The system environment information is characterized in that a defined state set S contains network levels, the network levels are divided into six levels from high to low, but through measurement, the lowest quality, the remaining value of the electric quantity of the mobile phone and the length of a cache segment in a test video cannot be normally loaded under the conditions of 1, 2 levels or 3g, and the cache state of a unit time point, namely the length of the segment, is selected by compiling a script for calling cache information.

The system in the invention interacts with the constantly changing state in the environment, and allocates reasonable streaming media quality and a reasonable CPU core to each segment, and experimental results show that the optimization method can effectively reduce the energy consumption of the mobile streaming media to equipment without influencing user experience, and the energy consumption of a loading part is reduced by twenty-one percent.

Drawings

FIG. 1 is a flow chart of the system of the present invention.

Fig. 2 is a diagram of the DQN learning process of the present invention.

Fig. 3 is a diagram of an application scenario of the present invention.

Detailed Description

The present invention will be further described with reference to the following examples, but the present invention is not limited to the following examples.

An HTTP adaptive flow control energy consumption optimization method based on DQN, as shown in fig. 1 and 3, the main working situation of HTTP adaptive flow work is to divide a streaming media file into smaller segments for HTTP request, transmission, etc., so we receive a slice of the streaming media file first, and the system collects the network environment and current electric quantity condition and processes the data, the specific process is as follows:

defining a state set S, dividing the state set S into six levels from high to low, but measuring that the quality in the test video cannot be normally loaded at two levels of 1, 2 or 3g, calculating the residual value of the electric quantity of the mobile phone and the length of the cache segment according to the return value of 0, and selecting the cache state of a unit time point, namely the length of the segment, by writing a script for calling cache information.

Defining action sets, namely an android XU3 development version used by people, a main Cortex-A15 high-frequency core and a Cortex-A7 low-frequency core, wherein the actions are mainly used for adjusting which core is used for working according to environmental changes, which core sleeps, a task selection A15 and a task selection A7, and streaming media quality is divided into lossless, high-definition and low-definition, and the actions are only limited to a video set of experimental tests.

3) The construction of the reward function and the model is selected,

firstly, initializing a neural network, wherein the main function of using the BP neural network is to estimate the value of action in each state, reduce the dimension of a vector, and assign values to a learning rate alpha and a discount factor gamma in a Q value iterative formula and an exploration probability epsilon in action selection. For each iteration cycle, the following process is carried out as shown in fig. 2, after initialization is completed, the state of the system is input, the output is the value generated by the current action, the output is estimated to replace the previous output, the optimal solution is found by optimizing step by step, and after the value of each action is obtained, an epsilon-tree strategy is adopted to find the optimal solution, here, a threshold value is initialized, the initial value is 0.8, namely eighty percent of actions are randomly selected when the actions are selected, twenty percent of actions are calculated through a neural network, the most appropriate one is selected, and the initialized value is lower and lower until the most appropriate one is not randomly selected as learning continues.

Claims

1. a kind of HTTP adaptive flow control energy consumption optimization method based on DQN, is characterized in that, comprises the following steps:

1) Environmental acquisition modeling: Use Dummynet to simulate the network used in daily life, use the client in the 3g, 4g, Wifi network environment, and collect the current environmental information, respectively, by the client data cache state B, that is, the current The length of the segment in the cache area, the network state N, and the set of three states of the battery power E, S=(B, N, E), divide the time into multiple time points, one-to-one correspondence, and save the data;

2) Definition of client action set and reward function: According to the environmental data collected in step 1), the state space of Q-learning is established as the state set, and the action set of the model is established. To select the appropriate action to enter the next state, the action set for establishing the model is mainly composed of two action states, the switching of video slice quality, the switching of high-frequency kernel and low-frequency kernel; the switching of video slice quality, the energy consumption level and switching The sum of the overhead is defined as the reward function. The reward function consists of the following two points. The first is the energy consumption level value, which is a mapping relationship formed by the energy consumption level, different network levels, different video quality, and the use of different cpu cores. The energy consumption level value here is selected from the mapping table. The second value is the overhead caused by video switching and large and small core switching. This value is a negative feedback, so the reward function expression is: R=C ₁ R _energy + C ₂ R _switch , where C ₁ and C ₂ are the weights of the two reward values, respectively. The specific value is set according to the user's preference, and the weight value can be 1;

3) Algorithm implementation: The Deep Q Learning algorithm is used, which is a Q-learning algorithm combined with the bp neural network. Through continuous interaction with the environment, the optimal action is selected. The main function of the neural network is to convert the high-dimensional state into a low-dimensional state. Output, the neural network makes the environment state s input by changing the high-dimensional state value in the environmental state into a low-dimensional state value, and the Q value corresponding to the output action is in the form of a vector, using the ε-greedy greedy algorithm, in In each state, randomly select an action with a small probability ε, select the optimal action according to the bp neural network with a probability of 1-ε, and then add the randomly selected action and the action selected according to the neural network to the return visit in our neural network Perform secondary training in the cache experience pool, make actions, and reach the next state. The neural network is trained to optimize the input state, and the output value uses the optimal solution strategy to output the optimal solution;

4) In the actual problem, the device obtains the environmental state value through the system, and selects the most matching quality video and the most power-saving kernel without affecting the user experience through DQN.