[go: up one dir, main page]

CN116227558A - Neural network dynamic exit lightweight method and system for multiple continuous reasoning - Google Patents

Neural network dynamic exit lightweight method and system for multiple continuous reasoning Download PDF

Info

Publication number
CN116227558A
CN116227558A CN202310306228.2A CN202310306228A CN116227558A CN 116227558 A CN116227558 A CN 116227558A CN 202310306228 A CN202310306228 A CN 202310306228A CN 116227558 A CN116227558 A CN 116227558A
Authority
CN
China
Prior art keywords
inference
reasoning
layer
exit
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310306228.2A
Other languages
Chinese (zh)
Inventor
邹桉
马叶涵
沈颖涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong University filed Critical Shanghai Jiao Tong University
Priority to CN202310306228.2A priority Critical patent/CN116227558A/en
Publication of CN116227558A publication Critical patent/CN116227558A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a neural network dynamic exit light-weight method and a system aiming at multiple continuous reasoning, comprising the following steps: step 1: constructing an inference model based on a neural network, predicting the position of network exit for each inference within a preset small time range, and correspondingly predicting a calculation configuration, wherein the calculation configuration comprises frequency and voltage; for a plurality of inferences in a preset large time range, calibrating the frequency and the voltage of the processor through residual inference workload and time constraint; step 2: the configuration is calculated according to the prediction and calibration to execute the neural network to realize dynamic voltage and frequency adjustment. Compared with a classical deep learning network, the method can realize energy saving of 63.8%, ensure that multiple times of neural network reasoning is completed within a specified time, terminate the reasoning in advance and obtain an accurate result by early exit, and reduce calculation and energy cost.

Description

针对多次连续推理的神经网络动态退出轻量化方法和系统Neural network dynamic exit lightweight method and system for multiple consecutive inferences

技术领域Technical Field

本发明涉及神经网络技术领域,具体地,涉及一种针对多次连续推理的神经网络动态退出轻量化方法和系统。The present invention relates to the field of neural network technology, and in particular to a neural network dynamic exit lightweight method and system for multiple consecutive reasonings.

背景技术Background Art

深度学习方法,如卷积神经网络,在多用途应用中取得了巨大成功。然而,在资源受限的系统上部署深度学习模型的挑战之一是其巨大的能源成本。作为一种动态推理方法,早期退出为网络添加了退出层,这可以提前终止推理,并获得准确的结果以节省能源。目前用于提前退出的能量调节的被动决策无法适应正在进行的推理状态、不同的推理工作负载和时间限制,更不用说在推理过程中指导计算平台的合理配置以节省潜在的能源了。Deep learning methods, such as convolutional neural networks, have achieved great success in multi-purpose applications. However, one of the challenges in deploying deep learning models on resource-constrained systems is their huge energy cost. As a dynamic inference method, early exit adds a dropout layer to the network, which can terminate the inference early and obtain accurate results to save energy. The current passive decision-making for energy regulation of early exit cannot adapt to the ongoing inference state, different inference workloads and time constraints, let alone guide the reasonable configuration of the computing platform during the inference process to save potential energy.

专利文献US20210056357A1公开了用于实现灵活、输入自适应深度学习神经网络的系统和方法;专利文献US20210012178A1公开了用于早期退出卷积的系统、方法和设备;专利文献EP3997621A1公开了用于早期退出卷积的系统、方法和设备。上述文献虽然提出了提前退出的方法,但是无法实现对提前退出点的预测。Patent document US20210056357A1 discloses a system and method for implementing a flexible, input-adaptive deep learning neural network; patent document US20210012178A1 discloses a system, method, and device for early exit convolution; patent document EP3997621A1 discloses a system, method, and device for early exit convolution. Although the above documents propose a method for early exit, they cannot predict the early exit point.

专利文献CN114997370A公开了一种基于预测退出的低功耗神经网络系统及其实现方法,虽然该专利文献提出了对提前退出点的预测,但是只能对单次神经网络推理进行预测并降低计算量和系统能耗,且预测精度低,无法高精度地从网络的前几层甚至第一次开始预测,无法高精度地对大型网络模型进行提前退出点预测。Patent document CN114997370A discloses a low-power neural network system based on predictive exit and its implementation method. Although the patent document proposes a prediction of early exit points, it can only predict a single neural network inference and reduce the amount of calculation and system energy consumption. The prediction accuracy is low, and it is impossible to predict with high accuracy from the first few layers of the network or even the first time, and it is impossible to predict the early exit point of a large network model with high accuracy.

发明内容Summary of the invention

针对现有技术中的缺陷,本发明的目的是提供一种针对多次连续推理的神经网络动态退出轻量化方法和系统。In view of the defects in the prior art, the purpose of the present invention is to provide a method and system for dynamically exiting lightweight neural networks for multiple consecutive reasonings.

根据本发明提供的针对多次连续推理的神经网络动态退出轻量化方法,包括:The method for dynamically exiting a neural network from lightweighting for multiple consecutive reasonings provided by the present invention includes:

步骤1:构建基于神经网络的推理模型,对于预设小时间范围内的每个推理,预测网络退出的位置,并相应地预测计算配置,所述计算配置包括频率和电压;Step 1: Build a neural network-based inference model, predict the location where the network exits for each inference within a preset small time range, and predict the computing configuration accordingly, the computing configuration including frequency and voltage;

步骤2:对于预设大时间范围内的多个推理,通过剩余推理工作量和时间约束,进行处理器频率和电压校准;Step 2: For multiple inferences within a preset large time range, calibrate the processor frequency and voltage based on the remaining inference workload and time constraints;

步骤3:根据预测和校准计算配置以执行神经网络,从而实现动态电压和频率调节。Step 3: Calculate the configuration based on the prediction and calibration to execute the neural network to achieve dynamic voltage and frequency adjustment.

优选的,在推理过程中设置推理的协调周期即推理任务期限,表达式为:Preferably, a coordination period of reasoning, i.e., a deadline of the reasoning task, is set during the reasoning process, and the expression is:

Figure BDA0004146935970000021
Figure BDA0004146935970000021

Figure BDA0004146935970000022
Figure BDA0004146935970000022

其中,λ表示任务截至时间的松紧,0<λ≤1,λ越小,期限越紧;ti表示每次推理实际完成时间;Tc为协调周期;M为推理任务的个数;Ti表示第i个推理任务的协调期限。Among them, λ represents the tightness of the task deadline, 0<λ≤1, the smaller λ is, the tighter the deadline is; ti represents the actual completion time of each reasoning; Tc is the coordination cycle; M is the number of reasoning tasks; Ti represents the coordination deadline of the i-th reasoning task.

优选的,在执行推理任务时,计算过程中所消耗的功率为有功功率Pactive,为动态功率PD、静态功率PS、恒功率PC的总和;Preferably, when performing an inference task, the power consumed in the calculation process is active power P active , which is the sum of dynamic power P D , static power P S , and constant power P C ;

计算平台空闲时消耗的功率为空闲功率Pidle,为静态功率PS、恒功率PC的总和;The power consumed by the computing platform when it is idle is the idle power P idle , which is the sum of the static power P S and the constant power P C ;

PD=CV2f PDCV2f

PS=V Ntr IS PS VNtrIS

其中,C为开关逻辑门的电容;V表示处理器的供电电压;f表示处理器时钟频率;Ntr为逻辑门的数目;IS为每个逻辑门的归一化静态电流;Where C is the capacitance of the switching logic gate; V is the power supply voltage of the processor; f is the processor clock frequency; N tr is the number of logic gates; I S is the normalized static current of each logic gate;

Ti内每个推理的能量消耗为:The energy consumption of each inference in Ti is:

Figure BDA0004146935970000023
Figure BDA0004146935970000023

位于早期退出之前的网络层运行在(VH,fH),在ti点网络退出后,计算平台通过动态电压频率调整技术DVFS将电压和频率降至最低电平(VL,fL),因此,推理任务ji的能量消耗表示为:The network layer before early exit runs at (V H , f H ). After the network exits at point t i , the computing platform reduces the voltage and frequency to the lowest level (V L , f L ) through the dynamic voltage and frequency scaling technology DVFS. Therefore, the energy consumption of inference task j i is expressed as:

Figure BDA0004146935970000024
Figure BDA0004146935970000024

Ei表示推理任务ji能量消耗;VH表示处理器高的供电电压;fH表示处理器高时钟频率;VL表示处理器低的供电电压。E i represents the energy consumption of reasoning task j i ; V H represents the high power supply voltage of the processor; f H represents the high clock frequency of the processor; V L represents the low power supply voltage of the processor.

优选的,在一个推理任务ji中,先预测网络退出位置,建立剩余层数以完成该推理;然后根据剩余层数和最推理坏情况执行时间Ti直到当前推理结束,建立(V,f)进行预测;根据Tc上剩余的推理量,通过反馈控制策略启用交叉推理校准器,根据多推理进度、总推理工作量和时间约束来校准(V,f);根据预测和校准,通过DVFS在运行时更新处理器到适当的计算配置。Preferably, in an inference task j i , the network exit position is first predicted, and the remaining number of layers is established to complete the inference; then (V, f) is established for prediction based on the remaining number of layers and the worst-case execution time Ti until the end of the current inference; based on the remaining amount of inference on T c , the cross-inference calibrator is enabled through the feedback control strategy, and (V, f) is calibrated according to the multi-inference progress, the total inference workload and the time constraint; based on the prediction and calibration, the processor is updated to the appropriate computing configuration at runtime through DVFS.

优选的,在推理任务ji中,现有层的包含BoF池化层和FC层,设yi为现有层第i层的中间结果,Nc为数据集中对象类的数量,BoF池化层作为从yi中提取的特征聚合,在BoF池化中,使用一组称为码本的特征向量来描述yi,每个码本的权重是通过测量码本和yi之间的相似度来生成的,码本权重的大小大于Nc,因此FC层进一步作为分类器,将BoF池化的结果调整为

Figure BDA0004146935970000031
从而估计出现有层的最终输出,分类器的估计结果为:Preferably, in the inference task j i , the existing layer includes a BoF pooling layer and a FC layer. Let yi be the intermediate result of the i-th layer of the existing layer, N c be the number of object classes in the data set, and the BoF pooling layer is used as a feature aggregation extracted from yi . In BoF pooling, a set of feature vectors called codebooks is used to describe yi . The weight of each codebook is generated by measuring the similarity between the codebook and yi . The size of the codebook weight is greater than N c , so the FC layer is further used as a classifier to adjust the result of BoF pooling to
Figure BDA0004146935970000031
Thus, the final output of the existing layer is estimated, and the estimated result of the classifier is:

Figure BDA0004146935970000032
Figure BDA0004146935970000032

其中,fw(x,i)为第i层的中间结果;x为初始网络输入;Wi为第i个退出层的参数,退出层即为BoF池化层和FC层;

Figure BDA0004146935970000033
是分类器的估计结果;Where fw (x,i) is the intermediate result of the i-th layer; x is the initial network input; Wi is the parameter of the i-th exit layer, and the exit layer is the BoF pooling layer and the FC layer;
Figure BDA0004146935970000033
is the estimated result of the classifier;

对现有层中的参数进行离线训练,在训练Wi时选择相同的交叉熵损失函数;Train the parameters in the existing layers offline, and choose the same cross entropy loss function when training Wi ;

Figure BDA0004146935970000034
Figure BDA0004146935970000034

其中,p(i)和q(i)分别是每个对象的实际和预测的类分布;

Figure BDA0004146935970000035
为交叉熵;Where p(i) and q(i) are the actual and predicted class distributions of each object, respectively;
Figure BDA0004146935970000035
is the cross entropy;

在训练现有层时,将训练集数据{x1,x2,…,xN}输入模型,N为训练数据个数,通过批量梯度下降对现有层参数进行优化,设Ltotal为网络的层数,目标向量集为

Figure BDA0004146935970000036
Figure BDA0004146935970000037
为正确结果;首先,在没有任何现有层的情况下,用以下公式训练模型:When training the existing layer, the training set data {x1, x2, …, xN} is input into the model, N is the number of training data, and the existing layer parameters are optimized by batch gradient descent. Let L total be the number of layers in the network, and the target vector set is
Figure BDA0004146935970000036
Figure BDA0004146935970000037
To get the correct result; first, without any existing layers, train the model with the following formula:

Figure BDA0004146935970000038
Figure BDA0004146935970000038

其中,W'为更新后的整体网络参数;η为学习率,如果精度不能满足要求,则调整η;j表示向量位置;

Figure BDA0004146935970000039
表示参数梯度;xj表示训练数据向量元素;rj表示目标数据向量元素;Where W' is the updated overall network parameter; η is the learning rate. If the accuracy cannot meet the requirements, adjust η; j represents the vector position;
Figure BDA0004146935970000039
represents parameter gradient; x j represents training data vector element; r j represents target data vector element;

对原模型W进行固定,在现有层的基础上,再次训练模型,优化BoF池化层和FC层的参数,表达式为:The original model W is fixed, and based on the existing layers, the model is trained again to optimize the parameters of the BoF pooling layer and the FC layer. The expression is:

Figure BDA0004146935970000041
Figure BDA0004146935970000041

训练完退出层后,计算平均特征权重μi作为退出决策的参数,表达式为:After training the exit layer, the average feature weight μ i is calculated as the parameter for the exit decision, and the expression is:

Figure BDA0004146935970000042
Figure BDA0004146935970000042

其中,k表示第k个全体分类目标;Among them, k represents the kth overall classification target;

在初始网络输入xj的推断过程中,在每个早期退出层,权重比αW为最大特征权重除以μi乘以用户指定的超参数β,最大特征权值越大,认为分类越有信心,若αW大于1,则推论终止,在这个早期退出层的结果被应用为最终结果,表达式为:During the inference process of the initial network input xj , at each early exit layer, the weight ratio αW is the maximum feature weight divided by μi multiplied by the user-specified hyperparameter β. The larger the maximum feature weight, the more confident the classification is. If αW is greater than 1, the inference is terminated, and the result of this early exit layer is applied as the final result, expressed as:

Figure BDA0004146935970000043
Figure BDA0004146935970000043

优选的,设预测器从神经网络的第L0层开始预测,预测器输入为第L0层神经网络的执行结果向量y0及其对应的退出层执行结果向量

Figure BDA0004146935970000044
Preferably, the predictor starts predicting from the L0th layer of the neural network, and the predictor input is the execution result vector y0 of the L0th layer of the neural network and its corresponding exit layer execution result vector
Figure BDA0004146935970000044

对上述结果进行扩充维度,通过在向量

Figure BDA0004146935970000045
两端添加0的方式,将向量
Figure BDA0004146935970000046
Figure BDA0004146935970000047
维扩展到
Figure BDA0004146935970000048
维,并记为
Figure BDA0004146935970000049
其中Nc为向量
Figure BDA00041469359700000410
长度同时也是推理结果分类的数量,K为特征池化组中特征向量的个数,具体计算过程如下述公式所示:The above results are expanded in dimension by using the vector
Figure BDA0004146935970000045
Add 0 at both ends to convert the vector
Figure BDA0004146935970000046
Depend on
Figure BDA0004146935970000047
Dimensions expanded to
Figure BDA0004146935970000048
dimension, and recorded as
Figure BDA0004146935970000049
Where N c is a vector
Figure BDA00041469359700000410
The length is also the number of inference result categories, K is the number of feature vectors in the feature pooling group, and the specific calculation process is shown in the following formula:

Figure BDA00041469359700000411
Figure BDA00041469359700000411

l表示第l个元素;l represents the lth element;

经过零填充后的输出向量

Figure BDA00041469359700000412
将进行一维卷积操作并获得新的特征权重向量
Figure BDA00041469359700000413
其中卷积权重为长度为K的一维向量h,具体计算过程如下述公式所示:The output vector is zero-padded
Figure BDA00041469359700000412
A one-dimensional convolution operation will be performed and a new feature weight vector will be obtained
Figure BDA00041469359700000413
The convolution weight is a one-dimensional vector h of length K. The specific calculation process is shown in the following formula:

Figure BDA00041469359700000414
Figure BDA00041469359700000414

通过递归将中间估计结果

Figure BDA00041469359700000415
替换为
Figure BDA00041469359700000416
并重复计算式,得到L0+2位置的出料层的预测结果,记为
Figure BDA00041469359700000417
By recursively converting the intermediate estimation results
Figure BDA00041469359700000415
Replace with
Figure BDA00041469359700000416
Repeat the calculation formula to obtain the predicted result of the discharge layer at position L 0 +2, which is recorded as
Figure BDA00041469359700000417

将退出层的预测结果放置在L0之后的任意一层,得到PREDICT函数,描述如下:Place the prediction result of the exit layer in any layer after L 0 to obtain the PREDICT function, which is described as follows:

Figure BDA0004146935970000059
其中,
Figure BDA0004146935970000051
表示L0后的预测出点的最小值;
Figure BDA0004146935970000059
in,
Figure BDA0004146935970000051
Indicates the minimum value of the predicted output point after L 0 ;

Figure BDA0004146935970000052
Figure BDA0004146935970000052

式中,

Figure BDA0004146935970000053
为L0+ζ层的预测出口置信度,因此,L0+ζ是预测函数预测的退出点,如果找不到ζ,则引入一个超参数τ,即ζ∈[1,τ],τ≤Ltotaal-L0,τ的上界表示禁止最后一层以外的预测结果;如果[1,τ]中没有整数满足条件,则设ζ=τ;In the formula,
Figure BDA0004146935970000053
is the prediction exit confidence of the L 0 +ζ layer. Therefore, L 0 +ζ is the exit point predicted by the prediction function. If ζ cannot be found, a hyperparameter τ is introduced, that is, ζ∈[1,τ], τ≤L totaal -L 0 . The upper bound of τ indicates that the prediction results outside the last layer are prohibited. If no integer in [1,τ] satisfies the condition, ζ=τ is set.

将PREDICT的现有预测结果ζ转换为中级频率fM,i,对于推理任务ji,在开始推理到L0预测的时间间隔内,保守应用一个相对高层次的

Figure BDA0004146935970000054
及时完成模型的Ltotal而不退出,表达式为:Convert the existing prediction results ζ of PREDICT to the mid-level frequency f M,i . For the inference task j i , a relatively high-level conservative approach is applied in the interval between the start of inference and the L 0 prediction.
Figure BDA0004146935970000054
The L total of the model is completed in time without exiting, and the expression is:

Figure BDA0004146935970000055
Figure BDA0004146935970000055

式中,fH为计算平台默认的最高频率,该计算平台可用Ltotal层完成推理,且不因时间Ti提前退出;频率

Figure BDA0004146935970000056
由Δfi校准;Where fH is the default maximum frequency of the computing platform. The computing platform can complete reasoning with L total layers and will not exit early due to time Ti ; frequency
Figure BDA0004146935970000056
Calibrated by Δf i ;

在预测L0时,根据预测的退出点,将f调整为中级频率fM,i,并运行网络,直到在Ti退出,给定一个具有Ltotal层数和早期预测ζ的网络,中级频率计算公式为:When predicting L0 , adjust f to the intermediate frequency fM,i according to the predicted exit point, and run the network until exiting at Ti . Given a network with L total layers and early prediction ζ, the intermediate frequency is calculated as:

Figure BDA0004146935970000057
Figure BDA0004146935970000057

其中,fM,i是预测的通过时间Ti完成推断的最低频率。where f M,i is the lowest frequency predicted to complete inference at time T i .

优选的,对于推理任务ji+1,交叉推理DVFS策略在完成包括ji在内的所有先前任务后提供校准建议Δfi+1,该校准是由一个离散增量比例积分PI调节器实现的,表达式为:Preferably, for the reasoning task j i+1 , the cross-reasoning DVFS strategy provides a calibration suggestion Δf i+1 after completing all previous tasks including j i . The calibration is implemented by a discrete incremental proportional integral PI regulator, and the expression is:

Δfi+1=Δfi+(KP+KI(ti-ti-1))e(ti)-KPe(ti-1)Δf i+1 =Δf i +(K P +K I (t i -t i-1 ))e(t i )-K P e(t i-1 )

其中,KP和KI分别为比例系数和积分系数;索引i表示当前协调周期内的第i个推理任务;ti表示推理ji从协调周期开始的相对完成时间;ti-ti-1为推理任务ji与ji-1完成的时间间隔;Where K P and K I are the proportional coefficient and integral coefficient respectively; index i represents the i-th reasoning task in the current coordination cycle; ti represents the relative completion time of reasoning j i from the beginning of the coordination cycle; ti -t i-1 is the time interval between the completion of reasoning tasks j i and j i-1 ;

PI调节器的输入偏差e(ti)是根据总推理工作量和自协调期开始以来推理执行进度对Tc的推理进度的评估,表达式为:

Figure BDA0004146935970000058
其中第一项是推理工作负载平衡的参考速度,第二项是到目前为止的处理速度。The input deviation e(t i ) of the PI regulator is an evaluation of the inference progress of T c based on the total inference workload and the inference execution progress since the beginning of the coordination period, and is expressed as:
Figure BDA0004146935970000058
The first term is the reference speed for inference workload balancing, and the second term is the processing speed so far.

优选的,通过DVFS调速器配置频率和对应电压,分别根据小时间尺度和大时间尺度的预测和校准执行网络,在每次推理开始时,基于的交叉推理校准推导出Δfi,DVFS调控器将计算配置设置为

Figure BDA0004146935970000061
保守地完成不提前退出的推理,在推理层L0进行早期退出预测时,根据内推断预测器导出的fM,i,以及根据交叉推断校准导出的Δfi,DVFS调控器建立适当的频率配置
Figure BDA0004146935970000062
表达式为:Preferably, the DVFS governor configures the frequency and the corresponding voltage, executes the network according to the prediction and calibration of the small time scale and the large time scale respectively, and at the beginning of each inference, Δf i is derived based on the cross-inference calibration, and the DVFS governor sets the calculation configuration to
Figure BDA0004146935970000061
Inference without early exit is done conservatively. When early exit prediction is performed at the inference layer L0 , the DVFS governor establishes an appropriate frequency configuration based on fM ,i derived from the internal inference predictor and Δf i derived from the cross-inference calibration.
Figure BDA0004146935970000062
The expression is:

Figure BDA0004146935970000063
Figure BDA0004146935970000063

因此,产生的能源消耗为:Therefore, the energy consumption is:

Figure BDA0004146935970000064
Figure BDA0004146935970000064

其中,

Figure BDA0004146935970000069
为预测层L0及其之前的网络层ji完成的时间;
Figure BDA0004146935970000065
为给定
Figure BDA0004146935970000066
所对应的电压;由于f的选择是基于推理中预测的剩余工作量,因此推理内预测可以使ji由Ti完成;交叉推断校准目标通过Tc完成M个顺序推断。in,
Figure BDA0004146935970000069
is the time it takes to complete the prediction layer L0 and its previous network layer j i ;
Figure BDA0004146935970000065
For a given
Figure BDA0004146935970000066
The corresponding voltage; since the selection of f is based on the remaining workload predicted in inference, the intra-inference prediction can enable j i to be completed by Ti ; the cross-inference calibration target completes M sequential inferences through T c .

根据本发明提供的针对多次连续推理的神经网络动态退出轻量化系统,包括:The neural network dynamic exit lightweight system for multiple consecutive reasonings provided by the present invention includes:

模块M1:构建基于神经网络的推理模型,对于预设小时间范围内的每个推理,预测网络退出的位置,并相应地预测计算配置,所述计算配置包括频率和电压;对于预设大时间范围内的多个推理,通过剩余推理工作量和时间约束,进行处理器频率和电压校准;Module M1: Build a neural network-based inference model, predict the location of network exit for each inference within a preset small time range, and predict the computing configuration accordingly, the computing configuration includes frequency and voltage; for multiple inferences within a preset large time range, perform processor frequency and voltage calibration based on the remaining inference workload and time constraints;

模块M2:根据预测和校准计算配置以执行神经网络,从而实现动态电压和频率调节。Module M2: Configured to execute a neural network based on prediction and calibration calculations to achieve dynamic voltage and frequency regulation.

优选的,在推理过程中设置推理的协调周期即推理任务期限,表达式为:Preferably, a coordination period of reasoning, i.e., a deadline of the reasoning task, is set during the reasoning process, and the expression is:

Figure BDA0004146935970000067
Figure BDA0004146935970000067

Figure BDA0004146935970000068
Figure BDA0004146935970000068

其中,λ表示任务截至时间的松紧,0<λ≤1,λ越小,期限越紧;ti表示每次推理实际完成时间;Tc为协调周期;M为推理任务的个数;Ti表示第i个推理任务的协调期限;Where λ represents the tightness of the task deadline, 0<λ≤1, the smaller λ is, the tighter the deadline is; ti represents the actual completion time of each reasoning; Tc is the coordination cycle; M is the number of reasoning tasks; Ti represents the coordination deadline of the i-th reasoning task;

在执行推理任务时,计算过程中所消耗的功率为有功功率Pactive,为动态功率PD、静态功率PS、恒功率PC的总和;When performing inference tasks, the power consumed in the calculation process is active power P active , which is the sum of dynamic power P D , static power P S , and constant power P C ;

计算平台空闲时消耗的功率为空闲功率Pidle,为静态功率PS、恒功率PC的总和;The power consumed by the computing platform when it is idle is the idle power P idle , which is the sum of the static power P S and the constant power P C ;

PD=CV2f PDCV2f

PS=VNtrIS PS VNtrIS

其中,C为开关逻辑门的电容;V表示处理器的供电电压;f表示处理器时钟频率;Ntr为逻辑门的数目;IS为每个逻辑门的归一化静态电流;Where C is the capacitance of the switching logic gate; V is the power supply voltage of the processor; f is the processor clock frequency; N tr is the number of logic gates; I S is the normalized static current of each logic gate;

Ti内每个推理的能量消耗为:The energy consumption of each inference in Ti is:

Figure BDA0004146935970000071
Figure BDA0004146935970000071

位于早期退出之前的网络层运行在(VH,fH),在ti点网络退出后,计算平台通过动态电压频率调整技术DVFS将电压和频率降至最低电平(VL,fL),因此,推理任务ji的能量消耗表示为:The network layer before early exit runs at (V H , f H ). After the network exits at point t i , the computing platform reduces the voltage and frequency to the lowest level (V L , f L ) through the dynamic voltage and frequency scaling technology DVFS. Therefore, the energy consumption of the inference task j i is expressed as:

Figure BDA0004146935970000072
Figure BDA0004146935970000072

Ei表示推理任务ji能量消耗;VH表示处理器高的供电电压;fH表示处理器高时钟频率;VL表示处理器低的供电电压;E i represents the energy consumption of reasoning task j i ; V H represents the high supply voltage of the processor; f H represents the high clock frequency of the processor; V L represents the low supply voltage of the processor;

在一个推理任务ji中,先预测网络退出位置,建立剩余层数以完成该推理;然后根据剩余层数和最推理坏情况执行时间Ti直到当前推理结束,建立(V,,f)进行预测;根据Tc上剩余的推理量,通过反馈控制策略启用交叉推理校准器,根据多推理进度、总推理工作量和时间约束来校准(V,f);根据预测和校准,通过DVFS在运行时更新处理器到适当的计算配置。In an inference task j i , first predict the network exit position and establish the remaining number of layers to complete the inference; then, based on the remaining number of layers and the worst-case execution time Ti until the end of the current inference, establish (V,,f) for prediction; based on the remaining amount of inference on T c , enable the cross-inference calibrator through the feedback control strategy, and calibrate (V,f) according to the multi-inference progress, total inference workload and time constraints; based on the prediction and calibration, update the processor to the appropriate computing configuration at runtime through DVFS.

与现有技术相比,本发明具有如下的有益效果:Compared with the prior art, the present invention has the following beneficial effects:

本发明在每个推理中预测网络将退出的位置,并在小时间范围内相应地调整计算配置(即频率和电压),同时根据大时间范围内的多个推理,结合剩余推理工作量和时间约束,提供计算配置(即频率和电压)建议,根据预测和校准配置以执行神经网络,从而实现动态电压和频率调节;与经典深度学习网络相比,本发明可实现高达63.8%的节能,同时保证多次神经网络推理在规定的时间内完成,通过早期退出可以提前终止推断并获得准确的结果,减少了计算和能源成本。The present invention predicts where the network will exit in each inference, and adjusts the computing configuration (i.e., frequency and voltage) accordingly within a small time range. At the same time, based on multiple inferences within a large time range, combined with the remaining inference workload and time constraints, it provides computing configuration (i.e., frequency and voltage) recommendations, and executes the neural network based on the predicted and calibrated configuration, thereby achieving dynamic voltage and frequency regulation. Compared with classic deep learning networks, the present invention can achieve energy savings of up to 63.8%, while ensuring that multiple neural network inferences are completed within the specified time. Early exit can terminate the inference in advance and obtain accurate results, reducing computing and energy costs.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过阅读参照以下附图对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present invention will become more apparent from the detailed description of non-limiting embodiments made with reference to the following drawings:

图1为本发明基于提前退出的神经网络结构图。FIG1 is a diagram showing the structure of a neural network based on early exit according to the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明,但不以任何形式限制本发明。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变化和改进。这些都属于本发明的保护范围。The present invention is described in detail below in conjunction with specific embodiments. The following embodiments will help those skilled in the art to further understand the present invention, but are not intended to limit the present invention in any form. It should be noted that, for those of ordinary skill in the art, several changes and improvements can also be made without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

实施例1:Embodiment 1:

如图1,针对在一段时间段内完成多次神经网络推理的神经网络应用,本发明提供了一种针对多次连续推理的神经网络动态退出轻量化方法,通过结合推理状态与运行时处理器动态电压和频率调节,实现降低神经网络计算的算力及功耗,,具体包括如下步骤:As shown in FIG1 , for a neural network application that completes multiple neural network inferences within a period of time, the present invention provides a neural network dynamic exit lightweight method for multiple continuous inferences, which reduces the computing power and power consumption of neural network calculations by combining the inference state with the dynamic voltage and frequency adjustment of the processor at runtime, and specifically includes the following steps:

步骤1:在小时间段每个推理中,预测网络将退出的位置,并在小时间范围内相应地预测计算配置,所述计算配置包括频率和电压;对于大时间范围内的多个推理,通过剩余推理工作负载和截止时间约束,提供处理器频率和电压校准;Step 1: In each inference of a small time period, predict where the network will exit and accordingly predict the computing configuration in the small time range, the computing configuration including frequency and voltage; for multiple inferences in a large time range, provide processor frequency and voltage calibration by the remaining inference workload and deadline constraints;

步骤2:根据预测和校准计算配置以执行神经网络,从而实现动态电压和频率调节。Step 2: Calculate the configuration based on the prediction and calibration to execute the neural network to achieve dynamic voltage and frequency scaling.

接下来进行更为详细的说明。A more detailed description follows.

推理的协调周期即推理任务期限的表达式为:The coordination cycle of reasoning, that is, the deadline of the reasoning task, is expressed as:

Figure BDA0004146935970000081
Figure BDA0004146935970000081

Figure BDA0004146935970000082
Figure BDA0004146935970000082

λ表示任务截至时间的松紧,0<λ≤1,λ越小,期限越紧;ti表示每次推理实际完成时间;Tc为协调周期;M为推理任务的个数;Ti表示第i个推理任务的协调期限。λ represents the tightness of the task deadline, 0<λ≤1, the smaller λ is, the tighter the deadline is; ti represents the actual completion time of each reasoning; Tc is the coordination cycle; M is the number of reasoning tasks; Ti represents the coordination deadline of the i-th reasoning task.

在执行推理任务时,计算过程中所消耗的功率为有功功率Pactive,为动态功率PD、静态功率PS、恒功率PC的总和;When performing inference tasks, the power consumed in the calculation process is active power P active , which is the sum of dynamic power P D , static power P S , and constant power P C ;

计算平台空闲时消耗的功率为空闲功率Pidle,为静态功率PS、恒功率PC的总和;The power consumed by the computing platform when it is idle is the idle power P idle , which is the sum of the static power P S and the constant power P C ;

PD=CV2f PDCV2f

PS=VNtrIS PS VNtrIS

其中,C为开关逻辑门的电容;V表示处理器的供电电压;f表示处理器时钟频率;Ntr为逻辑门的数目;Is为每个逻辑门的归一化静态电流;Where C is the capacitance of the switching logic gate; V is the power supply voltage of the processor; f is the processor clock frequency; N tr is the number of logic gates; I s is the normalized static current of each logic gate;

Ti内每个推理的能量消耗为:The energy consumption of each inference in Ti is:

Figure BDA0004146935970000091
Figure BDA0004146935970000091

由于神经网络存在早期部署,位于早期退出之前的网络层运行在(VH,fH),ti点网络退出后,计算平台通过DVFS将电压和频率降至最低电平(VL,fL),直到ti,因此推理任务ji的能量消耗表示为:Due to the early deployment of the neural network, the network layer before the early exit runs at (V H , f H ). After the network exits at point ti , the computing platform reduces the voltage and frequency to the lowest level (V L , f L ) through DVFS until ti . Therefore, the energy consumption of the inference task j i is expressed as:

Figure BDA0004146935970000092
Figure BDA0004146935970000092

Ei表示能量消耗;VH表示处理器高的供电电压;fH表示处理器高时钟频率;VL表示处理器低的供电电压;E i represents energy consumption; V H represents the high supply voltage of the processor; f H represents the high clock frequency of the processor; V L represents the low supply voltage of the processor;

建立适当的(V,f)来调整剩余的工作量和时间约束,根据网络退出位置的预测和在截止日期Tc之前完成的剩余推断,在运行时调整(V,f)。An appropriate (V, f) is established to adjust the remaining workload and time constraints, and (V, f) is adjusted at runtime based on the prediction of the network exit position and the remaining inference to be completed before the deadline T c .

在一个推理任务ji中,先预测推理网络退出位置,建立剩余层数以完成该推理;然后根据剩余层和最坏情况执行时间Ti直到当前推理结束,建立适当的(V,f)预测,以在小时间尺度上降低能源成本;考虑到Tc上剩余的推理量,通过反馈控制策略启用的交叉推理校准器,根据多推理进度、总推理工作量和时间约束来校准(V,f),以平衡大时间尺度上的工作量、能量和时间成本;根据预测和校准,DVFS调控器将在运行时更新处理器到适当的计算配置(即频率和电压),以节省能源,同时满足截止日期。In an inference task j i , the exit position of the inference network is first predicted and the number of remaining layers is established to complete the inference; then, based on the remaining layers and the worst-case execution time Ti until the end of the current inference, an appropriate (V, f) prediction is established to reduce energy costs on a small time scale; considering the remaining amount of inference on T c , the cross-inference calibrator enabled by the feedback control strategy calibrates (V, f) according to the multi-inference progress, total inference workload and time constraints to balance the workload, energy and time costs on a large time scale; based on the prediction and calibration, the DVFS governor will update the processor to the appropriate computing configuration (i.e., frequency and voltage) at runtime to save energy while meeting the deadline.

在推理任务ji中,现有层的设计基于现有作品,所有现有层被进一步修改以共享相同的拓扑结构。它包含BoF(Bag-of-Features)池化层和FC(full-connected)全连接层。设yi为第i层的中间结果,Nc为数据集中对象类的数量。BoF池化层作为从yi中提取的特征聚合。在BoF池化中,使用一组称为码本的特征向量来描述yi。每个码本的权重是通过测量码本和yi之间的相似度来生成的。由于码本权重的大小通常大于Nc,因此FC层进一步作为分类器,将BoF池化的结果调整为

Figure BDA0004146935970000101
从而估计出现有层的最终输出。In the inference task j i , the design of the existing layers is based on existing works, and all the existing layers are further modified to share the same topology. It contains BoF (Bag-of-Features) pooling layers and FC (full-connected) fully connected layers. Let yi be the intermediate result of the i-th layer, and N c be the number of object classes in the dataset. The BoF pooling layer acts as a feature aggregation extracted from yi . In BoF pooling, a set of feature vectors called codebooks is used to describe yi . The weight of each codebook is generated by measuring the similarity between the codebook and yi . Since the size of the codebook weight is usually larger than N c , the FC layer is further used as a classifier to adjust the result of BoF pooling to
Figure BDA0004146935970000101
Thereby estimating the final output of the existing layer.

现有层的功能记为:The functions of the existing layers are recorded as:

Figure BDA0004146935970000102
Figure BDA0004146935970000102

其中,fw(x,i)为第i层的中间结果,x为初始网络输入;Wi为第i个退出层(BoF和FC)的参数;

Figure BDA0004146935970000103
是分类器的估计结果。Where fw (x,i) is the intermediate result of the i-th layer, x is the initial network input; Wi is the parameter of the i-th exit layer (BoF and FC);
Figure BDA0004146935970000103
is the estimated result of the classifier.

现有层中的参数需要离线训练,由于现有层通过估计最终的网络输出来发挥作用,因此在训练Wi时选择相同的交叉熵损失函数。The parameters in the existing layers need to be trained offline. Since the existing layers work by estimating the final network output, the same cross entropy loss function is selected when training Wi .

Figure BDA0004146935970000104
Figure BDA0004146935970000104

其中,p(i)和q(i)分别是每个对象的实际和预测的类分布;

Figure BDA0004146935970000105
为交叉熵;Where p(i) and q(i) are the actual and predicted class distributions of each object, respectively;
Figure BDA0004146935970000105
is the cross entropy;

在训练现有层时,将训练集数据{x1,x2,…,xN}输入模型,N为训练数据个数,通过批量梯度下降对现有层参数进行优化,设Ltotal为网络的层数,目标向量集为

Figure BDA0004146935970000106
Figure BDA0004146935970000107
为正确结果;首先,在没有任何现有层的情况下,用以下公式训练模型:When training the existing layer, the training set data {x1, x2, …, xN} is input into the model, N is the number of training data, and the existing layer parameters are optimized by batch gradient descent. Let L total be the number of layers in the network, and the target vector set is
Figure BDA0004146935970000106
Figure BDA0004146935970000107
To get the correct result; first, without any existing layers, train the model with the following formula:

Figure BDA0004146935970000108
Figure BDA0004146935970000108

其中,W'为更新后的整体网络参数,η为学习率,如果精度不能满足要求,可以调整η;j表示向量位置;

Figure BDA0004146935970000109
表示参数梯度;xj表示训练数据向量元素;rj表示目标数据向量元素;Where W' is the updated overall network parameter, η is the learning rate, if the accuracy cannot meet the requirements, η can be adjusted; j represents the vector position;
Figure BDA0004146935970000109
represents parameter gradient; x j represents training data vector element; r j represents target data vector element;

之后,对原模型W进行固定,在现有层的基础上,再次训练模型,优化BoF池化层和FC层的参数,表达式为:After that, the original model W is fixed, and the model is trained again based on the existing layers to optimize the parameters of the BoF pooling layer and the FC layer. The expression is:

Figure BDA00041469359700001010
Figure BDA00041469359700001010

训练完退出层后,计算平均特征权重μi作为退出决策的参数,表达式为:After training the exit layer, the average feature weight μ i is calculated as the parameter for the exit decision, and the expression is:

Figure BDA00041469359700001011
Figure BDA00041469359700001011

k表示第k个全体分类目标。k represents the kth overall classification target.

在初始网络输入xj的推断过程中,在每个早期退出层,权重比αW为最大特征权重除以μi乘以用户指定的超参数β。最大特征权值越大,认为分类越有信心。一旦αW大于1,推论终止,在这个早期退出层的结果被应用为最终结果。During inference of the initial network input xj , at each early exit layer, the weight ratio αW is the maximum feature weight divided by μi multiplied by the user-specified hyperparameter β. The larger the maximum feature weight, the more confident the classification is. Once αW is greater than 1, inference terminates and the result of this early exit layer is applied as the final result.

Figure BDA0004146935970000111
Figure BDA0004146935970000111

假设预测器从神经网络的第L0层开始预测,预测器输入为第L0层神经网络的执行结果向量y0及其对应的退出层执行结果向量

Figure BDA0004146935970000112
Assume that the predictor starts predicting from the L0th layer of the neural network, and the predictor input is the execution result vector y0 of the L0th layer of the neural network and its corresponding exit layer execution result vector
Figure BDA0004146935970000112

零填充模块对上述结果进行扩充维度,通过在向量

Figure BDA0004146935970000113
两端添加0的方式,将向量
Figure BDA0004146935970000114
Figure BDA0004146935970000115
维扩展到
Figure BDA0004146935970000116
维,并记为
Figure BDA0004146935970000117
其中Nc为向量
Figure BDA0004146935970000118
长度同时也是推理结果分类的数量,K为特征池化组中特征向量的个数。具体计算过程如下述公式所示:The zero-filling module expands the dimension of the above results by
Figure BDA0004146935970000113
Add 0 at both ends to convert the vector
Figure BDA0004146935970000114
Depend on
Figure BDA0004146935970000115
Dimensions expanded to
Figure BDA0004146935970000116
dimension, and recorded as
Figure BDA0004146935970000117
Where N c is a vector
Figure BDA0004146935970000118
The length is also the number of inference result categories, and K is the number of feature vectors in the feature pooling group. The specific calculation process is shown in the following formula:

Figure BDA0004146935970000119
Figure BDA0004146935970000119

l表示第l个元素。l represents the lth element.

经过零填充模块的输出向量

Figure BDA00041469359700001110
将进行一维卷积操作并获得新的特征权重向量
Figure BDA00041469359700001111
其中卷积权重为长度为K的一维向量h。具体计算过程如下述公式所示:The output vector after the zero padding module
Figure BDA00041469359700001110
A one-dimensional convolution operation will be performed and a new feature weight vector will be obtained
Figure BDA00041469359700001111
The convolution weight is a one-dimensional vector h of length K. The specific calculation process is shown in the following formula:

Figure BDA00041469359700001112
Figure BDA00041469359700001112

通过递归将中间估计结果

Figure BDA00041469359700001113
替换为
Figure BDA00041469359700001114
并重复计算式,得到L0+2位置的出料层的预测结果,记为
Figure BDA00041469359700001115
按照上述步骤,可以计算L0之后任何现有层的预测结果。By recursively converting the intermediate estimation results
Figure BDA00041469359700001113
Replace with
Figure BDA00041469359700001114
Repeat the calculation formula to obtain the predicted result of the discharge layer at position L 0 +2, which is recorded as
Figure BDA00041469359700001115
Following the above steps, the prediction results of any existing layers after L0 can be calculated.

将退出层的预测结果放置在L0之后的任意一层,我们将得到PREDICT函数,描述如下:Placing the prediction result of the exit layer in any layer after L 0 , we will get the PREDICT function, which is described as follows:

Figure BDA00041469359700001116
Figure BDA00041469359700001116

其中,

Figure BDA00041469359700001117
表示L0后的预测出点的最小值;in,
Figure BDA00041469359700001117
Indicates the minimum value of the predicted output point after L 0 ;

Figure BDA00041469359700001118
Figure BDA00041469359700001118

式中,

Figure BDA0004146935970000121
为L0+ζ层的预测出口置信度。因此,L0+ζ是预测函数预测的退出点。如果找不到ζ,进一步引入一个超参数τ,即ζ∈[1,τ],τ≤Ltotal-L0。τ的上界表示禁止最后一层以外的预测结果。如果[1,τ]中没有整数满足条件,则设ζ=τ。预测的时间复杂度为O(Nc×K×τ),同时在预测中引入超参数。L0、β和τ可由高级用户调优,可以平衡不同应用场景的预测精度和计算成本。In the formula,
Figure BDA0004146935970000121
is the prediction exit confidence of the L 0 +ζ layer. Therefore, L 0 +ζ is the exit point predicted by the prediction function. If ζ cannot be found, a hyperparameter τ is further introduced, that is, ζ∈[1,τ], τ≤L total -L 0 . The upper bound of τ indicates that prediction results other than the last layer are prohibited. If no integer in [1,τ] satisfies the condition, then ζ=τ is set. The time complexity of the prediction is O(N c ×K×τ), and hyperparameters are introduced in the prediction. L 0 , β and τ can be tuned by advanced users to balance the prediction accuracy and computational cost of different application scenarios.

频率预测:将PREDICT的现有预测结果ζ转换为适当的“中级”频率fM,i。对于推理任务ji,在开始推理到L0预测的时间间隔内,保守应用一个相对“高层次”的

Figure BDA0004146935970000122
及时完成模型的Ltotal而不退出,表达式为:Frequency prediction: Convert the existing prediction results ζ of PREDICT to appropriate "mid-level" frequencies f M,i . For inference task j i , a relatively "high-level" prediction is conservatively applied in the interval between the start of inference and the L 0 prediction.
Figure BDA0004146935970000122
The L total of the model is completed in time without exiting, and the expression is:

Figure BDA0004146935970000123
Figure BDA0004146935970000123

式中,fH为计算平台默认的最高频率,该计算平台可以用Ltotal层完成推理,且不会因时间Ti提前退出。频率

Figure BDA0004146935970000124
由Δfi校准。Where fH is the default maximum frequency of the computing platform. The computing platform can complete reasoning with L total layers and will not exit early due to time Ti .
Figure BDA0004146935970000124
Calibrated by Δfi .

在预测L0时,根据预测的退出点,将f调整为合适的“中层”fM,i,并运行网络,直到在Ti退出。给定一个具有Ltotal层数和早出预测ζ的网络,计算频率降为:When predicting L 0 , we adjust f to the appropriate “middle layer” f M,i according to the predicted exit point, and run the network until exiting at Ti . Given a network with L total layers and early exit prediction ζ, the computation frequency drops to:

Figure BDA0004146935970000125
Figure BDA0004146935970000125

其中,fM,i是通过时间Ti可以完成推断的最低频率的预测。where f M,i is the lowest frequency prediction that can be inferred by time Ti .

由于我们的目标是以突发方式执行推理任务,对于推理任务ji+1,交叉推理DVFS策略在完成包括ji在内的所有先前任务后提供校准建议Δfi+1,考虑到更大时间尺度Tc上的工作量和时间限制。该校准是由一个离散增量比例积分(PI)调节器实现的,表达式为:Since our goal is to execute reasoning tasks in a bursty manner, for reasoning task j i+1 , the cross-reasoning DVFS strategy provides a calibration suggestion Δf i+1 after completing all previous tasks including j i , taking into account the workload and time constraints on a larger time scale T c . The calibration is implemented by a discrete incremental proportional integral (PI) regulator, expressed as:

Δfi+1=Δfi+(KP+KI(ti-ti-1))e(ti)-KPe(ti-1)Δf i+1 =Δf i +(K P +K I (t i -t i-1 ))e(t i )-K P e(t i-1 )

其中,KP和KI分别为比例系数和积分系数。索引i表示当前协调周期内的第i个推理任务。ti表示推理ji从协调周期开始的相对完成时间。ti-ti-1为推理任务ji与ji-1完成的时间间隔。Where K P and K I are the proportional coefficient and integral coefficient respectively. Index i represents the i-th reasoning task in the current coordination cycle. ti represents the relative completion time of reasoning j i from the beginning of the coordination cycle. ti - ti-1 is the time interval between the completion of reasoning tasks j i and j i-1 .

PI调节器的输入偏差e(ti)是根据总推理工作量和自协调期开始以来推理执行进度对Tc的推理进度的评估,表达式为:

Figure BDA0004146935970000126
其中第一项是推理工作负载平衡的参考速度,第二项是到目前为止的处理速度。The input deviation e(t i ) of the PI regulator is an evaluation of the inference progress of T c based on the total inference workload and the inference execution progress since the beginning of the coordination period, and is expressed as:
Figure BDA0004146935970000126
The first term is the reference speed for inference workload balancing, and the second term is the processing speed so far.

在实际应用中,采用较小的时间周期αTc,0<α≤1。αTc和Tc之间的间隔是通过反馈控制保留的余量,以消除执行时间超调的影响。α的适当选择可以在节约能源和满足最后期限之间取得平衡。交叉推理校正器的输出是DVFS调速器配置计算平台Δfi+1的下一个推理ji+1的频率校准。In practical applications, a smaller time period αT c is used, 0<α≤1. The interval between αT c and T c is a margin reserved by feedback control to eliminate the effect of execution time overshoot. The appropriate choice of α can strike a balance between saving energy and meeting deadlines. The output of the cross-inference corrector is the frequency calibration of the next inference j i+1 of the DVFS governor configuration calculation platform Δf i+1 .

最后,DVFS调速器配置频率和对应电压,分别根据小时间尺度和大时间尺度的预测和校准执行网络,进一步节约能源,满足网络推理期限。DVFS调控器以比推理更小的粒度操作(V,f)缩放。在每次推理开始时,基于的交叉推理校准推导出Δfi。DVFS调控器将计算配置设置为

Figure BDA0004146935970000131
保守地完成不提前退出的推理。在推理层L0进行早期退出预测时,DVFS调控器建立适当的频率配置
Figure BDA0004146935970000132
根据内推断预测器导出的fM,i,以及根据交叉推断校准导出的Δfi,表达式为:Finally, the DVFS governor configures the frequency and corresponding voltage to execute the network based on the prediction and calibration of small and large time scales, respectively, to further save energy and meet the network inference deadline. The DVFS governor operates at a smaller granularity (V,f) than inference. At the beginning of each inference, Δf i is derived based on the cross-inference calibration of . The DVFS governor sets the computation configuration to
Figure BDA0004146935970000131
Inference without early exit is done conservatively. When early exit prediction is made at inference level L0 , the DVFS governor establishes an appropriate frequency configuration
Figure BDA0004146935970000132
f M,i , derived from the intra-inference predictor, and Δfi , derived from the cross-inference calibration, are expressed as:

Figure BDA0004146935970000133
Figure BDA0004146935970000133

因此,由此产生的能源消耗将是:The resulting energy consumption would therefore be:

Figure BDA0004146935970000134
Figure BDA0004146935970000134

其中,

Figure BDA0004146935970000137
为预测层L0及其之前的网络层ji完成的时间。
Figure BDA0004146935970000135
为给定
Figure BDA0004146935970000136
所对应的电压。由于f的选择是基于推理中预测的剩余工作量,因此推理内预测可以使ji由Ti完成。交叉推断校准目标通过Tc完成M个顺序推断。由于V和f是计算性能的线性尺度(与推断成反比),而对PD和PS的三次尺度和线性尺度,双时间尺度电源管理有效地降低了E。in,
Figure BDA0004146935970000137
is the time it takes to complete the prediction layer L0 and its previous network layer j i .
Figure BDA0004146935970000135
For a given
Figure BDA0004146935970000136
The voltage corresponding to . Since the choice of f is based on the remaining workload predicted in inference, intra-inference prediction allows j i to be completed by Ti . The cross-inference calibration target completes M sequential inferences through T c . Since V and f are linear scales of computational performance (inversely proportional to inference), and cubic and linear scales for P D and PS , dual-time-scale power management effectively reduces E.

处理器电压和频率的调整可以通过系统调用(sycall)在应用程序级别实现。根据我们的观察,每个(V,f)变化的瞬态时间大约为1ms到3ms。改变处理器(V,f)的瞬态过程被纳入到我们实际的系统评估中。The processor voltage and frequency can be adjusted at the application level through system calls (sycall). According to our observation, the transient time of each (V, f) change is about 1ms to 3ms. The transient process of changing the processor (V, f) is incorporated into our actual system evaluation.

对于运行时的每个推理,推理内出口预测的计算复杂度为O(Nc×K×τ),交叉推理校准和DVFS调速器均为O(1)。For each inference at runtime, the computational complexity of in-inference exit prediction is O(N c ×K×τ), and the cross-inference calibration and DVFS governor are both O(1).

我们在常用的CIFAR-10、CIFAR-100、SVHN和CINIC数据集上使用VGG-19和ResNets-18作为骨干模型来评估本发明的推理精度和时序性能。在NVIDIAJetson TX2上进行评估,GPUfH=1.30050GHz,fL=0.11475GHz。串行和I/O操作在cpu上执行,大量并行和计算密集型的段被卸载到GPU上,由PyTorch库自动管理。能源成本由Tektronix MDO32示波器和TCP2020电流探头评估,采样频率为50kHz。在表1中列出了不同基准的预测和校准参数。We use VGG-19 and ResNets-18 as backbone models on the commonly used CIFAR-10, CIFAR-100, SVHN and CINIC datasets to evaluate the inference accuracy and timing performance of the present invention. The evaluation is performed on NVIDIA Jetson TX2, GPU f H = 1.30050GHz, f L = 0.11475GHz. Serial and I/O operations are performed on the CPU, and a large number of parallel and computationally intensive segments are offloaded to the GPU, which is automatically managed by the PyTorch library. The energy cost is evaluated by a Tektronix MDO32 oscilloscope and a TCP2020 current probe with a sampling frequency of 50kHz. The prediction and calibration parameters of different benchmarks are listed in Table 1.

表1Table 1

Figure BDA0004146935970000141
Figure BDA0004146935970000141

在推理过程中,频率是根据早期退出点的预测来计算的。预测的准确性直接决定了任务的时间和能量消耗。因此,我们测试了不同L0的数据集和模型的预测精度。根据预测和推断精度结果,其余部分Vgg-19和Resnet-18分别设L0=6和7。During inference, the frequency is calculated based on the prediction of the early exit point. The accuracy of the prediction directly determines the time and energy consumption of the task. Therefore, we tested the prediction accuracy of datasets and models with different L 0. According to the prediction and inference accuracy results, the rest of the Vgg-19 and Resnet-18 are set to L 0 = 6 and 7 respectively.

然后对推理精度进行评估,其结果如表2所示。本发明在两种模型上都达到了与其他早期退出方法相同的精度,比经典CNN低1%-3%。因为在早期退出预测中,即使预测错误,网络也会继续进行下一次预测,而不是强制推理退出,不会引入额外的推理精度损失。其余的电源管理设置对推断精度没有影响,因为首先确定ζ的现有层预测是为了实现高推断精度,电源管理是建立在此基础上的。The inference accuracy is then evaluated, and the results are shown in Table 2. The present invention achieves the same accuracy as other early exit methods on both models, which is 1%-3% lower than the classic CNN. Because in the early exit prediction, even if the prediction is wrong, the network will continue to make the next prediction instead of forcing the inference exit, and no additional loss of inference accuracy will be introduced. The remaining power management settings have no effect on the inference accuracy, because the existing layer predictions of ζ are determined first to achieve high inference accuracy, and power management is built on this basis.

表2Table 2

Figure BDA0004146935970000142
Figure BDA0004146935970000142

我们评估了时序性能,由于有效的双时间尺度DVFS管理,本发明显示出比其他方法更集中和更短的执行时间。α可以在节约能源和满足最后期限之间取得平衡。α=0.75导致的总执行时间比α=0.5略长,而它节省了更多的能量。We evaluate the timing performance and show that our proposed method achieves more focused and shorter execution time than other methods due to the effective dual-timescale DVFS management. α can strike a balance between saving energy and meeting deadlines. α = 0.75 results in slightly longer total execution time than α = 0.5, while it saves more energy.

最后,我们评估了不同λ值下的截止日期Tc满足率。作为压力测试,我们比较了更紧期限下的计时性能,即

Figure BDA0004146935970000143
当λ=1时,所有的方法都能满足最后期限。Finally, we evaluate the deadline T c satisfaction rate under different values of λ. As a stress test, we compare the timing performance under tighter deadlines, i.e.
Figure BDA0004146935970000143
When λ=1, all methods can meet the deadline.

为了测量运行时的功耗,Jetson TX2板直接连接到Keysight E36231A电源,电压为19V。电流探头和示波器监测馈电到电路板的电流。表3总结了每项任务的平均能量消耗。在真实平台上的能量测量表明,本发明与经典深度学习网络相比节能63.8%,与最先进的退出策略下的早期退出相比节能21.5%。在大多数情况下,α=0.75比α=0.5消耗的能量要少。To measure the power consumption during runtime, the Jetson TX2 board was directly connected to a Keysight E36231A power supply at 19V. A current probe and an oscilloscope monitored the current fed to the board. Table 3 summarizes the average energy consumption for each task. Energy measurements on a real platform show that the proposed method saves 63.8% energy compared to a classic deep learning network and 21.5% energy compared to early exit under the state-of-the-art exit strategy. In most cases, α = 0.75 consumes less energy than α = 0.5.

表3Table 3

Figure BDA0004146935970000151
Figure BDA0004146935970000151

开销的主要来源是提前退出PREDICT。因此,我们评估了每个基准的1750个提前退出PREDICT的时间成本。所有基准测试的最大时间成本都在300μs以下。ResNets-18的盒子比VGG-19的更宽,因为ResNets-18预计会以较大的ζ退出。通过时间成本表明了本发明在线电源管理的适用性和可伸缩性。The main source of overhead is early exit PREDICT. Therefore, we evaluate the time cost of 1750 early exit PREDICT for each benchmark. The maximum time cost for all benchmarks is below 300μs. The box of ResNets-18 is wider than that of VGG-19 because ResNets-18 is expected to exit with a larger ζ. The applicability and scalability of the proposed online power management are demonstrated by the time cost.

实施例2:Embodiment 2:

本发明还提供一种针对多次连续推理的神经网络动态退出轻量化系统,所述针对多次连续推理的神经网络动态退出轻量化系统可以通过执行所述针对多次连续推理的神经网络动态退出轻量化方法的流程步骤予以实现,即本领域技术人员可以将所述针对多次连续推理的神经网络动态退出轻量化方法理解为所述针对多次连续推理的神经网络动态退出轻量化系统的优选实施方式。The present invention also provides a lightweight system for dynamic exit of a neural network for multiple continuous reasonings. The lightweight system for dynamic exit of a neural network for multiple continuous reasonings can be implemented by executing the process steps of the lightweight method for dynamic exit of a neural network for multiple continuous reasonings. That is, those skilled in the art can understand the lightweight method for dynamic exit of a neural network for multiple continuous reasonings as a preferred implementation of the lightweight system for dynamic exit of a neural network for multiple continuous reasonings.

根据本发明提供的针对多次连续推理的神经网络动态退出轻量化系统,包括:模块M1:构建基于神经网络的推理模型,对于预设小时间范围内的每个推理,预测网络退出的位置,并相应地预测计算配置,所述计算配置包括频率和电压;对于预设大时间范围内的多个推理,通过剩余推理工作量和时间约束,进行处理器频率和电压校准;模块M2:根据预测和校准计算配置以执行神经网络,从而实现动态电压和频率调节。According to the present invention, a lightweight system for dynamic exit of a neural network for multiple continuous reasoning includes: module M1: constructing a reasoning model based on a neural network, predicting the location of network exit for each reasoning within a preset small time range, and correspondingly predicting the computing configuration, wherein the computing configuration includes frequency and voltage; for multiple reasonings within a preset large time range, calibrating the processor frequency and voltage through the remaining reasoning workload and time constraints; module M2: executing the neural network according to the predicted and calibrated computing configuration, thereby realizing dynamic voltage and frequency adjustment.

在推理过程中设置推理的协调周期即推理任务期限,表达式为:In the reasoning process, the reasoning coordination cycle, i.e., the reasoning task deadline, is set, and the expression is:

Figure BDA0004146935970000152
Figure BDA0004146935970000152

Figure BDA0004146935970000153
Figure BDA0004146935970000153

其中,λ表示任务截至时间的松紧,0<λ≤1,λ越小,期限越紧;ti表示每次推理实际完成时间;Tc为协调周期;M为推理任务的个数;Ti表示第i个推理任务的协调期限;Where λ represents the tightness of the task deadline, 0<λ≤1, the smaller λ is, the tighter the deadline is; ti represents the actual completion time of each reasoning; Tc is the coordination cycle; M is the number of reasoning tasks; Ti represents the coordination deadline of the i-th reasoning task;

在执行推理任务时,计算过程中所消耗的功率为有功功率Pactive,为动态功率PD、静态功率PS、恒功率PC的总和;When performing inference tasks, the power consumed in the calculation process is active power P active , which is the sum of dynamic power P D , static power P S , and constant power P C ;

计算平台空闲时消耗的功率为空闲功率Pidle,为静态功率PS、恒功率PC的总和;The power consumed by the computing platform when it is idle is the idle power P idle , which is the sum of the static power P S and the constant power P C ;

PD=CV2f PDCV2f

PS=VNtrIS PS VNtrIS

其中,C为开关逻辑门的电容;V表示处理器的供电电压;f表示处理器时钟频率;Ntr为逻辑门的数目;IS为每个逻辑门的归一化静态电流;Where C is the capacitance of the switching logic gate; V is the power supply voltage of the processor; f is the processor clock frequency; N tr is the number of logic gates; I S is the normalized static current of each logic gate;

Ti内每个推理的能量消耗为:The energy consumption of each inference in Ti is:

Figure BDA0004146935970000161
Figure BDA0004146935970000161

位于早期退出之前的网络层运行在(VH,fH),在ti点网络退出后,计算平台通过动态电压频率调整技术DVFS将电压和频率降至最低电平(VL,fL),因此,推理任务ji的能量消耗表示为:The network layer before early exit runs at (V H , f H ). After the network exits at point t i , the computing platform reduces the voltage and frequency to the lowest level (V L , f L ) through the dynamic voltage and frequency scaling technology DVFS. Therefore, the energy consumption of inference task j i is expressed as:

Figure BDA0004146935970000162
Figure BDA0004146935970000162

Ei表示推理任务ji能量消耗;VH表示处理器高的供电电压;fH表示处理器高时钟频率;VL表示处理器低的供电电压;E i represents the energy consumption of reasoning task j i ; V H represents the high supply voltage of the processor; f H represents the high clock frequency of the processor; V L represents the low supply voltage of the processor;

在一个推理任务ji中,先预测网络退出位置,建立剩余层数以完成该推理;然后根据剩余层数和最推理坏情况执行时间Ti直到当前推理结束,建立(V,,f)进行预测;根据Tc上剩余的推理量,通过反馈控制策略启用交叉推理校准器,根据多推理进度、总推理工作量和时间约束来校准(V,f);根据预测和校准,通过DVFS在运行时更新处理器到适当的计算配置。In an inference task j i , first predict the network exit position and establish the remaining number of layers to complete the inference; then, based on the remaining number of layers and the worst-case execution time Ti until the end of the current inference, establish (V,,f) for prediction; based on the remaining amount of inference on T c , enable the cross-inference calibrator through the feedback control strategy, and calibrate (V,f) according to the multi-inference progress, total inference workload and time constraints; based on the prediction and calibration, update the processor to the appropriate computing configuration at runtime through DVFS.

本领域技术人员知道,除了以纯计算机可读程序代码方式实现本发明提供的系统、装置及其各个模块以外,完全可以通过将方法步骤进行逻辑编程来使得本发明提供的系统、装置及其各个模块以逻辑门、开关、专用集成电路、可编程逻辑控制器以及嵌入式微控制器等的形式来实现相同程序。所以,本发明提供的系统、装置及其各个模块可以被认为是一种硬件部件,而对其内包括的用于实现各种程序的模块也可以视为硬件部件内的结构;也可以将用于实现各种功能的模块视为既可以是实现方法的软件程序又可以是硬件部件内的结构。Those skilled in the art know that, in addition to implementing the system, device and its various modules provided by the present invention in a purely computer-readable program code, it is entirely possible to implement the same program in the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded microcontrollers by logically programming the method steps. Therefore, the system, device and its various modules provided by the present invention can be considered as a hardware component, and the modules included therein for implementing various programs can also be considered as structures within the hardware component; the modules for implementing various functions can also be considered as both software programs for implementing the method and structures within the hardware component.

以上对本发明的具体实施例进行了描述。需要理解的是,本发明并不局限于上述特定实施方式,本领域技术人员可以在权利要求的范围内做出各种变化或修改,这并不影响本发明的实质内容。在不冲突的情况下,本申请的实施例和实施例中的特征可以任意相互组合。The above describes the specific embodiments of the present invention. It should be understood that the present invention is not limited to the above specific embodiments, and those skilled in the art can make various changes or modifications within the scope of the claims, which does not affect the essence of the present invention. In the absence of conflict, the embodiments of the present application and the features in the embodiments can be combined with each other at will.

Claims (10)

1.一种针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,包括:1. A lightweight method for dynamic exit of a neural network for multiple consecutive reasonings, characterized by comprising: 步骤1:构建基于神经网络的推理模型,对于预设小时间范围内的每个推理,预测网络退出的位置,并相应地预测计算配置,所述计算配置包括频率和电压;对于预设大时间范围内的多个推理,通过剩余推理工作量和时间约束,进行处理器频率和电压校准;Step 1: Build a neural network-based inference model, predict the location where the network exits for each inference within a preset small time range, and predict the computing configuration accordingly, the computing configuration includes frequency and voltage; for multiple inferences within a preset large time range, calibrate the processor frequency and voltage based on the remaining inference workload and time constraints; 步骤2:根据预测和校准计算配置以执行神经网络,从而实现动态电压和频率调节。Step 2: Calculate the configuration based on the prediction and calibration to execute the neural network to achieve dynamic voltage and frequency scaling. 2.根据权利要求1所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,在推理过程中设置推理的协调周期即推理任务期限,表达式为:2. According to the lightweight method of dynamic exit of neural network for multiple continuous reasoning according to claim 1, it is characterized in that the reasoning coordination cycle, i.e., the reasoning task deadline, is set during the reasoning process, and the expression is:
Figure FDA0004146935940000011
Figure FDA0004146935940000011
Figure FDA0004146935940000012
Figure FDA0004146935940000012
其中,λ表示任务截至时间的松紧,0<λ≤1,λ越小,期限越紧;ti表示每次推理实际完成时间;Tc为协调周期;M为推理任务的个数;Ti表示第i个推理任务的协调期限。Among them, λ represents the tightness of the task deadline, 0<λ≤1, the smaller λ is, the tighter the deadline is; ti represents the actual completion time of each reasoning; Tc is the coordination cycle; M is the number of reasoning tasks; Ti represents the coordination deadline of the i-th reasoning task.
3.根据权利要求2所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,在执行推理任务时,计算过程中所消耗的功率为有功功率Pactive,为动态功率PD、静态功率PS、恒功率PC的总和;3. The method for dynamic exit of a neural network for multiple continuous reasoning according to claim 2 is characterized in that, when performing a reasoning task, the power consumed in the calculation process is active power P active , which is the sum of dynamic power P D , static power P S , and constant power P C ; 计算平台空闲时消耗的功率为空闲功率Pidle,为静态功率PS、恒功率PC的总和;The power consumed by the computing platform when it is idle is the idle power P idle , which is the sum of the static power P S and the constant power P C ; PD=CV2f PDCV2f PS=VNtrIS PS VNtrIS 其中,C为开关逻辑门的电容;V表示处理器的供电电压;f表示处理器时钟频率;Ntr为逻辑门的数目;IS为每个逻辑门的归一化静态电流;Where C is the capacitance of the switching logic gate; V is the power supply voltage of the processor; f is the processor clock frequency; N tr is the number of logic gates; I S is the normalized static current of each logic gate; Ti内每个推理的能量消耗为:The energy consumption of each inference in Ti is:
Figure FDA0004146935940000013
Figure FDA0004146935940000013
位于早期退出之前的网络层运行在(VH,fH),在ti点网络退出后,计算平台通过动态电压频率调整技术DVFS将电压和频率降至最低电平(VL,fL),因此,推理任务ji的能量消耗表示为:The network layer before early exit runs at (V H , f H ). After the network exits at point t i , the computing platform reduces the voltage and frequency to the lowest level (V L , f L ) through the dynamic voltage and frequency scaling technology DVFS. Therefore, the energy consumption of inference task j i is expressed as:
Figure FDA0004146935940000021
Figure FDA0004146935940000021
Ei表示推理任务ji能量消耗;VH表示处理器高的供电电压;fH表示处理器高时钟频率;VL表示处理器低的供电电压。E i represents the energy consumption of reasoning task j i ; V H represents the high power supply voltage of the processor; f H represents the high clock frequency of the processor; V L represents the low power supply voltage of the processor.
4.根据权利要求3所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,在一个推理任务ji中,先预测网络退出位置,建立剩余层数以完成该推理;然后根据剩余层数和最推理坏情况执行时间Ti直到当前推理结束,建立(V,f)进行预测;根据Tc上剩余的推理量,通过反馈控制策略启用交叉推理校准器,根据多推理进度、总推理工作量和时间约束来校准(V,f);根据预测和校准,通过DVFS在运行时更新处理器到适当的计算配置。4. According to claim 3, the lightweight method for dynamic exit of a neural network for multiple continuous reasoning is characterized in that, in an reasoning task j i , the network exit position is first predicted, and the remaining number of layers is established to complete the reasoning; then, based on the remaining number of layers and the worst-case execution time Ti until the end of the current reasoning, (V, f) is established for prediction; based on the remaining amount of reasoning on T c , a cross-reasoning calibrator is enabled through a feedback control strategy, and (V, f) is calibrated according to the multi-reasoning progress, the total reasoning workload and the time constraint; based on the prediction and calibration, the processor is updated to the appropriate computing configuration at runtime through DVFS. 5.根据权利要求3所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,在推理任务ji中,现有层的包含BoF池化层和FC层,设yi为现有层第i层的中间结果,Nc为数据集中对象类的数量,BoF池化层作为从yi中提取的特征聚合,在BoF池化中,使用一组称为码本的特征向量来描述yi,每个码本的权重是通过测量码本和yi之间的相似度来生成的,码本权重的大小大于Nc,因此FC层进一步作为分类器,将BoF池化的结果调整为
Figure FDA0004146935940000022
从而估计出现有层的最终输出,分类器的估计结果为:
5. The method for dynamic exit of a neural network for multiple continuous reasoning according to claim 3 is characterized in that, in the reasoning task j i , the existing layer includes a BoF pooling layer and an FC layer, assuming that yi is the intermediate result of the i-th layer of the existing layer, N c is the number of object classes in the data set, and the BoF pooling layer is used as a feature aggregation extracted from yi . In BoF pooling, a set of feature vectors called codebooks is used to describe yi . The weight of each codebook is generated by measuring the similarity between the codebook and yi . The size of the codebook weight is greater than N c , so the FC layer is further used as a classifier to adjust the result of BoF pooling to
Figure FDA0004146935940000022
Thus, the final output of the existing layer is estimated, and the estimated result of the classifier is:
Figure FDA0004146935940000023
Figure FDA0004146935940000023
其中,fw(x,i)为第i层的中间结果;x为初始网络输入;Wi为第i个退出层的参数,退出层即为BoF池化层和FC层;
Figure FDA0004146935940000024
是分类器的估计结果;
Where fw(x, i) is the intermediate result of the i-th layer; x is the initial network input; Wi is the parameter of the i-th exit layer, and the exit layer is the BoF pooling layer and the FC layer;
Figure FDA0004146935940000024
is the estimated result of the classifier;
对现有层中的参数进行离线训练,在训练Wi时选择相同的交叉熵损失函数;Train the parameters in the existing layers offline, and choose the same cross entropy loss function when training Wi ;
Figure FDA0004146935940000025
Figure FDA0004146935940000025
其中,p(i)和q(i)分别是每个对象的实际和预测的类分布;
Figure FDA0004146935940000026
为交叉熵;
Where p(i) and q(i) are the actual and predicted class distributions of each object, respectively;
Figure FDA0004146935940000026
is the cross entropy;
在训练现有层时,将训练集数据{x1,x2,…,xN}输入模型,N为训练数据个数,通过批量梯度下降对现有层参数进行优化,设Ltotal为网络的层数,目标向量集为
Figure FDA0004146935940000027
Figure FDA0004146935940000028
为正确结果;首先,在没有任何现有层的情况下,用以下公式训练模型:
When training the existing layer, the training set data {x1, x2, …, xN} is input into the model, N is the number of training data, and the existing layer parameters are optimized by batch gradient descent. Let L total be the number of layers in the network, and the target vector set is
Figure FDA0004146935940000027
Figure FDA0004146935940000028
To get the correct result; first, without any existing layers, train the model with the following formula:
Figure FDA0004146935940000031
Figure FDA0004146935940000031
其中,W′为更新后的整体网络参数;η为学习率,如果精度不能满足要求,则调整η;j表示向量位置;
Figure FDA0004146935940000032
表示参数梯度;xj表示训练数据向量元素;rj表示目标数据向量元素;
Where W′ is the updated overall network parameter; η is the learning rate. If the accuracy cannot meet the requirements, adjust η; j represents the vector position;
Figure FDA0004146935940000032
represents parameter gradient; x j represents training data vector element; r j represents target data vector element;
对原模型W进行固定,在现有层的基础上,再次训练模型,优化BoF池化层和FC层的参数,表达式为:The original model W is fixed, and based on the existing layers, the model is trained again to optimize the parameters of the BoF pooling layer and the FC layer. The expression is:
Figure FDA0004146935940000033
Figure FDA0004146935940000033
训练完退出层后,计算平均特征权重μi作为退出决策的参数,表达式为:After training the exit layer, the average feature weight μ i is calculated as the parameter for the exit decision, and the expression is:
Figure FDA0004146935940000034
Figure FDA0004146935940000034
其中,k表示第k个全体分类目标;Among them, k represents the kth overall classification target; 在初始网络输入xj的推断过程中,在每个早期退出层,权重比αW为最大特征权重除以μi乘以用户指定的超参数β,最大特征权值越大,认为分类越有信心,若αW大于1,则推论终止,在这个早期退出层的结果被应用为最终结果,表达式为:During the inference process of the initial network input xj , at each early exit layer, the weight ratio αW is the maximum feature weight divided by μi multiplied by the user-specified hyperparameter β. The larger the maximum feature weight, the more confident the classification is. If αW is greater than 1, the inference is terminated, and the result of this early exit layer is applied as the final result, expressed as:
Figure FDA0004146935940000035
Figure FDA0004146935940000035
6.根据权利要求5所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,设预测器从神经网络的第L0层开始预测,预测器输入为第L0层神经网络的执行结果向量y0及其对应的退出层执行结果向量
Figure FDA0004146935940000036
6. The method for dynamic exit of a neural network for multiple continuous reasoning according to claim 5 is characterized in that the predictor starts prediction from the L0th layer of the neural network, and the predictor input is the execution result vector y0 of the L0th layer of the neural network and its corresponding exit layer execution result vector
Figure FDA0004146935940000036
对上述结果进行扩充维度,通过在向量
Figure FDA0004146935940000037
两端添加0的方式,将向量
Figure FDA0004146935940000038
Figure FDA0004146935940000039
维扩展到
Figure FDA00041469359400000310
维,并记为
Figure FDA00041469359400000311
其中Nc为向量
Figure FDA00041469359400000312
长度同时也是推理结果分类的数量,K为特征池化组中特征向量的个数,具体计算过程如下述公式所示:
The above results are expanded in dimension by using the vector
Figure FDA0004146935940000037
Add 0 at both ends to convert the vector
Figure FDA0004146935940000038
Depend on
Figure FDA0004146935940000039
Dimensions expanded to
Figure FDA00041469359400000310
dimension, and recorded as
Figure FDA00041469359400000311
Where N c is a vector
Figure FDA00041469359400000312
The length is also the number of inference result categories, K is the number of feature vectors in the feature pooling group, and the specific calculation process is shown in the following formula:
Figure FDA00041469359400000313
Figure FDA00041469359400000313
l表示第l个元素;l represents the lth element; 经过零填充后的输出向量
Figure FDA0004146935940000041
将进行一维卷积操作并获得新的特征权重向量
Figure FDA0004146935940000042
其中卷积权重为长度为K的一维向量h,具体计算过程如下述公式所示:
The output vector is zero-padded
Figure FDA0004146935940000041
A one-dimensional convolution operation will be performed and a new feature weight vector will be obtained
Figure FDA0004146935940000042
The convolution weight is a one-dimensional vector h of length K. The specific calculation process is shown in the following formula:
Figure FDA0004146935940000043
Figure FDA0004146935940000043
通过递归将中间估计结果
Figure FDA0004146935940000044
替换为
Figure FDA0004146935940000045
并重复计算式,得到L0+2位置的出料层的预测结果,记为
Figure FDA0004146935940000046
By recursively converting the intermediate estimation results
Figure FDA0004146935940000044
Replace with
Figure FDA0004146935940000045
Repeat the calculation formula to obtain the predicted result of the discharge layer at position L 0 +2, which is recorded as
Figure FDA0004146935940000046
将退出层的预测结果放置在L0之后的任意一层,得到PREDICT函数,描述如下:Place the prediction result of the exit layer in any layer after L 0 to obtain the PREDICT function, which is described as follows:
Figure FDA0004146935940000047
Figure FDA0004146935940000047
其中,
Figure FDA0004146935940000048
表示L0后的预测出点的最小值;
in,
Figure FDA0004146935940000048
Indicates the minimum value of the predicted output point after L 0 ;
Figure FDA0004146935940000049
Figure FDA0004146935940000049
式中,
Figure FDA00041469359400000410
为L0+ζ层的预测出口置信度,因此,L0+ζ是预测函数预测的退出点,如果找不到ζ,则引入一个超参数τ,即ζ∈[1,τ],τ≤Ltotal-L0,τ的上界表示禁止最后一层以外的预测结果;如果[1,τ]中没有整数满足条件,则设ζ=τ;
In the formula,
Figure FDA00041469359400000410
is the prediction exit confidence of the L 0 +ζ layer. Therefore, L 0 +ζ is the exit point predicted by the prediction function. If ζ cannot be found, a hyperparameter τ is introduced, that is, ζ∈[1,τ], τ≤L total -L 0 . The upper bound of τ indicates that the prediction results outside the last layer are prohibited. If no integer in [1,τ] satisfies the condition, ζ=τ is set.
将PREDICT的现有预测结果ζ转换为中级频率fM,i,对于推理任务ji,在开始推理到L0预测的时间间隔内,保守应用一个相对高层次的
Figure FDA00041469359400000411
及时完成模型的Ltotal而不退出,表达式为:
Convert the existing prediction results ζ of PREDICT to the mid-level frequency f M,i . For the inference task j i , a relatively high-level conservative approach is applied in the interval between the start of inference and the L 0 prediction.
Figure FDA00041469359400000411
The L total of the model is completed in time without exiting, and the expression is:
Figure FDA00041469359400000412
Figure FDA00041469359400000412
式中,fH为计算平台默认的最高频率,该计算平台可用Ltotal层完成推理,且不因时间Ti提前退出;频率
Figure FDA00041469359400000413
由Δfi校准;
Where fH is the default maximum frequency of the computing platform. The computing platform can complete reasoning with L total layers and will not exit early due to time Ti ; frequency
Figure FDA00041469359400000413
Calibrated by Δf i ;
在预测L0时,根据预测的退出点,将f调整为中级频率fM,i,并运行网络,直到在Ti退出,给定一个具有Ltotal层数和早期预测ζ的网络,中级频率计算公式为:When predicting L0 , adjust f to the intermediate frequency fM,i according to the predicted exit point, and run the network until exiting at Ti . Given a network with L total layers and early prediction ζ, the intermediate frequency is calculated as:
Figure FDA00041469359400000414
Figure FDA00041469359400000414
其中,fM,i是预测的通过时间Ti完成推断的最低频率。where f M,i is the lowest frequency predicted to complete inference at time Ti .
7.根据权利要求6所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,对于推理任务ji+1,交叉推理DVFS策略在完成包括ji在内的所有先前任务后提供校准建议Δfi+1,该校准是由一个离散增量比例积分PI调节器实现的,表达式为:7. The method for dynamic exit lightweighting of a neural network for multiple consecutive reasoning according to claim 6, characterized in that for the reasoning task j i+1 , the cross-reasoning DVFS strategy provides a calibration suggestion Δf i+1 after completing all previous tasks including j i , and the calibration is implemented by a discrete incremental proportional integral PI regulator, and the expression is: Δfi+1=Δfi+(KP+KI(ti-ti-1))e(ti)-KPe(ti-1)Δf i+1 =Δf i +(K P +K I (t i -t i-1 ))e(t i )-K P e(t i-1 ) 其中,KP和KI分别为比例系数和积分系数;索引i表示当前协调周期内的第i个推理任务;ti表示推理ji从协调周期开始的相对完成时间;ti-ti-1为推理任务ji与ji-1完成的时间间隔;Where K P and K I are the proportional coefficient and integral coefficient respectively; index i represents the i-th reasoning task in the current coordination cycle; ti represents the relative completion time of reasoning j i from the beginning of the coordination cycle; ti -t i-1 is the time interval between the completion of reasoning tasks j i and j i-1 ; PI调节器的输入偏差e(ti)是根据总推理工作量和自协调期开始以来推理执行进度对Tc的推理进度的评估,表达式为:
Figure FDA0004146935940000051
其中第一项是推理工作负载平衡的参考速度,第二项是到目前为止的处理速度。
The input deviation e(t i ) of the PI regulator is an evaluation of the inference progress of T c based on the total inference workload and the inference execution progress since the beginning of the coordination period, and is expressed as:
Figure FDA0004146935940000051
The first term is the reference speed for inference workload balancing, and the second term is the processing speed so far.
8.根据权利要求7所述的针对多次连续推理的神经网络动态退出轻量化方法,其特征在于,通过DVFS调速器配置频率和对应电压,分别根据小时间尺度和大时间尺度的预测和校准执行网络,在每次推理开始时,基于的交叉推理校准推导出Δfi,DVFS调控器将计算配置设置为
Figure FDA0004146935940000052
保守地完成不提前退出的推理,在推理层L0进行早期退出预测时,根据内推断预测器导出的fM,i,以及根据交叉推断校准导出的Δfi,DVFS调控器建立适当的频率配置
Figure FDA0004146935940000053
表达式为:
8. The method for dynamic exit of a neural network for multiple consecutive inferences according to claim 7 is characterized in that the frequency and the corresponding voltage are configured by the DVFS governor, and the network is executed according to the prediction and calibration of the small time scale and the large time scale respectively. At the beginning of each inference, Δf i is derived based on the cross-inference calibration, and the DVFS governor sets the calculation configuration to
Figure FDA0004146935940000052
Inference without early exit is done conservatively. When early exit prediction is performed at the inference layer L0 , the DVFS governor establishes an appropriate frequency configuration based on fM ,i derived from the internal inference predictor and Δf i derived from the cross-inference calibration.
Figure FDA0004146935940000053
The expression is:
Figure FDA0004146935940000054
Figure FDA0004146935940000054
因此,产生的能源消耗为:Therefore, the energy consumption is:
Figure FDA0004146935940000055
Figure FDA0004146935940000055
其中,
Figure FDA0004146935940000058
为预测层L0及其之前的网络层ji完成的时间;
Figure FDA0004146935940000056
为给定
Figure FDA0004146935940000057
所对应的电压;由于f的选择是基于推理中预测的剩余工作量,因此推理内预测可以使ji由Ti完成;交叉推断校准目标通过Tc完成M个顺序推断。
in,
Figure FDA0004146935940000058
is the time it takes to complete the prediction layer L0 and its previous network layer j i ;
Figure FDA0004146935940000056
For a given
Figure FDA0004146935940000057
The corresponding voltage; since the selection of f is based on the remaining workload predicted in inference, the intra-inference prediction can enable j i to be completed by Ti ; the cross-inference calibration target completes M sequential inferences through T c .
9.一种针对多次连续推理的神经网络动态退出轻量化系统,其特征在于,包括:9. A lightweight system for dynamic exit of a neural network for multiple consecutive reasonings, characterized by comprising: 模块M1:构建基于神经网络的推理模型,对于预设小时间范围内的每个推理,预测网络退出的位置,并相应地预测计算配置,所述计算配置包括频率和电压;Module M1: Build a neural network-based inference model, predict the location where the network exits for each inference within a preset small time range, and predict the computing configuration accordingly, the computing configuration including frequency and voltage; 模块M2:对于预设大时间范围内的多个推理,通过剩余推理工作量和时间约束,进行处理器频率和电压校准;Module M2: For multiple inferences within a preset large time range, processor frequency and voltage calibration is performed based on the remaining inference workload and time constraints; 模块M3:根据预测和校准计算配置以执行神经网络,从而实现动态电压和频率调节。Module M3: Configured to execute a neural network based on prediction and calibration calculations to achieve dynamic voltage and frequency regulation. 10.根据权利要求9所述的针对多次连续推理的神经网络动态退出轻量化系统,其特征在于,在推理过程中设置推理的协调周期即推理任务期限,表达式为:10. The neural network dynamic exit lightweight system for multiple continuous reasoning according to claim 9 is characterized in that a reasoning coordination cycle, i.e., a reasoning task deadline, is set during the reasoning process, and the expression is:
Figure FDA0004146935940000061
Figure FDA0004146935940000061
Figure FDA0004146935940000062
Figure FDA0004146935940000062
其中,λ表示任务截至时间的松紧,0<λ≤1,λ越小,期限越紧;ti表示每次推理实际完成时间;Tc为协调周期;M为推理任务的个数;Ti表示第i个推理任务的协调期限;Where λ represents the tightness of the task deadline, 0<λ≤1, the smaller λ is, the tighter the deadline is; ti represents the actual completion time of each reasoning; Tc is the coordination cycle; M is the number of reasoning tasks; Ti represents the coordination deadline of the i-th reasoning task; 在执行推理任务时,计算过程中所消耗的功率为有功功率Pactive,为动态功率PD、静态功率PS、恒功率PC的总和;When performing inference tasks, the power consumed in the calculation process is active power P active , which is the sum of dynamic power P D , static power P S , and constant power P C ; 计算平台空闲时消耗的功率为空闲功率Pidle,为静态功率PS、恒功率PC的总和;The power consumed by the computing platform when it is idle is the idle power P idle , which is the sum of the static power P S and the constant power P C ; PD=CV2f PDCV2f PS=VNtrIS PS VNtrIS 其中,C为开关逻辑门的电容;V表示处理器的供电电压;f表示处理器时钟频率;Ntr为逻辑门的数目;IS为每个逻辑门的归一化静态电流;Where C is the capacitance of the switching logic gate; V is the power supply voltage of the processor; f is the processor clock frequency; N tr is the number of logic gates; I S is the normalized static current of each logic gate; Ti内每个推理的能量消耗为:The energy consumption of each inference in Ti is:
Figure FDA0004146935940000063
Figure FDA0004146935940000063
位于早期退出之前的网络层运行在(VH,fH),在ti点网络退出后,计算平台通过动态电压频率调整技术DVFS将电压和频率降至最低电平(VL,fL),因此,推理任务ji的能量消耗表示为:The network layer before early exit runs at (V H , f H ). After the network exits at point t i , the computing platform reduces the voltage and frequency to the lowest level (V L , f L ) through the dynamic voltage and frequency scaling technology DVFS. Therefore, the energy consumption of inference task j i is expressed as:
Figure FDA0004146935940000064
Figure FDA0004146935940000064
Ei表示推理任务ji能量消耗;VH表示处理器高的供电电压;fH表示处理器高时钟频率;VL表示处理器低的供电电压;E i represents the energy consumption of reasoning task j i ; V H represents the high supply voltage of the processor; f H represents the high clock frequency of the processor; V L represents the low supply voltage of the processor; 在一个推理任务ji中,先预测网络退出位置,建立剩余层数以完成该推理;然后根据剩余层数和最推理坏情况执行时间Ti直到当前推理结束,建立(V,f)进行预测;根据Tc上剩余的推理量,通过反馈控制策略启用交叉推理校准器,根据多推理进度、总推理工作量和时间约束来校准(V,f);根据预测和校准,通过DVFS在运行时更新处理器到适当的计算配置。In an inference task j i , the network exit position is first predicted and the remaining number of layers is established to complete the inference; then (V, f) is established for prediction based on the remaining number of layers and the worst-case execution time Ti until the end of the current inference; based on the remaining amount of inference on T c , the cross-inference calibrator is enabled through the feedback control strategy to calibrate (V, f) according to the multi-inference progress, total inference workload and time constraints; based on the prediction and calibration, the processor is updated to the appropriate computing configuration at runtime through DVFS.
CN202310306228.2A 2023-03-24 2023-03-24 Neural network dynamic exit lightweight method and system for multiple continuous reasoning Pending CN116227558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310306228.2A CN116227558A (en) 2023-03-24 2023-03-24 Neural network dynamic exit lightweight method and system for multiple continuous reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310306228.2A CN116227558A (en) 2023-03-24 2023-03-24 Neural network dynamic exit lightweight method and system for multiple continuous reasoning

Publications (1)

Publication Number Publication Date
CN116227558A true CN116227558A (en) 2023-06-06

Family

ID=86571361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310306228.2A Pending CN116227558A (en) 2023-03-24 2023-03-24 Neural network dynamic exit lightweight method and system for multiple continuous reasoning

Country Status (1)

Country Link
CN (1) CN116227558A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894469A (en) * 2023-09-11 2023-10-17 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894469A (en) * 2023-09-11 2023-10-17 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment
CN116894469B (en) * 2023-09-11 2023-12-15 西南林业大学 DNN collaborative reasoning acceleration method, device and medium in end-edge cloud computing environment

Similar Documents

Publication Publication Date Title
Cui et al. Reinforcement learning for optimal primary frequency control: A Lyapunov approach
CN110137942A (en) Multiple Time Scales flexible load rolling scheduling method and system based on Model Predictive Control
CN113657661A (en) Enterprise carbon emission prediction method and device, computer equipment and storage medium
Belgioioso et al. Online feedback equilibrium seeking
CN108321795A (en) Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
US7444272B2 (en) Self-modulation in a model-based automated management framework
CN118466224B (en) Flow control method and system for electric propulsion system
Köhler et al. Real time economic dispatch for power networks: A distributed economic model predictive control approach
CN116227558A (en) Neural network dynamic exit lightweight method and system for multiple continuous reasoning
Carvalho et al. Autonomous power management in mobile devices using dynamic frequency scaling and reinforcement learning for energy minimization
HasanzadeZonuzy et al. Model-based reinforcement learning for infinite-horizon discounted constrained markov decision processes
Chen et al. Performance optimization of machine learning inference under latency and server power constraints
CN114997370B (en) Low-power neural network system based on predictive exit and its implementation method
Ferranti et al. A parallel dual fast gradient method for MPC applications
Kotary et al. Learning constrained optimization with deep augmented lagrangian methods
Chen et al. Quality optimization of adaptive applications via deep reinforcement learning in energy harvesting edge devices
CN115391048A (en) Micro-service instance dynamic horizontal expansion and contraction method and system based on trend prediction
CN114265674B (en) Task planning method and related device based on reinforcement learning under temporal logic constraints
Rostam et al. A hybrid Gaussian process approach to robust economic model predictive control
CN119225390A (en) Control model optimization method, device, equipment, storage medium and product
Kang et al. Power-and time-aware deep learning inference for mobile embedded devices
US11407327B1 (en) Controlling ongoing usage of a battery cell having one or more internal supercapacitors and an internal battery
Maasoumy et al. Comparison of control strategies for energy efficient building HVAC systems
Li et al. EENet: Energy efficient neural networks with run-time power management
Cui et al. Structured neural-pi control for networked systems: Stability and steady-state optimality guarantees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination