CN114328048A

CN114328048A - Disk fault prediction method and device

Info

Publication number: CN114328048A
Application number: CN202111582363.7A
Authority: CN
Inventors: 赵利强
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-04-12
Anticipated expiration: 2041-12-22
Also published as: WO2023116111A1; CN114328048B

Abstract

The invention discloses a disk failure prediction method and device, which acquires information of different types of disks, constructs a training data set based on the information of the disks; constructs a deep neural network model including a characteristic network structure and a classification network structure; wherein, the characteristic network The structure is used to extract the feature information of the disk based on the information of the disk; the classification network structure is used to determine whether the disk is faulty based on the feature information of the disk; the deep neural network model is trained based on the training data set, and the trained disk failure prediction model is obtained; Obtain the information of the target disk to be predicted in failure, and input the information of the target disk into the disk failure prediction model to obtain the failure prediction result of the target disk. It can be seen that the disk failure prediction model of the present application can simultaneously perform failure prediction for different types of disks, and has good generality.

Description

Disk failure prediction method and device

技术领域technical field

本发明涉及存储领域，特别是涉及一种磁盘故障预测方法及装置。The present invention relates to the field of storage, in particular to a method and device for predicting a disk failure.

背景技术Background technique

随着云计算、区块链等新兴技术的发展，对存储系统的需求越来越大。目前，在存储系统中，磁盘仍然是主流存储设备。现有方案中，通常采用S.M.A.R.T.(Self-MonitoringAnalysis and Reporting Technology，自我监测、分析及报告技术)技术预测存储系统中的磁盘是否故障，其故障预测原理为：为磁盘的各类指标数据一一设定正常阈值区间，当磁盘出现正常阈值区间以外的指标值时，认为磁盘预出现故障。但是，同一存储系统中磁盘型号可能不止一种，不同存储系统的磁盘型号也可能不同，而不同型号的磁盘的各类指标数据的分布并不相同，所以往往需要对不同型号的磁盘的各类指标数据设定不同的正常阈值区间，导致这种磁盘故障预测方式的通用性较差。With the development of emerging technologies such as cloud computing and blockchain, the demand for storage systems is increasing. At present, in the storage system, the disk is still the mainstream storage device. In the existing solution, S.M.A.R.T. (Self-Monitoring Analysis and Reporting Technology, Self-Monitoring Analysis and Reporting Technology) technology is usually used to predict whether the disks in the storage system are faulty. The normal threshold range is determined. When the disk has an indicator value outside the normal threshold range, it is considered that the disk is pre-faulty. However, there may be more than one type of disk in the same storage system, and the types of disks in different storage systems may also be different, and the distribution of various types of indicator data for different types of disks is not the same. The indicator data sets different normal threshold ranges, resulting in poor generality of this disk failure prediction method.

因此，如何提供一种解决上述技术问题的方案是本领域的技术人员目前需要解决的问题。Therefore, how to provide a solution to the above technical problem is a problem that those skilled in the art need to solve at present.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种磁盘故障预测方法及装置，磁盘故障预测模型可同时对不同型号的磁盘进行故障预测，通用性较好。The purpose of the present invention is to provide a disk failure prediction method and device. The disk failure prediction model can simultaneously predict the failure of different types of disks, and has good versatility.

为解决上述技术问题，本发明提供了一种磁盘故障预测方法，包括：In order to solve the above technical problems, the present invention provides a disk failure prediction method, including:

获取不同型号的磁盘的信息，并基于所述磁盘的信息构建训练数据集；Obtain information of different types of disks, and build a training data set based on the information of the disks;

构建包含特征网络结构和分类网络结构的深度神经网络模型；其中，所述特征网络结构用于基于磁盘的信息提取出所述磁盘的特征信息；所述分类网络结构用于基于所述磁盘的特征信息判定所述磁盘是否故障；Construct a deep neural network model including a feature network structure and a classification network structure; wherein, the feature network structure is used to extract the feature information of the disk based on the information of the disk; the classification network structure is used to extract the feature information of the disk based on the feature of the disk information to determine whether the disk is faulty;

基于所述训练数据集对所述深度神经网络模型进行训练，得到训练完成的磁盘故障预测模型；The deep neural network model is trained based on the training data set to obtain a trained disk failure prediction model;

获取待故障预测的目标磁盘的信息，并将所述目标磁盘的信息输入至所述磁盘故障预测模型，得到所述目标磁盘的故障预测结果。Obtain the information of the target disk to be predicted in failure, and input the information of the target disk into the disk failure prediction model to obtain the failure prediction result of the target disk.

可选地，获取不同型号的磁盘的信息，并基于所述磁盘的信息构建训练数据集，包括：Optionally, obtain information of different types of disks, and build a training data set based on the information of the disks, including:

获取不同型号的磁盘在不同时间的各类指标数据，并获取所述磁盘的故障信息；Obtain various types of indicator data of different types of disks at different times, and obtain the failure information of the disks;

根据所述磁盘的故障信息及预设故障预测提前时间，对所述磁盘的各类指标数据一一打上表示指标数据是否为故障数据的标签；According to the failure information of the disk and the preset failure prediction advance time, label the various index data of the disk one by one to indicate whether the index data is fault data;

将所述磁盘的各类指标数据及其对应的标签组合起来，构成所述训练数据集。The various types of indicator data of the disk and their corresponding labels are combined to form the training data set.

可选地，获取不同型号的磁盘的信息，并基于所述磁盘的信息构建训练数据集，还包括：Optionally, obtain information of different types of disks, and build a training data set based on the information of the disks, further comprising:

在获取不同型号的磁盘在不同时间的各类指标数据之后，在将所述磁盘的各类指标数据及其对应的标签组合起来之前，对所述磁盘的各类指标数据中的无效数据进行擦除处理，并对擦除处理后的所述磁盘的各类指标数据进行归一化处理，以将归一化处理后的所述磁盘的各类指标数据及其对应的标签组合起来，构成所述训练数据集。After acquiring various types of indicator data of different types of disks at different times, before combining various types of indicator data of the disks and their corresponding labels, erase invalid data in the various types of indicator data of the disks Remove processing, and normalize the various index data of the disk after the erasing process, so as to combine the normalized index data of the disk and their corresponding labels to form the the training dataset.

可选地，构建包含特征网络结构和分类网络结构的深度神经网络模型；基于所述训练数据集对所述深度神经网络模型进行训练，得到训练完成的磁盘故障预测模型，包括：Optionally, construct a deep neural network model including a characteristic network structure and a classification network structure; train the deep neural network model based on the training data set, and obtain a trained disk failure prediction model, including:

构建依次由第一全连接层和多个残差层组成的特征网络结构；其中，所述特征网络结构用于基于所述磁盘的各类指标数据提取出与指标数据一一对应的高维隐含特征；Construct a feature network structure consisting of a first fully connected layer and a plurality of residual layers in turn; wherein, the feature network structure is used to extract high-dimensional implicit features corresponding to the index data one-to-one based on the various index data of the disk ;

构建由第二全连接层组成的分类网络结构；其中，所述分类网络结构的输出维度为2，分别对应故障和正常；constructing a classification network structure composed of a second fully connected layer; wherein, the output dimension of the classification network structure is 2, corresponding to fault and normal respectively;

将构建好的所述特征网络结构和所述分类网络结构组合起来，得到深度神经网络模型；Combining the constructed feature network structure and the classification network structure to obtain a deep neural network model;

基于所述训练数据集对所述深度神经网络模型进行训练，得到训练完成的深度神经网络模型；The deep neural network model is trained based on the training data set to obtain a trained deep neural network model;

将所述磁盘的高维隐含特征及其对应的标签组合起来，构成新训练数据集，并基于所述新训练数据集对用于替换所述第二全连接层的XGBoost分类器进行训练，得到训练完成的XGBoost分类器；Combining the high-dimensional implicit features of the disk and their corresponding labels to form a new training data set, and based on the new training data set, the XGBoost classifier used to replace the second fully connected layer is trained to obtain training The completed XGBoost classifier;

将训练完成的所述XGBoost分类器替换训练完成的所述深度神经网络模型中的第二全连接层，并将替换后的深度神经网络模型作为所述磁盘故障预测模型。The trained XGBoost classifier is replaced with the second fully connected layer in the trained deep neural network model, and the replaced deep neural network model is used as the disk failure prediction model.

可选地，构建依次由第一全连接层和多个残差层组成的特征网络结构，包括：Optionally, construct a feature network structure sequentially composed of a first fully connected layer and multiple residual layers, including:

基于F_out＝X*W+b构建第一全连接层；其中，F_out为所述第一全连接层的输出向量；X为所述第一全连接层的输入向量；W为所述第一全连接层的网络权重；b为所述第一全连接层的偏置；The first fully-connected layer is constructed based on F _out =X*W+b; wherein, F _out is the output vector of the first fully-connected layer; X is the input vector of the first fully-connected layer; W is the first fully-connected layer. A network weight of the fully connected layer; b is the bias of the first fully connected layer;

基于x_a+1＝x_a+F(x_a,W_a)、F(x_a,W_a)＝Relu(x_a*W_a)构建L个残差层；其中，x_a+1为第a层残差层的输出；x_a为第a层残差层的输入；F(x_a,W_a)为第a层的残差学习函数；Relu为激活函数，Relu(x)＝max(0,x)；1≤a≤L且a为整数；Construct L residual layers based on x _a+1 =x _a +F(x _a ,W _a ), F(x _a ,W _a )=Relu(x _a *W _a ); where x _a+1 is the first The output of the residual layer of layer a; x _a is the input of the residual layer of layer a; F(x _a ,W _a ) is the residual learning function of layer a; Relu is the activation function, Relu(x)=max( 0,x); 1≤a≤L and a is an integer;

将构建好的所述第一全连接层和L个所述残差层组合起来，得到所述特征网络结构。The constructed first fully connected layer and the L residual layers are combined to obtain the feature network structure.

可选地，基于所述训练数据集对所述深度神经网络模型进行训练，得到训练完成的深度神经网络模型，包括：Optionally, the deep neural network model is trained based on the training data set to obtain a trained deep neural network model, including:

将所述训练数据集内磁盘的各类指标数据输入至所述深度神经网络模型，得到所述磁盘的故障预测结果；Inputting various index data of the disk in the training data set into the deep neural network model to obtain the failure prediction result of the disk;

将所述磁盘的故障预测结果和所述磁盘的各类指标数据对应的标签代入预设损失计算函数进行损失计算，得到第一损失；Substitute the failure prediction result of the disk and the labels corresponding to various index data of the disk into a preset loss calculation function to calculate the loss to obtain the first loss;

以将所述第一损失降低至0为优化目标，利用预设反向传播算法对所述神经网络模型的可调参数进行优化调整，直至所述训练数据集全部在所述神经网络模型上训练完成，得到训练完成的深度神经网络模型。With the optimization goal of reducing the first loss to 0, the adjustable parameters of the neural network model are optimized and adjusted using a preset back-propagation algorithm until all the training data sets are trained on the neural network model. After completion, the trained deep neural network model is obtained.

可选地，基于所述新训练数据集对用于替换所述第二全连接层的XGBoost分类器进行训练，得到训练完成的XGBoost分类器，包括：Optionally, based on the new training data set, the XGBoost classifier for replacing the second fully connected layer is trained to obtain the trained XGBoost classifier, including:

将所述新训练数据集内磁盘的各高维隐含特征输入至所述XGBoost分类器，得到所述磁盘的故障分类结果；Input each high-dimensional latent feature of the disk in the new training data set into the XGBoost classifier to obtain the failure classification result of the disk;

将所述磁盘的故障分类结果和所述磁盘的各高维隐含特征对应的标签代入预设损失计算函数进行损失计算，得到第二损失；Substitute the fault classification result of the disk and the label corresponding to each high-dimensional implicit feature of the disk into a preset loss calculation function to calculate the loss to obtain the second loss;

以将所述第二损失降低至0为优化目标，利用预设反向传播算法对所述XGBoost分类器的可调参数进行优化调整，直至所述新训练数据集全部在所述XGBoost分类器上训练完成，得到训练完成的XGBoost分类器。Taking reducing the second loss to 0 as the optimization goal, using the preset back-propagation algorithm to optimize and adjust the adjustable parameters of the XGBoost classifier until all the new training data sets are on the XGBoost classifier After the training is completed, the trained XGBoost classifier is obtained.

可选地，在将所述目标磁盘的信息输入至所述磁盘故障预测模型之前，所述磁盘故障预测方法还包括：Optionally, before inputting the information of the target disk into the disk failure prediction model, the disk failure prediction method further includes:

根据预设启发式遗传算法学习出所述磁盘故障预测模型的最优结构参数；Learning the optimal structural parameters of the disk failure prediction model according to a preset heuristic genetic algorithm;

按照所述最优结构参数调整所述磁盘故障预测模型的结构参数，以将所述目标磁盘的信息输入至最优结构的所述磁盘故障预测模型。The structure parameters of the disk failure prediction model are adjusted according to the optimal structure parameters, so as to input the information of the target disk into the disk failure prediction model of the optimal structure.

可选地，根据预设启发式遗传算法学习出所述磁盘故障预测模型的最优结构参数，包括：Optionally, learn the optimal structural parameters of the disk failure prediction model according to a preset heuristic genetic algorithm, including:

将所述磁盘故障预测模型的结构参数组合成一个参数向量，并随机初始化所述参数向量，且将初始化后的所述参数向量添加至预设向量优先队列；其中，所述向量优先队列为大根堆结构，所述大根堆结构的排序key为每个所述参数向量对应的误报率得分值；Combining the structural parameters of the disk failure prediction model into a parameter vector, initializing the parameter vector randomly, and adding the initialized parameter vector to a preset vector priority queue; wherein, the vector priority queue is a large root Heap structure, the sorting key of the big root heap structure is the false alarm rate score value corresponding to each of the parameter vectors;

按照初始化后的所述参数向量调整所述磁盘故障预测模型的结构参数，以得到所述磁盘故障预测模型在当前结构参数下的误报率得分值，并返回执行随机初始化所述参数向量的步骤，直至初始化次数到达预设次数阈值；Adjust the structural parameters of the disk failure prediction model according to the parameter vector after initialization, so as to obtain the false alarm rate score value of the disk failure prediction model under the current structural parameters, and return the value of performing random initialization of the parameter vector. step until the initialization times reach the preset times threshold;

从所述向量优先队列中随机选择预设第一数量的参数向量对，并将每个所述参数向量对按照param_new＝(param₁+param₂)/2进行交叉运算，以得到所述磁盘故障预测模型在参数向量param_new下的新误报率得分值；其中，param₁和param₂为每个所述参数向量对中的两个参数向量；A preset first number of parameter vector pairs are randomly selected from the vector priority queue, and each parameter vector pair is crossed according to param _new =(param ₁ +param ₂ )/2 to obtain the disk The new false alarm rate score value of the fault prediction model under the parameter vector param _new ; wherein, param ₁ and param ₂ are two parameter vectors in each of the parameter vector pairs;

将所述新误报率得分值和所述向量优先队列当前对应的误报率得分值一起从小到大排序，只将排序在前的预设第二数量的参数向量保留在所述向量优先队列中；Sort the new false alarm rate score value and the current corresponding false alarm rate score value of the vector priority queue from small to large, and only keep the parameter vector of the second preset number first in the vector in the priority queue;

遍历所述向量优先队列中的所有参数向量，并基于变异概率p＝e^iteration/3将每个所述参数向量param按照param_var＝param+rand*step进行变异，以得到所述磁盘故障预测模型在参数向量param_var下的变异误报率得分值；其中，iteration为迭代次数；step为各参数向量的变异基本单位；rand为每次变异过程中产生的随机数；Traverse all parameter vectors in the vector priority queue, and mutate each parameter vector param according to param _var =param+rand*step based on mutation probability p=e ^iteration/3 to obtain the disk failure prediction model The score value of the mutation false alarm rate under the parameter vector param _var ; among them, iteration is the number of iterations; step is the basic unit of mutation of each parameter vector; rand is the random number generated in each mutation process;

将所述变异误报率得分值和所述向量优先队列当前对应的误报率得分值一起从小到大排序，只将排序在前的预设第二数量的参数向量保留在所述向量优先队列中，并返回执行从所述向量优先队列中随机选择预设第一数量的参数向量对的步骤进入下一次迭代，直至迭代次数到达预设迭代次数阈值；Sort the variable false alarm rate score value and the current corresponding false alarm rate score value of the vector priority queue from small to large, and only keep the parameter vector of the second preset number first in the vector in the priority queue, and return to execute the step of randomly selecting a preset first number of parameter vector pairs from the vector priority queue to enter the next iteration, until the number of iterations reaches the preset number of iterations threshold;

在所有迭代结束后，从最终得到的所述向量优先队列中选择误报率得分值最小的参数向量作为所述最优结构参数。After all iterations are completed, the parameter vector with the smallest false alarm rate score value is selected from the finally obtained vector priority queue as the optimal structure parameter.

为解决上述技术问题，本发明还提供了一种磁盘故障预测装置，包括：In order to solve the above technical problems, the present invention also provides a disk failure prediction device, including:

存储器，用于存储计算机程序；memory for storing computer programs;

处理器，用于在执行所述计算机程序时实现上述任一种磁盘故障预测方法的步骤。The processor is configured to implement the steps of any one of the above-mentioned methods for predicting disk failures when executing the computer program.

本发明提供了一种磁盘故障预测方法，获取不同型号的磁盘的信息，并基于磁盘的信息构建训练数据集；构建包含特征网络结构和分类网络结构的深度神经网络模型；其中，特征网络结构用于基于磁盘的信息提取出磁盘的特征信息；分类网络结构用于基于磁盘的特征信息判定磁盘是否故障；基于训练数据集对深度神经网络模型进行训练，得到训练完成的磁盘故障预测模型；获取待故障预测的目标磁盘的信息，并将目标磁盘的信息输入至磁盘故障预测模型，得到目标磁盘的故障预测结果。可见，本申请的磁盘故障预测模型可同时对不同型号的磁盘进行故障预测，通用性较好。The invention provides a disk failure prediction method, which obtains information of different types of disks, and constructs a training data set based on the information of the disks; constructs a deep neural network model including a characteristic network structure and a classification network structure; The characteristic information of the disk is extracted from the information based on the disk; the classification network structure is used to determine whether the disk is faulty based on the characteristic information of the disk; the deep neural network model is trained based on the training data set, and the trained disk failure prediction model is obtained; information of the target disk for failure prediction, and input the information of the target disk into the disk failure prediction model to obtain the failure prediction result of the target disk. It can be seen that the disk failure prediction model of the present application can simultaneously perform failure prediction for different types of disks, and has good generality.

本发明还提供了一种磁盘故障预测装置，与上述故障预测方法具有相同的有益效果。The present invention also provides a disk failure prediction device, which has the same beneficial effects as the above-mentioned failure prediction method.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对现有技术和实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the prior art and the accompanying drawings required in the embodiments. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明实施例提供的一种磁盘故障预测方法的流程图；1 is a flowchart of a method for predicting a disk failure according to an embodiment of the present invention;

图2为本发明实施例提供的一种特征网络结构的原理图；2 is a schematic diagram of a characteristic network structure provided by an embodiment of the present invention;

图3为本发明实施例提供的一种磁盘故障预测模型的原理图；3 is a schematic diagram of a disk failure prediction model provided by an embodiment of the present invention;

图4为本发明实施例提供的一种残差层的原理图；FIG. 4 is a schematic diagram of a residual layer provided by an embodiment of the present invention;

图5为本发明实施例提供的一种变异概率随迭代次数的变化图。FIG. 5 is a graph of variation of mutation probability with iteration times according to an embodiment of the present invention.

具体实施方式Detailed ways

本发明的核心是提供一种磁盘故障预测方法及装置，磁盘故障预测模型可同时对不同型号的磁盘进行故障预测，通用性较好。The core of the present invention is to provide a disk failure prediction method and device. The disk failure prediction model can simultaneously perform failure prediction for different types of disks, and has good versatility.

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

请参照图1，图1为本发明实施例提供的一种磁盘故障预测方法的流程图。Please refer to FIG. 1 , which is a flowchart of a method for predicting a disk failure according to an embodiment of the present invention.

该磁盘故障预测方法包括：The disk failure prediction method includes:

步骤S1：获取不同型号的磁盘的信息，并基于磁盘的信息构建训练数据集。Step S1: Obtain information of different types of disks, and build a training data set based on the information of the disks.

具体地，本申请获取不同型号的磁盘的信息，以基于不同型号的磁盘的信息构建训练数据集DataSet₁(训练用于预测磁盘故障的深度神经网络模型使用)Specifically, the present application obtains information of different types of disks to construct a training data set DataSet ₁ based on the information of different types of disks (training is used for a deep neural network model for predicting disk failures)

步骤S2：构建包含特征网络结构和分类网络结构的深度神经网络模型，并基于训练数据集对深度神经网络模型进行训练，得到训练完成的磁盘故障预测模型。Step S2: constructing a deep neural network model including a characteristic network structure and a classification network structure, and training the deep neural network model based on the training data set to obtain a trained disk failure prediction model.

具体地，本申请构建包含特征网络结构和分类网络结构的深度神经网络模型，其中，特征网络结构用于基于磁盘的信息提取出磁盘的特征信息；分类网络结构用于基于磁盘的特征信息判定磁盘是否故障，即将一磁盘的信息输入至深度神经网络模型，深度神经网络模型可输出此磁盘的故障预测结果。Specifically, the present application constructs a deep neural network model including a feature network structure and a classification network structure, wherein the feature network structure is used to extract the feature information of the disk based on the information of the disk; the classification network structure is used to determine the disk based on the feature information of the disk. Whether it is faulty, that is, the information of a disk is input into the deep neural network model, and the deep neural network model can output the fault prediction result of the disk.

本申请基于训练数据集DataSet₁对深度神经网络模型进行训练，目的是通过训练深度神经网络模型来提高模型预测的精准度，最终得到精准度较高的磁盘故障预测模型。The present application trains the deep neural network model based on the training data set DataSet ₁ , in order to improve the accuracy of model prediction by training the deep neural network model, and finally obtain a disk failure prediction model with high accuracy.

步骤S3：获取待故障预测的目标磁盘的信息，并将目标磁盘的信息输入至磁盘故障预测模型，得到目标磁盘的故障预测结果。Step S3: Obtain the information of the target disk to be predicted in failure, and input the information of the target disk into the disk failure prediction model to obtain the failure prediction result of the target disk.

具体地，在得到精准度较高的磁盘故障预测模型之后，本申请可获取待故障预测的目标磁盘的信息，然后将待故障预测的目标磁盘的信息输入至磁盘故障预测模型，可得到目标磁盘的故障预测结果。Specifically, after obtaining a disk failure prediction model with high accuracy, the present application can obtain the information of the target disk to be predicted in failure, and then input the information of the target disk to be predicted in failure into the disk failure prediction model, and the target disk can be obtained failure prediction results.

可见，本申请的磁盘故障预测模型可同时对不同型号的磁盘进行故障预测，通用性较好。It can be seen that the disk failure prediction model of the present application can simultaneously perform failure prediction for different types of disks, and has good generality.

在上述实施例的基础上：On the basis of the above-mentioned embodiment:

作为一种可选的实施例，获取不同型号的磁盘的信息，并基于磁盘的信息构建训练数据集，包括：As an optional embodiment, information of different types of disks is obtained, and a training data set is constructed based on the information of the disks, including:

获取不同型号的磁盘在不同时间的各类指标数据，并获取磁盘的故障信息；Obtain various index data of different types of disks at different times, and obtain disk failure information;

根据磁盘的故障信息及预设故障预测提前时间，对磁盘的各类指标数据一一打上表示指标数据是否为故障数据的标签；According to the failure information of the disk and the preset failure prediction lead time, label the various index data of the disk one by one to indicate whether the index data is fault data;

将磁盘的各类指标数据及其对应的标签组合起来，构成训练数据集。The various index data of the disk and their corresponding labels are combined to form a training data set.

具体地，本申请的训练数据集的构建过程包括：1)获取不同型号的磁盘在不同时间的各类指标数据(即不同型号的磁盘在不同时间的S.M.A.R.T.数据信息，也可从Backblaze(云存储)开源磁盘数据集中获取指标数据)，并获取不同型号的磁盘的故障信息(可以此得知磁盘的故障发生日)；2)根据任一型号的磁盘的故障信息及预设故障预测提前时间，对此型号的磁盘的各类指标数据一一打上表示指标数据是否为故障数据的标签，比如，故障预测提前时间设置为14天，则将故障发生日及之前的总共14天的磁盘的各类指标数据一一打上表示指标数据为故障数据的标签(如1)，其余时间的磁盘的指标数据一一打上表示指标数据为正常数据的标签(如0)；3)将不同型号的磁盘在不同时间的各类指标数据及其对应的标签组合起来，构成训练数据集DataSet₁。Specifically, the construction process of the training data set of the present application includes: 1) Obtaining various types of indicator data of different types of disks at different times (that is, SMART data information of different types of disks at different times, or from Backblaze (cloud storage) ) to obtain indicator data from the open source disk data set), and obtain the failure information of different types of disks (you can know the failure date of the disk); 2) According to the failure information of any type of disk and the preset failure prediction advance time, Labels indicating whether the indicator data is fault data are attached to the various indicator data of the disk of this model. For example, if the failure prediction advance time is set to 14 days, the various types of disks on the date of the failure and the total 14 days before the failure are set. The indicator data is marked with a label indicating that the indicator data is fault data (such as 1), and the indicator data of the disk at the rest of the time is marked with a label indicating that the indicator data is normal data (such as 0); 3) Different types of disks in different All kinds of indicator data of time and their corresponding labels are combined to form a training data set DataSet ₁ .

作为一种可选的实施例，获取不同型号的磁盘的信息，并基于磁盘的信息构建训练数据集，还包括：As an optional embodiment, acquiring information of different types of disks, and constructing a training data set based on the information of the disks, also includes:

在获取不同型号的磁盘在不同时间的各类指标数据之后，在将磁盘的各类指标数据及其对应的标签组合起来之前，对磁盘的各类指标数据中的无效数据进行擦除处理，并对擦除处理后的磁盘的各类指标数据进行归一化处理，以将归一化处理后的磁盘的各类指标数据及其对应的标签组合起来，构成训练数据集。After obtaining various types of indicator data of different types of disks at different times, before combining various types of indicator data of the disk and their corresponding labels, the invalid data in the various types of indicator data of the disk is erased, and The various index data of the disk after the erasure processing is normalized, so as to combine the various index data of the normalized disk and their corresponding labels to form a training data set.

进一步地，本申请在获取不同型号的磁盘在不同时间的各类指标数据之后，还可对不同型号的磁盘的各类指标数据中的无效数据进行擦除处理，以避免无效数据影响模型训练；然后对擦除处理后的不同型号的磁盘的各类指标数据按照x＝(x_i-x_min)/(x_max-x_min)进行归一化处理，以方便后续的数据处理，其中，x_i为待归一化处理的指标数据；x_max为待归一化处理的指标数据的最大指标值；x_min为待归一化处理的指标数据的最小指标值；x为待归一化处理的指标数据的归一化处理结果；最后将归一化处理后的不同型号的磁盘在不同时间的各类指标数据及其对应的标签组合起来，构成训练数据集DataSet₁。Further, after obtaining various types of index data of different types of disks at different times, the present application can also perform erasing processing on invalid data in various types of index data of different types of disks, so as to avoid invalid data from affecting model training; Then, the various index data of different types of disks after erasing processing are normalized according to x=(x _i -x _min )/(x _max -x _min ) to facilitate subsequent data processing, where x _i is the index data to be normalized; x _max is the maximum index value of the index data to be normalized; x _min is the minimum index value of the index data to be normalized; x is the index data to be normalized The normalized processing result of the indicator data; finally, the normalized various types of indicator data and their corresponding labels of different types of disks at different times are combined to form a training data set DataSet ₁ .

作为一种可选的实施例，构建包含特征网络结构和分类网络结构的深度神经网络模型；基于训练数据集对深度神经网络模型进行训练，得到训练完成的磁盘故障预测模型，包括：As an optional embodiment, construct a deep neural network model including a feature network structure and a classification network structure; train the deep neural network model based on the training data set, and obtain a trained disk failure prediction model, including:

构建依次由第一全连接层和多个残差层组成的特征网络结构；其中，特征网络结构用于基于磁盘的各类指标数据提取出与指标数据一一对应的高维隐含特征；Constructing a feature network structure consisting of a first fully connected layer and multiple residual layers in turn; wherein, the feature network structure is used to extract high-dimensional hidden features corresponding to the index data one-to-one based on various index data of the disk;

构建由第二全连接层组成的分类网络结构；其中，分类网络结构的输出维度为2，分别对应故障和正常；Build a classification network structure composed of a second fully connected layer; wherein, the output dimension of the classification network structure is 2, corresponding to fault and normal respectively;

将构建好的特征网络结构和分类网络结构组合起来，得到深度神经网络模型；Combining the constructed feature network structure and classification network structure to obtain a deep neural network model;

基于训练数据集对深度神经网络模型进行训练，得到训练完成的深度神经网络模型；The deep neural network model is trained based on the training data set, and the trained deep neural network model is obtained;

将磁盘的高维隐含特征及其对应的标签组合起来，构成新训练数据集，并基于新训练数据集对用于替换第二全连接层的XGBoost分类器进行训练，得到训练完成的XGBoost分类器；Combine the high-dimensional latent features of the disk and their corresponding labels to form a new training data set, and train the XGBoost classifier used to replace the second fully connected layer based on the new training data set to obtain the trained XGBoost classifier;

将训练完成的XGBoost分类器替换训练完成的深度神经网络模型中的第二全连接层，并将替换后的深度神经网络模型作为磁盘故障预测模型。The trained XGBoost classifier is replaced with the second fully connected layer in the trained deep neural network model, and the replaced deep neural network model is used as the disk failure prediction model.

具体地，本申请的磁盘故障预测模型的构建过程包括：1)如图2所示，构建依次由第一FC(Fully Connected，全连接)层和多个RESNET(Residual Network，残差网络)层组成的特征网络结构FeatureNet；其中，特征网络结构用于基于磁盘的各类指标数据x提取出与磁盘的各类指标数据x一一对应的高维隐含特征feature_h，即feature_h＝x·FeatureNet；2)构建由第二FC层组成的分类网络结构ClassifierNet，即分类网络结构的输出out＝feature_h·ClassifierNet；需要说明的是，分类网络结构的输出out部分需要对ClassifierNet的输出做softmax(用于多分类的函数)处理，输出out的维度为2，分别对应故障(1)和正常(0)；3)将构建好的特征网络结构(第一FC层+多个RESNET层)和分类网络结构(第二FC层)组合起来，得到深度神经网络模型；4)基于训练数据集DataSet₁对深度神经网络模型进行训练，得到训练完成的深度神经网络模型；5)不同类型的磁盘在不同时间的各类指标数据有一一对应的标签，不同类型的磁盘在不同时间的各类指标数据有一一对应的高维隐含特征，则不同类型的磁盘在不同时间的各高维隐含特征也一一对应着标签，将不同类型的磁盘在不同时间的各高维隐含特征及其对应的标签组合起来，构成新训练数据集DataSet₂；6)基于新训练数据集DataSet₂对用于替换第二FC层的XGBoost(ExtremeGradient Boosting，极致梯度提升)分类器进行训练，得到训练完成的XGBoost分类器；7)将训练完成的XGBoost分类器替换训练完成的深度神经网络模型中的第二FC层，并将替换后的深度神经网络模型作为磁盘故障预测模型，即最终的磁盘故障预测模型为特征网络结构FeatureNet+XGBoost分类器(如图3所示)，用于磁盘故障预测使用。Specifically, the construction process of the disk failure prediction model of the present application includes: 1) As shown in FIG. 2 , the construction consists of a first FC (Fully Connected, fully connected) layer and a plurality of RESNET (Residual Network, residual network) layers in turn. The feature network structure FeatureNet composed of; wherein, the feature network structure is used to extract the high-dimensional hidden feature feature _h corresponding to the various index data x of the disk based on the various index data x of the disk, that is, feature _h = x·FeatureNet; 2) Construct the classification network structure ClassifierNet composed of the second FC layer, that is, the output of the classification network structure out=feature _h ClassifierNet; it should be noted that the output out part of the classification network structure needs to do softmax on the output of ClassifierNet (for Multi-classification function) processing, the dimension of output out is 2, corresponding to fault (1) and normal (0) respectively; 3) The constructed feature network structure (the first FC layer + multiple RESNET layers) and the classification network structure (The second FC layer) is combined to obtain a deep neural network model; 4) The deep neural network model is trained based on the training data set DataSet ₁ , and the trained deep neural network model is obtained; 5) Different types of disks at different times Various types of indicator data have a one-to-one corresponding label, and various types of indicator data of different types of disks at different times have a one-to-one corresponding high-dimensional implicit feature, so the high-dimensional implicit features of different types of disks at different times also correspond one-to-one. Label, each high-dimensional latent feature of different types of disks at different times and their corresponding labels are combined to form a new training data set DataSet ₂ ; 6) based on the new training data set DataSet ₂ to replace the XGBoost of the second FC layer (ExtremeGradient Boosting, Extreme Gradient Boosting) classifier is trained, and the trained XGBoost classifier is obtained; 7) The trained XGBoost classifier is replaced with the second FC layer in the trained deep neural network model, and the replaced The deep neural network model is used as the disk failure prediction model, that is, the final disk failure prediction model is the feature network structure FeatureNet+XGBoost classifier (as shown in Figure 3), which is used for disk failure prediction.

作为一种可选的实施例，构建依次由第一全连接层和多个残差层组成的特征网络结构，包括：As an optional embodiment, construct a feature network structure sequentially composed of a first fully connected layer and multiple residual layers, including:

基于F_out＝X*W+b构建第一全连接层；其中，F_out为第一全连接层的输出向量；X为第一全连接层的输入向量；W为第一全连接层的网络权重；b为第一全连接层的偏置；The first fully connected layer is constructed based on F _out =X*W+b; wherein, F _out is the output vector of the first fully connected layer; X is the input vector of the first fully connected layer; W is the network of the first fully connected layer weight; b is the bias of the first fully connected layer;

将构建好的第一全连接层和L个残差层组合起来，得到特征网络结构。The constructed first fully connected layer and L residual layers are combined to obtain the feature network structure.

具体地，本申请的特征网络结构的构建过程包括：1)基于F_out＝X*W+b构建第一全连接层，全连接层的作用是将向量映射到另一空间，增加模型复杂性，提高拟合度；其中，F_out为第一全连接层的输出向量，即第一层残差层的输入向量；X为第一全连接层的输入向量，模型训练时第一全连接层输入的是训练数据集DataSet₁，模型正式使用时第一全连接层输入的是待故障预测的目标磁盘的信息；W为第一全连接层的网络权重；b为第一全连接层的偏置；2)如图4所示，基于x_a+1＝x_a+F(x_a,W_a)、F(x_a,W_a)＝Relu(x_a*W_a)构建L个残差层；其中，x_a+1为第a层残差层的输出，即第a+1层残差层的输入；x_a为第a层残差层的输入；F(x_a,W_a)为第a层的残差学习函数；Relu为激活函数，Relu(x)＝max(0,x)；3)将构建好的第一全连接层和L个残差层组合起来，得到特征网络结构(最后一层残差层Resnet_L的输出为输入的磁盘的各类指标数据的高维隐含特征表示，记为feature_h)。Specifically, the construction process of the feature network structure of the present application includes: 1) constructing the first fully connected layer based on F _out =X*W+b, the function of the fully connected layer is to map the vector to another space, increasing the complexity of the model , to improve the degree of fit; where F _out is the output vector of the first fully connected layer, that is, the input vector of the first residual layer; X is the input vector of the first fully connected layer, the first fully connected layer during model training The input is the training data set DataSet _1. When the model is officially used, the first fully connected layer inputs the information of the target disk to be fault prediction; W is the network weight of the first fully connected layer; b is the bias of the first fully connected layer. 2) As shown in Figure 4, construct L residuals based on x _a+1 =x _a +F(x _a ,W _a ), F(x _a ,W _a )=Relu(x _a *W _a ) layer; where x _a+1 is the output of the residual layer of layer a, that is, the input of the residual layer of layer a+1; x _a is the input of the residual layer of layer a; F(x _a ,W _a ) is the residual learning function of the a-th layer; Relu is the activation function, Relu(x)=max(0,x); 3) Combine the constructed first fully connected layer and L residual layers to obtain a feature network Structure (the output of the last layer of residual layer Resnet _L is the high-dimensional implicit feature representation of various index data of the input disk, denoted as feature _h ).

作为一种可选的实施例，基于训练数据集对深度神经网络模型进行训练，得到训练完成的深度神经网络模型，包括：As an optional embodiment, the deep neural network model is trained based on the training data set, and the trained deep neural network model is obtained, including:

将训练数据集内磁盘的各类指标数据输入至深度神经网络模型，得到磁盘的故障预测结果；Input the various index data of the disk in the training data set into the deep neural network model to obtain the failure prediction result of the disk;

将磁盘的故障预测结果和磁盘的各类指标数据对应的标签代入预设损失计算函数进行损失计算，得到第一损失；Substitute the failure prediction result of the disk and the labels corresponding to various index data of the disk into the preset loss calculation function to calculate the loss to obtain the first loss;

以将第一损失降低至0为优化目标，利用预设反向传播算法对神经网络模型的可调参数进行优化调整，直至训练数据集全部在神经网络模型上训练完成，得到训练完成的深度神经网络模型。With the optimization goal of reducing the first loss to 0, the preset back-propagation algorithm is used to optimize and adjust the adjustable parameters of the neural network model until all the training data sets are trained on the neural network model, and the trained deep neural network is obtained. network model.

具体地，本申请的深度神经网络模型的训练过程包括：1)将训练数据集DataSet₁内磁盘的各类指标数据输入至深度神经网络模型，得到磁盘的故障预测结果；2)将磁盘的故障预测结果和磁盘的各类指标数据对应的标签代入预设损失计算函数(如CrossEntropy(交叉熵)函数)进行损失计算，得到第一损失；3)以将第一损失降低至0为优化目标(即以深度神经网络模型的磁盘故障预测结果与磁盘的实际故障情况尽可能一致为优化目标)，利用预设反向传播算法对神经网络模型的可调参数进行优化调整，直至训练数据集DataSet₁全部在神经网络模型上训练完成，得到训练完成的深度神经网络模型。Specifically, the training process of the deep neural network model of the present application includes: 1) inputting various index data of the disk in the training data set DataSet ₁ into the deep neural network model to obtain the failure prediction result of the disk; The labels corresponding to the prediction results and various index data of the disk are substituted into the preset loss calculation function (such as the CrossEntropy function) for loss calculation, and the first loss is obtained; 3) The optimization goal is to reduce the first loss to 0 ( That is to say, the prediction result of the disk failure of the deep neural network model is as consistent as possible with the actual failure situation of the disk as the optimization goal), and the preset back-propagation algorithm is used to optimize and adjust the adjustable parameters of the neural network model until the training data set DataSet ₁ All are trained on the neural network model, and the trained deep neural network model is obtained.

作为一种可选的实施例，基于新训练数据集对用于替换第二全连接层的XGBoost分类器进行训练，得到训练完成的XGBoost分类器，包括：As an optional embodiment, the XGBoost classifier used to replace the second fully connected layer is trained based on the new training data set, and the trained XGBoost classifier is obtained, including:

将新训练数据集内磁盘的各高维隐含特征输入至XGBoost分类器，得到磁盘的故障分类结果；Input the high-dimensional latent features of the disks in the new training data set into the XGBoost classifier to obtain the failure classification results of the disks;

将磁盘的故障分类结果和磁盘的各高维隐含特征对应的标签代入预设损失计算函数进行损失计算，得到第二损失；Substitute the fault classification result of the disk and the labels corresponding to each high-dimensional implicit feature of the disk into the preset loss calculation function to calculate the loss to obtain the second loss;

以将第二损失降低至0为优化目标，利用预设反向传播算法对XGBoost分类器的可调参数进行优化调整，直至新训练数据集全部在XGBoost分类器上训练完成，得到训练完成的XGBoost分类器。With the optimization goal of reducing the second loss to 0, the preset back-propagation algorithm is used to optimize and adjust the adjustable parameters of the XGBoost classifier until all the new training data sets are trained on the XGBoost classifier, and the trained XGBoost classifier is obtained. Classifier.

具体地，本申请的XGBoost分类器的训练过程包括：1)将新训练数据集DataSet₂内磁盘的各高维隐含特征输入至XGBoost分类器，得到磁盘的故障分类结果；2)将磁盘的故障分类结果和磁盘的各高维隐含特征对应的标签代入预设损失计算函数(如CrossEntropy函数)进行损失计算，得到第二损失；3)以将第二损失降低至0为优化目标(即以XGBoost分类器的磁盘故障分类结果与磁盘的实际故障情况尽可能一致为优化目标)，利用预设反向传播算法对XGBoost分类器的可调参数进行优化调整，直至新训练数据集DataSet₂全部在XGBoost分类器上训练完成，得到训练完成的XGBoost分类器。Specifically, the training process of the XGBoost classifier of the present application includes: 1) inputting each high-dimensional implicit feature of the disk in the new training data set DataSet ₂ into the XGBoost classifier to obtain the failure classification result of the disk; 2) classifying the failure of the disk The result and the labels corresponding to the high-dimensional hidden features of the disk are substituted into the preset loss calculation function (such as the CrossEntropy function) for loss calculation, and the second loss is obtained; 3) The optimization goal is to reduce the second loss to 0 (that is, use the XGBoost classifier The optimal disk failure classification result is as consistent as possible with the actual disk failure situation as the optimization goal), and the tunable parameters of the XGBoost classifier are optimized and adjusted by using the preset backpropagation algorithm until the new training dataset DataSet ₂ is all in the XGBoost classifier. After the above training is completed, the trained XGBoost classifier is obtained.

可见，本申请使用深度神经网络提取磁盘的各类指标数据的高维隐含特征，并使用XGBoost分类器替换掉深度神经网络的全连接分类层，在保留深度神经网络优秀的特征提取能力的同时，使用更优秀的函数逼近工具构造更好的分类器，在保证比较高的预测准确率的同时，大幅度降低了故障误报率。It can be seen that this application uses a deep neural network to extract the high-dimensional implicit features of various index data of the disk, and uses the XGBoost classifier to replace the fully connected classification layer of the deep neural network. A better function approximation tool can construct a better classifier, which can greatly reduce the false alarm rate while ensuring a relatively high prediction accuracy.

作为一种可选的实施例，在将目标磁盘的信息输入至磁盘故障预测模型之前，磁盘故障预测方法还包括：As an optional embodiment, before the information of the target disk is input into the disk failure prediction model, the disk failure prediction method further includes:

根据预设启发式遗传算法学习出磁盘故障预测模型的最优结构参数；The optimal structural parameters of the disk failure prediction model are learned according to the preset heuristic genetic algorithm;

按照最优结构参数调整磁盘故障预测模型的结构参数，以将目标磁盘的信息输入至最优结构的磁盘故障预测模型。Adjust the structure parameters of the disk failure prediction model according to the optimal structure parameters, so as to input the information of the target disk into the disk failure prediction model with the optimal structure.

进一步地，本申请在将目标磁盘的信息输入至磁盘故障预测模型之前，还可根据预设启发式遗传算法学习出磁盘故障预测模型的最优结构参数(如学习第一全连接层的神经元数量m、残差网络的层数L、模型训练的批量大小batch_size、XGboost决策树的数量n及每棵决策树的最大深度k的最优参数值)，并按照最优结构参数调整磁盘故障预测模型的结构参数，以将目标磁盘的信息输入至最优结构的磁盘故障预测模型进行故障预测。Further, before inputting the information of the target disk into the disk failure prediction model, the present application can also learn the optimal structural parameters of the disk failure prediction model according to the preset heuristic genetic algorithm (such as learning the neurons of the first fully connected layer. The number m, the number of layers of the residual network L, the batch size of the model training batch_size, the number of XGboost decision trees n and the optimal parameter value of the maximum depth k of each decision tree), and adjust the disk failure prediction according to the optimal structural parameters The structural parameters of the model are used to input the information of the target disk into the disk failure prediction model of the optimal structure for failure prediction.

作为一种可选的实施例，根据预设启发式遗传算法学习出磁盘故障预测模型的最优结构参数，包括：As an optional embodiment, the optimal structural parameters of the disk failure prediction model are learned according to a preset heuristic genetic algorithm, including:

将磁盘故障预测模型的结构参数组合成一个参数向量，并随机初始化参数向量，且将初始化后的参数向量添加至预设向量优先队列；其中，向量优先队列为大根堆结构，大根堆结构的排序key为每个参数向量对应的误报率得分值；Combine the structural parameters of the disk failure prediction model into a parameter vector, initialize the parameter vector randomly, and add the initialized parameter vector to the preset vector priority queue; wherein, the vector priority queue is a large root heap structure, and the order of the large root heap structure key is the false alarm rate score value corresponding to each parameter vector;

按照初始化后的参数向量调整磁盘故障预测模型的结构参数，以得到磁盘故障预测模型在当前结构参数下的误报率得分值，并返回执行随机初始化参数向量的步骤，直至初始化次数到达预设次数阈值；Adjust the structural parameters of the disk failure prediction model according to the initialized parameter vector to obtain the false alarm rate score value of the disk failure prediction model under the current structural parameters, and return to the step of randomly initializing the parameter vector until the initialization times reach the preset value number of thresholds;

从向量优先队列中随机选择预设第一数量的参数向量对，并将每个参数向量对按照param_new＝(param₁+param₂)/2进行交叉运算，以得到磁盘故障预测模型在参数向量param_new下的新误报率得分值；其中，param₁和param₂为每个参数向量对中的两个参数向量；A preset first number of parameter vector pairs are randomly selected from the vector priority queue, and each parameter vector pair is crossed according to param _new =(param ₁ +param ₂ )/2, so as to obtain the disk failure prediction model in the parameter vector The new false alarm rate score value under param _new ; where param ₁ and param ₂ are the two parameter vectors in each parameter vector pair;

将新误报率得分值和向量优先队列当前对应的误报率得分值一起从小到大排序，只将排序在前的预设第二数量的参数向量保留在向量优先队列中；Sort the new false alarm rate score value and the current corresponding false alarm rate score value of the vector priority queue from small to large, and only keep the parameter vector of the second preset number in the vector priority queue;

遍历向量优先队列中的所有参数向量，并基于变异概率p＝e^iteration/3将每个参数向量param按照param_var＝param+rand*step进行变异，以得到磁盘故障预测模型在参数向量param_var下的变异误报率得分值；其中，iteration为迭代次数；step为各参数向量的变异基本单位；rand为每次变异过程中产生的随机数；Traverse all parameter vectors in the vector priority queue, and mutate each parameter vector param according to param _var =param+rand*step based on the mutation probability p=e ^iteration/3 to obtain the disk failure prediction model under the parameter vector param _var where, iteration is the number of iterations; step is the basic unit of mutation of each parameter vector; rand is the random number generated in each mutation process;

将变异误报率得分值和向量优先队列当前对应的误报率得分值一起从小到大排序，只将排序在前的预设第二数量的参数向量保留在向量优先队列中，并返回执行从向量优先队列中随机选择预设第一数量的参数向量对的步骤进入下一次迭代，直至迭代次数到达预设迭代次数阈值；Sort the variable false alarm rate score value and the current corresponding false alarm rate score value of the vector priority queue from small to large, and only keep the preset second number of parameter vectors in the first order in the vector priority queue, and return Perform the step of randomly selecting a preset first number of parameter vector pairs from the vector priority queue to enter the next iteration, until the number of iterations reaches the preset number of iterations threshold;

在所有迭代结束后，从最终得到的向量优先队列中选择误报率得分值最小的参数向量作为最优结构参数。After all iterations are over, the parameter vector with the smallest false alarm rate score value is selected from the final vector priority queue as the optimal structure parameter.

具体地，本申请的磁盘故障预测模型的最优结构参数的学习过程包括：1)将磁盘故障预测模型的结构参数组合成一个参数向量，如第一全连接层的神经元数量m、残差网络的层数L、模型训练的批量大小batch_size、XGboost决策树的数量n及每棵决策树的最大深度k组合成一个5维参数向量[m,L,batch_size,n,k]，记为param，并随机初始化参数向量param，且将初始化后的参数向量param添加至预设向量优先队列queue；其中，向量优先队列queue为大根堆结构(大根堆是一个完全二叉树)，大根堆结构的排序key(节点键值)为每个参数向量对应的误报率得分值；2)按照初始化后的参数向量param调整磁盘故障预测模型的结构参数(m、L、batch_size、n、k)，以得到磁盘故障预测模型在当前结构参数下的误报率得分值，并返回执行随机初始化参数向量param的步骤，直至初始化次数到达预设次数阈值(如10次)；3)从向量优先队列queue中随机选择(2*预设第一数量)个参数向量，并将(2*预设第一数量)个参数向量两两组成一队，得到预设第一数量(如20)的参数向量对，并将每个参数向量对按照param_new＝(param₁+param₂)/2进行交叉运算，以得到磁盘故障预测模型在每个参数向量param_new下的新误报率得分值；其中，param₁和param₂为每个参数向量对中的两个参数向量；4)将第一数量的新误报率得分值和向量优先队列当前对应的误报率得分值一起从小到大排序，只将排序在前的预设第二数量(如20)的参数向量param保留在向量优先队列queue中，即只保留前20个误报率得分值最小的参数向量param，其它参数向量param做自然选择丢弃；5)遍历向量优先队列queue中的所有参数向量param，并基于变异概率p＝e^iteration/3将每个参数向量param按照param_var＝param+rand*step进行变异，以得到磁盘故障预测模型在每个参数向量param_var下的变异误报率得分值；其中，iteration为迭代次数，p为变异概率，二者的关系图如图5所示，迭代次数越多，变异概率越小；step为各参数向量的变异基本单位，如step＝[10,1,16,5,1]，第一全连接层的神经元数量m的变异基本单位为10，其余变量同理；rand为每次变异过程中产生的随机数，可从[-10,10]之间产生；6)将各变异误报率得分值和向量优先队列当前对应的误报率得分值一起从小到大排序，只将排序在前的预设第二数量的参数向量保留在向量优先队列中，即只保留前20个误报率得分值最小的参数向量param，其它参数向量param做自然选择丢弃，并返回执行步骤3)进入下一次迭代，直至迭代次数到达预设迭代次数阈值；7)在所有迭代结束后，从最终得到的向量优先队列中选择误报率得分值最小的参数向量作为最优结构参数。Specifically, the learning process of the optimal structural parameters of the disk failure prediction model of the present application includes: 1) Combining the structural parameters of the disk failure prediction model into a parameter vector, such as the number of neurons in the first fully connected layer m, the residual error The number of layers L of the network, the batch size batch_size of model training, the number n of XGboost decision trees, and the maximum depth k of each decision tree are combined into a 5-dimensional parameter vector [m, L, batch_size, n, k], denoted as param , and randomly initialize the parameter vector param, and add the initialized parameter vector param to the preset vector priority queue queue; among them, the vector priority queue queue is a big root heap structure (big root heap is a complete binary tree), the sorting key of the big root heap structure (Node key value) is the false alarm rate score value corresponding to each parameter vector; 2) Adjust the structural parameters (m, L, batch_size, n, k) of the disk failure prediction model according to the initialized parameter vector param to obtain The score value of the false alarm rate of the disk failure prediction model under the current structural parameters, and returns to the step of executing the random initialization parameter vector param until the initialization times reach the preset number of times threshold (such as 10 times); 3) From the vector priority queue queue Randomly select (2*preset first number) parameter vectors, and form a team of (2*preset first number) parameter vectors in pairs to obtain a preset first number (such as 20) of parameter vector pairs, Perform cross operation on each parameter vector pair according to param _new =(param ₁ +param ₂ )/2 to obtain the new false alarm rate score value of the disk failure prediction model under each parameter vector param _new ; where, param ₁ and param ₂ are two parameter vectors in each parameter vector pair; 4) Sort the new false alarm rate score value of the first number and the current corresponding false alarm rate score value of the vector priority queue from small to large, Only the parameter vector param of the second preset number (such as 20) in the first order is kept in the vector priority queue queue, that is, only the parameter vector param with the smallest false alarm rate score value in the first 20 is kept, and the other parameter vector param is done. Natural selection is discarded; 5) Traverse all parameter vectors param in the vector priority queue queue, and mutate each parameter vector param according to param _var =param+rand*step based on mutation probability p=e ^iteration/3 to obtain disk failure The score value of the mutation false alarm rate of the prediction model under each parameter vector param _var ; among them, iteration is the number of iterations, and p is the mutation probability. The relationship between the two is shown in Figure 5. The more iterations, the higher the mutation probability. small; step is the basic unit of variation of each parameter vector, such as step=[10,1,16,5,1], the basic unit of variation of the number m of neurons in the first fully connected layer is 10, and the same is true for other variables; ran d is a random number generated in each mutation process, which can be generated between [-10, 10]; 6) The score value of the false alarm rate of each mutation and the current corresponding value of the false alarm rate of the vector priority queue are from small To the large sorting, only the preset second number of parameter vectors in the first order are retained in the vector priority queue, that is, only the first 20 parameter vectors param with the smallest false alarm rate score are retained, and other parameter vectors param are selected naturally. Discard, and return to step 3) to enter the next iteration until the number of iterations reaches the preset number of iterations threshold; 7) After all iterations are over, select the parameter vector with the smallest false alarm rate score from the final vector priority queue. as the optimal structural parameter.

可见，该策略解决了磁盘故障预测中最优模型参数搜索问题，相比于经典遗传算法，本申请中的启发式遗传算法优化了变异和选择策略，变异概率随迭代次数的增加而降低，且结合了优先队列的启发式搜索方法，可以迅速在各个维度搜索解，并以尽可能快的速度逼近最优解。It can be seen that this strategy solves the optimal model parameter search problem in disk failure prediction. Compared with the classical genetic algorithm, the heuristic genetic algorithm in this application optimizes the mutation and selection strategy, and the mutation probability decreases with the increase of the number of iterations, and Combined with the heuristic search method of the priority queue, the solution can be quickly searched in each dimension, and the optimal solution can be approached as fast as possible.

综上，本申请通过深度神经网络对磁盘的各类指标数据特征进行大量样本的学习，挖掘每个特征的特点及不同特征组合间隐含的相关性，提取出更加高维的特征，然后根据这些特征使用集成树模型进行故障预测，并使用基于遗传算法的自动学习策略学习最佳网络结构，在保持较高识别准确率的基础上，大幅降低了误报率。本申请的基于深度学习的磁盘故障预测方法适用于云平台、存储服务提供商、政府数据中心等磁盘使用量较大、数据较多的运维场景，可以对庞杂的磁盘数据信息进行故障分析和预测，提前发现有问题的磁盘并做好相应的准备和维护，由此可以提升运维人员的效率，降低运维成本，进而提升产品竞争力。To sum up, this application learns a large number of samples of various index data features of disks through deep neural networks, excavates the characteristics of each feature and the implicit correlation between different feature combinations, and extracts higher-dimensional features. These features use an ensemble tree model for fault prediction, and use an automatic learning strategy based on genetic algorithm to learn the optimal network structure, which greatly reduces the false positive rate while maintaining a high recognition accuracy. The deep learning-based disk failure prediction method of the present application is suitable for operation and maintenance scenarios with large disk usage and large data such as cloud platforms, storage service providers, government data centers, etc., and can perform fault analysis and analysis of complex disk data information. Prediction, discovering problematic disks in advance and making corresponding preparations and maintenance can improve the efficiency of operation and maintenance personnel, reduce operation and maintenance costs, and improve product competitiveness.

本申请还提供了一种磁盘故障预测装置，包括：The present application also provides a disk failure prediction device, including:

存储器，用于存储计算机程序；memory for storing computer programs;

处理器，用于在执行计算机程序时实现上述任一种磁盘故障预测方法的步骤。The processor is configured to implement the steps of any one of the above-mentioned disk failure prediction methods when executing the computer program.

本申请提供的磁盘故障预测装置的介绍请参考上述磁盘故障预测方法的实施例，本申请在此不再赘述。For the introduction of the disk failure prediction device provided by the present application, please refer to the above-mentioned embodiments of the disk failure prediction method, which will not be repeated in this application.

还需要说明的是，在本说明书中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that, in this specification, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is no such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其他实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A disk failure prediction method is characterized by comprising the following steps:

acquiring information of disks of different models, and constructing a training data set based on the information of the disks;

constructing a deep neural network model comprising a characteristic network structure and a classification network structure; the characteristic network structure is used for extracting characteristic information of the disk based on information of the disk; the classification network structure is used for judging whether the disk fails or not based on the characteristic information of the disk;

training the deep neural network model based on the training data set to obtain a trained disk failure prediction model;

and acquiring information of a target disk to be subjected to failure prediction, and inputting the information of the target disk into the disk failure prediction model to obtain a failure prediction result of the target disk.

2. The disk failure prediction method of claim 1, wherein obtaining information of disks of different models and constructing a training data set based on the information of the disks comprises:

acquiring various index data of different types of disks at different time, and acquiring fault information of the disks;

according to the fault information of the disk and preset fault prediction advance time, labels indicating whether the index data are fault data or not are marked on various index data of the disk one by one;

and combining the various index data of the disk and the corresponding labels thereof to form the training data set.

3. The disk failure prediction method of claim 2, wherein obtaining information of disks of different models and constructing a training data set based on the information of the disks further comprises:

after various types of index data of different types of disks at different times are obtained, before the various types of index data of the disks and corresponding labels are combined, the invalid data in the various types of index data of the disks are erased, and the various types of index data of the disks after the erasure processing are normalized, so that the various types of index data of the disks after the normalization processing and the corresponding labels are combined to form the training data set.

4. The disk failure prediction method of claim 2, wherein a deep neural network model is constructed that includes a feature network structure and a classification network structure; training the deep neural network model based on the training data set to obtain a trained disk failure prediction model, and the method comprises the following steps:

constructing a characteristic network structure sequentially consisting of a first full-connection layer and a plurality of residual error layers; the characteristic network structure is used for extracting high-dimensional implicit characteristics which correspond to the index data one by one on the basis of various index data of the disk;

constructing a classification network structure consisting of second full connection layers; the output dimensionality of the classification network structure is 2, and the classification network structure corresponds to a fault and a normal respectively;

combining the constructed characteristic network structure and the classification network structure to obtain a deep neural network model;

training the deep neural network model based on the training data set to obtain a trained deep neural network model;

combining the high-dimensional hidden features of the disk and the labels corresponding to the high-dimensional hidden features to form a new training data set, and training the XGboost classifier for replacing the second full-connection layer based on the new training data set to obtain a trained XGboost classifier;

and replacing a second full connection layer in the trained deep neural network model by the trained XGboost classifier, and taking the replaced deep neural network model as the disk fault prediction model.

5. The disk failure prediction method of claim 4, wherein constructing a feature network structure consisting of a first fully-connected layer and a plurality of residual layers in sequence comprises:

based on F_outConstructing a first fully connected layer; wherein, F_outIs the output vector of the first fully-connected layer; x is the input vector of the first fully-connected layer; w is the network weight of the first fully-connected layer; b is the bias of the first fully-connected layer;

based on x_a+1＝x_a+F(x_a,W_a)、F(x_a,W_a)＝Relu(x_a*W_a) Constructing L residual error layers; wherein x is_a+1Is the output of the a-th layer residual error layer; x is the number of_aIs the input of the a-th layer residual error layer; f (x)_a,W_a) A residual learning function for layer a; relu is an activation function, Relu (x) max (0, x); a is more than or equal to 1 and less than or equal to L, and a is an integer;

and combining the constructed first full-connection layer with the L residual error layers to obtain the characteristic network structure.

6. The disk failure prediction method of claim 4, wherein training the deep neural network model based on the training data set to obtain a trained deep neural network model comprises:

inputting various index data of the disk in the training data set to the deep neural network model to obtain a fault prediction result of the disk;

substituting the failure prediction result of the disk and labels corresponding to various index data of the disk into a preset loss calculation function to perform loss calculation to obtain a first loss;

and taking the first loss reduced to 0 as an optimization target, and optimizing and adjusting the adjustable parameters of the neural network model by using a preset back propagation algorithm until the training of the training data set on the neural network model is completed, so as to obtain a trained deep neural network model.

7. The disk failure prediction method of claim 4, wherein training the XGboost classifier used to replace the second fully-connected layer based on the new training data set to obtain a trained XGboost classifier comprises:

inputting each high-dimensional implicit characteristic of the disk in the new training data set into the XGboost classifier to obtain a fault classification result of the disk;

substituting the fault classification result of the disk and the label corresponding to each high-dimensional hidden feature of the disk into a preset loss calculation function to perform loss calculation to obtain a second loss;

and taking the second loss reduced to 0 as an optimization target, and optimizing and adjusting the adjustable parameters of the XGboost classifier by using a preset back propagation algorithm until the new training data set is completely trained on the XGboost classifier, so as to obtain the trained XGboost classifier.

8. The disk failure prediction method according to any one of claims 1 to 7, wherein before inputting the information of the target disk to the disk failure prediction model, the disk failure prediction method further comprises:

learning the optimal structure parameters of the disk fault prediction model according to a preset heuristic genetic algorithm;

and adjusting the structural parameters of the disk failure prediction model according to the optimal structural parameters so as to input the information of the target disk into the disk failure prediction model with an optimal structure.

9. The disk failure prediction method of claim 8, wherein learning the optimal structural parameters of the disk failure prediction model according to a predetermined heuristic genetic algorithm comprises:

synthesizing the structural parameters of the disk failure prediction model into a parameter vector, randomly initializing the parameter vector, and adding the initialized parameter vector to a preset vector priority queue; the vector priority queue is a big root heap structure, and the sorting key of the big root heap structure is the false alarm rate score value corresponding to each parameter vector;

adjusting the structural parameters of the disk failure prediction model according to the initialized parameter vector to obtain a false alarm rate score value of the disk failure prediction model under the current structural parameters, and returning to the step of executing the random initialization of the parameter vector until the initialization times reach a preset time threshold;

randomly selecting a preset first number of parameter vector pairs from the vector priority queue, and enabling each parameter vector pair to be in accordance with param_new＝(param₁+param₂) Performing a crossover operation to obtain the disk failurePrediction model in parameter vector param_newObtaining a new false alarm rate score value; wherein param₁And param₂For both of said pairs of parameter vectors;

sorting the new false alarm rate score values and the current corresponding false alarm rate score values of the vector priority queue from small to large, and only keeping the parameter vectors sorted in the first preset second quantity in the vector priority queue;

traversing all parameter vectors in the vector priority queue and based on the variation probability p ═ e^iteration/3Following each of said parameter vectors param to param_varPerforming mutation on param + rand step to obtain the parameter vector param of the disk failure prediction model_varA lower variation false alarm rate score value; wherein iteration is iteration times; step is the basic unit of variation of each parameter vector; rand is a random number generated in each variation process;

sorting the variation false alarm rate score values and the false alarm rate score values currently corresponding to the vector priority queue from small to large, only keeping the parameter vectors of the preset second number sorted in the vector priority queue, and returning to the step of randomly selecting the parameter vector pairs of the preset first number from the vector priority queue for entering the next iteration until the iteration number reaches a preset iteration number threshold;

and after all iterations are finished, selecting the parameter vector with the minimum false alarm rate score value from the finally obtained vector priority queue as the optimal structure parameter.

10. A disk failure prediction apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the disk failure prediction method according to any of claims 1-9 when executing said computer program.