Disclosure of Invention
Aiming at the problems of lack of a feedback mechanism or poor feedback mechanism in the prior art, the invention provides a power grid network security event analysis method based on machine learning, and a model with relatively optimal accuracy is obtained by means of transverse comparison and adjustment in the training process, so that the accuracy of overall analysis is improved.
The following is a technical scheme of the invention.
A power grid network security event analysis method based on machine learning comprises the following steps:
s01: inputting a dataset comprising historical network security events and corresponding treatment methods;
s02: dividing a data set into a training set and a verification set, converting the names of the historical network security events into word vector forms, and establishing a feature set;
s03: the feature set and the treatment method in the training set are led into the neural networks with different parameters in batches for training to obtain a plurality of trained neural networks;
s04: inputting the feature set of the verification set into each trained neural network to output a disposal method, comparing the feature set with the original disposal method in the verification set, and counting to obtain the accuracy;
s05: according to the accuracy, the trained neural network is adjusted and screened, and a final model is obtained;
s06: and converting the real-time network security event name into a word vector form, establishing a feature set, and inputting the feature set into a final model to obtain a disposal method.
The invention uses the principle that different parameter settings in neural network learning can directly influence training results to transversely compare a plurality of neural networks with different parameters, thereby adjusting the neural networks and obtaining an optimal final model.
Preferably, the statistical process of the accuracy rate includes: if the comparison results are consistent, the correct count is incremented by one, and if the comparison results are inconsistent, the error count is incremented by one, and accuracy = correct count/sum of correct count and error count.
Preferably, the neural network is a BP neural network, and the training process includes: setting the number of nodes of an input layer, the number of nodes of an hidden layer, the number of nodes of an output layer, the training iteration times and the learning rate of the BP neural network, taking a feature set in a training set as input, taking a treatment method as output, and importing the feature set into the BP neural network for training, wherein different numbers of nodes of the hidden layer are set during each batch of training. Because the setting of the node number of the hidden layer has a larger influence on the accuracy, and the optimal solution cannot be obtained at one time, the node number is set in batches and then is adjusted later.
Preferably, the value range of the hidden layer node number m is: 30n > m >2n, and m < s, s are the number of input samples, and the hidden layer node number in each batch of training is randomly valued in the value range or is arranged in an arithmetic progression. The above ranges are given by empirical formulas and are only used to define the general range.
Preferably, step S05 includes: judging that the accuracy rate reaches a preset condition, if so, taking the neural network with the highest accuracy rate as a final model; if not, establishing an accuracy rate-hidden layer node number distribution diagram, performing curve fitting, taking the hidden layer node number corresponding to the highest accuracy rate point in the curve as the hidden layer node number of the final model, and retraining to obtain the final model. The ordinate in the distribution diagram is the accuracy rate, the abscissa is the hidden layer node number, the curve fluctuation high point can be intuitively reached, and the hidden layer node number corresponding to the vicinity of the high point is the optimal solution under the current condition.
The essential effects of the invention include: the safety event can be analyzed, the disposal method can be intelligently judged, and the learning and training process can be adjusted by setting up a feedback mechanism, so that the analysis accuracy is improved.
Detailed Description
The technical scheme of the present application will be described below with reference to examples. In addition, numerous specific details are set forth in the following description in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present invention.
Examples:
a power grid network security event analysis method based on machine learning, as shown in figure 1, comprises the following steps:
s01: a dataset is input, the dataset comprising historical network security events and corresponding treatment methods.
S02: dividing the data set into a training set and a verification set, converting the names of the historical network security events into word vector forms, and establishing a feature set.
S03: and (3) introducing the feature set and the treatment method in the training set into the neural networks with different parameters in batches for training to obtain a plurality of trained neural networks. The neural network adopted in the embodiment is a BP neural network, and the training process comprises: setting the number of nodes of an input layer, the number of nodes of an hidden layer, the number of nodes of an output layer, the training iteration times and the learning rate of the BP neural network, taking a feature set in a training set as input, taking a treatment method as output, and importing the feature set into the BP neural network for training, wherein different numbers of nodes of the hidden layer are set during each batch of training. Because the setting of the node number of the hidden layer has a larger influence on the accuracy, and the optimal solution cannot be obtained at one time, the node number is set in batches and then is adjusted later. Wherein the value range of the node number m of the hidden layer is as follows: 30n > m >2n, and m < s, s are the number of input samples, and the hidden layer node number in each batch of training is randomly valued in the value range or is arranged in an arithmetic progression. The above ranges are given by empirical formulas and are only used to define the general range.
S04: inputting the feature set of the verification set into each trained neural network to output a disposal method, comparing the feature set with the original disposal method in the verification set, and counting to obtain the accuracy; the statistical process of the accuracy rate comprises the following steps: if the comparison results are consistent, the correct count is incremented by one, and if the comparison results are inconsistent, the error count is incremented by one, and accuracy = correct count/sum of correct count and error count.
S05: and adjusting and screening the trained neural network according to the accuracy rate, and obtaining a final model from the neural network. Step S05 includes: judging that the accuracy rate reaches a preset condition, if so, taking the neural network with the highest accuracy rate as a final model; if not, establishing an accuracy rate-hidden layer node number distribution diagram, performing curve fitting, taking the hidden layer node number corresponding to the highest accuracy rate point in the curve as the hidden layer node number of the final model, and retraining to obtain the final model. The ordinate in the distribution diagram is the accuracy rate, the abscissa is the hidden layer node number, the curve fluctuation high point can be intuitively reached, and the hidden layer node number corresponding to the vicinity of the high point is the optimal solution under the current condition.
S06: and converting the real-time network security event name into a word vector form, establishing a feature set, and inputting the feature set into a final model to obtain a disposal method.
According to the embodiment, the principle that different parameter settings in neural network learning can directly influence training results is utilized, and the neural networks with different parameters are transversely compared, so that adjustment is carried out, and an optimal final model is obtained.
The essential effects of the present embodiment include: the safety event can be analyzed, the disposal method can be intelligently judged, and the learning and training process can be adjusted by setting up a feedback mechanism, so that the analysis accuracy is improved.
From the foregoing description of the embodiments, it will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of a specific apparatus is divided into different functional modules to implement all or part of the functions described above.
In the embodiments provided in this application, it should be understood that the disclosed structures and methods may be implemented in other ways. For example, the embodiments described above with respect to structures are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another structure, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via interfaces, structures or units, which may be in electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.