Disclosure of Invention
The invention aims to provide a transformer high-active-value prediction method based on voiceprints and a neural network, and aims to solve the problem that the measurement of power transformer high-active-value monitoring data in the prior art is inaccurate.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the transformer high active value prediction method based on the voiceprint and the neural network is characterized by comprising the following steps of: the method comprises the following steps:
(1) acquiring real high active value of the transformer and audio data corresponding to the high active value within the duration time;
(2) uniformly dividing the duration of the high active value into a plurality of time segments, and dividing the audio data with the high active value corresponding to the time segments into a training set, a test set and a verification set;
(3) extracting the characteristics of the audio data to obtain the Filterbank characteristics of the audio data, wherein the characteristics are multidimensional tensors and can be regarded as a spectrogram;
(4) the method comprises the steps of constructing a convolutional neural network, wherein the convolutional neural network is composed of an input layer, four groups of convolution-pooling units, a global average pooling layer, a full-connection layer and an output layer, the input layer is a Filterbank characteristic spectrum diagram of audio data, each group of convolution-pooling units is composed of a convolution layer and a pooling layer, the pooling layers in each group of convolution-pooling units are AvgPooling layers, and the output layer is a 1-dimensional high-activity value;
(5) taking the Filterbank characteristic spectrogram of the audio data set obtained in the step (3) as a training data spectrogram, and training the convolutional neural network constructed in the step (4) based on the training data spectrogram and the corresponding real high active value to obtain a high active value prediction model of the transformer;
(6) and inputting the test set into the prediction model and then comparing the test set with the verification set to verify the prediction model.
The transformer high active value prediction method based on the voiceprint and the neural network is characterized by comprising the following steps of: and (2) simultaneously acquiring high active values in the step (1), and quantizing the acquired high active values according to a set step length, so that the continuous high active values are mapped into discrete values, and audio data corresponding to the discrete high active values are obtained.
The transformer high active value prediction method based on the voiceprint and the neural network is characterized by comprising the following steps of: in the step (3), the Filterbank features are extracted by firstly performing framing on the audio data, calculating short-time fourier transform, then calculating multidimensional mel logarithmic energy, and finally expressing the audio data of each time slice as a multidimensional tensor, namely a spectrogram.
The transformer high active value prediction method based on the voiceprint and the neural network is characterized by comprising the following steps of: in the process of extracting the Filterbank characteristics in the step (3), the shape of the filter is triangular, and the initial frequency of each filter is distributed at equal intervals at the mel frequency.
The transformer high active value prediction method based on the voiceprint and the neural network is characterized by comprising the following steps of: and (4) constructing a convolutional neural network, wherein the convolutional neural network comprises a plurality of convolutional pooling units, different features of corresponding voice segments are extracted from the spectrogram by the convolutional layers in each convolutional pooling unit, the pooling layers are used for carrying out average pooling on the spectrogram, and finally, high active prediction values are output through the global average pooling layers and the full-connection layers.
Compared with the prior art, the method has the advantages that the high active value of the transformer can be predicted through the prediction model by acquiring the audio data of the transformer within the duration of the high active value and constructing the prediction model based on the neural network, a monitoring device connected with power equipment and a line does not need to be installed on site, the influence of a complex power environment can be avoided, and the accuracy of the prediction result is high.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The transformer high active value prediction method based on the voiceprint and the neural network comprises the following steps:
(1) and acquiring audio data corresponding to the duration time of the high active value of the transformer, acquiring the high active value at the same time, and quantizing the acquired high active value according to a set step length, so that the continuous high active value is mapped into discrete values, and audio data corresponding to the discrete high active values are obtained.
Since the amplitude of the high active value is continuously changed, the high active value needs to be quantized for better classification, when the quantization is performed according to the step size of 1, the distribution histogram is shown in fig. 1, and it can be seen from the histogram that the high active value is mainly concentrated in 22-28 in one day, and when the high active value is greater than 40, the corresponding data samples are few, and the difference of the high active value data can be dozens of times. Therefore, when the high active value is quantized with the step size of 1, the problem of data imbalance can cause difficulty in neural network model classification.
In order to better utilize data, the high active value is quantized with the step length of 3, so that the problem of unbalanced data distribution is relieved to a certain extent, and meanwhile, the change trend of the high active value can be reflected, and a quantization table refers to the table 1 below.
TABLE 1 quantization rules Table
Interval(s)
|
Quantized value
|
Interval(s)
|
Quantized value
|
Interval(s)
|
Quantized value
|
(20.5,23.5)
|
22
|
(29.5,32.5)
|
31
|
(38.5,41.5)
|
40
|
(23.5,26.5)
|
25
|
(32.5,35.5)
|
34
|
(41.5,44.5)
|
43
|
(26.5,29.5)
|
28
|
(35.5,38.5)
|
37
|
(44.5,47.5)
|
46 |
Quantization is a process of mapping continuous values into discrete values, and is a basic concept in digital signal processing, and different data formats can be more uniform by quantizing data, so that the processing of a computer is facilitated. In the embodiment of the invention, the high active value is quantized with the step size of 3. Therefore, the quantized high active values result in 9 categories of data.
(2) And uniformly dividing the duration of the high active value into a plurality of time segments, and dividing the audio data with the high active value corresponding to the time segments into a training set, a test set and a verification set.
The data set used in the embodiment of the invention is audio data collected from a transformer on Fuyang in 3 months in 2019, and the sampling frequency is 48 kHz. Although the high active value changes with time, it is considered that the value is relatively constant within 10s because it changes slowly. In the embodiment of the invention, the recorded audio with the time duration of 15 minutes is segmented to obtain a plurality of segments with the time duration of 10s, all audio files are divided according to the ratio of 4:1:1 to obtain a training set, a testing set and a verification set, and when the data sets are divided, all segments from the same original 15-minute audio can be divided into only one data set. The data distribution of these three sets is shown in fig. 2.
(3) Filterbank feature extraction
1) Pre-emphasis, which can compensate the high frequency part of the voice signal suppressed by the pronunciation system and can highlight the formant of the high frequency; 2) framing, namely performing frame fetching on input audio with the frame length of 200ms and the frame shift of 25 ms; 3) windowing, namely adding a Hamming window to each frame of signal to enable two ends of each frame of signal to be attenuated to be close to 0; 4) calculating short-time Fourier transform of each frame, and converting a time domain signal into a frequency domain signal; 5) mel filtering, filtering the signal by a set of linearly distributed triangular window filters on Mel frequency scales, and calculating Mel logarithmic energy of each frame, wherein the dimensionality is 256 dimensions. Finally each 10s audio sample will be divided into 400 frames, which in turn are expressed as a 1x400x256 dimensional tensor. The flow of the Filterbank feature extraction and the resulting spectrogram are shown in fig. 3 and 4, respectively.
(4) And constructing a convolutional neural network, wherein the convolutional neural network is composed of an input layer, four groups of convolution-pooling units, a global average pooling layer, a full-connection layer and an output layer, each group of convolution-pooling units is respectively composed of a convolution layer and a pooling layer, and the pooling layer in each group of convolution-pooling units is an AvgPooling layer.
Fig. 5 shows a convolutional neural network for high-activity value prediction according to an embodiment of the present invention, where the convolutional neural network includes 4 convolutional layers and 4 pooling layers, where the convolutional layer convolutional core has a size of 5, the number of channels is 32,64,128, and 256 in sequence, the pooling layer is an AvgPooling layer with a step size of 2, after passing through all convolutional layers and pooling layers, the global average pooling layer immediately following the convolutional layer reduces both time and frequency dimensions to 1, and the last fully-connected layer maps a feature vector with 512 dimensions to an output high-activity value.
(5) And (3) extracting the Filterbank characteristics from the training set obtained in the step (2) to obtain a spectrogram of the audio data, and inputting the spectrogram of the audio data and the corresponding real high-activity value into a convolutional neural network to obtain a prediction model.
(6) And inputting the test set into the prediction model and then comparing the test set with the verification set to verify the prediction model.
In order to demonstrate the practical effect of the invention, the invention reserves a continuous speech segment when dividing the data set. They do not appear in any of the training set, test set, and validation set described above. Compared with the three sets, the distribution of the part of data in time is continuous, and the high active value changes with time to present a sinusoidal law.
As shown in fig. 6, the tested audio is audio data of 62 hours from 19 th 3, 19 th 22 th 19 th 3, 22 th 12 th 19 th 3, 19 th 22 th 19 th 12 th 19 th, the solid line represents the real change rule of the high active value (normalized to the [0,1] interval) along with the time, the dotted line represents the prediction of the high active value according to the audio file, and it can be seen from fig. 6 that the prediction of the high active value by the present invention is basically correct, and the change trend of the high active value can be reflected.
The embodiments of the present invention are described only for the preferred embodiments of the present invention, and not for the limitation of the concept and scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the design concept of the present invention shall fall into the protection scope of the present invention, and the technical content of the present invention which is claimed is fully set forth in the claims.