CN114400019A

CN114400019A - Model generation method, abnormality detection device, and electronic apparatus

Info

Publication number: CN114400019A
Application number: CN202111666960.8A
Authority: CN
Inventors: 于洪伟; 李亚桐
Original assignee: Voiceai Technologies Co ltd
Current assignee: Voiceai Technologies Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-26
Anticipated expiration: 2041-12-31
Also published as: CN114400019B

Abstract

The embodiments of the present application disclose a model generation method, an abnormality detection method, an apparatus, and an electronic device. The method includes: acquiring a training data set, the training data set including respective first audio features of multiple audio information of the target device, the multiple audio information including normal audio information and abnormal audio information; Set to train the generator network to be trained to use the converged generator network to be trained as the initial anomaly detection model; adjust the initial anomaly detection model through the discriminator network to use the adjusted generator network as the target anomaly detection model Model. Through the above method, it is possible to perform anomaly detection on the device to be detected by inputting the first audio feature of the device to be detected into the target anomaly detection model, which saves manpower and improves the efficiency of anomaly detection.

Description

Model generation method, abnormality detection method, device, and electronic device

技术领域technical field

本申请涉及计算机技术领域，更具体地，涉及一种模型生成方法、异常检测方法、装置以及电子设备。The present application relates to the field of computer technology, and more particularly, to a model generation method, anomaly detection method, apparatus, and electronic device.

背景技术Background technique

随着电力需求的不断增长，电力系统对于电力运行设备的经济型和可靠性要求不断提高。由于长时间运行的负荷作用以及自然环境(气温、气压、湿度以及污秽等)的影响，会引起电力运行设备的老化、磨损，从而使得电力运行设备的性能与可靠性逐渐降低，存在安全隐患，因此对电力运行设备的运行状态进行监视与检测是非常有必要的。With the continuous growth of power demand, the economical and reliability requirements of the power system for the power operating equipment continue to increase. Due to the load effect of long-term operation and the influence of the natural environment (temperature, air pressure, humidity and pollution, etc.), the aging and wear of power operating equipment will cause the performance and reliability of power operating equipment to gradually decrease, and there are potential safety hazards. Therefore, it is very necessary to monitor and detect the operating state of the power operating equipment.

但是，在相关的电力运行设备检测方式中需要人工排查，效率低下。However, manual inspection is required in the related electric power operation equipment detection methods, which is inefficient.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本申请提出了一种模型生成方法、异常检测方法、装置、电子设备以及存储介质，以实现改善上述问题。In view of the above problems, the present application proposes a model generation method, anomaly detection method, apparatus, electronic device, and storage medium, so as to improve the above problems.

第一方面，本申请提供了一种模型生成方法，应用于电子设备，所述方法包括：获取训练数据集，所述训练数据集包括目标设备的多个音频信息各自的第一音频特征，所述多个音频信息包括正常音频信息和异常音频信息；通过所述训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型；通过判别器网络对所述初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。In a first aspect, the present application provides a model generation method, which is applied to an electronic device. The method includes: acquiring a training data set, where the training data set includes respective first audio features of multiple audio information of the target device, and the The plurality of audio information includes normal audio information and abnormal audio information; the generator network to be trained is trained by the training data set, so that the convergent generator network to be trained is used as the initial abnormal detection model; The initial anomaly detection model is adjusted to use the adjusted generator network as the target anomaly detection model.

第二方面，本申请提供了一种异常检测方法，应用于电子设备，所述方法包括：获取待检测音频；将所述待检测音频进行分帧、加窗、快速傅里叶变换，以得到所述待检测音频对应的第一音频特征，所述第一音频特征为所述待检测音频对应的语谱图；将所述第一音频特征输入上述方法得到的目标异常检测模型中，获取所述目标异常检测模型输出的检测结果。In a second aspect, the present application provides an anomaly detection method, which is applied to an electronic device. The method includes: acquiring audio to be detected; performing frame segmentation, windowing, and fast Fourier transform on the audio to be detected to obtain The first audio feature corresponding to the audio to be detected, the first audio feature is the spectrogram corresponding to the audio to be detected; input the first audio feature into the target anomaly detection model obtained by the above method, and obtain the Describe the detection results output by the target anomaly detection model.

第三方面，本申请提供了一种模型生成装置，运行于电子设备，所述装置包括：数据集获取单元，用于获取训练数据集，所述训练数据集包括目标设备的多个音频信息各自的第一音频特征，所述多个音频信息包括正常音频信息和异常音频信息；初始异常检测模型获取单元，用于通过所述训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型；目标异常检测模型获取单元，用于通过判别器网络对所述初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。In a third aspect, the present application provides a model generation apparatus, which runs on an electronic device, the apparatus includes: a data set acquisition unit, configured to acquire a training data set, wherein the training data set includes a plurality of audio information of the target device, respectively. The first audio feature, the plurality of audio information includes normal audio information and abnormal audio information; the initial abnormal detection model acquisition unit is used to train the generator network to be trained by the training data set, so that the convergent to be trained generator network is trained. The generator network is used as the initial abnormality detection model; the target abnormality detection model acquisition unit is used to adjust the initial abnormality detection model through the discriminator network, so as to use the adjusted generator network as the target abnormality detection model.

第四方面，本申请提供了一种异常检测装置，运行于电子设备，所述装置包括：待检测音频获取单元，用于获取待检测音频；第一音频特征获取单元，用于将所述待检测音频进行分帧、加窗、快速傅里叶变换，以得到所述待检测音频对应的第一音频特征，所述第一音频特征为所述待检测音频对应的语谱图；检测结果获取单元，用于将所述第一音频特征输入上述方法得到的目标异常检测模型中，获取所述目标异常检测模型输出的检测结果。In a fourth aspect, the present application provides an abnormality detection device that runs on an electronic device, the device comprising: a to-be-detected audio acquisition unit for acquiring to-be-detected audio; a first audio feature acquisition unit for The detected audio is subjected to framing, windowing, and fast Fourier transform to obtain the first audio feature corresponding to the audio to be detected, and the first audio feature is the spectrogram corresponding to the audio to be detected; the detection result is obtained The unit is configured to input the first audio feature into the target abnormality detection model obtained by the above method, and obtain the detection result output by the target abnormality detection model.

第五方面，本申请提供了一种电子设备，包括一个或多个处理器以及存储器；一个或多个程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行，所述一个或多个程序配置用于执行上述的方法。In a fifth aspect, the present application provides an electronic device comprising one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, The one or more programs are configured to perform the methods described above.

第六方面，本申请提供的一种计算机可读存储介质，所述计算机可读存储介质中存储有程序代码，其中，在所述程序代码运行时执行上述的方法。In a sixth aspect, the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, wherein the above-mentioned method is executed when the program code is executed.

本申请提供的一种模型生成方法、异常检测方法、装置、电子设备以及存储介质，在获取包括正常音频信息和异常音频信息的第一音频特征的训练数据集后，通过该训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型，再通过判别器网络对初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。通过上述方式使得，在通过正常音频信息和异常音频信息的第一音频特征训练得到目标异常检测模型后，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。并且，通过判别器网络对初始异常检测模型进行调整，可以使得目标异常检测模型在异常音频训练数据缺乏的情况下也可以有较好的性能，即在对正常音频和异常音频的判别上可以有较高的准确率。In a model generation method, anomaly detection method, device, electronic device and storage medium provided by the present application, after acquiring a training data set including a first audio feature of normal audio information and abnormal audio information, the training data set is used for training. The generator network is trained to use the converged generator network to be trained as the initial anomaly detection model, and then the initial anomaly detection model is adjusted through the discriminator network to use the adjusted generator network as the target anomaly detection model. Through the above method, after the target anomaly detection model is obtained by training the first audio features of the normal audio information and the abnormal audio information, in the process of anomaly detection of the device to be detected, the first audio feature of the device to be detected can be input into the target The anomaly detection model performs anomaly detection on the equipment to be detected, which saves manpower and improves the efficiency of anomaly detection. Moreover, by adjusting the initial anomaly detection model through the discriminator network, the target anomaly detection model can also have better performance in the absence of abnormal audio training data, that is, it can distinguish between normal audio and abnormal audio. higher accuracy.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained from these drawings without creative effort.

图1示出了本申请实施例提出的一种模型生成方法的流程图；FIG. 1 shows a flowchart of a model generation method proposed by an embodiment of the present application;

图2示出了本申请图1中S110的一种实施例方式的流程图；FIG. 2 shows a flowchart of an embodiment of S110 in FIG. 1 of the present application;

图3示出了本申请提出的一种获取音频数据集方法的流程的示意图；FIG. 3 shows a schematic diagram of the flow of a method for acquiring an audio data set proposed by the present application;

图4示出了本申请提出的一种获取第一音频特征方法的流程的示意图；FIG. 4 shows a schematic diagram of a flow of a method for acquiring a first audio feature proposed by the present application;

图5示出了本申请提出的一种待训练生成器网络的示意图；FIG. 5 shows a schematic diagram of a generator network to be trained proposed by the present application;

图6示出了本申请另一实施例提出的一种模型生成方法的流程图；FIG. 6 shows a flowchart of a model generation method proposed by another embodiment of the present application;

图7示出了本申请又一实施例提出的一种模型生成方法的流程图；FIG. 7 shows a flowchart of a model generation method proposed by another embodiment of the present application;

图8示出了本申请提出的一种待训练异常检测模型的示意图；8 shows a schematic diagram of a to-be-trained anomaly detection model proposed by the present application;

图9示出了本申请实施例提出的一种异常检测方法的流程图；FIG. 9 shows a flowchart of an abnormality detection method proposed by an embodiment of the present application;

图10示出了本申请实施例提出的一种模型生成装置的结构框图；FIG. 10 shows a structural block diagram of a model generation apparatus proposed by an embodiment of the present application;

图11示出了本申请实施例提出的一种异常检测装置的结构框图；FIG. 11 shows a structural block diagram of an abnormality detection apparatus proposed by an embodiment of the present application;

图12示出了本申请提出的一种电子设备的结构框图；FIG. 12 shows a structural block diagram of an electronic device proposed by the present application;

图13是本申请实施例的用于保存或者携带实现根据本申请实施例的参数获取方法的程序代码的存储单元。FIG. 13 is a storage unit for storing or carrying a program code for implementing a parameter acquisition method according to an embodiment of the present application according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

发明人在对相关电力运行设备异常检测研究中发现，传统的电力运行设备异常检测方法需要人工排查，效率低下。基于深度神经网络的电力运行设备异常检测方法需要大量的异常数据来对网络参数进行训练，然而在实际生产环境中，异常数据数量较少，使得深度神经网络的性能不佳。The inventor found in the research on abnormality detection of related electric power operation equipment that the traditional abnormal detection method of electric power operation equipment requires manual investigation and is inefficient. The anomaly detection method of power operating equipment based on deep neural network requires a large amount of abnormal data to train network parameters. However, in the actual production environment, the amount of abnormal data is small, which makes the performance of deep neural network poor.

因此，发明人提出了本申请中的一种模型生成方法、异常检测方法、装置以及电子设备，在获取包括正常音频信息和异常音频信息的第一音频特征的训练数据集后，通过该训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型，再通过判别器网络对初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。通过上述方式使得，在通过正常音频信息和异常音频信息的第一音频特征训练得到目标异常检测模型后，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。并且，通过判别器网络对初始异常检测模型进行调整，可以使得目标异常检测模型在异常音频训练数据缺乏的情况下也可以有较好的性能，即在对正常音频和异常音频的判别上可以有较高的准确率。Therefore, the inventor proposes a model generation method, anomaly detection method, device and electronic device in the present application. After acquiring the training data set including the first audio features of normal audio information and abnormal audio information, the Set the generator network to be trained to train the convergent generator network to be trained as the initial anomaly detection model, and then adjust the initial anomaly detection model through the discriminator network to use the adjusted generator network as the target anomaly detection model . Through the above method, after the target anomaly detection model is obtained by training the first audio features of the normal audio information and the abnormal audio information, in the process of anomaly detection of the device to be detected, the first audio feature of the device to be detected can be input into the target The anomaly detection model performs anomaly detection on the equipment to be detected, which saves manpower and improves the efficiency of anomaly detection. Moreover, by adjusting the initial anomaly detection model through the discriminator network, the target anomaly detection model can also have better performance in the absence of abnormal audio training data, that is, it can distinguish between normal audio and abnormal audio. higher accuracy.

请参阅图1，本申请提供的一种模型生成方法，应用于电子设备，所述方法包括：Referring to FIG. 1, a model generation method provided by the present application is applied to an electronic device, and the method includes:

S110：获取训练数据集，所述训练数据集包括目标设备的多个音频信息各自的第一音频特征，所述多个音频信息包括正常音频信息和异常音频信息。S110: Acquire a training data set, where the training data set includes respective first audio features of multiple pieces of audio information of the target device, where the multiple pieces of audio information include normal audio information and abnormal audio information.

在本申请实施例中，目标设备可以为选中进行异常检测的设备。其中，目标设备可以为电力运行设备，例如：发电机、电动机、变压器等。In this embodiment of the present application, the target device may be a device selected for abnormality detection. Wherein, the target device may be a power running device, such as a generator, an electric motor, a transformer, and the like.

其中，如图2所示，获取训练数据集，包括：Among them, as shown in Figure 2, the training data set is obtained, including:

S111：获取目标设备的多个音频信息。S111: Acquire multiple pieces of audio information of the target device.

其中，作为一种方式，可以通过音频采集设备(例如：录音机等)对目标设备在运行时因自身内部结构或硬件条件等原因而产生的声音进行多次采样，将采样得到的多个音频作为目标设备的多个音频信息，其中，多个音频信息可以包括正常音频信息(目标设备正常运行时的音频)和异常音频信息(目标设备出现异常时的音频，例如：变压器因外部短路而导致过电流、因负荷长时间超过额定容量而导致过负荷等异常时的声音)。示例性的，目标设备为变压器，通过音频采集设备对变压器正常运行和出现异常时的多个音频进行采样，可以得到多个采样率为16kHZ、采样精度为16bit的音频信息。Among them, as a way, the sound generated by the target device during operation due to its own internal structure or hardware conditions can be sampled multiple times through an audio collection device (such as a tape recorder, etc.), and the multiple sampled audios can be used as Multiple audio information of the target device, wherein the multiple audio information can include normal audio information (audio when the target device is running normally) and abnormal audio information (audio when the target device is abnormal, for example: the transformer is over-circuited due to an external short circuit. current, sound when overloaded due to a load exceeding the rated capacity for a long time). Exemplarily, the target device is a transformer, and the audio collection device samples multiple audio frequencies when the transformer operates normally and when abnormality occurs, so as to obtain multiple audio information with a sampling rate of 16 kHz and a sampling precision of 16 bits.

需要说明的是，采样率和采样精度一般与音频采集设备的硬件条件相关，采样率越高，可以表明音频采集设备每秒所采集到的采样点数量越多，例如，当采样率为16khz时，可以表明音频采集设备每秒所采集到的采样点为16000个；采样精度越高，可以表明每一个采样点数据的表示范围越大，例如，当采样精度为16bit时，可以表明每个采样点数据所表示的范围为-32768(2^(16-1))～+32767之间。It should be noted that the sampling rate and sampling accuracy are generally related to the hardware conditions of the audio capture device. The higher the sampling rate, the more sampling points the audio capture device collects per second. For example, when the sampling rate is 16khz , it can be shown that the audio collection device collects 16,000 sampling points per second; the higher the sampling precision, the larger the representation range of each sampling point data, for example, when the sampling precision is 16bit, it can be shown that each sample The range represented by point data is -32768 (2 ^(16-1) ) to +32767.

如图3所示，在音频采集设备采集到目标设备的多个音频信息后，可以对目标设备的多个音频信息进行存储和标注，以将标注后的多个音频信息作为音频数据集。示例性的，可以将采集到的正常音频信息的标签设置为1，将采集到的异常音频信息的标签设置为0。As shown in FIG. 3 , after the audio collection device collects multiple audio information of the target device, the multiple audio information of the target device may be stored and marked, so as to use the marked multiple audio information as an audio data set. Exemplarily, the tag of the collected normal audio information may be set to 1, and the tag of the collected abnormal audio information may be set to 0.

需要说明的是，对目标设备的多个音频信息进行标注的方式可以有多种，可以是手工标注，也可以是利用分类模型自动标注，还可以是在利用分类模型自动标注后进行人工校准。It should be noted that there may be various ways to label the multiple audio information of the target device, which may be manual labeling, automatic labeling using a classification model, or manual calibration after automatic labeling using a classification model.

S112：对所述音频信息进行分帧、加窗、快速傅里叶变换，以得到所述音频信息对应的语谱图。S112: Perform framing, windowing, and fast Fourier transform on the audio information to obtain a spectrogram corresponding to the audio information.

其中，作为一种方式，可以将包括目标设备多个音频信息的音频数据集进行预处理并将预处理的结果作为训练数据集进行存储，该训练数据集包括有多个音频信息各自对应的语谱图，以将音频信息转化为图像信息输入深度神经网络中进行训练。其中，预处理可以包括分帧、加窗、快速傅里叶变换，如图4所示，可以先对标注好的音频做分帧加窗处理，再对每个窗做快速傅里叶变换以得到标注好的音频对应的语谱图。示例性的，当音频采集设备所采集到的每个音频的时长为1s、采样率为16kHZ、帧长为25ms、帧移为10ms、窗口为汉宁窗时，每个音频信息对应有16000个采样点数据，每个音频信息中的每一帧对应有400个采样点数据，当窗口长度与帧长相同时，对每一帧数据乘上汉宁窗函数后，可以得到400个加窗后的采样点数据，对加窗后的采样点数据做512点的快速傅里叶变换后可以得到一帧数据对应的512根谱线，在完成此次计算后，可以将汉宁窗向后移动10ms(160个采样点数据)，进行下一次的计算，直到将所有帧计算完毕，可以得到每个音频信息各自对应的语谱图。In one way, an audio data set including multiple audio information of the target device can be preprocessed and the preprocessed result is stored as a training data set, where the training data set includes language corresponding to each of the multiple audio information. Spectrogram to convert audio information into image information and input it into deep neural network for training. Among them, the preprocessing can include framing, windowing, and fast Fourier transform. As shown in Figure 4, the marked audio can be divided into frames and windowed, and then each window can be subjected to fast Fourier transform to obtain Obtain the spectrogram corresponding to the labeled audio. Exemplarily, when the duration of each audio collected by the audio collection device is 1s, the sampling rate is 16kHZ, the frame length is 25ms, the frame shift is 10ms, and the window is a Hanning window, each audio information corresponds to 16,000 pieces of audio information. Sampling point data, each frame in each audio information corresponds to 400 sampling point data. When the window length is the same as the frame length, after multiplying the Hanning window function on each frame of data, 400 windowed data can be obtained. Sampling point data, 512 spectral lines corresponding to one frame of data can be obtained by performing 512-point fast Fourier transform on the windowed sampling point data. After this calculation is completed, the Hanning window can be moved backward by 10ms (160 sampling point data), perform the next calculation until all frames are calculated, and the corresponding spectrogram of each audio information can be obtained.

作为另一种方式，可以将包括目标设备多个音频信息的音频数据集通过分帧、加窗、快速傅里叶变换做实时处理，以得到上述多个音频信息各自对应的语谱图，以将音频信息转化为图像信息实时地输入深度神经网络中进行训练。As another method, the audio data set including multiple audio information of the target device can be processed in real time by framing, windowing, and fast Fourier transform, so as to obtain the corresponding spectrograms of the multiple audio information, so as to obtain the corresponding spectrograms. Convert audio information into image information and input it into deep neural network for training in real time.

需要说明的是，帧长、帧移、窗口函数以及快速傅里叶变换的点数可以是根据实际需求确定的。示例性的，考虑到异常检测模型的实时性以及音频信息的分辨率，可以将快速傅里叶变换的点数设置为512，这样即可以提取到丰富的音频信息又可以使模型能够保持较快的计算速度。It should be noted that the frame length, frame shift, window function and the number of fast Fourier transform points may be determined according to actual requirements. Exemplarily, considering the real-time nature of the anomaly detection model and the resolution of the audio information, the number of fast Fourier transform points can be set to 512, so that both rich audio information can be extracted and the model can maintain a faster speed. Calculate speed.

再者，需要说明的是，由于在实际生产生活中，电力运行设备出现异常的情况较少，使得训练数据集中正常音频对应的语谱图数量可能会多于异常音频对应的语谱图数量，所以本申请实施例中的训练数据集可以是不平衡的。Furthermore, it should be noted that since in actual production and life, there are few abnormal situations in the power operation equipment, so that the number of spectrograms corresponding to normal audio in the training data set may be more than the number of spectrograms corresponding to abnormal audio. Therefore, the training data set in this embodiment of the present application may be unbalanced.

S113：将所述语谱图作为所述音频信息的第一音频特征。S113: Use the spectrogram as the first audio feature of the audio information.

其中，作为一种方式，可以将目标设备的多个语谱图作为目标设备的多个音频信息各自的第一音频特征。Wherein, as an approach, multiple spectrograms of the target device may be used as the respective first audio features of multiple audio information of the target device.

S120：通过所述训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型。S120: Train the generator network to be trained by using the training data set, so as to use the converged generator network to be trained as an initial anomaly detection model.

其中，待训练生成器网络(Generator，G)可以用于基于第一音频特征生成符合第一音频特征分布情况的第二音频特征，并对第二音频特征进行特征提取。在本申请实施例中，待训练生成器网络可以包括第一音频特征重构网络和特征提取网络，其中，第一音频特征重构网络可以包括第一编码器、LSTM(Long Short-Term Memory，长短期记忆网络)以及解码器，特征提取网络可以包括第二编码器。作为一种方式，可以将第一音频特征输入第一编码器中，通过非线性变换对第一音频特征进行编码(Coding)，得到第一音频特征的低维特征；再将编码后的特征(第一音频特征的低维特征)输入LSTM中，因为LSTM对时间信息有良好的提取能力并且本申请中的音频特征与时间相关，所以可以利用LSTM对编码后的特征进行进一步地提取，以得到更加有效的潜在表征(第一音频特征的低维特征)，使得待训练生成器网络可以学习到正常音频和异常音频各自在时间维度上的特征，从而提高待训练生成器网络对正常音频和异常音频的分辨能力；然后可以将潜在表征输入解码器中，通过逆映射对潜在表征进行解码(Decoding)，重构出第一音频特征，此时重构出的第一音频特征可以被称为第二音频特征，其中，第一音频特征与第二音频特征尺寸相同；在得到第二音频特征后，可以将第二音频特征输入第二编码器中，对第二音频特征进行进一步的特征提取和降维。The generator network (Generator, G) to be trained may be used to generate a second audio feature conforming to the distribution of the first audio feature based on the first audio feature, and perform feature extraction on the second audio feature. In this embodiment of the present application, the generator network to be trained may include a first audio feature reconstruction network and a feature extraction network, wherein the first audio feature reconstruction network may include a first encoder, LSTM (Long Short-Term Memory, long short term memory network) and a decoder, the feature extraction network may include a second encoder. As a way, the first audio feature can be input into the first encoder, and the first audio feature can be encoded (Coding) through nonlinear transformation to obtain the low-dimensional feature of the first audio feature; then the encoded feature ( The low-dimensional features of the first audio feature) are input into LSTM, because LSTM has a good ability to extract temporal information and the audio features in this application are related to time, so LSTM can be used to further extract the encoded features to obtain More effective latent representation (low-dimensional feature of the first audio feature), so that the generator network to be trained can learn the characteristics of normal audio and abnormal audio in the time dimension, thereby improving the generator network to be trained to normal audio and abnormal audio. The ability to distinguish audio; then the potential representation can be input into the decoder, the potential representation can be decoded (Decoding) through inverse mapping, and the first audio feature can be reconstructed. At this time, the reconstructed first audio feature can be called the first audio feature. Two audio features, wherein the size of the first audio feature is the same as that of the second audio feature; after obtaining the second audio feature, the second audio feature can be input into the second encoder, and further feature extraction and extraction are performed on the second audio feature. Dimensionality reduction.

可选的，待训练生成器网络还可以包括全连接层，在全连接层后通过softmax激活函数可以用于输出第一音频特征对应的音频信息的异常检测结果。通过将包括多个第一音频特征的训练数据集对待训练生成器网络进行训练，可以得到收敛的待训练生成器网络，可以将收敛的待训练生成器网络作为初始异常检测模型。Optionally, the generator network to be trained may further include a fully connected layer, and after the fully connected layer, a softmax activation function may be used to output anomaly detection results of audio information corresponding to the first audio feature. By training the generator network to be trained with the training data set including the plurality of first audio features, a converged generator network to be trained can be obtained, and the converged generator network to be trained can be used as an initial anomaly detection model.

可选的，为了减少特征信息的损失，在待训练生成器网络中，第一编码器和解码器的相同尺寸的中间特征之间可以采用残差连接。示例性的，如图5所示，在第一编码器部分，第一音频特征经过一个3×3的卷积运算后可以得到一个中间特征，同样，在解码器部分，LSTM的输出经过一个3×3的反卷积运算后可以得到一个输出特征，将该输出特征与上述第一编码的中间特征相加可以得到解码器的一个中间特征，该解码器中间特征与上述第一编码器中间特征尺寸相同。Optionally, in order to reduce the loss of feature information, in the generator network to be trained, a residual connection may be used between the intermediate features of the same size of the first encoder and the decoder. Exemplarily, as shown in Figure 5, in the first encoder part, an intermediate feature can be obtained after the first audio feature undergoes a 3×3 convolution operation. Similarly, in the decoder part, the output of the LSTM undergoes a 3×3 convolution operation. After the deconvolution operation of ×3, an output feature can be obtained, and an intermediate feature of the decoder can be obtained by adding the output feature and the above-mentioned first encoded intermediate feature, and the decoder intermediate feature and the above-mentioned first encoder intermediate feature Same size.

需要说明的是，第一编码器和第二编码器的结构可以相同。示例性的，如图5所示，待训练生成器网络的第一编码器和第二编码器分别可以包括3个3×3的二维卷积层，解码器可以包括3个3×3的二维反卷积层，此外，待训练生成器网络还可以包括一个LSTM层和一个全连接层。It should be noted that the structures of the first encoder and the second encoder may be the same. Exemplarily, as shown in FIG. 5 , the first encoder and the second encoder of the generator network to be trained may respectively include three 3×3 two-dimensional convolutional layers, and the decoder may include three 3×3 convolutional layers. 2D deconvolution layer, in addition, the generator network to be trained can also include an LSTM layer and a fully connected layer.

再者，需要说明的是，待训练生成器网络中的编码器、解码器、LSTM层以及全连接层的网络层深度和每一层网络对应的参数(例如，卷积层尺寸、反卷积层尺寸等)可以根据不同的目标设备、第一音频特征大小等进行灵活设置的。可选的，为了提高模型的性能，可以在待训练生成器网络中引入注意力机制。Furthermore, it should be noted that the network layer depth of the encoder, decoder, LSTM layer and fully connected layer in the generator network to be trained and the parameters corresponding to each layer of the network (for example, the size of the convolution layer, the deconvolution layer layer size, etc.) can be flexibly set according to different target devices, the size of the first audio feature, and the like. Optionally, in order to improve the performance of the model, an attention mechanism can be introduced into the generator network to be trained.

S130：通过判别器网络对所述初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。S130: Adjust the initial anomaly detection model through a discriminator network, so as to use the adjusted generator network as a target anomaly detection model.

其中，判别器网络(Discriminator，D)可以用于判别初始异常检测模型中的第一音频特征和第二音频特征的分布是否一致。在本申请实施例中，判别器网络可以包括第三编码器，可选的，第三编码器的网络结构可以与第一编码器或者第二编码器相同。The discriminator network (Discriminator, D) can be used to discriminate whether the distributions of the first audio feature and the second audio feature in the initial anomaly detection model are consistent. In this embodiment of the present application, the discriminator network may include a third encoder. Optionally, the network structure of the third encoder may be the same as that of the first encoder or the second encoder.

作为一种方式，由于初始异常检测模型中的生成器网络的训练目的可以为生成符合第一音频特征分布(真实分布)的第二音频特征，判别器网络的训练目的可以为正确判别第一音频特征和第二音频特征的分布是否一致，所以通过判别器网络对初始异常检测模型进行网络参数(例如，权重等)调整，可以使得初始异常检测模型中的生成器网络在与判别器网络相互对抗的过程中学习到正常音频信息和异常音频信息各自对应的第一音频特征的数据分布，从而使得调整后的生成器网络可以作为目标异常检测模型对输入的音频特征进行异常检测。As an approach, since the training purpose of the generator network in the initial anomaly detection model may be to generate a second audio feature conforming to the first audio feature distribution (true distribution), the training purpose of the discriminator network may be to correctly discriminate the first audio Whether the distribution of the feature and the second audio feature is consistent, so adjusting the network parameters (for example, weights, etc.) of the initial anomaly detection model through the discriminator network can make the generator network in the initial anomaly detection model compete with the discriminator network. In the process of learning the data distribution of the first audio features corresponding to the normal audio information and the abnormal audio information, the adjusted generator network can be used as a target abnormality detection model to perform abnormality detection on the input audio features.

作为另一种方式，可以根据训练数据集中异常音频信息的占比，来确定是否通过判别器网络对初始异常检测模型进行调整，若异常音频信息的占比小于阈值，则通过判别器网络对初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。示例性的，假设阈值为A，训练数据集中异常音频信息的占比为B，若B小于A，则表明异常音频信息的训练样本不足，可能会使得初始异常检测模型学习不到异常音频信息对应的第一音频特征的数据分布，导致初始异常检测模型无法准确区分正常音频和异常音频，此时，可以通过判别器网络对初始异常检测模型进行网络参数(例如，权重等)调整，以便初始异常检测模型中的生成器网络在与判别器网络相互对抗的过程中学习到正常音频信息和异常音频信息各自对应的第一音频特征的数据分布，从而使得调整后的生成器网络可以作为目标异常检测模型对输入的音频特征进行异常检测。As another method, it can be determined whether to adjust the initial anomaly detection model through the discriminator network according to the proportion of abnormal audio information in the training data set. The anomaly detection model is adjusted to use the adjusted generator network as the target anomaly detection model. Exemplarily, assuming that the threshold is A, and the proportion of abnormal audio information in the training data set is B, if B is less than A, it indicates that the training samples of abnormal audio information are insufficient, which may make the initial abnormal detection model unable to learn the corresponding abnormal audio information. The data distribution of the first audio feature of , causes the initial anomaly detection model to be unable to accurately distinguish between normal audio and abnormal audio. The generator network in the detection model learns the data distribution of the first audio features corresponding to the normal audio information and abnormal audio information in the process of confrontation with the discriminator network, so that the adjusted generator network can be used as a target abnormality detection. The model performs anomaly detection on the input audio features.

本实施例提供的一种模型生成方法，在获取包括正常音频信息和异常音频信息的第一音频特征的训练数据集后，通过该训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型，再通过判别器网络对初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。通过上述方式使得，在通过正常音频信息和异常音频信息的第一音频特征训练得到目标异常检测模型后，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。并且，通过判别器网络对初始异常检测模型进行调整，可以使得目标异常检测模型在异常音频训练数据缺乏的情况下也可以有较好的性能，即在对正常音频和异常音频的判别上可以有较高的准确率。In a model generation method provided by this embodiment, after acquiring a training data set including a first audio feature of normal audio information and abnormal audio information, the generator network to be trained is trained by the training data set, so as to convert the converged to-be-trained generator network. The generator network is trained as the initial anomaly detection model, and then the initial anomaly detection model is adjusted through the discriminator network to use the adjusted generator network as the target anomaly detection model. Through the above method, after the target anomaly detection model is obtained by training the first audio features of the normal audio information and the abnormal audio information, in the process of anomaly detection of the device to be detected, the first audio feature of the device to be detected can be input into the target The anomaly detection model performs anomaly detection on the equipment to be detected, which saves manpower and improves the efficiency of anomaly detection. Moreover, by adjusting the initial anomaly detection model through the discriminator network, the target anomaly detection model can also have better performance in the absence of abnormal audio training data, that is, it can distinguish between normal audio and abnormal audio. higher accuracy.

请参阅图6，本申请提供的一种模型生成方法，应用于电子设备，所述方法包括：Please refer to FIG. 6 , a model generation method provided by the present application is applied to an electronic device, and the method includes:

S210：获取训练数据集，所述训练数据集包括目标设备的多个音频信息各自的第一音频特征，所述多个音频信息包括正常音频信息和异常音频信息。S210: Acquire a training data set, where the training data set includes respective first audio features of multiple pieces of audio information of the target device, where the multiple pieces of audio information include normal audio information and abnormal audio information.

S220：将所述训练数据集输入所述待训练生成器网络，得到所述待训练生成器网络的输出。S220: Input the training data set into the generator network to be trained, and obtain the output of the generator network to be trained.

其中，作为一种方式，可以将正常音频信息和异常音频信息各自对应的第一音频特征输入待训练生成器网络，得到待训练生成器网络的输出。在这种方式下，可以有多个正常音频信息和多个异常音频信息，每一个正常或者异常音频信息对应有一个第一音频特征。Wherein, as a method, the first audio features corresponding to the normal audio information and the abnormal audio information can be input into the generator network to be trained to obtain the output of the generator network to be trained. In this manner, there may be multiple normal audio information and multiple abnormal audio information, and each normal or abnormal audio information corresponds to a first audio feature.

S230：基于所述输出、第一损失函数和第二损失函数对所述待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型，其中，所述第一损失函数为所述第一编码器输出结果和所述第二编码器输出结果的差的绝对值，所述第二损失函数为所述第一音频特征与第二音频特征的差的绝对值，所述第二音频特征为所述解码器的输出结果。S230: Train the generator network to be trained based on the output, the first loss function and the second loss function, so as to use the converged generator network to be trained as an initial anomaly detection model, wherein the first loss function is the absolute value of the difference between the output result of the first encoder and the output result of the second encoder, the second loss function is the absolute value of the difference between the first audio feature and the second audio feature, the The second audio feature is the output result of the decoder.

其中，在本申请实施例中，第一损失函数可以用于最小化待训练生成器网络(Generator，G)中第一编码器的输出特征与第二编码器的输出特征之间的距离，以便待训练生成器网络可以学习到正常音频信息和异常音频信息各自对应的编码特征分布情况。作为一种方式，第一损失函数可以为第一编码器输出结果和第二编码器输出结果的差的绝对值，第一损失函数的计算公式如下：Wherein, in this embodiment of the present application, the first loss function can be used to minimize the distance between the output features of the first encoder and the output features of the second encoder in the generator network (Generator, G) to be trained, so that The generator network to be trained can learn the respective encoding feature distributions of normal audio information and abnormal audio information. As a way, the first loss function can be the absolute value of the difference between the output result of the first encoder and the output result of the second encoder, and the calculation formula of the first loss function is as follows:

Loss_g₁＝‖z₁-z₂‖Loss_g ₁ =‖z ₁ -z ₂ ‖

其中，z₁可以表示第一编码器的输出结果，z₂可以表示第二编码器的输出结果。Wherein, z ₁ may represent the output result of the first encoder, and z ₂ may represent the output result of the second encoder.

再者，在本申请实施例中，第二损失函数可以用于最小化待训练生成器网络(Generator，G)中第一音频特征与第二音频特征之间的距离，以便待训练生成器网络可以学习到正常音频信息和异常音频信息各自对应的纹理特征分布情况。作为一种方式，第二损失函数可以为第一音频特征和解码器输出结果的差的绝对值，第二损失函数的计算公式如下：Furthermore, in this embodiment of the present application, the second loss function may be used to minimize the distance between the first audio feature and the second audio feature in the generator network to be trained (Generator, G), so that the generator network to be trained The distribution of texture features corresponding to normal audio information and abnormal audio information can be learned. As a way, the second loss function may be the absolute value of the difference between the first audio feature and the output result of the decoder, and the calculation formula of the second loss function is as follows:

Loss_g₂＝‖x-G(x)‖Loss_g ₂ =‖xG(x)‖

其中，x可以表示第一音频特征，G(x)可以表示解码器的输出结果。Wherein, x may represent the first audio feature, and G(x) may represent the output result of the decoder.

作为一种方式，可以将第一损失函数和第二损失函数的加权和作为待训练生成器网络的损失函数，基于待训练生成器网络的输出和待训练生成器网络的损失函数对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型。待训练生成器网络的损失函数的计算公式如下：As a way, the weighted sum of the first loss function and the second loss function can be used as the loss function of the generator network to be trained, and the generator network to be trained is based on the output of the generator network to be trained and the loss function of the generator network to be trained. The network is trained to use the converged generator network to be trained as the initial anomaly detection model. The calculation formula of the loss function of the generator network to be trained is as follows:

Loss_G＝xLoss_g₁+yLoss_g₂ Loss_G=xLoss_g ₁ +yLoss_g ₂

其中，x与y的和为1，x与y的值可以是基于经验进行设置的，也可以是作为待训练生成器网络的可训练参数经训练得到的。The sum of x and y is 1, and the values of x and y can be set based on experience, or can be obtained by training as trainable parameters of the generator network to be trained.

S240：通过判别器网络对所述初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。S240: Adjust the initial anomaly detection model through a discriminator network, so as to use the adjusted generator network as a target anomaly detection model.

本实施例提供的一种模型生成方法，通过上述方式使得，在通过正常音频信息和异常音频信息的第一音频特征训练得到目标异常检测模型后，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。并且，通过判别器网络对初始异常检测模型进行调整，可以使得目标异常检测模型在异常音频训练数据缺乏的情况下也可以有较好的性能，即在对正常音频和异常音频的判别上可以有较高的准确率。并且，在本实施例中，通过待训练生成器网络的输出、第一损失函数和第二损失函数对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型，从而使得初始异常检测模型可以得到正常音频信息和异常音频信息各自对应的特征分布情况，提高了初始异常检测模型对正常音频和异常音频的判别能力。In the model generation method provided by this embodiment, after the target abnormality detection model is obtained by training the normal audio information and the first audio feature of the abnormal audio information, in the process of abnormality detection of the device to be detected, it is possible to The first audio feature of the device to be detected is input into the target abnormality detection model to perform abnormality detection on the device to be detected, which saves manpower and improves the efficiency of abnormality detection. Moreover, by adjusting the initial anomaly detection model through the discriminator network, the target anomaly detection model can also have better performance in the absence of abnormal audio training data, that is, it can distinguish between normal audio and abnormal audio. higher accuracy. Moreover, in this embodiment, the generator network to be trained is trained through the output of the generator network to be trained, the first loss function and the second loss function, so that the converged generator network to be trained is used as the initial abnormality detection model, thereby The initial anomaly detection model can obtain the respective feature distributions of the normal audio information and the abnormal audio information, which improves the ability of the initial anomaly detection model to discriminate between the normal audio and the abnormal audio.

请参阅图7，本申请提供的一种模型生成方法，应用于电子设备，所述方法包括：Please refer to FIG. 7 , a model generation method provided by the present application is applied to an electronic device, and the method includes:

S310：获取训练数据集，所述训练数据集包括目标设备的多个音频信息各自的第一音频特征，所述多个音频信息包括正常音频信息和异常音频信息。S310: Acquire a training data set, where the training data set includes respective first audio features of multiple pieces of audio information of the target device, where the multiple pieces of audio information include normal audio information and abnormal audio information.

S320：通过所述训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型。S320: Train the generator network to be trained by using the training data set, so as to use the converged generator network to be trained as an initial anomaly detection model.

S330：获取待训练异常检测模型，所述待训练异常检测模型包括所述初始异常检测模型和所述第三编码器。S330: Obtain an abnormality detection model to be trained, where the abnormality detection model to be trained includes the initial abnormality detection model and the third encoder.

其中，作为一种方式，如图8所示，待训练异常检测模型可以包括第一编码器、解码器、第二编码器以及第三编码器，其中，第三编码器可以作为判别器网络(Discriminator，D)，第三编码器的输入可以为第一音频特征和解码器的输出(第二音频特征)。Wherein, as one way, as shown in FIG. 8 , the anomaly detection model to be trained may include a first encoder, a decoder, a second encoder and a third encoder, wherein the third encoder may be used as a discriminator network ( Discriminator, D), the input of the third encoder can be the first audio feature and the output of the decoder (the second audio feature).

可选的，第一编码器、第二编码器和第三编码器的结构可以相同。Optionally, the structures of the first encoder, the second encoder and the third encoder may be the same.

S340：将所述训练数据集输入所述待训练异常检测模型，得到所述待训练异常检测模型的输出。S340: Input the training data set into the anomaly detection model to be trained to obtain an output of the anomaly detection model to be trained.

其中，作为一种方式，可以将正常音频信息和异常音频信息各自对应的多个第一音频特征输入待训练异常检测模型，得到待训练异常检测模型的输出。Wherein, as a method, the plurality of first audio features corresponding to the normal audio information and the abnormal audio information may be input into the abnormality detection model to be trained to obtain the output of the abnormality detection model to be trained.

S350：基于所述输出、所述第一损失函数、所述第二损失函数和第三损失函数对所述待训练异常检测模型进行调整，得到收敛的待训练异常检测模型，其中，所述第三损失函数为第三音频特征与第四音频特征的差的绝对值，所述第三音频特征为所述第一音频特征对应的第三编码器输出结果，所述第四音频特征为所述第二音频特征对应的第三编码器输出结果。S350: Adjust the anomaly detection model to be trained based on the output, the first loss function, the second loss function and the third loss function to obtain a converged anomaly detection model to be trained, wherein the first The three loss function is the absolute value of the difference between the third audio feature and the fourth audio feature, the third audio feature is the output result of the third encoder corresponding to the first audio feature, and the fourth audio feature is the The third encoder output result corresponding to the second audio feature.

其中，在本申请实施例中，第三损失函数可以用于最小化第一音频特征对应的判别器网络输出特征与第二音频特征对应的判别器网络输出特征之间的距离，以便待训练异常检测模型学习到可以欺骗判别器网络(Discriminator，D)的特征，即使得判别器网络无法确认第二音频特征是否为生成的特征。作为一种方式，第三损失函数可以为第三音频特征与第四音频特征的差的绝对值，第三损失函数的计算公式如下：Wherein, in the embodiment of the present application, the third loss function may be used to minimize the distance between the output feature of the discriminator network corresponding to the first audio feature and the output feature of the discriminator network corresponding to the second audio feature, so that the abnormality to be trained The detection model learns features that can fool the discriminator network (Discriminator, D), even if the discriminator network cannot confirm whether the second audio feature is a generated feature. As a way, the third loss function may be the absolute value of the difference between the third audio feature and the fourth audio feature, and the calculation formula of the third loss function is as follows:

Loss_d＝‖D(x)-D(G(x))‖Loss_d=‖D(x)-D(G(x))‖

其中，D(x)可以表示第三音频特征，第三音频特征可以为将第一音频特征输入判别器网络所得到的输出结果；D(G(x))可以表示第四音频特征，第四音频特征可以为将第二音频特征输入判别器网络所得到的输出结果。Among them, D(x) can represent the third audio feature, and the third audio feature can be the output result obtained by inputting the first audio feature into the discriminator network; D(G(x)) can represent the fourth audio feature, the fourth audio feature The audio feature may be an output result obtained by inputting the second audio feature into the discriminator network.

作为一种方式，可以将第一损失函数、第二损失函数和第三损失函数的加权和作为待训练异常检测模型的损失函数，通过待训练异常检测模型的输出和待训练异常检测模型的损失函数对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型。待训练生成器网络的损失函数的计算公式如下：As a way, the weighted sum of the first loss function, the second loss function and the third loss function can be used as the loss function of the anomaly detection model to be trained, and the output of the anomaly detection model to be trained and the loss of the anomaly detection model to be trained can be used as the loss function. The function trains the to-be-trained generator network to use the converged to-be-trained generator network as the initial anomaly detection model. The calculation formula of the loss function of the generator network to be trained is as follows:

Loss＝xLoss_g₁+yLoss-g₂+zLoss_dLoss=xLoss_g ₁ +yLoss-g ₂ +zLoss_d

其中，x、y、z的和为1，x、y、z的值可以是基于经验进行设置的，也可以是作为待训练异常检测模型的可训练参数经训练得到的。The sum of x, y, and z is 1, and the values of x, y, and z may be set based on experience, or may be obtained through training as trainable parameters of the anomaly detection model to be trained.

S360：将所述收敛的待训练异常检测模型中的生成器网络作为目标异常检测模型。S360: Use the generator network in the converged anomaly detection model to be trained as a target anomaly detection model.

本实施例提供的一种模型生成方法，通过上述方式使得，在通过正常音频信息和异常音频信息的第一音频特征训练得到目标异常检测模型后，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。并且，通过判别器网络对初始异常检测模型进行调整，可以使得目标异常检测模型在异常音频训练数据缺乏的情况下也可以有较好的性能，即在对正常音频和异常音频的判别上可以有较高的准确率。并且，在本实施例中，通过判别器网络判别待训练异常检测模型中的生成器网络的音频特征的真假(第一音频特征的标签为真，第二音频特征的标签为假)，可以提升生成器网络对正常音频特征的获取能力，从而增大正常音频和异常音频各自对应的第一音频特征在通过生成器网络进行特征提取后的差异性，进一步地提升了待训练异常检测模型中的生成器网络的性能，也就是目标异常检测模型的性能。In the model generation method provided by this embodiment, after the target abnormality detection model is obtained by training the normal audio information and the first audio feature of the abnormal audio information, in the process of abnormality detection of the device to be detected, it is possible to The first audio feature of the device to be detected is input into the target abnormality detection model to perform abnormality detection on the device to be detected, which saves manpower and improves the efficiency of abnormality detection. Moreover, by adjusting the initial anomaly detection model through the discriminator network, the target anomaly detection model can also have better performance in the absence of abnormal audio training data, that is, it can distinguish between normal audio and abnormal audio. higher accuracy. In addition, in this embodiment, the discriminator network is used to discriminate the authenticity of the audio features of the generator network in the anomaly detection model to be trained (the label of the first audio feature is true, and the label of the second audio feature is false). Improve the generator network's ability to acquire normal audio features, thereby increasing the difference between the first audio features corresponding to normal audio and abnormal audio after feature extraction through the generator network, further improving the abnormality detection model to be trained. The performance of the generator network, that is, the performance of the target anomaly detection model.

请参阅图9，本申请提供的一种异常检测方法，应用于电子设备，所述方法包括：Please refer to FIG. 9, an anomaly detection method provided by this application is applied to an electronic device, and the method includes:

S410：获取待检测音频。S410: Acquire the audio to be detected.

其中，待检测音频可以为电力运行设备(发电机、电动机、变压器等)在运行时发出的声音，这个声音可以是电力设备因自身的内部构造或硬件条件而发出的。作为一种方式，可以周期性地通过音频采集设备获取待检测音频，从而使得可以对电力运行设备进行实时检测，以便当电力运行设备出现异常时可以及时发现并维护，避免安全隐患的发生。示例性的，可以每2s通过音频采集设备获取一次待检测音频。Wherein, the audio frequency to be detected may be the sound produced by the power operating equipment (generator, motor, transformer, etc.) during operation, and the sound may be produced by the power equipment due to its own internal structure or hardware conditions. As a way, the audio to be detected can be obtained periodically through the audio collection device, so that the power running device can be detected in real time, so that when the power running device is abnormal, it can be discovered and maintained in time to avoid the occurrence of potential safety hazards. Exemplarily, the audio to be detected may be acquired by the audio collection device every 2s.

S420：将所述待检测音频进行分帧、加窗、快速傅里叶变换，以得到所述待检测音频对应的第一音频特征，所述第一音频特征为所述待检测音频对应的语谱图。S420: Perform framing, windowing, and fast Fourier transform on the audio to be detected to obtain a first audio feature corresponding to the audio to be detected, where the first audio feature is a language corresponding to the audio to be detected Spectrum.

在本申请实施例中，语谱图可以为二维图像，语谱图的尺寸可以与快速傅里叶变换的点数和待检测音频分帧加窗后的帧数有关。其中，语谱图的第一维度尺寸可以通过公式：快速傅里叶变换点数/2+1得到，1可以表示直流分量；语谱图的第二维度尺寸可以通过公式：(采样率×音频时长-采样率×帧长)/(采样率×帧移)+1得到，其中，帧长和帧移的单位为s。示例性的，当采样率为16kHZ、帧长为25ms、帧移为10ms、快速傅里叶变换点数为512时，时长为2s的待检测音频可以得到257×198的语谱图。In this embodiment of the present application, the spectrogram may be a two-dimensional image, and the size of the spectrogram may be related to the number of points in the fast Fourier transform and the number of frames of the audio to be detected after sub-framing and windowing. Among them, the size of the first dimension of the spectrogram can be obtained by the formula: the number of fast Fourier transform points/2+1, and 1 can represent the DC component; the size of the second dimension of the spectrogram can be obtained by the formula: (sampling rate×audio duration -Sampling rate×frame length)/(sampling rate×frame shift)+1, where the unit of frame length and frame shift is s. Exemplarily, when the sampling rate is 16kHZ, the frame length is 25ms, the frame shift is 10ms, and the number of fast Fourier transform points is 512, a 257×198 spectrogram of the audio to be detected with a duration of 2s can be obtained.

S430：将所述第一音频特征输入目标异常检测模型中，获取所述目标异常检测模型输出的检测结果。S430: Input the first audio feature into a target abnormality detection model, and obtain a detection result output by the target abnormality detection model.

其中，作为一种方式，可以将待检测音频对应的语谱图输入目标异常检测模型中，该目标异常检测模型可以输出待检测音频是否为异常音频，若为异常音频，则表明待检测音频所对应的电力运行设备出现异常，需要对该电力设备进行故障排查；若为正常音频，则表明待检测音频所对应的电力运行设备处于正常工作状态。Among them, as a way, the spectrogram corresponding to the audio to be detected can be input into the target abnormality detection model, and the target abnormality detection model can output whether the audio to be detected is abnormal audio, if it is abnormal audio, it indicates the audio to be detected. If the corresponding power running device is abnormal, it is necessary to troubleshoot the power device; if the audio is normal, it means that the power running device corresponding to the audio to be detected is in a normal working state.

本实施例提供的一种异常检测方法，通过上述方式使得，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。In an abnormality detection method provided by this embodiment, in the process of performing abnormality detection on the device to be detected, the first audio feature of the device to be detected can be input into the target abnormality detection model to perform abnormality detection on the device to be detected, thereby saving energy. It saves manpower and improves the efficiency of anomaly detection.

请参阅图10，本申请提供的一种模型生成装置600，运行于电子设备，所述装置600包括：Please refer to FIG. 10 , a model generation apparatus 600 provided by the present application runs on an electronic device, and the apparatus 600 includes:

数据集获取单元610，用于获取训练数据集，所述训练数据集包括目标设备的多个音频信息各自的第一音频特征，所述多个音频信息包括正常音频信息和异常音频信息。The data set acquisition unit 610 is configured to acquire a training data set, where the training data set includes respective first audio features of multiple audio information of the target device, and the multiple audio information includes normal audio information and abnormal audio information.

初始异常检测模型获取单元620，用于通过所述训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型。The initial abnormality detection model obtaining unit 620 is configured to train the generator network to be trained by using the training data set, so as to use the converged generator network to be trained as the initial abnormality detection model.

目标异常检测模型获取单元630，用于通过判别器网络对所述初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。The target abnormality detection model obtaining unit 630 is configured to adjust the initial abnormality detection model through the discriminator network, so as to use the adjusted generator network as the target abnormality detection model.

其中，作为一种方式，数据集获取单元610具体用于获取目标设备的多个音频信息；对所述音频信息进行分帧、加窗、快速傅里叶变换，以得到所述音频信息对应的语谱图；将所述语谱图作为所述音频信息的第一音频特征。Wherein, as a way, the data set acquisition unit 610 is specifically configured to acquire a plurality of audio information of the target device; perform framing, windowing, and fast Fourier transform on the audio information, so as to obtain the corresponding audio information of the audio information. Spectrogram; using the spectrogram as the first audio feature of the audio information.

作为一种方式，所述生成器网络包括第一音频特征重构网络和特征提取网络，其中，所述第一音频特征重构网络包括第一编码器、LSTM以及解码器，所述特征提取网络包括第二编码器，初始异常检测模型获取单元620具体用于将所述训练数据集输入所述待训练生成器网络，得到所述待训练生成器网络的输出；基于所述输出、第一损失函数和第二损失函数对所述待训练生成器网络进行训练，以得到初始异常检测模型，其中，所述第一损失函数为所述第一编码器输出结果和所述第二编码器输出结果的差的绝对值，所述第二损失函数为所述第一音频特征与第二音频特征的差的绝对值，所述第二音频特征为所述解码器的输出结果。In one way, the generator network includes a first audio feature reconstruction network and a feature extraction network, wherein the first audio feature reconstruction network includes a first encoder, an LSTM and a decoder, and the feature extraction network Including a second encoder, the initial anomaly detection model obtaining unit 620 is specifically configured to input the training data set into the generator network to be trained, and obtain the output of the generator network to be trained; based on the output, the first loss function and a second loss function to train the generator network to be trained to obtain an initial anomaly detection model, wherein the first loss function is the output result of the first encoder and the output result of the second encoder The absolute value of the difference of the second loss function is the absolute value of the difference between the first audio feature and the second audio feature, and the second audio feature is the output result of the decoder.

作为一种方式，所述判别器网络包括第三编码器，目标异常检测模型获取单元630具体用于获取待训练异常检测模型，所述待训练异常检测模型包括所述初始异常检测模型和所述第三编码器；将所述训练数据集输入所述待训练异常检测模型，得到所述待训练异常检测模型的输出；基于所述输出、所述第一损失函数、所述第二损失函数和第三损失函数对所述待训练异常检测模型进行调整，得到收敛的待训练异常检测模型，其中，所述第三损失函数为第三音频特征与第四音频特征的差的绝对值，所述第三音频特征为所述第一音频特征对应的第三编码器输出结果，所述第四音频特征为所述第二音频特征对应的第三编码器输出结果；将所述收敛的待训练异常检测模型中的生成器网络作为目标异常检测模型。In one way, the discriminator network includes a third encoder, and the target anomaly detection model acquiring unit 630 is specifically configured to acquire an anomaly detection model to be trained, and the anomaly detection model to be trained includes the initial anomaly detection model and the a third encoder; input the training data set into the anomaly detection model to be trained, and obtain the output of the anomaly detection model to be trained; based on the output, the first loss function, the second loss function and A third loss function adjusts the anomaly detection model to be trained to obtain a converged anomaly detection model to be trained, wherein the third loss function is the absolute value of the difference between the third audio feature and the fourth audio feature, and the The third audio feature is the output result of the third encoder corresponding to the first audio feature, and the fourth audio feature is the output result of the third encoder corresponding to the second audio feature; The generator network in the detection model is used as the target anomaly detection model.

可选的，所述第一编码器、第二编码器和第三编码器结构相同。Optionally, the structures of the first encoder, the second encoder and the third encoder are the same.

请参阅图11，本申请提供的一种异常检测装置800，运行于电子设备，所述装置800包括：Please refer to FIG. 11 , an abnormality detection apparatus 800 provided by the present application operates on an electronic device, and the apparatus 800 includes:

检测音频获取单元810，用于获取待检测音频。The detection audio acquisition unit 810 is configured to acquire the audio to be detected.

第一音频特征获取单元820，用于将所述待检测音频进行分帧、加窗、快速傅里叶变换，以得到所述待检测音频对应的第一音频特征，所述第一音频特征为所述待检测音频对应的语谱图。The first audio feature acquisition unit 820 is used to perform framing, windowing, and fast Fourier transform on the audio to be detected, so as to obtain a first audio feature corresponding to the audio to be detected, where the first audio feature is The spectrogram corresponding to the audio to be detected.

检测结果获取单元830，用于将所述第一音频特征输入目标异常检测模型中，获取所述目标异常检测模型输出的检测结果。The detection result obtaining unit 830 is configured to input the first audio feature into the target abnormality detection model, and obtain the detection result output by the target abnormality detection model.

其中，作为一种方式，检测音频获取单元810具体用于周期性地获取待检测音频。Wherein, as a way, the detection audio acquisition unit 810 is specifically configured to periodically acquire the audio to be detected.

下面将结合图12对本申请提供的一种电子设备进行说明。An electronic device provided by the present application will be described below with reference to FIG. 12 .

请参阅图12，基于上述的模型生成方法、异常检测方法、装置，本申请实施例还提供的另一种可以执行前述模型生成方法、异常检测方法的电子设备100。电子设备100包括相互耦合的一个或多个(图中仅示出一个)处理器102、存储器104。其中，该存储器104中存储有可以执行前述实施例中内容的程序，而处理器102可以执行该存储器104中存储的程序。Referring to FIG. 12 , based on the above-mentioned model generation method, anomaly detection method, and apparatus, an embodiment of the present application further provides another electronic device 100 that can execute the aforementioned model generation method and anomaly detection method. The electronic device 100 includes one or more (only one is shown in the figure) a processor 102 and a memory 104 that are coupled to each other. Wherein, the memory 104 stores a program that can execute the content in the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104 .

其中，处理器102可以包括一个或者多个处理核。处理器102利用各种接口和线路连接整个电子设备100内的各个部分，通过运行或执行存储在存储器104内的指令、程序、代码集或指令集，以及调用存储在存储器104内的数据，执行电子设备100的各种功能和处理数据。可选地，处理器102可以采用数字信号处理(Digital Signal Processing，DSP)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)、可编程逻辑阵列(ProgrammableLogic Array，PLA)中的至少一种硬件形式来实现。处理器102可集成中央处理器(CentralProcessing Unit，CPU)、图像处理器(Graphics Processing Unit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器102中，单独通过一块通信芯片进行实现。The processor 102 may include one or more processing cores. The processor 102 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 104, and calling the data stored in the memory 104. Various functions of the electronic device 100 and processing data. Optionally, the processor 102 may use at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). implemented in hardware. The processor 102 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 102, and is implemented by a communication chip alone.

存储器104可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。存储器104可用于存储指令、程序、代码、代码集或指令集。存储器104可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储终端100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。The memory 104 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 104 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like.

请参考图13，其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读存储介质1000中存储有程序代码，所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 13 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. Program codes are stored in the computer-readable storage medium 1000, and the program codes can be invoked by a processor to execute the methods described in the above method embodiments.

计算机可读存储介质1000可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地，计算机可读存储介质800包括非易失性计算机可读存储介质(non-transitory computer-readable storage medium)。计算机可读存储介质1000具有执行上述方法中的任何方法步骤的程序代码1010的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码1010可以例如以适当形式进行压缩。The computer-readable storage medium 1000 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium. Computer readable storage medium 1000 has storage space for program code 1010 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 1010 may be compressed, for example, in a suitable form.

综上所述，本申请提供的一种模型生成方法、异常检测方法、装置以及电子设备，在获取包括正常音频信息和异常音频信息的第一音频特征的训练数据集后，通过该训练数据集对待训练生成器网络进行训练，以将收敛的待训练生成器网络作为初始异常检测模型，再通过判别器网络对初始异常检测模型进行调整，以将调整后的生成器网络作为目标异常检测模型。通过上述方式使得，在通过正常音频信息和异常音频信息的第一音频特征训练得到目标异常检测模型后，在对待检测设备进行异常检测的过程中，可以将待检测设备的第一音频特征输入目标异常检测模型对待检测设备进行异常检测，节省了人力，提高了异常检测效率。并且，通过判别器网络对初始异常检测模型进行调整，可以使得目标异常检测模型在异常音频训练数据缺乏的情况下也可以有较好的性能，即在对正常音频和异常音频的判别上可以有较高的准确率。To sum up, a model generation method, anomaly detection method, device and electronic device provided by the present application, after acquiring the training data set including the first audio features of normal audio information and abnormal audio information, through the training data set The generator network to be trained is trained to use the converged generator network to be trained as the initial anomaly detection model, and then the initial anomaly detection model is adjusted through the discriminator network to use the adjusted generator network as the target anomaly detection model. Through the above method, after the target anomaly detection model is obtained by training the first audio features of the normal audio information and the abnormal audio information, in the process of anomaly detection of the device to be detected, the first audio feature of the device to be detected can be input into the target The anomaly detection model performs anomaly detection on the equipment to be detected, which saves manpower and improves the efficiency of anomaly detection. Moreover, by adjusting the initial anomaly detection model through the discriminator network, the target anomaly detection model can also have better performance in the absence of abnormal audio training data, that is, it can distinguish between normal audio and abnormal audio. higher accuracy.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A model generation method applied to an electronic device, the method comprising:

acquiring a training data set, wherein the training data set comprises respective first audio features of a plurality of pieces of audio information of a target device, and the plurality of pieces of audio information comprise normal audio information and abnormal audio information;

training a generator network to be trained through the training data set to take the converged generator network to be trained as an initial anomaly detection model;

and adjusting the initial anomaly detection model through a discriminator network to take the adjusted generator network as a target anomaly detection model.

2. The method of claim 1, wherein the generator network to be trained comprises a first audio feature reconstruction network and a feature extraction network, wherein the first audio feature reconstruction network comprises a first encoder, an LSTM, and a decoder, and wherein the feature extraction network comprises a second encoder;

training the generator network to be trained through the training data set to take the converged generator network to be trained as an initial anomaly detection model, including:

inputting the training data set into the generator network to be trained to obtain the output of the generator network to be trained;

training the generator network to be trained based on the output, a first loss function and a second loss function to take the converged generator network to be trained as an initial anomaly detection model, wherein the first loss function is an absolute value of a difference between the first encoder output result and the second encoder output result, the second loss function is an absolute value of a difference between the first audio feature and a second audio feature, and the second audio feature is an output result of the decoder.

3. The method of claim 2, wherein the network of discriminators includes a third encoder, and wherein adjusting the initial anomaly detection model by the network of discriminators to have the adjusted generator network as a target anomaly detection model comprises:

acquiring an anomaly detection model to be trained, wherein the anomaly detection model to be trained comprises the initial anomaly detection model and the third encoder;

inputting the training data set into the anomaly detection model to be trained to obtain the output of the anomaly detection model to be trained;

adjusting the anomaly detection model to be trained based on the output, the first loss function, the second loss function and a third loss function to obtain a converged anomaly detection model to be trained, wherein the third loss function is an absolute value of a difference between a third audio feature and a fourth audio feature, the third audio feature is a third encoder output result corresponding to the first audio feature, and the fourth audio feature is a third encoder output result corresponding to the second audio feature;

and taking a generator network in the converged anomaly detection model to be trained as a target anomaly detection model.

4. The method of claim 3, wherein the first encoder, the second encoder, and the third encoder are identical in structure.

5. The method of claim 1, wherein obtaining a training data set that includes first audio features of respective audio information of a target device, the audio information including normal audio information and abnormal audio information, comprises:

acquiring a plurality of audio information of a target device;

performing framing, windowing and fast Fourier transform on the audio information to obtain a spectrogram corresponding to the audio information;

and taking the spectrogram as a first audio feature of the audio information.

6. An abnormality detection method applied to an electronic device, the method comprising:

acquiring audio to be detected;

performing framing, windowing and fast Fourier transform on the audio to be detected to obtain a first audio characteristic corresponding to the audio to be detected, wherein the first audio characteristic is a spectrogram corresponding to the audio to be detected;

inputting the first audio characteristic into a target abnormity detection model obtained by the method of any one of claims 1 to 5, and obtaining a detection result output by the target abnormity detection model.

7. The method according to claim 6, applied to an electronic device, wherein the acquiring the audio to be detected comprises:

the audio to be detected is acquired periodically.

8. An apparatus for model generation, operable on an electronic device, the apparatus comprising:

the data set acquisition unit is used for acquiring a training data set, wherein the training data set comprises first audio features of a plurality of pieces of audio information of the target equipment, and the plurality of pieces of audio information comprise normal audio information and abnormal audio information;

an initial anomaly detection model obtaining unit, configured to train a generator network to be trained through the training data set, so as to use the converged generator network to be trained as an initial anomaly detection model;

and the target anomaly detection model acquisition unit is used for adjusting the initial anomaly detection model through the discriminator network so as to take the adjusted generator network as a target anomaly detection model.

9. An anomaly detection apparatus, operable with an electronic device, the apparatus comprising:

the audio acquisition unit to be detected is used for acquiring audio to be detected;

the first audio characteristic acquisition unit is used for performing framing, windowing and fast Fourier transform on the audio to be detected to obtain a first audio characteristic corresponding to the audio to be detected, wherein the first audio characteristic is a spectrogram corresponding to the audio to be detected;

a detection result obtaining unit, configured to input the first audio feature into the target anomaly detection model obtained by the method according to any one of claims 1 to 5, and obtain a detection result output by the target anomaly detection model.

10. An electronic device comprising one or more processors and memory;

one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-7.

11. A computer-readable storage medium, having program code stored therein, wherein the method of any of claims 1-7 is performed when the program code is run.