CN116741204A

CN116741204A - An abnormal sound detection method based on hierarchical metadata information constraints

Info

Publication number: CN116741204A
Application number: CN202310768780.3A
Authority: CN
Inventors: 兰海燕; 关键; 魏玉明; 杨凯; 康金敏; 单俊
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2023-06-28
Filing date: 2023-06-28
Publication date: 2023-09-12

Abstract

The invention discloses an abnormal sound detection method based on hierarchical metadata information constraint, which comprises the steps of converting an audio waveform of sound to be detected into Log-Mel spectral frequency characteristics, and then inputting the Log-Mel spectral frequency characteristics into a pre-trained characteristic extractor to obtain advanced audio characteristicsComputing advanced audio featuresEach attribute group center c corresponding to a machine ID of a sound to be detected _m Selecting the minimum value as an anomaly score A, wherein M is the number of attribute groups under the corresponding machine ID, and judging that the sound to be detected is different when A is larger than a given threshold valueConstant sound; the attribute group center c _m The average value of the advanced audio characteristics obtained by a pre-trained characteristic extractor for the audio fragments of the training set; the invention designs a metadata information tree structure, fully utilizes metadata information to extract finer characteristics, effectively improves the performance of the abnormal sound detection system, and solves the problems of insufficient performance and low reliability of detection results of the existing industrial abnormal sound detection method under domain offset.

Description

An abnormal sound detection method based on hierarchical metadata information constraints

技术领域Technical field

本发明属于异音检测方法技术领域，涉及一种基于层次元数据信息约束的异常声音检测方法。The invention belongs to the technical field of abnormal sound detection methods, and relates to an abnormal sound detection method based on hierarchical metadata information constraints.

背景技术Background technique

异常声音检测(Anomalous SoundDetection，ASD)，目的是自动检测目标机器处于异常状况时发出的异常声音。Anomalous Sound Detection (ASD) aims to automatically detect abnormal sounds emitted by the target machine when it is in an abnormal condition.

随着深度学习技术在音频领域的不断发展，无监督异常声音检测中广泛采用自编码器架构。现有的方法通常会利用机械设备正常运转时的声音的对数梅尔频谱图(Log-MelSpectrogram)作为特征输入来训练自编码器模型，并通过输出与输入相同数量帧的对数梅尔频谱图作为重构特征来进行异常声音的检测。然而，在训练过程中自编码器只包含正常声音的约束条件，没有包含异常声音的信息，如果训练得到的特征不能很好地适用于异常声音，那么自编码器重构方法的有效性就会受到限制。With the continuous development of deep learning technology in the audio field, autoencoder architecture is widely used in unsupervised abnormal sound detection. Existing methods usually use the log-Mel Spectrogram of the sound when the mechanical equipment is operating normally as a feature input to train the autoencoder model, and output the log-Mel Spectrogram of the same number of frames as the input. The graph is used as reconstructed features to detect abnormal sounds. However, during the training process, the autoencoder only contains the constraints of normal sounds and does not contain information about abnormal sounds. If the features obtained by training are not well applicable to abnormal sounds, the effectiveness of the autoencoder reconstruction method will be reduced. restricted.

在神经网络的训练过程中，需要足够的标签数据来进行约束，但是获得异常数据本身在工业异常声音检测领域就是一个挑战。为解决这个问题，现有的自监督方法将无监督模型转化成有监督模型来更好地学习正常数据的紧凑表示。其中一个自监督分类的辅助任务通过训练一个分类器来预测每台机器的机器ID，并通过辨别伴随音频数据的机器ID作为标签来学习精细的正常声音特征，从而将它们与异常声音区分开来。如果分类器错误地分类了声音数据的机器ID，则将其视为异常。然而，由于现实世界存在域偏移问题，训练和测试数据的主要特征有时并不具有相似的分布，在实践中，异常声音检测的性能往往会受到限制。例如，机器操作条件或噪声类型的改变可能会导致源域(训练)和目标域(检测)之间的声学特性不同，因此使用来自源域声音训练的模型可能会错误地识别目标域中的异常声音。During the training process of neural networks, sufficient label data is needed for constraints, but obtaining abnormal data itself is a challenge in the field of industrial abnormal sound detection. To solve this problem, existing self-supervised methods convert unsupervised models into supervised models to better learn compact representations of normal data. One of the auxiliary tasks of self-supervised classification learns fine features of normal sounds by training a classifier to predict the machine ID of each machine and distinguishing them from abnormal sounds by discerning the machine IDs of accompanying audio data as labels. . If the classifier incorrectly classifies the machine ID of the sound data, it is considered an anomaly. However, due to the domain shift problem in the real world, the main features of training and test data sometimes do not have similar distributions, and the performance of abnormal sound detection is often limited in practice. For example, changes in machine operating conditions or noise types may cause acoustic properties to differ between the source domain (training) and the target domain (detection), so a model trained using sounds from the source domain may incorrectly identify anomalies in the target domain sound.

自监督分类方案采用机器ID作为音频文件的辅助标签，以进行特征学习，因为每个机器ID代表特定类型的域偏移。然而，除了机器类型和机器ID之外，声音还与各种属性相关联，例如机器的运行速度。因此，属性值的改变也是导致域偏移的原因之一，对于影响域偏移至关重要。仅使用机器ID可能不足以获得有助于表征域偏移的特征。而自监督机器属性分类考虑了工业机器属性对声学特性的影响，并将其作为自监督特征学习的辅助标签。然而，该系统并没有充分利用与音频文件本身相关联的元数据信息，因此提取的特征表示不足够精细。The self-supervised classification scheme adopts machine IDs as auxiliary labels of audio files for feature learning because each machine ID represents a specific type of domain offset. However, in addition to machine type and machine ID, sounds are also associated with various properties, such as how fast the machine is operating. Therefore, changes in attribute values are also one of the causes of domain drift and are crucial to affecting domain drift. Using machine ID alone may not be sufficient to obtain features that help characterize domain shifts. Self-supervised machine attribute classification considers the impact of industrial machine attributes on acoustic characteristics and uses it as an auxiliary label for self-supervised feature learning. However, this system does not make full use of the metadata information associated with the audio file itself, so the extracted feature representations are not refined enough.

发明内容Contents of the invention

针对上述现有技术，本发明要解决的技术问题是提供一种基于层次元数据信息约束的异常声音检测方法，解决机器的音频文件附带的元数据信息没有被充分利用，域偏移下自监督方法提取的特征表示不够精细的问题。In view of the above-mentioned existing technologies, the technical problem to be solved by the present invention is to provide an abnormal sound detection method based on hierarchical metadata information constraints, to solve the problem that the metadata information attached to the audio files of the machine is not fully utilized, and self-supervision under domain offset The feature representation extracted by this method is not precise enough.

为解决上述技术问题，本发明的一种基于层次元数据信息约束的异常声音检测方法，包括：In order to solve the above technical problems, the present invention provides an abnormal sound detection method based on hierarchical metadata information constraints, including:

将待检测声音的音频波形转换为Log-Mel谱频特征，然后输入至预先训练的特征提取器中，得到高级音频特征计算高级音频特征/>与待检测声音对应机器ID的每个属性组中心c_m的马氏距离，选取其中最小值作为异常分数A，/>M为对应机器ID下的属性组个数，当A大于给定阈值时，判定待检测声音为异常声音；Convert the audio waveform of the sound to be detected into Log-Mel spectral frequency features, and then input it into the pre-trained feature extractor to obtain advanced audio features Calculate advanced audio features/> The Mahalanobis distance of the center c _m of each attribute group corresponding to the machine ID of the sound to be detected is selected, and the minimum value is selected as the anomaly score A,/> M is the number of attribute groups under the corresponding machine ID. When A is greater than the given threshold, the sound to be detected is determined to be an abnormal sound;

所述属性组中心c_m为训练集音频片段经过预先训练的特征提取器得到的高级音频特征的平均值；The attribute group center c _m is the average value of the high-level audio features obtained by the pre-trained feature extractor of the training set audio clips;

所述特征提取器的训练过程包括：The training process of the feature extractor includes:

选取机器的一组正常声音片段作为训练集；Select a set of normal sound clips from the machine as a training set;

将每个机器ID对应的训练集音频片段中属性及属性的值均相同的音频片段划分为一个属性组，每个机器ID及对应的属性组构成层次元数据信息；Divide the audio clips with the same attributes and attribute values in the training set audio clips corresponding to each machine ID into an attribute group, and each machine ID and corresponding attribute group constitute hierarchical metadata information;

将训练集音频波形转换为Log-Mel谱频特征并送入特征提取器中，得到音频的低级特征f_l和高级特征f_h；Convert the audio waveform of the training set into Log-Mel spectral frequency features and send them to the feature extractor to obtain the low-level features f _l and high-level features f _h of the audio;

将低级特征f_l和高级特征f_h分别送入机器ID分类器C_ID和机器AG分类器C_AG中，分别得到机器ID分类器C_ID对机器ID辅助标签的预测值和机器AG分类器C_AG对机器AG辅助标签的预测值/>C_ID(·)表示机器ID分类器，C_AG(·)表示机器属性分类器；The low-level features f _l and high-level features f _h are respectively sent to the machine ID classifier C _ID and the machine AG classifier C _AG to obtain the predicted values of the machine ID auxiliary label by the machine ID classifier C _ID respectively. and the predicted value of machine AG auxiliary label by machine AG classifier C _AG /> C _ID (·) represents the machine ID classifier, C _AG (·) represents the machine attribute classifier;

利用总交叉熵损失函数L_Total训练特征提取器，得到训练后的特征提取器，L_Total＝λL_ID+(1-λ)L_AG，λ是设定的权重参数，L_ID表示预测值和层次元数据信息中机器ID标签l_ID之间的差异值的损失函数，L_AG表示预测值/>和层次元数据信息中机器属性组标签l_AG的差异值的损失函数。Use the total cross-entropy loss function L _Total to train the feature extractor to obtain the trained feature extractor, L _Total = λL _ID + (1-λ)L _AG , λ is the set weight parameter, and L _ID represents the predicted value and the loss function of the difference value between the machine ID tag l _ID in the hierarchical metadata information, L _AG represents the predicted value/> and the loss function of the difference value of the machine attribute group label l _AG in the hierarchical metadata information.

进一步的，所述特征提取器包括带有注意力机制的深度网络和二维卷积层，通过带有注意力机制的深度网络提取低级特征f_l，然后经过二维卷积层提取得到高级特征f_h。Further, the feature extractor includes a deep network with an attention mechanism and a two-dimensional convolution layer. The low-level features f _l are extracted through the deep network with the attention mechanism, and then the high-level features are extracted through the two-dimensional convolution layer. _fh .

进一步的，所述带有注意力机制的深度网络为MobileFaceNet。Furthermore, the deep network with attention mechanism is MobileFaceNet.

本发明的有益效果：Beneficial effects of the present invention:

1)针对现有的自监督方法对元数据信息利用不充分的问题，本发明设计了元数据信息树结构，充分利用元数据信息提取更精细的特征；1) In view of the problem that existing self-supervision methods do not fully utilize metadata information, the present invention designs a metadata information tree structure to fully utilize metadata information to extract more refined features;

2)本发明设计的层次元数据信息约束方法能够有效地提升异音检测系统的性能，解决现有工业异音检测方法在域偏移下性能不足，检测结果可信度低的问题。2) The hierarchical metadata information constraint method designed by the present invention can effectively improve the performance of the abnormal sound detection system and solve the problem of insufficient performance of existing industrial abnormal sound detection methods under domain offset and low credibility of detection results.

附图说明Description of drawings

图1是本发明总体技术路线图；Figure 1 is the overall technical roadmap of the present invention;

图2是本发明中构建的层次元数据信息结构图；Figure 2 is a hierarchical metadata information structure diagram constructed in the present invention;

图3是基于MobileFaceNet构建的骨干网络结构。Figure 3 is the backbone network structure built based on MobileFaceNet.

具体实施方式Detailed ways

下面结合说明书附图和实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and examples.

本发明一种基于层次元数据信息约束的异常声音检测方法，借助构建的每个机器的元数据信息树结构获取机器ID与属性之间的层次关系，本发明的技术方案是这样实现的：The present invention is an abnormal sound detection method based on hierarchical metadata information constraints, and obtains the hierarchical relationship between machine IDs and attributes with the help of the constructed metadata information tree structure of each machine. The technical solution of the present invention is implemented as follows:

步骤1：提取数据集中机器音频的Log-Mel谱频特征，同时利用机器ID和属性之间的层次关系，构建层次元数据信息结构；Step 1: Extract the Log-Mel spectrum characteristics of the machine audio in the data set, and use the hierarchical relationship between machine IDs and attributes to build a hierarchical metadata information structure;

步骤2：将Log-Mel谱频特征送入特征提取器中，得到低级和高级特征表示。用得到的特征表示和层次元数据信息结构训练模型；Step 2: Feed the Log-Mel spectral frequency features into the feature extractor to obtain low-level and high-level feature representations. Use the obtained feature representation and hierarchical metadata information structure to train the model;

步骤3：用训练好的模型对测试声音使用基于属性组中心的异常检测，判断声音是否异常。Step 3: Use the trained model to use anomaly detection based on the attribute group center on the test sound to determine whether the sound is abnormal.

进一步的，所述步骤1具体包含：Further, the step 1 specifically includes:

步骤101：使用快速傅里叶变换和梅尔滤波器组将音频波形转换为Log-Mel谱频特征；Step 101: Use fast Fourier transform and Mel filter bank to convert the audio waveform into Log-Mel spectral frequency characteristics;

步骤102：利用音频片段所对应的不同属性及其相应的值构建属性组(AttributeGroup，AG)；Step 102: Construct an attribute group (AttributeGroup, AG) using different attributes corresponding to the audio clips and their corresponding values;

步骤103：用机器ID和步骤102得到的属性组构建层次元数据信息结构，得到机器ID辅助标签l_ID和机器AG辅助标签l_AG，所述机器ID和机器属性层次关系用于学习与域偏移相关的特征。Step 103: Use the machine ID and the attribute group obtained in step 102 to construct a hierarchical metadata information structure, and obtain the machine ID auxiliary label l _ID and the machine AG auxiliary label l _AG . The hierarchical relationship between the machine ID and machine attributes is used for learning and domain bias. transfer-related features.

进一步的，所述步骤2具体包括：Further, the step 2 specifically includes:

步骤201：将步骤101得到的Log-Mel谱频特征送入特征提取器中，得到音频的低级和高级特征表示；Step 201: Send the Log-Mel spectral frequency features obtained in step 101 to the feature extractor to obtain low-level and high-level feature representations of the audio;

步骤202：将步骤201得到的低级和高级特征表示分别送入机器ID分类器C_ID和机器AG分类器C_AG中，分别得到机器ID分类器C_ID对机器ID辅助标签的预测值和机器AG分类器C_AG对机器AG辅助标签的预测值，其公式如下：和/> Step 202: Send the low-level and high-level feature representations obtained in step 201 to the machine ID classifier C _ID and machine AG classifier C _AG respectively to obtain the predicted values of the machine ID auxiliary label and machine AG by the machine ID classifier C _ID respectively. The predicted value of classifier C _AG for machine AG auxiliary label, its formula is as follows: and/>

步骤203：将步骤202得到的机器ID和机器AG的预测值分别送入训练总交叉熵损失函数L_Total，以训练基于HMIC的异常声音检测模型，公式如下：和/> Step 203: Send the predicted values of the machine ID and machine AG obtained in step 202 to the training total cross-entropy loss function L _Total to train the HMIC-based abnormal sound detection model. The formula is as follows: and/>

进一步的，所述步骤3具体包括：Further, the step 3 specifically includes:

步骤301：在训练过程中计算正常声音的每个AG的中心点，即属性组中心(Attribute Group Center，AGC)，AG的中心是模型在每个属性组中学到的音频特征的平均值，以检测存在域偏移的测试数据的异常，计算公式如下： Step 301: Calculate the center point of each AG of the normal sound during the training process, that is, the Attribute Group Center (AGC). The center of the AG is the average value of the audio features learned by the model in each attribute group. To detect anomalies in test data with domain offset, the calculation formula is as follows:

其中，c_m表示第m个属性组中心，表示从第n个训练音频片段的模型中得到的高级音频特征，n∈[1,N]。Among them, c _m represents the center of the m-th attribute group, Represents the high-level audio features obtained from the model of the nth training audio clip, n∈[1,N].

步骤302：将测试数据的特征表示与AGC的马氏距离作为异常分数，以衡量测试数据的音频特征表示与每个属性组中心c_m之间的相似度，马氏距离是一种考虑了相关性和协方差矩阵的距离度量方式，可用于评估未知样本与训练数据之间的相似度。测试样本到AGC的马氏距离越近表示该未知声音越可能属于正常数据，到AGC的马氏距离越远表示该未知声音越可能为异常数据。计算公式如下：/> Step 302: Use the Mahalanobis distance between the feature representation of the test data and the AGC as an anomaly score to measure the audio feature representation of the test data The similarity between the center c _m of each attribute group and the Mahalanobis distance is a distance measure that takes into account the correlation and covariance matrix and can be used to evaluate the similarity between unknown samples and training data. The closer the Mahalanobis distance between the test sample and the AGC is, the more likely the unknown sound is to be normal data. The further the Mahalanobis distance to the AGC is, the more likely the unknown sound is to be abnormal data. The calculation formula is as follows:/>

其中，A表示异常分数，Σ^-1是协方差矩阵∑的逆矩阵，并且∑由第m个属性组相同的节段下的所有音频片段的特征获得。Among them, A represents the anomaly score, Σ ^-1 is the inverse matrix of the covariance matrix Σ, and Σ is obtained from the features of all audio segments under the segment with the same m-th attribute group.

下面结合具体参数给出实施例。Examples are given below in conjunction with specific parameters.

结合图1，本发明提供了一种基于层次元数据信息约束的异常声音检测方法，具体包括如下步骤：Combined with Figure 1, the present invention provides an abnormal sound detection method based on hierarchical metadata information constraints, which specifically includes the following steps:

步骤1：step 1:

利用快速傅里叶变换和梅尔滤波器组从原始声音信号中提取Log-Mel谱频特征，其中帧大小设置为1024，帧的跳跃大小为512，梅尔滤波器组的数量为128。对于长达10秒的音频，至少需要生成313帧的对数梅尔频谱图特征，因此，输入的对数梅尔频谱图特征维度为313×128。Fast Fourier transform and Mel filter bank are used to extract Log-Mel spectral features from the original sound signal, where the frame size is set to 1024, the frame jump size is 512, and the number of Mel filter banks is 128. For 10 seconds of audio, at least 313 frames of log-mel spectrogram features need to be generated, so the input log-mel spectrogram feature dimension is 313×128.

之后构建层次元数据信息结构。图2是本发明中构建的层次元数据信息结构图，进一步利用机器ID和属性之间的层次关系。由于每个机器ID下的音频片段可能具有某些不同值的属性，因此将这些属性和相应的值一起分组为该机器ID下的一个AG，从而可以得到每个机器ID具有不同属性值的多个AG，得到机器ID辅助标签l_ID和机器AG辅助标签l_AG。构建了每个机器类型的元数据信息树结构，其中机器ID为信息树的节点，AG为信息树的叶。这种层次关系用作自监督学习中的约束以获得更精细的音频特征表示，其中机器ID表征低级特征学习的域偏移类型，属性组利用域偏移的声学特性用于高级特征学习。A hierarchical metadata information structure is then constructed. Figure 2 is a hierarchical metadata information structure diagram constructed in the present invention, further utilizing the hierarchical relationship between machine IDs and attributes. Since the audio clips under each machine ID may have some attributes with different values, these attributes and corresponding values are grouped together into an AG under the machine ID, so that multiple attributes with different attribute values for each machine ID can be obtained. AG, get the machine ID auxiliary label l _ID and the machine AG auxiliary label l _AG . The metadata information tree structure of each machine type is constructed, where the machine ID is the node of the information tree and AG is the leaf of the information tree. This hierarchical relationship is used as a constraint in self-supervised learning to obtain a more refined audio feature representation, where the machine ID characterizes the domain shift type for low-level feature learning, and the attribute group utilizes the acoustic properties of the domain shift for high-level feature learning.

以机器类型为玩具车为例，机器ID为00的玩具车包含4个不同属性值的属性，其中Car表示汽车型号，Spd表示汽车速度大小，Mic表示采集麦克风数量，Noise表示环境噪声等级，例如Car的值为A1、C2等，而Noise的值为1、2等。通过对这些属性及相应值进行分组，获得机器ID为00的玩具车的AG数量为11个，以及机器类型玩具车的全部机器ID的AG数量44个。因此使用针对所有机器类型的42个机器ID(即七种机器类型，每个具有六个机器ID)下的总共250个AG来构建机器ID与属性之间的层次关系。Taking the machine type as a toy car as an example, the toy car with machine ID 00 contains 4 attributes with different attribute values, where Car represents the car model, Spd represents the speed of the car, Mic represents the number of collecting microphones, and Noise represents the environmental noise level, for example Car's values are A1, C2, etc., while Noise's values are 1, 2, etc. By grouping these attributes and corresponding values, the number of AGs for the toy car with machine ID 00 is 11, and the number of AGs for all machine IDs of the machine type toy car is 44. A total of 250 AGs under 42 machine IDs for all machine types (i.e., seven machine types, each with six machine IDs) are therefore used to build the hierarchical relationship between machine IDs and attributes.

步骤2：Step 2:

将步骤1得到的Log-Mel谱频特征送入特征提取器中得到低级特征表示f_l和高级特征表示f_h：Send the Log-Mel spectrum and frequency features obtained in step 1 to the feature extractor to obtain low-level feature representation f _l and high-level feature representation f _h :

f_l＝F(X)f _l =F(X)

低级特征表示f_l通过骨干网络中的二维卷积层提取高级特征表示f_h：The low-level feature representation f _l extracts the high-level feature representation f _h through the two-dimensional convolutional layer in the backbone network:

f_h＝Conv2D(f_l)f _h =Conv2D(f _l )

其中，F(·)表示骨干网络中的特征提取器，Conv2D(·)表示骨干网络中的2D卷积层。图3展示了基于MobileFaceNet构建的骨干网络结构，所述骨干网络借助特征提取器模块和2D卷积层实现。此外，骨干网络的结构并不局限于上述结构，可替换为带有注意力机制的深度网络层。Among them, F(·) represents the feature extractor in the backbone network, and Conv2D(·) represents the 2D convolution layer in the backbone network. Figure 3 shows the backbone network structure built based on MobileFaceNet, which is implemented with the help of feature extractor modules and 2D convolutional layers. In addition, the structure of the backbone network is not limited to the above structure and can be replaced by a deep network layer with an attention mechanism.

步骤1得到的机器ID和机器属性层次关系用于学习与域偏移相关的特征，其借助低层次的机器ID约束低级特征学习，其借助高层次的机器AG约束高级特征学习：The hierarchical relationship between machine ID and machine attributes obtained in step 1 is used to learn features related to domain offset. It uses low-level machine ID to constrain low-level feature learning, and it uses high-level machine AG to constrain high-level feature learning:

步骤2得到的低级和高级特征表示分别送入机器ID分类器C_ID和机器AG分类器C_AG中，分别得到机器ID分类器C_ID对机器ID辅助标签的预测值和机器AG分类器C_AG对机器AG辅助标签的预测值：The low-level and high-level feature representations obtained in step 2 are respectively sent to the machine ID classifier C _ID and the machine AG classifier C _AG to obtain the predicted values of the machine ID auxiliary label by the machine ID classifier C _ID and the machine AG classifier C _AG respectively. Predicted value for machine AG-assisted tags:

步骤2得到的机器ID和机器AG的预测值分别送入训练总交叉熵损失函数L_Total，以训练基于HMIC的异常声音检测模型：The predicted values of machine ID and machine AG obtained in step 2 are respectively sent to the training total cross-entropy loss function L _Total to train the abnormal sound detection model based on HMIC:

L_Total＝λL_ID+(1-λ)L_AG；L _Total =λL _ID +(1-λ)L _AG ;

其中，λ是在训练期间经验性地选择的权重参数，权重参数λ针对每种机器类型进行调整，对于机器类型的所有机器ID都是相同的。其中，L_ID和L_ID分别为：where λ is a weight parameter chosen empirically during training. The weight parameter λ is adjusted for each machine type and is the same for all machine IDs of the machine type. Among them, L _ID and L _ID are:

其中，CE(·)表示交叉熵(Cross-Entropy，CE)损失函数Among them, CE(·) represents the cross-entropy (Cross-Entropy, CE) loss function

步骤3：Step 3:

计算每个属性组学习到的音频特征的平均值作为AGC来评估测试声音，AGC包含与域偏移相关的正常声音的声学特性，因此可以用来测量存在域偏移的测试样本的异常。异常分数为检测到的声音的音频特征表示与AGC的马氏距离。The average value of the audio features learned for each attribute group is calculated as the AGC to evaluate the test sound. The AGC contains the acoustic characteristics of normal sounds related to the domain shift, so it can be used to measure the anomalies of the test samples with domain shift. The anomaly score is the Mahalanobis distance between the audio feature representation of the detected sound and the AGC.

假设第m个属性组下有N个训练音频片段，标签为其中m∈[1,M]，M为对应机器ID下的属性组个数。Assume that there are N training audio clips under the m-th attribute group, and the labels are Where m∈[1,M], M is the number of attribute groups under the corresponding machine ID.

马氏距离来测量检测到的声音的音频特征表示与每个属性组中心c_m之间的相似度：Mahalanobis distance to measure audio feature representation of detected sounds Similarity to the center c _m of each attribute group:

其中，A表示异常分数，Σ^-1是协方差矩阵Σ的逆矩阵，并且Σ由第m个属性组相同的节段下的所有音频片段的特征获得。Among them, A represents the anomaly score, Σ ^-1 is the inverse matrix of the covariance matrix Σ, and Σ is obtained from the features of all audio segments under the same segment of the m-th attribute group.

本发明提供基于层次元数据信息约束的异常声音检测方法有力解决了现有异音检测方法在域偏移下性能不足的缺陷，表1和表2展示了传统方法与应用本发明提供策略后方法异音检测性能的AUC值标对比，借助于常用的异常检测评价指标AUC来反映源域和目标域中整体测试性能，表3展示了传统方法与应用本发明提供策略后方法异音检测性能的部分AUC(pAUC)指标对比，借助pAUC来说明异音检测方法在低误报率下整体测试性能，从而反映方法的实用性。The present invention provides an abnormal sound detection method based on hierarchical metadata information constraints, which effectively solves the defect of insufficient performance of existing abnormal sound detection methods under domain offset. Table 1 and Table 2 show the traditional methods and applications. The present invention provides a post-strategy method. AUC value scale comparison of abnormal sound detection performance uses the commonly used abnormality detection evaluation index AUC to reflect the overall test performance in the source domain and target domain. Table 3 shows the abnormal sound detection performance of the traditional method and the method after applying the strategy provided by the present invention. Partial AUC (pAUC) index comparison uses pAUC to illustrate the overall test performance of the noise detection method under low false alarm rate, thus reflecting the practicality of the method.

本发明提供的基于层次元数据信息约束的异常声音检测方法远超现有传统方法的异常检测性能和低误报率下的异常检测性能，取得更为出色的多个领域AUC表现和更强的pAUC表现，有力地说明了本发明策略的能在域偏移下出色完成异常声音检测，带来更良好的性能表现。The abnormal sound detection method based on hierarchical metadata information constraints provided by the present invention far exceeds the abnormality detection performance of existing traditional methods and the abnormality detection performance under low false alarm rate, and achieves better AUC performance in multiple fields and stronger The pAUC performance strongly illustrates that the strategy of the present invention can excellently complete abnormal sound detection under domain offset, bringing better performance.

表1Table 1

表2Table 2

表3table 3

Claims

1. An abnormal sound detection method based on hierarchical metadata information constraint is characterized by comprising the following steps:

the audio waveform of the sound to be detected is converted into Log-Mel spectral frequency characteristics, and then the Log-Mel spectral frequency characteristics are input into a pre-trained characteristic extractor to obtain advanced audio characteristicsComputing advanced audio features->Each attribute group center c corresponding to a machine ID of a sound to be detected _m Selecting the minimum value as the abnormality score A, < >>M is the number of attribute groups under the corresponding machine ID, and when A is greater than a given threshold value, the sound to be detected is judged to be abnormal sound;

the attribute group center c _m For training set audio clip channelAn average value of the advanced audio features obtained by the feature extractor trained in advance;

the training process of the feature extractor comprises the following steps:

selecting a group of normal sound clips of the machine as a training set;

dividing the audio clips with the same attribute and attribute value in the training set audio clips corresponding to each machine ID into an attribute group, wherein each machine ID and the corresponding attribute group form hierarchical metadata information;

converting the training set audio waveform into Log-Mel spectral frequency characteristics and sending the characteristics to a characteristic extractor to obtain low-level characteristics f of audio _l And advanced feature f _h ；

Will lower the level of features f _l And advanced feature f _h Respectively sent into a machine ID classifier C _ID And machine AG classifier C _AG Respectively obtain the machine ID classifier C _ID Predictive value for machine ID assisted tagsAnd machine AG classifier C _AG Predictive value for machine AG auxiliary tag +.>C _ID (. Cndot.) represents a machine ID classifier, C _AG (. Cndot.) represents a machine attribute classifier;

using a total cross entropy loss function L _Total Training the feature extractor to obtain a trained feature extractor, L _Total ＝λL _ID +(1-λ)L _AG Lambda is a set weight parameter, L _ID Representing predicted valuesAnd machine ID tag/in hierarchical metadata information _ID A loss function of the difference value between L _AG Representing predicted value +.>And hierarchical metadataMachine attribute group label in information _AG A loss function of the difference value of (2).

2. The abnormal sound detection method based on hierarchical metadata information constraint of claim 1, wherein: the feature extractor comprises a depth network with an attention mechanism and a two-dimensional convolution layer, and extracts low-level features f through the depth network with the attention mechanism _l Then extracting by a two-dimensional convolution layer to obtain an advanced feature f _h 。

3. The abnormal sound detection method based on hierarchical metadata information constraint of claim 1, wherein: the deep network with the attention mechanism is MobileFaceNet.