CN115035912B

CN115035912B - Automatic annotation method of underwater acoustic signal samples based on MOC model

Info

Publication number: CN115035912B
Application number: CN202210644380.7A
Authority: CN
Inventors: 王红滨; 张帅; 张政超; 何鸣; 王勇; 周连科; 孙彧; 王念滨
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2024-04-26
Anticipated expiration: 2042-06-08
Also published as: CN115035912A

Abstract

The invention discloses an automatic labeling method of underwater sound signal samples based on MOC models, in particular relates to an automatic labeling method of underwater sound signal samples based on MOC models, which aims to solve the problems that the traditional underwater sound signal sample labeling adopts a manual method, is time-consuming and labor-consuming, has low economic benefit, is limited by professionals and has low labeling accuracy, and comprises the steps of collecting underwater sound signals as samples and calculating the acoustic characteristics of the underwater sound signal samples by utilizing acoustic models; establishing a MOC model, wherein the MOC model sequentially comprises a first convolution layer, a preferred convolution residual layer, a second convolution layer, a attention mechanism layer, a full connection layer and a classification layer, inputting acoustic features of underwater sound signal samples into the MOC model for training, and outputting marked underwater sound signal samples until loss converges to obtain a trained MOC model; and (3) carrying out the operation on the underwater sound signal sample to be marked to obtain the marked underwater sound signal sample. Belonging to the field of underwater sound signal labeling.

Description

Automatic annotation method of underwater acoustic signal samples based on MOC model

技术领域Technical Field

本发明涉及一种标注方法，具体涉及一种基于MOC模型的水下声音信号样本的自动标注方法，属于水下声音信号标注领域。The invention relates to a labeling method, in particular to an automatic labeling method for underwater sound signal samples based on a MOC model, and belongs to the field of underwater sound signal labeling.

背景技术Background technique

随着深度学习与强化学习的广泛应用，大家普遍接受将深度学习与强化学习应用于水声信号的模式识别任务中，那么此时就面临一个问题：如何处理水声信号样本的问题。由于水下声信号数据集非常少，并且数据集的数据量不够大以及数据集不够准确，所以这个问题严重影响着水声领域的发展。通过研究发现这个问题的主要根源是水下声信号标注的发展缓慢。传统的水声信号样本标注方法是使用人工进行标注，这种人工标注方法不仅费时费力，经济效益也不高，同时受标注人员专业性的限制，其标注的准确性往往也不能达到要求。With the widespread application of deep learning and reinforcement learning, it is generally accepted that deep learning and reinforcement learning are applied to the pattern recognition task of underwater acoustic signals. Then, a problem arises: how to process underwater acoustic signal samples. Since there are very few underwater acoustic signal data sets, the data volume of the data sets is not large enough, and the data sets are not accurate enough, this problem seriously affects the development of the underwater acoustic field. Through research, it is found that the main root of this problem is the slow development of underwater acoustic signal labeling. The traditional method of labeling underwater acoustic signal samples is to use manual labeling. This manual labeling method is not only time-consuming and labor-intensive, but also has low economic benefits. At the same time, due to the limitations of the professionalism of the labelers, the accuracy of the labeling often cannot meet the requirements.

发明内容Summary of the invention

本发明为了解决传统水声信号样本标注采用人工方法，不仅费时费力，经济效益低，还受专业性限制，标注准确性低的问题，进而提出了一种基于MOC模型的水声信号样本自动标注方法。In order to solve the problem that traditional underwater acoustic signal sample labeling adopts manual method, which is not only time-consuming and labor-intensive, but also has low economic benefits and is limited by professionalism and has low labeling accuracy, the present invention proposes an automatic labeling method for underwater acoustic signal samples based on MOC model.

本发明采取的技术方案是：The technical solution adopted by the present invention is:

它包括以下步骤：It includes the following steps:

S1、采集水声信号作为样本，利用声学模型计算所述水声信号样本的声学特征；S1. Collecting an underwater acoustic signal as a sample, and calculating the acoustic characteristics of the underwater acoustic signal sample using an acoustic model;

S2、建立MOC模型，MOC模型依次包括卷积层一、优选卷积残差层、卷积层二、注意力机制层、全连接层和分类层，将所述水声信号样本的声学特征输入MOC模型内进行训练，输出已标注的水声信号样本，直到loss收敛，得到训练好的MOC模型；S2, establishing a MOC model, which includes a convolutional layer 1, a preferred convolutional residual layer, a convolutional layer 2, an attention mechanism layer, a fully connected layer and a classification layer in sequence, inputting the acoustic features of the underwater acoustic signal sample into the MOC model for training, and outputting the labeled underwater acoustic signal sample until the loss converges to obtain a trained MOC model;

S3、将待标注的水声信号样本执行S1-S2，得到已标注的水声信号样本。S3. Execute S1-S2 on the underwater acoustic signal sample to be labeled to obtain a labeled underwater acoustic signal sample.

优选的，所述S1中声学模型包括高斯混合模型、隐马尔可夫模型。Preferably, the acoustic model in S1 includes a Gaussian mixture model and a hidden Markov model.

优选的，所述S2中建立MOC模型，MOC模型依次包括卷积层一、优选卷积残差层、卷积层二、注意力机制层、全连接层和分类层，将所述水声信号样本的声学特征输入MOC模型内进行训练，输出已标注的水声信号样本，直到loss收敛，得到训练好的MOC模型，具体过程为：Preferably, a MOC model is established in S2, and the MOC model includes a convolution layer 1, a preferably convolution residual layer, a convolution layer 2, an attention mechanism layer, a fully connected layer and a classification layer in sequence. The acoustic features of the underwater acoustic signal sample are input into the MOC model for training, and the labeled underwater acoustic signal sample is output until the loss converges to obtain a trained MOC model. The specific process is:

S21、将所述水声信号样本的声学特征输入MOC模型的卷积层一内，输出特征一；S21, inputting the acoustic features of the underwater acoustic signal sample into the convolution layer 1 of the MOC model, and outputting feature 1;

S22、将S21输出的特征一输入优选卷积残差层内，输出特征二；S22, input feature 1 output by S21 into the preferred convolution residual layer, and output feature 2;

S23、将S22输出的特征二输入卷积层二内，输出特征三；S23, input feature 2 output by S22 into convolutional layer 2, and output feature 3;

S24、将S23输出的特征三输入注意力机制层内，输出特征四；S24, input feature 3 output by S23 into the attention mechanism layer, and output feature 4;

S25、将S24输出的特征四输入全连接层内，输出特征五；S25, input feature 4 output by S24 into the fully connected layer, and output feature 5;

S26、将S25输出的特征五输入分类层内，输出已标注的水声信号样本。S26, input feature five outputted from S25 into the classification layer, and output the labeled underwater acoustic signal samples.

优选的，所述S22中所述优选卷积残差层依次包括第一卷积层、残差层、第二卷积层、注意力机制优选卷积层。Preferably, the preferred convolution residual layer in S22 includes a first convolution layer, a residual layer, a second convolution layer, and an attention mechanism preferred convolution layer in sequence.

优选的，所述S22中将S21输出的特征一输入优选卷积残差层内，输出特征二，具体过程为：Preferably, in S22, feature 1 outputted from S21 is inputted into a preferred convolution residual layer, and feature 2 is outputted. The specific process is as follows:

S221、将S21输出的特征一输入第一卷积层内，输出特征β；S221, input feature 1 output by S21 into the first convolutional layer, and output feature β;

S222、将S221输出的特征β输入残差层内，输出特征γ；S222, input the feature β output by S221 into the residual layer, and output the feature γ;

S223、将S222输出的特征γ输入第二卷积层内，输出特征δ；S223, input the feature γ output by S222 into the second convolutional layer, and output the feature δ;

S224、将S223输出的特征δ输入注意力机制优选卷积层内，输出特征ξ；S224, input the feature δ output by S223 into the preferred convolutional layer of the attention mechanism, and output the feature ξ;

S225、将S224输出的特征ξ与S21输出的特征一相乘，得到特征二。S225. Multiply the feature ξ output by S224 by the feature one output by S21 to obtain feature two.

优选的，所述S224中所述注意力机制优选卷积层依次包括全局平均池化层(GAP)、一维卷积层、优选卷积层一、优选卷积层二、优选卷积层三。Preferably, the attention mechanism in S224 preferably includes, in sequence, a global average pooling layer (GAP), a one-dimensional convolution layer, preferably a convolution layer one, preferably a convolution layer two, and preferably a convolution layer three.

优选的，所述S224中将S223输出的特征δ输入注意力机制优选卷积层内，输出特征ξ，具体过程为：Preferably, in S224, the feature δ outputted from S223 is inputted into the attention mechanism preferably convolutional layer, and the feature ξ is outputted. The specific process is:

S2241、将S223输出的特征δ输入全局平均池化层内，输出特征a；S2241, input the feature δ output by S223 into the global average pooling layer, and output feature a;

S2242、将S2241输出的特征a输入一维卷积层内，输出特征b；S2242, input feature a output by S2241 into a one-dimensional convolutional layer, and output feature b;

S2243、将S2242输出的特征b输入优选卷积层一内，输出特征c；S2243, input feature b output by S2242 into the preferred convolutional layer 1, and output feature c;

S2244、将S2243输出的特征c输入优选卷积层二内，输出特征d；S2244, input feature c output by S2243 into the preferred convolutional layer 2, and output feature d;

S2245、将S2243输出的特征c和S2244输出的特征d输入优选卷积层三内，输出特征e；S2245, input feature c output by S2243 and feature d output by S2244 into the preferred convolutional layer 3, and output feature e;

S2246、将S2243输出的特征c、S2244输出的特征d和S2245输出的特征e进行聚合，得到聚合的特征，利用Sigmoid对聚合的特征进行处理，输出特征f；S2246, aggregate feature c output by S2243, feature d output by S2244, and feature e output by S2245 to obtain aggregated features, process the aggregated features using Sigmoid, and output feature f;

S2247、将S2246输出的特征f与S223输出的特征δ相加，得到特征ξ。S2247. Add the feature f output by S2246 and the feature δ output by S223 to obtain the feature ξ.

优选的，所述S2243中将S2242输出的特征b输入优选卷积层一内，输出特征c，具体过程为：Preferably, in S2243, the feature b output by S2242 is input into the preferred convolutional layer 1 to output feature c. The specific process is as follows:

Ⅰ、在优选卷积层一内设置A个不同大小的卷积核，并根据S1中所述水声信号样本的声学特征的维度设置优选卷积层一中卷积核的最大尺寸：Ⅰ. Set A convolution kernels of different sizes in the preferred convolution layer 1, and set the maximum size of the convolution kernel in the preferred convolution layer 1 according to the dimension of the acoustic characteristics of the underwater acoustic signal sample described in S1:

其中，C表示所述水声信号样本的声学特征的维度，所述声学特征为梅尔频谱；Wherein, C represents the dimension of the acoustic feature of the underwater acoustic signal sample, and the acoustic feature is a Mel spectrum;

K表示卷积核尺寸，k为奇数，且为整数；K represents the convolution kernel size, k is an odd number and an integer;

Ⅱ、利用卷积核优选算法选择A个卷积核中最优的卷积核，具体过程为：Ⅱ. Use the convolution kernel optimization algorithm to select the best convolution kernel among A convolution kernels. The specific process is as follows:

计算A个卷积核的输出，利用相似度算法得到相似度最高的两个卷积核，选取所述两个卷积核中尺寸大的卷积核作为优选卷积层一最优的卷积核，得到具有最优卷积核的优选卷积层一；Calculate the outputs of A convolution kernels, use a similarity algorithm to obtain the two convolution kernels with the highest similarity, select the convolution kernel with a larger size among the two convolution kernels as the optimal convolution kernel of the preferred convolution layer 1, and obtain the preferred convolution layer 1 with the optimal convolution kernel;

Ⅲ、将S2242输出的特征b输入具有最优卷积核的优选卷积层一内，得到特征c。III. Input feature b output by S2242 into the preferred convolution layer 1 with the optimal convolution kernel to obtain feature c.

优选的，所述S2244中将S2243输出的特征c输入优选卷积层二内，输出特征d，具体过程为：Preferably, in S2244, the feature c outputted by S2243 is inputted into the preferred convolutional layer 2 to output feature d. The specific process is as follows:

所述优选卷积层二的卷积核数量、卷积核最大尺寸、最优卷积核与所述优选卷积层一的卷积核数量、卷积核最大尺寸、最优卷积核相同，将Ⅲ中得到的特征c输入优选卷积层二内，输出特征d。The number of convolution kernels, the maximum size of the convolution kernel, and the optimal convolution kernel of the preferred convolution layer 2 are the same as those of the preferred convolution layer 1. The feature c obtained in III is input into the preferred convolution layer 2, and the feature d is output.

优选的，所述S2245中将S2243输出的特征c和S2244输出的特征d输入优选卷积层三内，输出特征e，具体过程为：Preferably, in S2245, the feature c output by S2243 and the feature d output by S2244 are input into the preferred convolution layer 3 to output feature e. The specific process is as follows:

所述优选卷积层三的卷积核数量、卷积核最大尺寸、最优卷积核与所述优选卷积层一的卷积核数量、卷积核最大尺寸、最优卷积核相同，将Ⅲ中得到的特征c和S2244输出的特征d输入优选卷积层三内，输出特征e。The number of convolution kernels, the maximum size of the convolution kernel, and the optimal convolution kernel of the preferred convolution layer three are the same as those of the preferred convolution layer one. The feature c obtained in III and the feature d output by S2244 are input into the preferred convolution layer three, and the feature e is output.

有益效果：Beneficial effects:

本发明先利用现有的声学模型计算水声信号样本的声学特征，为下一步提取水下声音信号样本的特征嵌入做准备；再利用优选卷积、注意力机制和多层特征融合的方法建立MOC模型，MOC模型依次包括卷积层一、优选卷积残差层、卷积层二、注意力机制层、全连接层和分类层；所述优选卷积残差层依次包括第一卷积层、残差层、第二卷积层、注意力机制优选卷积层；所述注意力机制优选卷积层依次包括全局平均池化层(GAP)、一维卷积层、优选卷积层一、优选卷积层二、优选卷积层三。根据MOC模型提取所述水声信号样本的声学特征的特征嵌入，使得提取的特征嵌入更加全面，更具有代表性，最后用MOC模型的分类器(分类器函数Softmax)对水声信号样本进行分类和标注，得到水声信号样本中的声信号属于正常信号，还是噪音信号。The present invention first uses the existing acoustic model to calculate the acoustic features of the underwater acoustic signal sample, in preparation for the next step of extracting the feature embedding of the underwater sound signal sample; then the MOC model is established by using the method of optimal convolution, attention mechanism and multi-layer feature fusion, the MOC model includes convolution layer one, optimal convolution residual layer, convolution layer two, attention mechanism layer, full connection layer and classification layer in sequence; the preferred convolution residual layer includes the first convolution layer, residual layer, second convolution layer, attention mechanism preferred convolution layer in sequence; the attention mechanism preferred convolution layer includes global average pooling layer (GAP), one-dimensional convolution layer, optimal convolution layer one, optimal convolution layer two, and optimal convolution layer three in sequence. The feature embedding of the acoustic features of the underwater acoustic signal sample is extracted according to the MOC model, so that the extracted feature embedding is more comprehensive and more representative, and finally the classifier (classifier function Softmax) of the MOC model is used to classify and label the underwater acoustic signal sample to obtain whether the sound signal in the underwater acoustic signal sample belongs to a normal signal or a noise signal.

本发明根据水声信号样本的声学特征利用MOC模型提取所述声学特征的特征嵌入，再对得到的特征嵌入利用分类器函数Softmax自动进行分类和标注，代替了传统的人工标注方法，不仅省时省力，还提高了经济效益，同时又不受专业性限制，提高了标注的准确性。The present invention uses the MOC model to extract feature embedding of the acoustic features according to the acoustic features of the underwater acoustic signal samples, and then automatically classifies and labels the obtained feature embedding using the classifier function Softmax, replacing the traditional manual labeling method, which not only saves time and labor, but also improves economic benefits. At the same time, it is not restricted by professionalism and improves the accuracy of labeling.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是MOC模型的结构图；Figure 1 is a structural diagram of the MOC model;

图2是优选卷积残差层OCA-Res2Block的结构图；FIG2 is a structural diagram of the preferred convolutional residual layer OCA-Res2Block;

图3是注意力机制优选卷积层OCA-Block的结构图；Figure 3 is a structural diagram of the attention mechanism's preferred convolutional layer OCA-Block;

具体实施方式Detailed ways

具体实施方式一：结合图1-图3说明本实施方式，本实施方式所述一种基于MOC模型的水声信号样本自动标注方法，它包括以下步骤：Specific implementation method 1: This implementation method is described in conjunction with Figures 1 to 3. This implementation method describes an automatic annotation method for underwater acoustic signal samples based on the MOC model, which includes the following steps:

S1、采集水声信号作为样本，利用声学模型计算所述水声信号样本的声学特征；S1, collecting an underwater acoustic signal as a sample, and calculating the acoustic characteristics of the underwater acoustic signal sample using an acoustic model;

采取水下声音信号(简称水声信号)作为样本，利用传统的声学模型提取水下声音信号样本的声学特征，为下一步提取水下声音信号样本的特征嵌入做准备。水声信号包括船舶航行时产生的声信号，所述声学模型包括高斯混合模型、隐马尔可夫模型；Underwater sound signals (referred to as underwater acoustic signals) are taken as samples, and the acoustic features of underwater sound signal samples are extracted using traditional acoustic models, in preparation for the next step of extracting feature embedding of underwater sound signal samples. The underwater acoustic signals include the sound signals generated when the ship is sailing, and the acoustic models include Gaussian mixture models and hidden Markov models;

S2、建立MOC模型，MOC模型依次包括卷积层一、优选卷积残差层、卷积层二、注意力机制层、全连接层和分类层，将所述水声信号样本的声学特征输入MOC模型内进行训练，输出已标注的水声信号样本，直到loss收敛，得到训练好的MOC模型，具体过程为：S2. Establish a MOC model. The MOC model includes a convolutional layer 1, a preferred convolutional residual layer, a convolutional layer 2, an attention mechanism layer, a fully connected layer, and a classification layer in sequence. The acoustic features of the underwater acoustic signal sample are input into the MOC model for training, and the labeled underwater acoustic signal sample is output until the loss converges to obtain a trained MOC model. The specific process is as follows:

MOC模型又称多层优选卷积网络模型，它是将多层特征融合和注意力机制相结合的模型，结构图如图1所示，其中，Conv1D是指一维卷积；BN代表批处理归一化；ReLU为激活函数；FC表示全连接层；C、T分别为输入的大小；k为卷积核大小；d为空洞卷积的膨胀率；S为类别数。The MOC model is also known as the multi-layer optimal convolutional network model. It is a model that combines multi-layer feature fusion and attention mechanism. The structure diagram is shown in Figure 1, where Conv1D refers to one-dimensional convolution; BN stands for batch normalization; ReLU is the activation function; FC represents the fully connected layer; C and T are the input sizes respectively; k is the convolution kernel size; d is the expansion rate of the void convolution; S is the number of categories.

S21、将所述水声信号样本的声学特征输入MOC模型的卷积层一内，输出特征一，具体过程为：S21, input the acoustic features of the underwater acoustic signal sample into the convolution layer 1 of the MOC model, and output feature 1. The specific process is:

在卷积层一内利用激活函数对所述水声信号样本的声学特征进行处理，得到处理后的声学特征，利用激活函数加入非线性因素，用于提升MOC模型的表达能力；再对处理后的声学特征进行批处理归一化，得到初步卷积处理后的低维特征(特征一)，低维特征是非线性特征。利用批处理归一化将处理后的声学特征限定到一个未定的范围中，在一定程度上提升MOC模型的泛化能力，批处理归一化又称批量数据归一化处理或批量标准化处理归一化。卷积层一的卷积核大小为5，空洞卷积的膨胀率为1。In the convolution layer 1, the acoustic features of the underwater acoustic signal sample are processed using an activation function to obtain processed acoustic features. Nonlinear factors are added using the activation function to improve the expressive power of the MOC model. The processed acoustic features are then batch normalized to obtain low-dimensional features (feature 1) after preliminary convolution processing. The low-dimensional features are nonlinear features. Batch normalization is used to limit the processed acoustic features to an undetermined range, which improves the generalization ability of the MOC model to a certain extent. Batch normalization is also called batch data normalization or batch standardization. The convolution kernel size of the convolution layer 1 is 5, and the expansion rate of the dilated convolution is 1.

S22、将S21输出的特征一输入优选卷积残差层内，输出特征二，具体过程为：S22, input feature 1 output by S21 into the preferred convolution residual layer, and output feature 2. The specific process is:

优选卷积残差层(OCA-Res2Block)由一个Res2Block层和一个基于注意力的优选卷积层(OCA-Block)组成，它是将注意力机制和残差思想进行结合。优选卷积残差层的卷积核大小为3，空洞卷积的膨胀率为2。优选卷积残差层依次包括第一卷积层、残差层(Res2Conv1D)、第二卷积层、注意力机制优选卷积层(OCA-Block)，如图2所示。The preferred convolution residual layer (OCA-Res2Block) consists of a Res2Block layer and an attention-based preferred convolution layer (OCA-Block), which combines the attention mechanism with the residual idea. The convolution kernel size of the preferred convolution residual layer is 3, and the dilation rate of the dilated convolution is 2. The preferred convolution residual layer includes the first convolution layer, the residual layer (Res2Conv1D), the second convolution layer, and the attention mechanism preferred convolution layer (OCA-Block), as shown in Figure 2.

S221、将S21输出的特征一输入第一卷积层内，输出特征β，具体过程为：S221, input the feature 1 output by S21 into the first convolutional layer, and output feature β. The specific process is:

在第一卷积层内利用激活函数对S21输出的低维特征(特征一)进行处理，得到处理后的低维特征，再对处理后的低维特征进行批处理归一化，得到特征β，低维特征经过上述卷积处理，将低维特征进行了升维，减少了低维特征的参数。In the first convolutional layer, the activation function is used to process the low-dimensional features (feature one) output by S21 to obtain the processed low-dimensional features, and then the processed low-dimensional features are batch normalized to obtain feature β. The low-dimensional features are processed by the above convolution to increase the dimension of the low-dimensional features and reduce the parameters of the low-dimensional features.

S222、将S221输出的特征β输入残差层内，输出特征γ，具体过程为：S222, input the feature β output by S221 into the residual layer, and output the feature γ. The specific process is:

在残差层(Res2 Conv1D)内利用激活函数对S221输出的特征β进行处理，得到处理后的特征β，特征β是非线性特征，再对处理后的特征β进行批处理归一化，得到具有不同粒度的特征γ(特征γ)。In the residual layer (Res2 Conv1D), the feature β output by S221 is processed by an activation function to obtain the processed feature β, which is a nonlinear feature. The processed feature β is then batch normalized to obtain a feature γ (feature γ) with different granularities.

S223、将S222输出的特征γ输入第二卷积层内，输出特征δ，具体过程为：S223, input the feature γ output by S222 into the second convolutional layer, and output the feature δ. The specific process is:

在第二卷积层内利用激活函数对S222输出的具有不同粒度的特征γ(特征γ)进行处理，得到处理后的具有不同粒度的特征γ，再对处理后的具有不同粒度的特征γ进行批处理归一化，得到非线性的高维特征δ(特征δ)。In the second convolutional layer, the activation function is used to process the feature γ (feature γ) with different granularities output by S222 to obtain the processed feature γ with different granularities, and then the processed feature γ with different granularities is batch normalized to obtain a nonlinear high-dimensional feature δ (feature δ).

S224、将S223输出的特征δ输入注意力机制优选卷积层内，输出特征ξ，具体过程为：S224, input the feature δ output by S223 into the preferred convolutional layer of the attention mechanism, and output the feature ξ. The specific process is:

根据文章Squeeze-and-Excitation Networks中记载的SE-Block模型可知，SE-Block模型是通过使用挤压与激励操作得到不同通道的权重，以此让SE-Block模型自主的判定哪些特征嵌入是重要的，哪些是不重要的。但是SE-Block模型会导致部分重要的特征嵌入丢失，降低SE-Block模型的性能。所以本发明使用一个一维卷积层取代SE-Block中的第一个全连接层，不仅减少了使用全连接层带来巨大的参数量，还能够有效的减少参数计算量，同时提升了SE-Block模型的计算速度，减少了运算时间。又在一维卷积层之后依次加入了三层优选卷积层(OC1d-Layer)，提出了注意力机制优选卷积层(OCA-Block)。即注意力机制优选卷积层依次包括全局平均池化层(GAP)、一维卷积层、优选卷积层一、优选卷积层二、优选卷积层三。如图3所示，k为卷积核大小，n为奇数。According to the SE-Block model recorded in the article Squeeze-and-Excitation Networks, the SE-Block model obtains the weights of different channels by using squeezing and excitation operations, so that the SE-Block model can autonomously determine which feature embeddings are important and which are not important. However, the SE-Block model will cause the loss of some important feature embeddings, reducing the performance of the SE-Block model. Therefore, the present invention uses a one-dimensional convolutional layer to replace the first fully connected layer in SE-Block, which not only reduces the huge amount of parameters brought by the use of the fully connected layer, but also can effectively reduce the amount of parameter calculation, while improving the calculation speed of the SE-Block model and reducing the operation time. Three layers of preferred convolutional layers (OC1d-Layer) are added in sequence after the one-dimensional convolutional layer, and an attention mechanism preferred convolutional layer (OCA-Block) is proposed. That is, the attention mechanism preferred convolutional layer includes a global average pooling layer (GAP), a one-dimensional convolutional layer, a preferred convolutional layer one, a preferred convolutional layer two, and a preferred convolutional layer three in sequence. As shown in Figure 3, k is the convolution kernel size and n is an odd number.

本发明在每层优选卷积层(OC1d-Layer)中均设置了多个不同大小的卷积核，根据经过声学模型处理后的水声信号样本的声学特征的维度设置每层OC1d-Layer的最大卷积核尺寸，利用卷积核优选算法选择每层OC1d-Layer中最能够表示水声信号特征的卷积核(最优的卷积核)，每层优选卷积层的卷积核总数量、卷积核最大尺寸、最优卷积核均一致，再通过现有的优选卷积操作在声学特征中进行小范围内跨信道交互，以便获取更优、更具有代表性的特征嵌入。同时考虑到不同的OC1d-Layer处理后得到的水声信号特征信息不同，因此采用多层特征融合思想，将多层OC1d-Layer的输出进行聚合。The present invention sets a plurality of convolution kernels of different sizes in each preferred convolution layer (OC1d-Layer), sets the maximum convolution kernel size of each OC1d-Layer according to the dimension of the acoustic features of the underwater acoustic signal sample after being processed by the acoustic model, and uses the convolution kernel optimization algorithm to select the convolution kernel (optimal convolution kernel) that can best represent the characteristics of the underwater acoustic signal in each OC1d-Layer. The total number of convolution kernels, the maximum size of the convolution kernel, and the optimal convolution kernel of each preferred convolution layer are all consistent, and then the existing preferred convolution operation is used to perform cross-channel interaction in a small range in the acoustic features, so as to obtain better and more representative feature embedding. At the same time, considering that the characteristic information of the underwater acoustic signal obtained after different OC1d-Layer processing is different, the multi-layer feature fusion idea is adopted to aggregate the outputs of the multi-layer OC1d-Layer.

S2242、将S2241输出的特征a输入一维卷积层内，输出参数更少的特征b；所述一维卷积层内的卷积核大小为1。S2242. Input feature a output by S2241 into a one-dimensional convolutional layer to output feature b with fewer parameters; the convolution kernel size in the one-dimensional convolutional layer is 1.

S2243、将S2242输出的特征b输入优选卷积层一内，输出特征c，具体过程为：S2243, input feature b output by S2242 into the preferred convolutional layer 1, and output feature c. The specific process is as follows:

本发明在优选卷积层一中设置有个卷积核，水声信号的声学特征为梅尔频谱，它的频域维度尺寸为80维，所以根据公式(1)可得卷积核最大的尺寸取k＝7。则所述优选卷积层一内的卷积核大小为k＝1，3，5，7。The present invention preferably has a convolutional layer 1. convolution kernels, the acoustic feature of the underwater acoustic signal is the Mel spectrum, and its frequency domain dimension is 80 dimensions, so according to formula (1), the maximum size of the convolution kernel is k = 7. Then the convolution kernel size in the preferred convolution layer 1 is k = 1, 3, 5, 7.

本发明针对不同尺寸的卷积核，计算个卷积核的输出，利用相似度算法找出相似度最优的两个卷积核，选取两个卷积核中尺寸大的卷积核作为优选卷积层一中最能够表示水声信号特征的卷积核(最优的卷积核)，因为尺寸大的卷积核在卷积之后产生的数据参数较少。后续的每层OC1d-Layer以此类推。本发明的相似度算法采用余弦相似度、汉明距离、斯皮尔曼相关系数。The present invention calculates convolution kernels of different sizes. The output of the convolution kernels is obtained by using a similarity algorithm to find the two convolution kernels with the best similarity. The convolution kernel with the larger size among the two convolution kernels is selected as the convolution kernel (optimal convolution kernel) that can best represent the characteristics of the underwater acoustic signal in the preferred convolution layer 1, because the convolution kernel with the larger size generates fewer data parameters after convolution. The same is true for each subsequent OC1d-Layer. The similarity algorithm of the present invention adopts cosine similarity, Hamming distance, and Spearman correlation coefficient.

S2244、将S2243输出的特征c输入优选卷积层二内，输出特征d，具体过程为：S2244, input the feature c output by S2243 into the preferred convolutional layer 2, and output feature d. The specific process is as follows:

S2245、将S2243输出的特征c和S2244输出的特征d输入优选卷积层三内，输出融合的特征e，具体过程为：S2245, input the feature c output by S2243 and the feature d output by S2244 into the preferred convolution layer 3, and output the fused feature e. The specific process is:

S2246、将S2243输出的特征c、S2244输出的特征d和S2245输出的特征e进行聚合，得到聚合的特征，利用Sigmoid对聚合的特征进行处理，输出具有多种特性的特征f；S2246, aggregate feature c output by S2243, feature d output by S2244, and feature e output by S2245 to obtain aggregated features, process the aggregated features using Sigmoid, and output feature f with multiple characteristics;

S225、将S224输出的特征ξ与S21输出的特征一相乘，得到特征二，具体过程为：S225, multiply the feature ξ output by S224 by the feature 1 output by S21 to obtain feature 2. The specific process is:

将S2246输出的特征ξ与S21输出的特征一相乘，得到特征二。Multiply the feature ξ output by S2246 by the feature one output by S21 to obtain feature two.

S23、将S22输出的特征二输入卷积层二内进行卷积和激活处理，输出特征三；S23, input feature 2 output by S22 into convolution layer 2 for convolution and activation processing, and output feature 3;

S24、将S23输出的特征三输入注意力机制层内，输出不同权重的特征四，具体过程为：S24, input feature 3 output by S23 into the attention mechanism layer, and output feature 4 with different weights. The specific process is as follows:

将S23输出的特征三输入注意力机制层内，通过注意力机制层内的注意力统计池化和批处理归一化，计算最终帧级特征(特征三)的均值和标准差，得到具有不同权重的特征四。Feature three output by S23 is input into the attention mechanism layer. Through attention statistical pooling and batch normalization in the attention mechanism layer, the mean and standard deviation of the final frame-level feature (feature three) are calculated to obtain feature four with different weights.

S25、将S24输出的特征四输入全连接层内进行全连接和批处理归一化，输出已标有分数的特征五；S25, input feature 4 output by S24 into the fully connected layer for full connection and batch normalization, and output feature 5 with a score;

将S25输出的已标有分数的特征五输入分类层内，按照水声信号样本的属性利用分类器函数(Softmax)对已标有分数的特征五进行分类标注，水声信号样本的属性包括水声信号样本中的声信号属于正常信号，还是噪音信号。即标注的信息为水声信号样本的属性。The feature 5 with the score output by S25 is input into the classification layer, and the feature 5 with the score is classified and labeled using the classifier function (Softmax) according to the attributes of the underwater acoustic signal sample. The attributes of the underwater acoustic signal sample include whether the acoustic signal in the underwater acoustic signal sample belongs to a normal signal or a noise signal. That is, the labeled information is the attribute of the underwater acoustic signal sample.

本发明利用MOC模型提取所述水声信号的声学特征的特征嵌入，利用优选卷积、注意力机制和多层特征融合的方法将水声信号进行提取全面的、具有代表性的特征，最后用分类器对水下声信号进行分类、标注，得到水声信号样本中的声信号属于正常信号，还是噪音信号。The present invention utilizes the MOC model to extract feature embedding of the acoustic features of the underwater acoustic signal, utilizes the method of optimizing convolution, attention mechanism and multi-layer feature fusion to extract comprehensive and representative features from the underwater acoustic signal, and finally uses a classifier to classify and label the underwater acoustic signal to obtain whether the acoustic signal in the underwater acoustic signal sample belongs to a normal signal or a noise signal.

S3、将待标注的水声信号执行S1-S2，得到已标注的水声信号样本。S3. Execute S1-S2 on the underwater acoustic signal to be labeled to obtain a labeled underwater acoustic signal sample.

Claims

1. The automatic underwater sound signal sample labeling method based on the MOC model is characterized by comprising the following steps of: it comprises the following steps:

S1, collecting an underwater sound signal as a sample, and calculating the acoustic characteristics of the underwater sound signal sample by using an acoustic model;

S2, establishing a MOC model, wherein the MOC model sequentially comprises a first convolution layer, a preferred convolution residual layer, a second convolution layer, an attention mechanism layer, a full connection layer and a classification layer, inputting acoustic features of the underwater sound signal samples into the MOC model for training, and outputting marked underwater sound signal samples until loss converges to obtain a trained MOC model;

S3, executing the S1-S2 on the underwater sound signal sample to be marked to obtain a marked underwater sound signal sample;

the acoustic model in the S1 comprises a Gaussian mixture model and a hidden Markov model;

establishing a MOC model in the step S2, wherein the MOC model sequentially comprises a first convolution layer, a preferred convolution residual layer, a second convolution layer, an attention mechanism layer, a full connection layer and a classification layer, acoustic features of the underwater sound signal samples are input into the MOC model for training, the marked underwater sound signal samples are output until loss converges, and the trained MOC model is obtained, wherein the specific process is as follows:

S21, inputting the acoustic characteristics of the underwater sound signal sample into a first convolution layer of an MOC model, and outputting a first characteristic;

S22, inputting the first feature output in the S21 into a preferable convolution residual layer, and outputting a second feature;

S23, inputting the second characteristic output by the S22 into the second convolution layer, and outputting a third characteristic;

s24, inputting the third feature output in the S23 into the attention mechanism layer, and outputting the fourth feature;

S25, inputting the feature IV output in the S24 into the full-connection layer, and outputting a feature V;

s26, inputting the characteristics output in the S25 into the classification layer, and outputting marked underwater sound signal samples;

the preferred convolution residual layer in the step S22 sequentially comprises a first convolution layer, a residual layer, a second convolution layer and an attention mechanism preferred convolution layer;

In the step S22, the first feature output in the step S21 is input into a preferred convolution residual layer, and the second feature is output, which specifically includes:

S221, inputting the feature I output in the S21 into a first convolution layer, and outputting a feature beta;

s222, inputting the characteristic beta output by the S221 into a residual layer, and outputting a characteristic gamma;

S223, inputting the characteristic gamma output by the S222 into a second convolution layer, and outputting the characteristic delta;

S224, inputting the feature delta output in the S223 into a preferably convolution layer of the attention mechanism, and outputting a feature xi;

S225, multiplying the characteristic xi output by the S224 with the characteristic I output by the S21 to obtain a characteristic II;

the attention mechanism preferred convolution layer in S224 includes, in order, a global average pooling layer (GAP), a one-dimensional convolution layer, a preferred convolution layer one, a preferred convolution layer two, and a preferred convolution layer three;

In the step S224, the feature δ output in the step S223 is input into a preferably convolution layer, and the specific process is as follows:

S2241, inputting the feature delta output in the S223 into the global average pooling layer, and outputting a feature a;

s2242, inputting the feature a output in the S2241 into the one-dimensional convolution layer, and outputting the feature b;

s2243, inputting the characteristic b output in the S2242 into the first preferable convolution layer and outputting the characteristic c;

S2244, inputting the characteristic c output in the S2243 into the second preferable convolution layer, and outputting the characteristic d;

s2245, inputting the feature c output by S2243 and the feature d output by S2244 into the preferable convolution layer III, and outputting the feature e;

S2246, aggregating the feature c output by S2243, the feature d output by S2244 and the feature e output by S2245 to obtain an aggregated feature, processing the aggregated feature by using Sigmoid, and outputting a feature f;

S2247, adding the characteristic f output by S2246 and the characteristic delta output by S223 to obtain a characteristic xi;

In the step S2243, the feature b output in the step S2242 is input into the preferred convolution layer one, and the feature c is output, which specifically includes:

setting A convolution kernels with different sizes in a preferred convolution layer I, and setting the maximum size of the convolution kernels in the preferred convolution layer I according to the dimension of the acoustic characteristic of the underwater sound signal sample in S1:

Wherein C represents a dimension of an acoustic feature of the hydroacoustic signal sample, the acoustic feature being a mel spectrum;

K represents the convolution kernel size, K is an odd number and is an integer;

II, selecting an optimal convolution kernel in the A convolution kernels by utilizing a convolution kernel optimization algorithm, wherein the specific process is as follows:

Calculating the output of the A convolution kernels, obtaining two convolution kernels with highest similarity by using a similarity algorithm, and selecting the convolution kernel with large size in the two convolution kernels as an optimal convolution kernel of the optimal convolution layer I to obtain the optimal convolution layer I with the optimal convolution kernel;

III, inputting the characteristic b output by the S2242 into a preferable convolution layer I with an optimal convolution kernel to obtain a characteristic c;

in the step S2244, the feature c output in the step S2243 is input into the second preferred convolution layer, and the feature d is output, which specifically includes:

The number of convolution kernels, the maximum size of convolution kernels and the optimal convolution kernels of the second preferred convolution layer are the same as those of the first preferred convolution layer, and the characteristic c obtained in the III is input into the second preferred convolution layer to output the characteristic d;

In the step S2245, the feature c output in the step S2243 and the feature d output in the step S2244 are input into the preferred convolution layer three, and the feature e is output, which specifically includes the steps of:

and the number of convolution kernels, the maximum size of the convolution kernels and the optimal convolution kernels of the preferred convolution layer III are the same as those of the preferred convolution layer I, the maximum size of the convolution kernels and the optimal convolution kernels, the characteristics c obtained in III and the characteristics d output by S2244 are input into the preferred convolution layer III, and the characteristics e are output.