[go: up one dir, main page]

CN113488073A - Multi-feature fusion based counterfeit voice detection method and device - Google Patents

Multi-feature fusion based counterfeit voice detection method and device Download PDF

Info

Publication number
CN113488073A
CN113488073A CN202110762591.6A CN202110762591A CN113488073A CN 113488073 A CN113488073 A CN 113488073A CN 202110762591 A CN202110762591 A CN 202110762591A CN 113488073 A CN113488073 A CN 113488073A
Authority
CN
China
Prior art keywords
speech
features
fusion
feature
forged
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110762591.6A
Other languages
Chinese (zh)
Other versions
CN113488073B (en
Inventor
陈晋音
叶林辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110762591.6A priority Critical patent/CN113488073B/en
Publication of CN113488073A publication Critical patent/CN113488073A/en
Application granted granted Critical
Publication of CN113488073B publication Critical patent/CN113488073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于多特征融合的伪造语音检测方法及装置,包括:通过提取语音中的多种特征,通过特征缩放和特征平衡矩阵将提取的特征进行融合获得融合特征,融合特征尽可能的融合了语音中的特征,然后利用融合特征训练基于长短期记忆网络的伪造语音检测模型,实现对各种伪造语音方法生成的伪造语音的检测。

Figure 202110762591

The invention discloses a method and device for detecting fake speech based on multi-feature fusion. The features in the speech are fused, and then the forged speech detection model based on the long short-term memory network is trained by using the fusion features to realize the detection of forged speech generated by various forged speech methods.

Figure 202110762591

Description

Multi-feature fusion based counterfeit voice detection method and device
Technical Field
The invention belongs to the field of deep learning safety, and particularly relates to a method and a device for detecting forged voice based on multi-feature fusion.
Background
The voice counterfeiting technology is to generate the voice of a specific speaker through a certain technology, and compared with video counterfeiting, the voice counterfeiting technology has the characteristics of difficulty in finding and wide application range. Voiceprint locks such as WeChat can be broken by forged speech, which raises security issues in the aspects of property and privacy.
The forged voice can be synthesized by the techniques of parameter generation, waveform splicing, voice simulation generation and the like. Most methods of detecting parameter-generated speech falsification rely on specific parameters in the falsified speech for detection, and the dynamic variation of the parameters of the falsified speech generated based on the parameters is often smaller than that of the natural speech. Wherein, the high-order cepstrum coefficient reflecting the spectrum envelope details tends to be smooth in the hidden Markov model parameter training and generating process. Thus, the higher order cepstral components of the parameter-generated speech are less variable than natural speech. While this difference estimation provides a way to distinguish between real speech and parametric production speech, it is based on the full knowledge of a particular hidden markov speech parameter generation system. Therefore, the same countermeasures may not be applicable to other spurious voices generated by generators using different acoustic parameters.
Due to the simplicity of the waveform splicing type voice forging operation, the method is widely applied to voice forging. For spurious speech in the form of speech waveform concatenation, the detector compares the new access sample with the stored past access attempt instances. With the rapid development of deep learning, a voice anti-counterfeiting detection system based on deep learning starts to enter the sight of people, and detection of analog counterfeit voice can be realized. The neural network system based on deep learning is a special machine learning system, becomes a widely used model in recent years, and almost achieves the best results in time when being applied to various tasks such as biological recognition and the like. However, the existing technology for detecting the forged voice based on deep learning has the problems that the type of the forged voice detection is single, and only the voice generated by a certain forging method can be detected.
Disclosure of Invention
Aiming at the problem that the conventional forged voice detection method can only detect weak generalization of a certain forged voice, the invention provides a forged voice detection method based on multi-feature fusion, which realizes the balance construction of various features to construct fusion features by fusing a plurality of voice features and a feature balance matrix; and training a forged voice detection model based on the long-term and short-term memory network by using the constructed fusion characteristics, and realizing the detection of various forged voices.
In a first aspect, an embodiment of the present invention provides a method for detecting a forged voice based on multi-feature fusion, including the following steps:
acquiring the forged voice and the corresponding normal voice, and constructing labels of the forged voice and the normal voice;
the following processes are respectively performed for the forged voice and the normal voice: performing feature extraction on the voice to obtain multi-class features of different dimensions, respectively scaling the multi-class features of different dimensions to the same dimension to obtain multi-class base features, and then fusing the multi-class base features by using a special normal balance matrix to obtain fused features;
constructing a forged voice detection model consisting of a long-term and short-term memory network and a full-connection neural network, and performing supervised learning on the forged voice detection model by using the fusion characteristics and the label of the forged voice and the fusion characteristics and the label of the normal voice so as to optimize the model parameters and the abnormal balance matrix of the forged voice detection model;
when the method is applied, after the fusion characteristics of the voice to be detected are obtained by using the abnormal balance matrix determined by the fusion weight parameters, the fusion characteristics of the voice to be detected are detected by using the parameter-optimized forged voice detection model so as to output a detection result.
Preferably, the multi-class features obtained by feature extraction of the speech include: fundamental frequency, mel cepstrum coefficients, non-periodic components, mel frequency spectrum, energy spectrum, frequency spectrum, linear prediction coefficient, and linear prediction cepstrum coefficients.
Preferably, the multi-class features of different dimensions are respectively scaled to the same dimension by using a nearest neighbor interpolation method to obtain the multi-class features.
Preferably, the abnormal equilibrium matrix is composed of fusion weight parameters, and the fusion weight parameters are initialized by random numbers conforming to normal distribution.
Preferably, when the fake voice detection model is supervised and learned, a cross entropy loss function is adopted as an optimization target of the model.
In a second aspect, an embodiment further provides a device for detecting a forged voice based on multi-feature fusion, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the above-mentioned method for detecting a forged voice based on multi-feature fusion when executing the computer program.
The technical scheme provided by the embodiment has the excellent effects of at least comprising the following steps:
the method comprises the steps of extracting various features in voice, fusing the extracted features through feature scaling and a feature balance matrix to obtain fusion features, fusing the features in the voice as much as possible, and then training a forged voice detection model based on a long-term and short-term memory network by utilizing the fusion features to realize detection of forged voice generated by various forged voice methods.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a method for detecting forged voice based on multi-feature fusion provided by an embodiment;
fig. 2 is a flow diagram of multi-feature fusion provided by an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The forged voice detection technology usually only models one or more voices in the voices to realize forged voice synthesis, and the conventional forged voice detection method only can detect specific forged voices and is weak in generalization. Based on the above, the embodiment provides a method for detecting forged voices based on multi-feature fusion, which includes extracting multiple features in voices, fusing the extracted features through feature scaling and a feature balance matrix to obtain fusion features, and training a forged voice detection model based on a long-term and short-term memory network by using the fusion features to detect forged voices generated by various forged voice methods.
FIG. 1 is a flowchart of a method for detecting forged voice based on multi-feature fusion provided by an embodiment; fig. 2 is a flow diagram of multi-feature fusion provided by an embodiment. As shown in fig. 1 and fig. 2, the method for detecting a forged voice based on multi-feature fusion provided by the embodiment includes the following steps:
step1, acquiring the fake voice and the corresponding normal voice, and constructing labels of the fake voice and the normal voice.
In the embodiment, the forged voice and the normal voice generated by methods such as parameter generation, waveform splicing and deep learning can be acquired. And training and constructing a forged voice detection model based on a long-term and short-term memory network by taking the forged voice and the normal voice as data sets, wherein the class of the normal voice is marked as 0, and the class of the forged voice is marked as 1.
And 2, performing feature extraction on the voice to obtain various features with different dimensionalities.
In an embodiment, features such as fundamental frequency, mel-frequency cepstral coefficients, non-periodic components, mel-frequency spectrum, energy spectrum, frequency spectrum, linear prediction coefficients, linear prediction cepstral coefficients, etc. may be extracted. The number of features is determined according to actual situations, and the detection range of the model for the forged voice is wider as the number of extracted features is larger. In the embodiment, the feature extraction process does not need to be performed by a depth model, and can be obtained by a conventional speech feature extraction method, for example, the specific extraction process of the mel cepstrum coefficient is as follows:
step1, pre-emphasizing the speech signal, framing and windowing to obtain a pre-processed speech signal x (n), where the pre-emphasizing is performed by a transfer function h (z) -1- α z-1Wherein α is a pre-emphasis coefficient, and 0.9 < α < 1.0. The result of the pre-emphasis process is y (n) ═ x (n) — α · x (n-1).
Step 2: and performing frame division and windowing processing on the pre-emphasized voice, wherein the frame division of the voice signal is realized by adopting a movable window with limited length for weighting. Typically, the number of frames per second is about 33-100 frames. The general framing method is an overlapped segmentation method, the overlapped part of the previous frame and the next frame is called frame shift, and the ratio of the frame shift to the frame length is generally 0-0.5. The window used is the hamming window, expressed as follows:
Figure BDA0003150514100000051
step3, performing discrete Fourier transform on the preprocessed signals to obtain a discrete spectrum X (k), wherein the transform formula is as follows:
Figure BDA0003150514100000052
step 4: inputting X (k) into a Mel filter bank, and then taking logarithm to obtain a logarithmic spectrum:
Figure BDA0003150514100000061
wherein Hm(k) Is a band pass filter.
Step 5: and (3) transforming the S (m) into a cepstrum domain through discrete cosine transform, wherein the obtained Mel cepstrum coefficient is expressed as follows:
Figure BDA0003150514100000062
and 3, respectively zooming the multi-class features with different dimensions to the same dimension to obtain the multi-class base features.
As shown in fig. 2, the extracted multi-type features are fused in series. If these features are directly fused, there may be a problem of feature imbalance, and a larger feature dimension may overwhelm or cover a smaller feature dimension, which causes a problem of reduced contribution of a small-dimension feature in the authentication process, and a feature with a large feature dimension plays a key role in the authentication process. Therefore, the dimension of the extracted voice feature is firstly scaled, and each feature is scaled to the base feature with the same dimension.
In an embodiment, nearest neighbor interpolation is used to scale the features. The matrix obtained by scaling the features extracted from the input speech is called the basis matrix. The specific process is as follows:
the extracted features are scaled, and feature 1, feature 2 to feature n in the figure represent n features extracted from the speech, such as features of fundamental frequency, mel frequency cepstrum and the like. Scaling the features using nearest neighbor interpolation, assuming the original matrix feature dimension A ═ Hsrc,Wsrc]The characteristic dimension after characteristic scaling is B ═ Hdst,Wdst]Then the width scaling factor is fxAnd a height scaling factor fyAs shown in the following formula (5)
Figure BDA0003150514100000063
Need to index the zoom in the zooming process
Figure BDA0003150514100000064
And
Figure BDA0003150514100000065
and (6) carrying out rounding. Wherein i ∈ [0, H ]dst],j∈[0,Wdst]。
And 4, fusing the multi-class base characteristics by using the abnormal balance matrix to obtain fused characteristics.
In an embodiment, the base matrix is further processed to obtain a fusion matrix. Feature scaling enables features to be dimensionally consistent, but if these features are directly fused in series, there is still a certain problem of feature imbalance. In the discrimination process, the contribution of the base features to the model discrimination is still unbalanced, that is, the higher the original feature dimension corresponding to the base features after feature scaling is, the greater the contribution in the model discrimination process is. Thus introducing a balanced feature matrix. The matrix is a parameter to be trained, so that the trained model can distribute weights in the fusion process according to the contribution of the features in the classification to perform feature fusion. The method comprises the following specific steps:
step1: initializing a feature balance matrix, wherein the feature balance matrix is a parameter matrix with training, and initializing the parameters into random numbers which are in accordance with positive distribution.
Step 2: putting the characteristic balance matrix into a model for training, and enabling the matrix to balance the contribution of each base characteristic to model classification in the training process, wherein the fusion process of each base characteristic through the characteristic balance matrix is as follows:
Fu=F×[w1,w2…wn]T (6)
wherein W ═ W1,w2...wn]A feature balance matrix is represented. F denotes the base feature to be fused, FuRepresenting the fusion characteristics. Based on the fusion characteristics, a fake voice identification model is constructed to identify the truth of the fusion and the characteristics.
And 5, constructing and training a forged voice detection model.
As the voice is a time sequence signal, and a certain correlation exists between the front frame and the rear frame, a forged voice detection model based on a long-short term memory network is constructed, the long-short term memory network can be used for modeling the time sequence signal, and the characteristics on the time dimension are extracted to realize the detection of the forged voice. The constructed forged voice detection model based on the long-term and short-term memory network is shown as the following figure 1, and the constructed specific steps are as follows:
step1: a forged voice detection model based on a long and short memory network is built, as shown in figure 1, the model consists of a long and short memory network and a fully-connected neural network, and the input of the model is extracted fusion characteristics. And the long-term and short-term memory network performs feature extraction on the input fusion features on the time dimension, and the extracted features are classified through a fully-connected neural network. Because the input voice is judged to be normal voice or fake voice, the last layer of the model is designed to be composed of two neurons and used for outputting a judgment result.
Step 2: and (3) training a forged voice detection model based on the long-term and short-term memory network, and training the model by using the data set constructed in the step (1). The long-short term memory network comprises three gate control units, namely a forgetting gate, a forgetting gate and an output gate, wherein the gates are actually full connection layers, and the network parameter updating mode is as shown in the following (7) to (11):
Figure BDA0003150514100000081
Figure BDA0003150514100000082
Figure BDA0003150514100000083
Figure BDA0003150514100000084
Figure BDA0003150514100000085
wherein s is(t)Indicates the output status unit, h(t)Represents a hidden state element, giDenotes an input gate, gfIndicating forgetting gate, gORepresenting the output gate, f representing the activation function, t representing the current time node, b representing the bias, u representing the weight of the input layer to the hidden layer, and w representing the weight of the hidden layer node to the next hidden layer node.
Step 3A cross entropy loss function is used as an optimization target of the model, and the cross entropy loss function is expressed as follows:
Figure BDA0003150514100000091
wherein xiDenotes the ith sample of the input, n denotes the total number of samples in the training set, p (x)i) Represents a sample xiTrue class of, q (x)i) Representing model versus input sample xiThe prediction class label of (2).
And 6, detecting the application of the forged voice detection model.
When the method is applied, after the fusion characteristics of the voice to be detected are obtained by using the abnormal balance matrix determined by the fusion weight parameters, the fusion characteristics of the voice to be detected are detected by using the parameter-optimized forged voice detection model so as to output a detection result.
The embodiment also provides a device for detecting the forged voice based on the multi-feature fusion, which comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for detecting the forged voice based on the multi-feature fusion when executing the computer program, and the method comprises the following steps:
step1, acquiring the fake voice and the corresponding normal voice, and constructing labels of the fake voice and the normal voice.
And 2, performing feature extraction on the voice to obtain various features with different dimensionalities.
And 3, respectively zooming the multi-class features with different dimensions to the same dimension to obtain the multi-class base features.
And 4, fusing the multi-class base characteristics by using the abnormal balance matrix to obtain fused characteristics.
And 5, constructing and training a forged voice detection model.
The detection and device for the forged voice based on the multi-feature fusion are characterized in that various features in the voice are extracted, the extracted features are fused through feature scaling and a feature balance matrix to obtain fusion features, the fusion features are fused with the features in the voice as much as possible, and then a forged voice detection model based on a long-term and short-term memory network is trained by utilizing the fusion features, so that the detection of the forged voice generated by various forged voice methods is realized.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (6)

1.一种基于多特征融合的伪造语音检测方法,其特征在于,包括以下步骤:1. a forged speech detection method based on multi-feature fusion, is characterized in that, comprises the following steps: 获取伪造语音以及对应的正常语音,构建伪造语音和正常语音的标签;Obtain fake voices and corresponding normal voices, and construct labels for fake voices and normal voices; 针对伪造语音和正常语音分别执行以下过程:对语音进行特征提取得到不同维度的多类特征,对不同维度的多类特征分别缩放到相同维度得到多类基特征后,利用特正平衡矩阵对多类基特征进行融合,得到融合特征;The following processes are respectively performed for the fake speech and the normal speech: extracting the features of the speech to obtain multi-class features of different dimensions, scaling the multi-class features of different dimensions to the same dimension to obtain multi-class base features, and using a special positive balance matrix to compare the multi-class features. Class base features are fused to obtain fused features; 构建由长短期记忆网络和全连接神经网络组成的伪造语音检测模型,利用伪造语音的融合特征和标签、正常语音的融合特征和标签对伪造语音检测模型进行有监督学习,以优化伪造语音检测模型的模型参数和特正平衡矩阵;Construct a fake speech detection model consisting of a long short-term memory network and a fully connected neural network, and use the fusion features and labels of the fake speech and the fusion features and labels of normal speech to perform supervised learning on the fake speech detection model to optimize the fake speech detection model. The model parameters and special positive balance matrix of ; 应用时,利用融合权重参数确定的特正平衡矩阵获取待测语音的融合特征后,利用参数优化的伪造语音检测模型对待测语音的融合特征进行检测,以输出检测结果。In application, after the fusion feature of the speech to be tested is obtained by using the special positive balance matrix determined by the fusion weight parameter, the fusion feature of the speech to be tested is detected by using the parameter-optimized fake speech detection model to output the detection result. 2.如权利要求1所述的基于多特征融合的伪造语音检测方法,其特征在于,对语音进行特征提取得到的多类特征包括:基频、梅尔倒谱系数、非周期成分、梅尔频谱、能量谱、频谱、线性预测系数、线性预测倒谱系数。2. the forged speech detection method based on multi-feature fusion as claimed in claim 1, is characterized in that, the multi-class feature that speech is carried out feature extraction and obtains comprises: fundamental frequency, Mel cepstral coefficient, aperiodic composition, Mel Spectrum, Energy Spectrum, Spectrum, Linear Prediction Coefficient, Linear Prediction Cepstral Coefficient. 3.如权利要求1所述的基于多特征融合的伪造语音检测方法,其特征在于,采用最邻近插值法对分别将不同维度的多类特征缩放到相同维度得到多类基特征。3 . The forged speech detection method based on multi-feature fusion according to claim 1 , wherein the nearest neighbor interpolation method is used to respectively scale multi-class features of different dimensions to the same dimension to obtain multi-class base features. 4 . 4.如权利要求1所述的基于多特征融合的伪造语音检测方法,其特征在于,所述特正平衡矩阵由表示融合权重参数组成,采用符合正态分布的随机数初始化融合权重参数。4 . The method for detecting fake speech based on multi-feature fusion as claimed in claim 1 , wherein the special positive balance matrix is composed of a parameter representing fusion weight, and a random number conforming to normal distribution is used to initialize the fusion weight parameter. 5 . 5.如权利要求1所述的基于多特征融合的伪造语音检测方法,其特征在于,在对伪造语音检测模型进行监督学习时,采用交叉熵损失函数作为模型的优化目标。5 . The method for detecting fake speech based on multi-feature fusion according to claim 1 , wherein, when performing supervised learning on the fake speech detection model, a cross-entropy loss function is used as the optimization target of the model. 6 . 6.一种基于多特征融合的伪造语音检测装置,包括存储器,处理器以及存储在所述存储器中并可在所述处理器上执行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1~5任一项所述的基于多特征融合的伪造语音检测方法。6. a forged voice detection device based on multi-feature fusion, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the The computer program implements the forged speech detection method based on multi-feature fusion according to any one of claims 1 to 5.
CN202110762591.6A 2021-07-06 2021-07-06 A method and device for forged speech detection based on multi-feature fusion Active CN113488073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110762591.6A CN113488073B (en) 2021-07-06 2021-07-06 A method and device for forged speech detection based on multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110762591.6A CN113488073B (en) 2021-07-06 2021-07-06 A method and device for forged speech detection based on multi-feature fusion

Publications (2)

Publication Number Publication Date
CN113488073A true CN113488073A (en) 2021-10-08
CN113488073B CN113488073B (en) 2023-11-24

Family

ID=77940641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110762591.6A Active CN113488073B (en) 2021-07-06 2021-07-06 A method and device for forged speech detection based on multi-feature fusion

Country Status (1)

Country Link
CN (1) CN113488073B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724693A (en) * 2021-11-01 2021-11-30 中国科学院自动化研究所 Voice judging method and device, electronic equipment and storage medium
CN113808579A (en) * 2021-11-22 2021-12-17 中国科学院自动化研究所 Detection method, device, electronic device and storage medium for generating speech
CN116153336A (en) * 2023-04-19 2023-05-23 北京中电慧声科技有限公司 Synthetic voice detection method based on multi-domain information fusion
CN117059131A (en) * 2023-10-13 2023-11-14 南京龙垣信息科技有限公司 False audio detection method based on emotion recognition
CN117690455A (en) * 2023-12-21 2024-03-12 合肥工业大学 Partially synthesized forged speech detection method and system based on sliding window
CN119517008A (en) * 2025-01-21 2025-02-25 蚂蚁智信(杭州)信息技术有限公司 Synthetic voice detection method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019173304A1 (en) * 2018-03-05 2019-09-12 The Trustees Of Indiana University Method and system for enhancing security in a voice-controlled system
CN110444208A (en) * 2019-08-12 2019-11-12 浙江工业大学 A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm
WO2019232833A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Speech differentiating method and device, computer device and storage medium
CN111564163A (en) * 2020-05-08 2020-08-21 宁波大学 An RNN-based voice detection method for multiple forgery operations
CN111613240A (en) * 2020-05-22 2020-09-01 杭州电子科技大学 A camouflaged speech detection method based on attention mechanism and Bi-LSTM
CN112767967A (en) * 2020-12-30 2021-05-07 深延科技(北京)有限公司 Voice classification method and device and automatic voice classification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019173304A1 (en) * 2018-03-05 2019-09-12 The Trustees Of Indiana University Method and system for enhancing security in a voice-controlled system
WO2019232833A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Speech differentiating method and device, computer device and storage medium
CN110444208A (en) * 2019-08-12 2019-11-12 浙江工业大学 A kind of speech recognition attack defense method and device based on gradient estimation and CTC algorithm
CN111564163A (en) * 2020-05-08 2020-08-21 宁波大学 An RNN-based voice detection method for multiple forgery operations
CN111613240A (en) * 2020-05-22 2020-09-01 杭州电子科技大学 A camouflaged speech detection method based on attention mechanism and Bi-LSTM
CN112767967A (en) * 2020-12-30 2021-05-07 深延科技(北京)有限公司 Voice classification method and device and automatic voice classification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈晋音: "面向语音识别系统的黑盒对抗攻击方法", 《小型微型计算机系统》, pages 1 - 11 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724693A (en) * 2021-11-01 2021-11-30 中国科学院自动化研究所 Voice judging method and device, electronic equipment and storage medium
CN113724693B (en) * 2021-11-01 2022-04-01 中国科学院自动化研究所 Voice judging method and device, electronic equipment and storage medium
CN113808579A (en) * 2021-11-22 2021-12-17 中国科学院自动化研究所 Detection method, device, electronic device and storage medium for generating speech
CN116153336A (en) * 2023-04-19 2023-05-23 北京中电慧声科技有限公司 Synthetic voice detection method based on multi-domain information fusion
CN117059131A (en) * 2023-10-13 2023-11-14 南京龙垣信息科技有限公司 False audio detection method based on emotion recognition
CN117059131B (en) * 2023-10-13 2024-03-29 南京龙垣信息科技有限公司 False audio detection method based on emotion recognition
CN117690455A (en) * 2023-12-21 2024-03-12 合肥工业大学 Partially synthesized forged speech detection method and system based on sliding window
CN117690455B (en) * 2023-12-21 2024-05-28 合肥工业大学 Partially synthesized forged speech detection method and system based on sliding window
CN119517008A (en) * 2025-01-21 2025-02-25 蚂蚁智信(杭州)信息技术有限公司 Synthetic voice detection method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113488073B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN113488073A (en) Multi-feature fusion based counterfeit voice detection method and device
EP3955246B1 (en) Voiceprint recognition method and device based on memory bottleneck feature
WO2018107810A1 (en) Voiceprint recognition method and apparatus, and electronic device and medium
CN107731233B (en) Voiceprint recognition method based on RNN
KR102198273B1 (en) Machine learning based voice data analysis method, device and program
CN112541533B (en) A modified car recognition method based on neural network and feature fusion
CN114495950A (en) Voice deception detection method based on deep residual shrinkage network
Dawood et al. A robust voice spoofing detection system using novel CLS-LBP features and LSTM
Chakravarty et al. Spoof detection using sequentially integrated image and audio features
Singh A text independent speaker identification system using ANN, RNN, and CNN classification technique
Chakravarty et al. A lightweight feature extraction technique for deepfake audio detection
Wani et al. Deepfakes audio detection leveraging audio spectrogram and convolutional neural networks
Imran et al. An analysis of audio classification techniques using deep learning architectures
Nirmal et al. A hybrid bald eagle-crow search algorithm for gaussian mixture model optimisation in the speaker verification framework
Cheng et al. DNN-based speech enhancement with self-attention on feature dimension
Woubie et al. Voice quality features for replay attack detection
Namburi Speaker recognition based on mutated monarch butterfly optimization configured artificial neural network
Panda et al. Study of speaker recognition systems
CN118098247A (en) Voiceprint recognition method and system based on parallel feature extraction model
CN117976006A (en) Audio processing method, device, computer equipment and storage medium
Wang et al. Revealing the processing history of pitch-shifted voice using CNNs
CN111310836A (en) Method and device for defending voiceprint recognition integrated model based on spectrogram
CN116665649A (en) Synthetic voice detection method based on prosody characteristics
Nguyen et al. Vietnamese speaker authentication using deep models
CN115620731A (en) A Speech Feature Extraction and Detection Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant