CN107564536A

CN107564536A - AMR pitch delay steganalysis methods based on difference Markov transition probability features in pitch delay subframe group group

Info

Publication number: CN107564536A
Application number: CN201710797602.8A
Authority: CN
Inventors: 任延珍; 杨婧; 王丽娜
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-09-06
Filing date: 2017-09-06
Publication date: 2018-01-09

Abstract

The invention discloses an AMR pitch delay steganographic analysis method based on the Markov transfer probability feature of the difference value within a pitch delay subframe group. Aiming at two steganographic algorithms for modifying pitch delay in speech coding, this method proposes the AMR steganalysis feature based on the Markov transition probability of the intra-group difference of pitch delay subframes, uses support vector machine for classification and prediction, and realizes AMR pitch-oriented Delayed steganalysis method. The advantage of the present invention is that, for the two existing pitch-delay-oriented steganography methods, when the relative embedding rate is 50%, the detection rate can reach more than 95%.

Description

AMR basis based on the Markov transition probability feature of intragroup difference of pitch delay subframe group Tone Delay Steganalysis Method

技术领域technical field

本发明涉及数字媒体处理技术领域，特别涉及一种判断AMR语音是否经过秘密信息隐写的多媒体信息内容安全技术领域。The invention relates to the technical field of digital media processing, in particular to the technical field of multimedia information content security for judging whether AMR voice is steganographically steganographically.

技术背景technical background

近些年以来，伴随着移动通讯手段的不断成熟，通讯厂商已经全面推广针对移动互联网的通讯技术手段，语音通讯业务需求量日益增长，对语音通讯的编码质量以及网络带宽占用量的要求全面提升。为满足移动语音通讯的需求，在1999年10月，3GPP(3rdGeneration Partnership Project)指定AMR(AdaptiveMulti-Rate，自适应多码率)语音编码作为移动互联网语音通讯的语音压缩编码标准，使得AMR编码在GSM、TDMA、UMTS和VoLTE中被广泛使用。目前，各类移动终端设备，如：三星、苹果和华为等手机厂商将AMR设置为手机的语音格式。同时，安装在智能终端的各种通信APP，如：微信、QQ和Skype等，在语音聊天和语音留言功能中，格式亦采用AMR编码。随着AMR编码标准的普及，AMR语音为隐写术提供了全新的空间。In recent years, with the continuous maturity of mobile communication means, communication manufacturers have comprehensively promoted communication technology means for mobile Internet, the demand for voice communication services is increasing, and the requirements for the coding quality of voice communication and network bandwidth occupancy have been comprehensively improved. . In order to meet the needs of mobile voice communication, in October 1999, 3GPP (3rdGeneration Partnership Project) designated AMR (Adaptive Multi-Rate, Adaptive Multi-Rate) voice coding as the voice compression coding standard for mobile Internet voice communication, making AMR coding in Widely used in GSM, TDMA, UMTS and VoLTE. At present, all kinds of mobile terminal equipment, such as Samsung, Apple, Huawei and other mobile phone manufacturers set AMR as the voice format of mobile phones. At the same time, various communication apps installed on smart terminals, such as WeChat, QQ and Skype, etc., also use AMR encoding in the voice chat and voice message functions. With the popularization of AMR encoding standard, AMR voice provides a new space for steganography.

针对AMR语音的隐写算法逐渐出现，这些算法主要集成在AMR语音的编码过程中，通过在编码过程中调整和修改相关参数，使参数中含有秘密信息完成嵌入。基于AMR语音的编码特性，已有的隐写算法主要涉及语音压缩编码中的线性预测阶段、自适应码本搜索阶段和固定码本调制阶段。其中，针对AMR语音自适应码本搜索阶段的隐写算法，对编码过程中基音延迟进行微调实现秘密信息的隐藏，检测难度大，相应的隐写分析算法也较少。Steganography algorithms for AMR speech gradually appear. These algorithms are mainly integrated in the encoding process of AMR speech. By adjusting and modifying relevant parameters during the encoding process, the parameters contain secret information to complete the embedding. Based on the coding characteristics of AMR speech, the existing steganographic algorithms mainly involve the linear prediction stage, adaptive codebook search stage and fixed codebook modulation stage in speech compression coding. Among them, for the steganographic algorithm in the AMR speech adaptive codebook search stage, fine-tuning the pitch delay in the encoding process to hide the secret information is difficult to detect, and there are few corresponding steganographic analysis algorithms.

在AMR语音编码自适应码本搜索阶段，基音延迟是对语音基音周期的预测结果，浊音段基音延迟之间具有短时稳定性。而现有面向基音延迟的隐写方法通过控制基音延迟的搜索范围实现秘密信息的嵌入，将会导致浊音段基音延迟的短时稳定特性发生改变。基于上述考虑，采用基于基音延迟子帧组组内差值Markov转移概率作为特征，可实现针对AMR基音延迟隐写方法的有效检测。In the adaptive codebook search stage of AMR speech coding, the pitch delay is the prediction result of the speech pitch period, and there is short-term stability between pitch delays in voiced segments. However, the existing pitch-delay-oriented steganography method achieves embedding of secret information by controlling the search range of pitch delay, which will lead to changes in the short-term stability of pitch delay in voiced segments. Based on the above considerations, using the Markov transition probability based on the intra-group difference of the pitch delay subframe as a feature, the effective detection of the AMR pitch delay steganography method can be realized.

本发明的隐写分析检测方法基于AMR语音，在发明内容陈述之前需要对AMR语音的编解码原理进行介绍。The steganalysis detection method of the present invention is based on AMR speech, and it is necessary to introduce the encoding and decoding principles of AMR speech before presenting the content of the invention.

AMR语音压缩编码是基于ACELP的混合语音压缩编码技术，其特点是可以依照待压缩音频的具体情况和通信环境，选择合适的语音压缩编码码率。图1是AMR编码原理示意图，其基本原理是对原始语音按帧编码时，根据使合成语音与原始语音的加权均方误差最小的准则，从自适应码本和固定码本中挑选出合适的码本矢量来代替残差信号，并将码矢地址和增益及各滤波器的参数量化编码后传送到接收端。AMR编码器输入语音的采样频率为8kHz，语音信号是16bit量化无压缩的线性PCM编码，以20ms为一帧进行编码，一帧包含160个采样点，每帧中以5ms为单位，分为4个子帧。根据其实现的功能主要可分为三大部分，包括线性预测阶段、自适应码本搜索阶段和固定码本调制阶段。以下为涉及到的专业术语解释：AMR voice compression coding is a hybrid voice compression coding technology based on ACELP. Its characteristic is that the appropriate voice compression coding rate can be selected according to the specific situation of the audio to be compressed and the communication environment. Figure 1 is a schematic diagram of the principle of AMR coding. The basic principle is to select the appropriate codebook from the adaptive codebook and the fixed codebook according to the criterion of minimizing the weighted mean square error between the synthesized speech and the original speech when encoding the original speech frame by frame. The codebook vector is used to replace the residual signal, and the address and gain of the code vector, as well as the parameters of each filter are quantized and encoded, and then transmitted to the receiving end. The sampling frequency of the input voice of the AMR encoder is 8kHz, and the voice signal is a 16-bit quantized and uncompressed linear PCM code, which is coded in a frame of 20ms. One frame contains 160 sampling points, and each frame is divided into 4 parts with a unit of 5ms. subframes. According to the functions realized by it, it can be divided into three parts, including the linear prediction stage, the adaptive codebook search stage and the fixed codebook modulation stage. The following is an explanation of the technical terms involved:

1、压缩音频：指经过有损压缩的音频，如MP3、WMA、AMR均为有损压缩音频。1. Compressed audio: Refers to audio that has undergone lossy compression, such as MP3, WMA, and AMR are all lossy compressed audio.

2、Cover：载体音频，即未进行秘密信息嵌入的音频。2. Cover: Carrier audio, that is, audio without embedding secret information.

3、Stego：载密音频，即进行了秘密信息嵌入的音频。3. Stego: Confidential audio, that is, audio with secret information embedded.

4、子帧组：AMR语音每帧包括4个子帧，第一子帧和第二子帧构成一个子帧组，第三子帧和第四子帧构成一个子帧组。4. Subframe group: Each frame of AMR voice includes 4 subframes, the first subframe and the second subframe form a subframe group, and the third subframe and the fourth subframe form a subframe group.

5、组内关系：表示子帧组中前一子帧基音延迟和后一子帧基音延迟之间的关系。5. Intra-group relationship: Indicates the relationship between the pitch delay of the previous subframe and the pitch delay of the next subframe in the subframe group.

发明内容Contents of the invention

本发明针对目前面向AMR隐写分析方法相对缺少的问题，实现了面向AMR基音延迟的隐写分析。The present invention aims at the problem that current AMR-oriented steganalysis methods are relatively lacking, and realizes steganalysis oriented to AMR pitch delay.

本发明的技术方案采用基于基音延迟子帧组组内差值Markov转移概率作为短时稳定性评价标准提取特征，采用SVM分类器对AMR语音的Cover和Stego样本进行二分类。1、一种基于基音延迟子帧组组内差值Markov转移概率特征的AMR基音延迟隐写分析方法，其特征在于，包括：The technical scheme of the present invention adopts the Markov transition probability based on the intra-group difference of the pitch delay subframe as the short-term stability evaluation standard to extract features, and uses the SVM classifier to perform binary classification on the Cover and Stego samples of AMR speech. 1, a kind of AMR pitch delay steganographic analysis method based on pitch delay subframe group difference Markov transition probability feature, it is characterized in that, comprising:

步骤1、基于基音延迟子帧组组内差值Markov转移概率作为短时稳定性评价标准提取特征，具体包括：Step 1, based on the Markov transition probability of the difference in the pitch delay sub-frame group as the short-term stability evaluation standard to extract features, specifically including:

步骤1.1、构建基音延迟子帧组组内差值，在AMR语音压缩编码中，20ms的语音为一帧，一帧语音按5ms为单位分为4个子帧，每个子帧具有一个基音延迟；在一帧语音帧中，第二子帧T₂和第四子帧T₄的基音延迟分别以第一子帧T₁和第三子帧T₃为基础，在一定区间内按照相关性获取；如图2所示，将语音帧中，第一子帧T₁和第二子帧T₂作为一个子帧组，第三子帧T₃和第四子帧T₄作为另一个子帧组；Step 1.1, build the difference value in the pitch delay subframe group group, in the AMR voice compression coding, the voice of 20ms is a frame, and a frame of voice is divided into 4 subframes by 5ms, and each subframe has a pitch delay; In a speech frame, the pitch delays of the _second subframe T2 and the _fourth subframe T4 are respectively based on the _first subframe T1 and the _third subframe T3, and are obtained according to correlation within a certain interval; as As shown in Figure 2, in the speech frame, the _first subframe T1 and the _second subframe T2 are used as a subframe group, and the _third subframe T3 and the _fourth subframe T4 are used as another subframe group;

为便于描述将AMR语音的基音延迟序列按子帧组的方式表示为P＝(p₁₁,p₁₂,...,p_t1,p_t2,...,p_N1,p_N2)，N是AMR语音中子帧组的总个数，i是子帧组按时间排列的索引值，p_t1和p_t2分别为子帧组t中的第一个子帧和第二个子帧；由于语音浊音段基音延迟具有短时稳定性，计算子帧组组内差值用于度量子帧组组内两个子帧之间的稳定性；子帧组组内差值D_intra计算公式如式1所示，t∈[1,N]：For the convenience of description, the pitch delay sequence of AMR speech is represented as P=(p ₁₁ ,p ₁₂ ,...,p _t1 , _pt2 ,...,p _N1 ,p _N2 ) in subframe group, and N is The total number of subframe groups in AMR speech, i is the index value of the subframe group arranged in time, p _t1 and p _t2 are the first subframe and the second subframe in the subframe group t respectively; due to speech voiced The segment pitch delay has short-term stability, and the calculation of the intra-subframe group difference is used to measure the stability between two subframes in the subframe group; the calculation formula of the intra-subframe group difference D _intra is shown in Equation 1 , t∈[1,N]:

D_intra(t)＝p_t1-p_t2 式1D _intra (t) = p _t1 -p _t2 Formula 1

步骤1.2、构建基音延迟子帧组组内差值Markov转移概率特征：马尔可夫转移概率矩阵的作用是衡量变量在不同的状态之间进行转变的概率，因此可以利用其来表示连续基音延迟子帧组组内差值的改变情况；计算基音延迟子帧组组内差值Markov转移概率作为隐写分析特征，描述Cover语音和Stego语音基音延迟子帧组组内差值的差异情况，实现对两者的区分；Step 1.2, constructing the Markov transition probability feature of the intra-group difference of pitch delay subframes: the function of the Markov transition probability matrix is to measure the probability of a variable transitioning between different states, so it can be used to represent the continuous pitch delay subframe The change of the difference value within the frame group group; calculate the Markov transition probability of the difference value within the sub-frame group of the pitch delay as the feature of steganalysis, describe the difference between the difference value of the pitch delay sub-frame group between the Cover speech and the Stego speech, and realize the the distinction between the two;

基音延迟子帧组组内差值Markov转移概率特征定义为M1_intra，按式2进行计算：M1_intra(i,j)的值为组内差值D_intra(t)为i时，D_intra(t+1)为j的马尔可夫转移概率，t为组内差值时序上的索引，S₀为计算得到的子帧组组内差值的总个数The Markov transition probability feature of the pitch delay subframe group intra-group difference is defined as M1 _intra , and is calculated according to formula 2: when the value of M1 _intra (i, j) is the intra-group difference D _intra (t) is i, D _intra ( t+1) is the Markovian transition probability of j, t is the index on the time sequence of intra-group differences, S ₀ is the total number of calculated sub-frame group intra-group differences

根据AMR编码可知，一帧语音中，第二子帧T₂和第四子帧T₄的基音延迟分别以第一子帧T₁和第三子帧T₃为基础，在一定区间内按照相关性获取；因此，对于不同的码率，组内差值的区间是固定的，组内差值D_intra的范围，以及相应的M1_intra特征维度，通过提取AMR语音基音延迟子帧组组内差值Markov转移概率，作为对基音延迟修改的隐写分类特征；According to AMR coding, in a frame of speech, the pitch delays of the second subframe T ₂ and the fourth subframe T ₄ are based on the first subframe T ₁ and the third subframe T ₃ respectively, according to the correlation within a certain interval Therefore, for different code rates, the interval of the intra-group difference is fixed, the range of the intra-group difference D _intra , and the corresponding M1 _intra feature dimension, by extracting the intra-group difference of the AMR voice pitch delay subframe group Value Markov transition probability as a steganographic classification feature for pitch delay modification;

步骤2、采用SVM分类器对AMR语音的Cover和Stego样本进行二分类并进行隐写分析检测。Step 2. Use the SVM classifier to perform binary classification on the Cover and Stego samples of the AMR speech and perform steganalysis detection.

在上述的一种基于基音延迟子帧组组内差值Markov转移概率特征的AMR基音延迟隐写分析方法，所述步骤2具体包括：In the above-mentioned AMR pitch delay steganalysis method based on the pitch delay subframe group difference Markov transition probability feature, the step 2 specifically includes:

步骤2.1、分类器训练，具体包括：Step 2.1, classifier training, specifically includes:

步骤2.1.1，输入WAV样本，分别生成cover样本和对应的stego样本，并按照1中的方法提取基音延迟子帧组组内差值Markov转移概率分类特征；Step 2.1.1, input WAV samples, generate cover samples and corresponding stego samples respectively, and extract pitch delay subframe group difference Markov transition probability classification features according to the method in 1;

步骤2.1.2，经过2.1.1过程后，得到训练集样本和数量相等的两种不同嵌入算法的载密样本，然后随机选择不同数量的stego样本和cover样本使用SVM分类器训练隐写分析模型；Step 2.1.2, after the process of 2.1.1, obtain the training set samples and the same number of encrypted samples of two different embedding algorithms, and then randomly select different numbers of stego samples and cover samples and use the SVM classifier to train the steganalysis model ;

步骤2.2、隐写分析检测，具体包括：Step 2.2, steganalysis detection, specifically includes:

利用上述隐写分析模型进行隐写分析检测的流程包括以下步骤：The process of performing steganalysis detection using the above steganalysis model includes the following steps:

步骤2.2.1，提取待测样本的隐写分析特征集合；Step 2.2.1, extracting the steganalysis feature set of the sample to be tested;

步骤2.2.2，将特征输入构建好的隐写分析模型，得到样本的隐写判断结果。In step 2.2.2, the features are input into the constructed steganalysis model to obtain the steganographic judgment result of the sample.

附图说明Description of drawings

图1是AMR编解码流程图。Figure 1 is a flow chart of AMR encoding and decoding.

图2是子帧组划分及组内差值示意图。Fig. 2 is a schematic diagram of subframe group division and intra-group difference.

图3a是载体音频与载密音频基音延迟分布及基音延迟子帧组组内差值分布(Cover语音和Stego语音基音延迟分布)。Fig. 3a is the pitch delay distribution of the carrier audio and secret audio and the difference distribution within the pitch delay sub-frame group (the pitch delay distribution of Cover speech and Stego speech).

图3b是载体音频与载密音频基音延迟分布及基音延迟子帧组组内差值分布(Cover语音和Stego语音基音延迟子帧组组内差值分布情况)。Fig. 3b shows the pitch delay distribution of the carrier audio and the secret audio and the difference distribution within the pitch delay sub-frame group (the distribution of the difference within the pitch delay sub-frame group of Cover speech and Stego speech).

图4a是载体音频与载密音频的基音延迟子帧组组内差值Markov转移概率特征(Cover语音)。Fig. 4a is the Markov transition probability feature (Cover speech) of the pitch delay sub-frame intra-group difference between the carrier audio and the secret audio.

图4b是载体音频与载密音频的基音延迟子帧组组内差值Markov转移概率特征(Stego语音)。Fig. 4b is the Markov transition probability feature (Stego voice) of the pitch delay sub-frame group differences between the carrier audio and the secret audio.

图5是本发明的隐写分析训练及检测框架。Fig. 5 is the steganalysis training and detection framework of the present invention.

具体实施方式detailed description

1、特征提取方法1. Feature extraction method

1.1特征提取1.1 Feature extraction

1.1.1构建基音延迟子帧组组内差值1.1.1 Constructing the intra-group difference value of the pitch delay subframe group

在AMR语音压缩编码中，20ms的语音为一帧，一帧语音按5ms为单位分为4个子帧，每个子帧具有一个基音延迟。在一帧语音帧中，第二子帧T₂和第四子帧T₄的基音延迟分别以第一子帧T₁和第三子帧T₃为基础，在一定区间内按照相关性获取。如图2所示，将语音帧中，第一子帧T₁和第二子帧T₂作为一个子帧组，第三子帧T₃和第四子帧T₄作为另一个子帧组。In AMR speech compression coding, 20ms of speech is one frame, and one frame of speech is divided into 4 subframes in units of 5ms, and each subframe has a pitch delay. In one speech frame, the pitch delays of the second subframe T ₂ and the fourth subframe T ₄ are respectively based on the first subframe T ₁ and the third subframe T ₃ , and are obtained according to correlation within a certain interval. As shown in FIG. 2 , in the speech frame, the _first subframe T1 and the second subframe T2 are _regarded as a subframe group, and the _third subframe T3 and the _fourth subframe T4 are regarded as another subframe group.

为便于描述将AMR语音的基音延迟序列按子帧组的方式表示为P＝(p₁₁,p₁₂,...,p_t1,p_t2,...,p_N1,p_N2)，N是AMR语音中子帧组的总个数，i是子帧组按时间排列的索引值，p_t1和p_t2分别为子帧组t中的第一个子帧和第二个子帧。由于语音浊音段基音延迟具有短时稳定性，计算子帧组组内差值用于度量子帧组组内两个子帧之间的稳定性。子帧组组内差值D_intra计算公式如式1所示，t∈[1,N]：For the convenience of description, the pitch delay sequence of AMR speech is represented as P=(p ₁₁ ,p ₁₂ ,...,p _t1 , _pt2 ,...,p _N1 ,p _N2 ) in subframe group, and N is The total number of subframe groups in AMR speech, i is the index value of the subframe group arranged in time, p _t1 and p _t2 are the first subframe and the second subframe in the subframe group t, respectively. Since the pitch delay of voiced voiced segment has short-term stability, calculating the intra-subframe group difference is used to measure the stability between two subframes within the subframe group. The formula for calculating the intra-subframe difference D _intra is shown in Equation 1, t∈[1,N]:

D_intra(t)＝p_t1-p_t2 (式1)D _intra (t) = p _t1 -p _t2 (Formula 1)

1.1.2构建基音延迟子帧组组内差值Markov转移概率特征1.1.2 Construct the Markov transition probability feature of the difference value within the pitch delay subframe group

马尔可夫转移概率矩阵的作用是衡量变量在不同的状态之间进行转变的概率，因此可以利用其来表示连续基音延迟子帧组组内差值的改变情况。计算基音延迟子帧组组内差值Markov转移概率作为隐写分析特征，描述Cover语音和Stego语音基音延迟子帧组组内差值的差异情况，实现对两者的区分。The role of the Markov transition probability matrix is to measure the probability of the variable transitioning between different states, so it can be used to represent the change of the difference within the group of consecutive pitch delay subframes. Calculate the Markov transition probability of the intra-group difference of the pitch delay sub-frame group as the feature of steganalysis, describe the difference between the pitch delay sub-frame group difference between the Cover speech and the Stego speech, and realize the distinction between the two.

根据AMR编码可知，一帧语音中，第二子帧T₂和第四子帧T₄的基音延迟分别以第一子帧T₁和第三子帧T₃为基础，在一定区间内按照相关性获取。因此，对于不同的码率，组内差值的区间是固定的，组内差值D_intra的范围，以及相应的M1_intra特征维度表1所示。According to AMR coding, in a frame of speech, the pitch delays of the second subframe T ₂ and the fourth subframe T ₄ are based on the first subframe T ₁ and the third subframe T ₃ respectively, according to the correlation within a certain interval sexual acquisition. Therefore, for different code rates, the interval of the intra-group difference is fixed, and the range of the intra-group difference D _intra and the corresponding M1 _intra feature dimension are shown in Table 1.

表1组内差值D_intra的范围、M1_intra特征维度Table 1 Range of intra-group difference D _intra , feature dimension of M1 _intra

通过提取AMR语音基音延迟子帧组组内差值Markov转移概率，作为对基音延迟修改的隐写分类特征。By extracting the Markov transition probability of the intra-group difference of pitch delay subframes in AMR speech, it is used as the steganographic classification feature for pitch delay modification.

1.2特征原理分析1.2 Analysis of characteristic principle

由于语音浊音段的基音周期具有短时稳定性，基音延迟是语音编码过程中对基音周期的预测结果，因此浊音段基音延迟也具有短时稳定性，相邻的基音延迟之间应该是较为稳定的，子帧组中两个基音延迟的差值应该较小。现有的针对基音延迟的隐写方法，通过控制编码过程中基音延迟的搜索范围实现隐写。隐写过程导致搜索得到的基音延迟值发生改变，浊音段相邻基音延迟之间的短时稳定性会被破坏，子帧组中两个基音延迟的差值会变大。Since the pitch period of the voiced speech segment has short-term stability, the pitch delay is the prediction result of the pitch period in the speech coding process, so the pitch delay of the voiced segment also has short-term stability, and the pitch delay between adjacent pitch delays should be relatively stable , the difference between the two pitch delays in the subframe group should be small. The existing steganography method for pitch delay realizes steganography by controlling the search range of pitch delay in the encoding process. The steganography process leads to changes in the pitch delay value obtained by searching, the short-term stability between adjacent pitch delays in the voiced segment will be destroyed, and the difference between the two pitch delays in the subframe group will become larger.

图3为显示的是Cover语音和Stego语音一段浊音段基音延迟序列、子帧组组内差值D_intra的分布情况。其中Cover语音是公开AMR编码器生成12.2kbps码率模式的AMR语音文件，Stego语音是采用文献[1]隐写算法按最大容量嵌入后生成的AMR语音样本。图3(a)表明，Cover语音子帧之间的基音延迟比Stego语音平稳。图3(b)是Cover语音和Stego语音中子帧组组内差值D_intra的分布情况，可以看出两者具有明显差别，Cover语音中，差值小分布集中，连续差值变化小，存在连续差值为0的现象，而Stego语音中，差值分布的值较大，连续差值变化大，差值为0的情况很小。对比结果表明，隐写过程的确破坏了浊音段相邻基音延迟之间的短时稳定性。Fig. 3 shows the distribution of the pitch delay sequence and intra-subframe difference D _intra of a voiced segment of Cover speech and Stego speech. Among them, the Cover voice is an AMR voice file generated by a public AMR encoder with a code rate of 12.2kbps, and the Stego voice is an AMR voice sample generated after embedding with the maximum capacity using the steganography algorithm in literature [1]. Figure 3(a) shows that the pitch delay between subframes is smoother for Cover speech than for Stego speech. Figure 3(b) shows the distribution of the intra-subframe group difference D _intra in Cover speech and Stego speech. It can be seen that the two have obvious differences. In Cover speech, the difference is small and the distribution is concentrated, and the continuous difference changes little. There is a phenomenon that the continuous difference is 0, but in Stego voice, the value of the difference distribution is large, the continuous difference changes greatly, and the case of 0 difference is very small. The comparison results show that the steganographic process does destroy the short-term stability between adjacent pitch delays in voiced segments.

马尔可夫转移概率矩阵的作用是衡量变量在不同的状态之间进行转变的概率，因此可以利用其来表示连续差值之间的改变情况，计算基音延迟子帧组组内差值Markov转移概率作为特征，描述Cover语音和Stego语音的差异情况，可以实现Cover语音和Stego语音的区分。The function of the Markov transition probability matrix is to measure the probability of the variable transitioning between different states, so it can be used to represent the change between continuous differences and calculate the Markov transition probability of the difference within the pitch delay subframe group As a feature, describing the difference between the Cover voice and the Stego voice can realize the distinction between the Cover voice and the Stego voice.

对Cover语音和Stego语音，分别计算基音延迟子帧组组内差值Markov转移概率分布情况显示在图4中。其中Cover语音是公开AMR编码器生成12.2kbps码率模式的AMR语音文件，Stego语音是采用文献[1]隐写算法按最大容量嵌入后生成的AMR语音样本。从结果中可以看出，在Cover语音中，特征分布在中心部分存在明显峰值，而在Stego语音中特征分布较为平坦，因此基音延迟子帧组组内差值Markov转移概率在Cover语音中和Stego语音中存在明显区别。For the Cover voice and the Stego voice, the Markov transition probability distribution of the intra-group differences of the pitch delay sub-frames are calculated respectively, as shown in Fig. 4 . Among them, the Cover voice is an AMR voice file generated by a public AMR encoder with a code rate of 12.2kbps, and the Stego voice is an AMR voice sample generated after embedding with the maximum capacity using the steganography algorithm in literature [1]. It can be seen from the results that in the Cover speech, the feature distribution has an obvious peak in the central part, while in the Stego speech the feature distribution is relatively flat, so the Markov transition probability of the pitch delay subframe group difference in the Cover speech is the same as that of the Stego speech. There is a clear difference in speech.

从图3、图4的对比结果可以看出，将基音延迟子帧组组内差值Markov转移概率特征作为分类特征来区分cover音频与stego音频是有效的。From the comparison results in Figure 3 and Figure 4, it can be seen that it is effective to use the Markov transition probability feature of the pitch delay subframe group intra-group difference as a classification feature to distinguish cover audio from stego audio.

2、隐写分析检测2. Steganalysis detection

2.1分类器训练2.1 Classifier Training

步骤2.1.1，输入WAV样本，分别生成cover样本和对应的stego样本，并按照1中的方法提取基音延迟子帧组组内差值Markov转移概率分类特征。Step 2.1.1, input WAV samples, generate cover samples and corresponding stego samples respectively, and extract the Markov transition probability classification feature of pitch delay sub-frame group difference value according to the method in 1.

步骤2.1.2，经过2.1.1过程后，得到训练集样本和数量相等的两种不同嵌入算法的载密样本，然后随机选择不同数量的stego样本和cover样本使用SVM分类器训练隐写分析模型。Step 2.1.2, after the process of 2.1.1, obtain the training set samples and the same number of encrypted samples of two different embedding algorithms, and then randomly select different numbers of stego samples and cover samples and use the SVM classifier to train the steganalysis model .

2.2隐写分析检测2.2 Steganalysis detection

步骤2.2.1，提取待测样本的隐写分析特征集合。Step 2.2.1, extracting the steganalysis feature set of the sample to be tested.

2.3隐写分析实验结果2.3 Experimental results of steganalysis

为了验证本发明算法的有效性，本发明针对不同隐写方法训练的隐写分析模型，实验结果如表1、表2所示。其中TPR表示载密音频(Stego)的检测为Stego的概率，TNR表示载体音频(Cover)检测为Cover的检测率。In order to verify the effectiveness of the algorithm of the present invention, the present invention aims at the steganalysis model trained by different steganography methods, and the experimental results are shown in Table 1 and Table 2. Among them, TPR represents the probability that the secret audio (Stego) is detected as Stego, and TNR represents the detection rate that the carrier audio (Cover) is detected as Cover.

实验结果表明，在12.2kbps、10.2kbps、7.95kbps、7.40kbps、6.70kbps和5.90kbps编码模式下，本发明的分类模型对已有的两种针对比例因子带码书修改的隐写方法都有较好的检测能力，在相对嵌入率为50％时，检测率都能达到95％以上。Experimental result shows, under 12.2kbps, 10.2kbps, 7.95kbps, 7.40kbps, 6.70kbps and 5.90kbps encoding mode, classification model of the present invention has to existing two kinds of steganography methods that are modified for scale factor band code book. Good detection ability, when the relative embedding rate is 50%, the detection rate can reach more than 95%.

表2检测文献[1]隐写算法的性能Table 2 detects the performance of the steganographic algorithm in the literature [1]

表3检测文献[2]隐写算法的性能Table 3 detects the performance of the steganographic algorithm in the literature [2]

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention belongs can make various modifications or supplements to the described specific embodiments or adopt similar methods to replace them, but they will not deviate from the spirit of the present invention or go beyond the definition of the appended claims range.

Claims

1. a kind of AMR pitch delay steganalysis method based on pitch delay subframe group difference Markov transition probability feature, it is characterized in that, comprising:

Step 1, based on the Markov transition probability of the difference in the pitch delay sub-frame group as the short-term stability evaluation standard to extract features, specifically including:

Step 1.1, build the difference value in the pitch delay subframe group group, in the AMR voice compression coding, the voice of 20ms is a frame, and a frame of voice is divided into 4 subframes by 5ms, and each subframe has a pitch delay; In a speech frame, the pitch delays of the _second subframe T2 and the _fourth subframe T4 are respectively based on the _first subframe T1 and the _third subframe T3, and are obtained according to correlation within a certain interval; as As shown in Figure 2, in the speech frame, the _first subframe T1 and the _second subframe T2 are used as a subframe group, and the _third subframe T3 and the _fourth subframe T4 are used as another subframe group;

For the convenience of description, the pitch delay sequence of AMR speech is represented as P=(p ₁₁ ,p ₁₂ ,...,p _t1 , _pt2 ,...,p _N1 ,p _N2 ) in subframe group, and N is The total number of subframe groups in AMR speech, i is the index value of the subframe group arranged in time, p _t1 and p _t2 are the first subframe and the second subframe in the subframe group t respectively; due to speech voiced The segment pitch delay has short-term stability, and the calculation of the intra-subframe group difference is used to measure the stability between two subframes in the subframe group; the calculation formula of the intra-subframe group difference D _intra is shown in Equation 1 , t∈[1,N]:

D _intra (t) = p _t1 -p _t2 Formula 1

Step 1.2, constructing the Markov transition probability feature of the intra-group difference of pitch delay subframes: the function of the Markov transition probability matrix is to measure the probability of a variable transitioning between different states, so it can be used to represent the continuous pitch delay subframe The change of the difference value within the frame group group; calculate the Markov transition probability of the difference value within the sub-frame group of the pitch delay as the feature of steganalysis, describe the difference between the difference value of the pitch delay sub-frame group between the Cover speech and the Stego speech, and realize the the distinction between the two;

The Markov transition probability feature of the pitch delay subframe group intra-group difference is defined as M1 _intra , and is calculated according to formula 2: when the value of M1 _intra (i, j) is the intra-group difference D _intra (t) is i, D _intra ( t+1) is the Markovian transition probability of j, t is the index on the time sequence of intra-group differences, S ₀ is the total number of calculated sub-frame group intra-group differences

According to AMR coding, in a frame of speech, the pitch delays of the second subframe T ₂ and the fourth subframe T ₄ are based on the first subframe T ₁ and the third subframe T ₃ respectively, according to the correlation within a certain interval Therefore, for different code rates, the interval of the intra-group difference is fixed, the range of the intra-group difference D _intra , and the corresponding M1 _intra feature dimension, by extracting the intra-group difference of the AMR voice pitch delay subframe group Value Markov transition probability as a steganographic classification feature for pitch delay modification;

Step 2. Use the SVM classifier to perform binary classification on the Cover and Stego samples of the AMR speech and perform steganalysis detection.

2. a kind of AMR pitch delay steganalysis method based on pitch delay subframe group difference Markov transition probability feature according to claim 1, it is characterized in that, described step 2 specifically comprises:

Step 2.1, classifier training, specifically includes:

Step 2.1.1, input WAV samples, generate cover samples and corresponding stego samples respectively, and extract pitch delay subframe group difference Markov transition probability classification features according to the method in 1;

Step 2.1.2, after the process of 2.1.1, obtain the training set samples and the same number of encrypted samples of two different embedding algorithms, and then randomly select different numbers of stego samples and cover samples and use the SVM classifier to train the steganalysis model ;

Step 2.2, steganalysis detection, specifically includes:

The process of performing steganalysis detection using the above steganalysis model includes the following steps:

Step 2.2.1, extracting the steganalysis feature set of the sample to be tested;

In step 2.2.2, the features are input into the constructed steganalysis model to obtain the steganographic judgment result of the sample.