[go: up one dir, main page]

CN109192217A - General information towards multiclass low rate compression voice steganography hides detection method - Google Patents

General information towards multiclass low rate compression voice steganography hides detection method Download PDF

Info

Publication number
CN109192217A
CN109192217A CN201810884205.9A CN201810884205A CN109192217A CN 109192217 A CN109192217 A CN 109192217A CN 201810884205 A CN201810884205 A CN 201810884205A CN 109192217 A CN109192217 A CN 109192217A
Authority
CN
China
Prior art keywords
acl
steganography
symbol
frame
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810884205.9A
Other languages
Chinese (zh)
Other versions
CN109192217B (en
Inventor
李松斌
刘鹏
杨洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Research Station Of South China Sea Institute Of Acoustics Chinese Academy Of Sciences
Institute of Acoustics CAS
Original Assignee
Research Station Of South China Sea Institute Of Acoustics Chinese Academy Of Sciences
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Station Of South China Sea Institute Of Acoustics Chinese Academy Of Sciences, Institute of Acoustics CAS filed Critical Research Station Of South China Sea Institute Of Acoustics Chinese Academy Of Sciences
Priority to CN201810884205.9A priority Critical patent/CN109192217B/en
Publication of CN109192217A publication Critical patent/CN109192217A/en
Application granted granted Critical
Publication of CN109192217B publication Critical patent/CN109192217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了面向多类低速率压缩语音隐写的通用信息隐藏检测方法,用于检测G.723.1低速率语音编码器的编码,所述方法具体步骤如下:步骤1)基于低速率压缩语音码流中的码元构建所有码元帧内帧间关联的码元时空关联网络CESN;步骤2)利用码元时空关联网络CESN中的强关联边进一步构建用于隐写分析的码元贝叶斯网络CEBN;步骤3)基于训练样本对码元贝叶斯网络CEBN进行网络参数学习并得到阈值Jthr;步骤4)给定一段包含N帧的语音片段,根据码元贝叶斯网络CEBN的网络参数计算语音隐写指数J,当J<Jthr时,判定为未隐写语音片段;当J≥Jthr时,判定为隐写语音片段。本发明的方法的通用隐写分析能力明显高于基于非压缩域MFCC通用隐写分析方法。

The invention discloses a general information hiding detection method for multi-type low-rate compressed speech steganography, which is used for detecting the coding of G.723.1 low-rate speech encoder. The specific steps of the method are as follows: Step 1) based on the low-rate compressed speech code The symbols in the stream construct the symbol space-time correlation network CESN associated with all symbols within the frame; step 2) further construct the symbol Bayesian for steganalysis by using the strong correlation edges in the symbol space-time correlation network CESN Network CEBN; Step 3) carry out network parameter learning to symbol Bayesian network CEBN based on training samples and obtain threshold J thr ; Step 4) given a segment of speech fragments comprising N frames, according to the network of symbol Bayesian network CEBN The parameter calculates the speech steganography index J. When J<J thr , it is determined as an unsteganographic speech segment; when J≥J thr , it is determined as a steganographic speech segment. The general steganalysis capability of the method of the present invention is obviously higher than that of the general steganalysis method based on the uncompressed domain MFCC.

Description

General information towards multiclass low rate compression voice steganography hides detection method
Technical field
The present invention relates to field of information security technology, in particular to the general letter towards multiclass low rate compression voice steganography Hidden information detection method.
Background technique
VoIP (Voice over Internet Protocol) speech hiding is a kind of to be embedded into secret information Secret information is set to be difficult to the technology discovered by regulator in VoIP voice bearer.VoIP system has used multiple voice coding Device, including G.711, G.726, G.728, G.729, G.723.1, iLBC, AMR and their derived version etc., they can be big Cause is divided into two classes: first is that it is based on the high-speed wave coder of pulse code modulation (Pulse Code Modulation, PCM), Typical Representative is as G.711;Second is that linear predictive coding (the Analysis by Synthesis-Linear based on analysis-by-synthesis method Predictive Coding, AbS-LPC) middle low rate hybrid coder, Typical Representative is as G.723.1, G.729 and AMR Deng.Since high-speed speech coder is applied simultaneously seldom in actual VoIP system, to save bandwidth resources, low speed is mostly used Rate encoder.Currently, common Methods of Speech Information Hiding is to realize secret letter based on AbS-LPC low rate compression voice coding The insertion of breath.In terms of existing document, the information concealing method based on AbS-LPC low rate compression speech coder is according to insertion The difference of position can be divided into three classes: first kind method is to carry out Information hiding using pitch synthesis filter;Second class method is Information hiding is carried out using LPC composite filter;Third class method is that the symbol in direct modification compression speech code stream carries out letter Breath is hidden.
To detect in speech code stream with the presence or absence of hiding information as purpose Information Hiding & Detecting technology, also known as voice steganography Analytical technology is and speech hiding technology synchronized development, their mutually contradictory and development of mutually promoting.Existing AbS- LPC low rate compression voice steganalysis method is divided into three classes for information concealing method: the first kind is closed for based on tone The steganalysis method of Information hiding is carried out at filter;Second class is to carry out Information hiding for based on LPC composite filter Steganalysis method;Third class is the steganalysis method hiding for symbol modulation information in speech code stream.
Above-mentioned three classes AbS-LPC low rate compression voice steganalysis method mainly includes three steps: being extracted first AbS-LPC low rate compresses the various features in voice;Then Fusion Features processing is carried out, is carried out if characteristic dimension is excessively high Feature Selection or dimension-reduction treatment;Finally using treated feature as feature vector, support vector machines (Support is utilized Vector Machine, SVM) training classifier, realize that AbS-LPC low rate compresses voice steganalysis.These methods are main There are two features: first is that using the frame of " feature extraction --- characteristic processing --- svm classifier ";Second is that hidden for specific information Hiding method is detected, and specific encoded information or specific code element information are only utilized in feature extraction, thus cannot Carry out general steganalysis.If carrying out general steganalysis, need using a variety of steganalysis methods, complexity is high, very Difficulty meets application request.Some steganalysis methods that phonetic feature is extracted based on uncompressed domain, such as mel cepstrum feature, Such methods are a kind of general steganalysis methods, although can be used for AbS-LPC low-bit-rate speech coding steganalysis;But it should Class method is mainly and is uncompressed voice document steganalysis and designs, and AbS-LPC coding principle is not bound with, to AbS-LPC The Detection accuracy that low rate compresses voice is relatively low, and generalization ability is weaker.Therefore it is low to need a kind of efficient general AbS-LPC Rate compresses voice steganalysis method.
Summary of the invention
It is an object of the invention to overcome above-mentioned technological deficiency, propose a kind of towards multiclass low rate compression voice steganography General information hide detection method, for detecting the coding of G.723.1 Speech Coding at Low Bit Rates, the method includes under:
When step 1) constructs the symbol of intra-frame trunk in all symbol frames based on the symbol in low rate compression speech code stream Null Context network C ESN;
Step 2) utilizes the strong incidence edge in symbol space time correlation network C ESN further to construct the code for steganalysis First Bayesian network CEBN;
Step 3) is based on training sample and carries out network parameter study to symbol Bayesian network CEBN and obtain threshold value Jthr
The given one section of sound bite comprising N frame of step 4), calculates according to the network parameter of symbol Bayesian network CEBN Voice steganography index J, as J < JthrWhen, it is determined as non-steganography sound bite;As J >=JthrWhen, it is determined as steganography sound bite.
As a kind of improvement of the above method, the step 1) is specifically included:
Step 1-1) G.723.1 the every frame of the encoding code stream of Speech Coding at Low Bit Rates be made of 24 symbols;24 symbols It include: 3 LPC vector quantization index symbols: VQ1、VQ2And VQ3, 4 adaptive codebooks delay symbols: ACL0、ACL1、ACL2With ACL3, 4 portfolio premium symbols: GAIN0、GAIN1、GAIN2And GAIN3, 5 pulse positions index symbols: POS0、POS1、 POS2、POS3And MSBPOS, 4 impulse codes index symbol: PSIG0、PSIG1、PSIG2And PSIG3, 4 grid index symbols GRID0、GRID1、GRID2And GRID3
Step 1-2) 24 symbols are denoted as Ci, i=1,2 ..., 24, with CiFor vertex, incidence relation conduct between intra frame The digraph on side, is denoted as figure D=<V, and E>, wherein V and E is indicated are as follows:
Wherein: V indicates the set on vertex in digraph D, vi[m] indicates i-th of code word of m frame;E indicates the collection on side in D It closes, by vertex vi[p] is directed toward vertex vjThe directed edge of [q];Incidence edge in frame is indicated as i=j and p ≠ q;It is indicated as j > i Intra-frame trunk side;Analyze incidence relation when { 0,1 } j-i ∈: incidence edge has 276, interframe in the frame being made of 24 symbols Incidence edge has 576, shares 852;
Step 1-3) calculate symbol CiAnd CjRelative index Ic(Ci,Cj):
Wherein, riAnd rjRespectively indicate symbol CiAnd CjMaximum value, p (Ci=ci) and p (Cj=ci) respectively indicate symbol Ci=ciMarginal probability and symbol Cj=ciMarginal probability, p (Ci=ci,Cj=ci) indicate symbol Ci=ciAnd Cj=ciConnection Close probability;If symbol CiAnd CjIndependently of each other, then arbitrary c is corresponded toiAnd ciIt is all satisfied p (Ci=ci)p(Cj=ci)=p (Ci= ci,Cj=ci), then Ic(Ci,Cj)=0;
Step 1-4) in 852 incidence edges retain Ic(Ci,Cj) > 0.5 strong incidence edge, then symbol space time correlation network CESN includes strong incidence edge and 7 strong incidence edges of interframe totally 11 sides in 4 frames;Strong incidence edge in frame are as follows: ACL0-ACL2、VQ1- VQ2、VQ1-VQ3And VQ2-VQ3;The strong incidence edge of interframe are as follows: ACL0-ACL0、ACL0-ACL2、ACL2-ACL0、ACL2-ACL2、VQ1- VQ1、VQ2-VQ2And VQ3-VQ3
As a kind of improvement of the above method, the step 2) is specifically included:
Step 2-1) using voice frame category as root node O, the voice frame category includes: non-steganography, is denoted as 0 and hidden It writes, is denoted as 1;
Step 2-2) respectively using 24 symbol values as the child node of root node O, by grid index symbol GRID0、 GRID1、GRID2、GRID3This 4 nodes merge into a node, are denoted as GRID, it is 4 that bit, which distributes number,;By symbol ACL1With ACL3A node is merged into, is denoted as ACL, it is 4 that bit, which distributes number,;20 child nodes are shared after having merged, and are constituted by O to 20 The directed edge of a node;
Step 2-3) delete step 1-4) strong incidence edge ACL in obtained frame0-ACL2、VQ1-VQ2、VQ1-VQ3And VQ2-VQ3 Middle VQ1To VQ3Directed edge VQ1-VQ3
Step 2-4) respectively using adjacent interframe value as node ACL0、ACL2、VQ1、VQ2、VQ3Child node ACL0’、 ACL2’、VQ1’、VQ2’、VQ3', it constitutes by ACL0To ACL0’、ACL0To ACL2’、ACL2To ACL2’、ACL2To ACL0’、VQ1It arrives VQ1’、VQ2To VQ2’、VQ3To VQ3' seven directed edges;Remove ACL0To ACL2' and ACL2To ACL0Two sides;By directed edge O To ACL0', O to ACL2', O to VQ1', O to VQ2', O to VQ3' be added in network;
Step 2-5) final constructed symbol Bayesian network CEBN is a multitiered network being made of 26 nodes, All code element informations are contained, wherein 15 node GAIN0、GAIN1、GAIN2、GAIN3、POS0、POS1、POS2、POS3、 MSBPOS、PSIG0、PSIG1、PSIG2、PSIG3, ACL, GRID be isolated node.
As a kind of improvement of the above method, the step 3) is specifically included:
Step 3-1) node in symbol Bayesian network CEBN is denoted as stochastic variable X respectively1,X2,…,X26, wherein X1 Corresponding root node O, other stochastic variables correspond to child node;The value of stochastic variable is denoted as x respectively1,x2,…,x26, then network Joint probability distribution are as follows:
Wherein: Pa (Xi) indicate stochastic variable XiFather node;P(Xi|Pa(Xi)) indicate stochastic variable XiConditional probability;
Step 3-2) note stochastic variable XiShared KiA value, θijkIndicate XiTake its k-th of value, Pa (Xi) take its jth Conditional probability when a value, then θijkIt indicates are as follows:
θijk=P (Xi=xik|Pa(Xi)=Pa (Xi)j)
Step 3-3) using Bayesian network parameters study space time correlation network C EBN parameters θijkValue
Step 3-4) pass through training sample calculating threshold value Jthr
As a kind of improvement of the above method, the step 3-3) it specifically includes:
Step 3-3-1) Bayesian network parameters study is using mode shown in following formula:
Wherein: π (θ) indicates prior distribution, and θ is parameter;χ indicates sample information;π (θ | χ) indicate Posterior distrbutionp, priori point Cloth π (θ) is Dirichlet distribution, then:
Wherein, Γ () is gamma function;αijk,1≤k≤KiFor hyper parameter, Dir () indicates Dirichlet distribution Function;
Step 3-3-2) it is located in sample χ and meets Xi=xikAnd Pa (Xi)=Pa (Xi)jNumber be βijk, due to posteriority point Cloth π (θ | χ) also obeys Dirichlet distribution, then π (θ | χ) is indicated are as follows:
Step 3-3-3) space time correlation network C EBN parameters θijkMAP estimationAre as follows:
As a kind of improvement of the above method, the step 3-4) it specifically includes:
Step 3-4-1) assume to include M sections of voices in training sample, calculate steganography of the every section of voice in non-steganography Steganography index in the case of index and steganography;Remember that steganography index of all training samples in non-steganography constitutes set JC= {Jc1,Jc2,…,JcM, the steganography index in steganography constitutes set JS={ Js1,Js2,…,JsM};
Step 3-4-2) threshold value JthrIt is obtained by following formula:
Wherein: CNT (JC:Jcl< Jthr) and CNT (JS:Jsl≥Jthr) respectively indicate non-steganography index JCIn meet Jcl< JthrNumber and steganography index JSIn meet Jsl≥JthrNumber, 1≤l≤M.
As a kind of improvement of the above method, the step 4) is specifically included:
Step 4-1) one section of sound bite comprising N frame is given, calculate the whole steganography degree for measuring sound bite Voice steganography index J=Nstego/ N, NstegoIndicate the quantity for being judged as steganography frame in sound bite;
Classified using diagnostic reasoning from bottom to top to each speech frame, i.e., known child node parameter distribution, to count Calculate the probability of father node:
Work as x1=0 and x1When=1, speech frame is respectively indicated in stochastic variable Xi=xi, i=2 is non-steganography when 3 ..., 26 The posterior probability of frame and steganography frame;Work as x1=1 probability is greater than x1When=0 probability, it is believed that the frame steganography, otherwise it is assumed that the frame Non- steganography;Thus to obtain Nstego
Step 4-2) as J < JthrWhen, it is determined as non-steganography voice;As J >=JthrWhen, it is determined as steganography voice.
Present invention has an advantage that
1, the general steganalysis ability of method of the invention is apparently higher than based on uncompressed domain MFCC general steganalysis Method;
2, when voice duration is shorter, method of the invention is than traditional SVM Stego-detection method based on feature extraction Advantage is more obvious;
3, the method for the invention general steganalysis ability still with higher when network complexity is lower;
4, method of the invention not only can be used as dedicated steganalysis method, but also can be used as general steganalysis method; And current dedicated steganalysis method can only aimed detection part steganography.
Detailed description of the invention
Fig. 1 is symbol space time correlation network diagram proposed by the invention;
Fig. 2 is symbol Bayesian network proposed by the invention.
Specific embodiment
Synthesis analysis linear predictive coding (Analysis by Synthesis-Linear Predictive Coding, AbS-LPC) it is widely used in a variety of Low-ratespeech codings, existing AbS-LPC low rate compresses voice Stego-detection side Method is designed for certain types of steganography method, and generalization ability is weaker.For this purpose, the invention proposes one kind towards multiclass low speed The general information that rate compresses voice steganography hides detection method.Since the symbol in AbS-LPC low rate compression speech code stream is deposited In space-time relationship, and all AbS-LPC low rates compression voice steganography method is inherently to change symbol value.Therefore, The present invention is from the angle of symbol, first building symbol space time correlation network, then using in symbol space time correlation network Strong incidence edge constructs symbol Bayesian network, uses Dirichlet distribution as prior distribution learning network parameter, and be based on shellfish The general Stego-detection of this implementation of inference multiclass low rate of leaf compression voice steganography.
The invention will be further described in the following with reference to the drawings and specific embodiments.
The invention proposes the general informations towards multiclass low rate compression voice steganography to hide detection method, the method Specific step is as follows:
Step S1) network of intra-frame trunk in all symbol frames constructed based on the symbol in low rate compression speech code stream, Referred to as symbol space time correlation network (Code Element Spatiotemporal Network, CESN);
The step S1) it specifically includes:
Step S1-1) G.723.1 the every frame of the encoding code stream of Speech Coding at Low Bit Rates be made of 24 symbols, comprising: LPC Vector quantization indexes symbol VQ1、VQ2、VQ3, adaptive codebook delay symbol ACL0、ACL1、ACL2、ACL3, portfolio premium symbol GAIN0、GAIN1、GAIN2、GAIN3, pulse position index symbol POS0、POS1、POS2、POS3, MSBPOS, impulse code index Symbol PSIG0、PSIG1、PSIG2、PSIG3With grid index symbol GRID0、GRID1、GRID2、GRID3.24 symbols are denoted as Ci(i=1,2 ..., 24) is constructed with CiFor vertex, digraph of the incidence relation as side between intra frame, as shown in Figure 1, being denoted as Figure D=<V, E>, wherein V and E is indicated are as follows:
Wherein: V indicates the set on vertex in digraph D, vi[m] indicates i-th of code word of m frame;E indicates the collection on side in D It closes, by vertex vi[p] is directed toward vertex vjThe directed edge of [q].Incidence edge in frame is indicated as i=j and p ≠ q;It is indicated as j > i Intra-frame trunk side.Since symbol interframe incidence relation power can be by time effects, time interval is bigger, and intra-frame trunk relationship is got over It is weak;Therefore for simplifying the analysis, the present invention only considers the symbol association relationship of adjacent interframe, i.e., when only analyzing { 0,1 } j-i ∈ Incidence relation.Therefore in the frame being made of 24 symbols incidence edge have 276, intra-frame trunk side have 576, share 852.
Step S1-2) due to the incidence edge for including in CESN it is too many, be unfavorable for analyzing.Due to the corresponding symbol of each edge Correlation is strong and weak different, therefore the weaker side of correlation in CESN can be removed, and the strong side of retention relationship is with abbreviation CESN. In order to quantify the power of symbol relevance, the present invention defines symbol CiAnd CjRelative index Ic(Ci,Cj) are as follows:
Wherein, riAnd rjRespectively indicate symbol CiAnd CjMaximum value, p (Ci=ci) and p (Cj=ci) respectively indicate symbol Ci=ciMarginal probability and symbol Cj=ciMarginal probability, p (Ci=ci,Cj=ci) indicate symbol Ci=ciAnd Cj=ciConnection Close probability.If symbol CiAnd CjIndependently of each other, then arbitrary c is corresponded toiAnd ciIt is all satisfied p (Ci=ci)p(Cj=ci)=p (Ci= ci,Cj=ci), i.e. Ic(Ci,Cj)=0.Therefore symbol CiAnd CjCorrelation is stronger, IcIt is bigger.
Using 0.5 as IcThreshold value classify to 852 sides, only remain larger than 0.5 strong incidence edge.After abbreviation CESN includes strong incidence edge and 7 strong incidence edges of interframe totally 11 sides in 4 frames.Strong incidence edge in frame are as follows: ACL0-ACL2、VQ1- VQ2、VQ1-VQ3And VQ2-VQ3.The strong incidence edge of interframe are as follows: ACL0-ACL0、ACL0-ACL2、ACL2-ACL0、ACL2-ACL2、VQ1- VQ1、VQ2-VQ2And VQ3-VQ3
Statistics indicate that relevance is weak between inhomogeneity symbol, VQ between similar symbol1、VQ2、VQ3Space-time relationship is strong, ACL0With ACL2Space-time relationship is strong.This is because symbol LPC vector quantization indexes symbol VQ1、VQ2、VQ3It is to divide in short-term speech model Analysis as a result, adaptive codebook postpone symbol ACL0、ACL2It is to be analyzed when long voice signal as a result, i.e. pitch period, therefore Their relevances are stronger;Symbol ACL1And ACL3The difference with previous period of sub-frame is indicated, respectively by ACL0And ACL3It determines, because This relevance is weaker;And other symbols are used to indicate that voice by the residual signal in short-term, after long-term prediction, does not have obvious Space-time relationship.
Step S2) utilize the strong incidence edge in symbol space time correlation network C ESN further to construct the code for steganalysis First Bayesian network (Code Element Bayesian Network, CEBN);
The step S2) it specifically includes:
Step S2-1) using voice frame category as root node O, there are two kinds of non-steganography (being denoted as 0) and steganography (being denoted as 1).
Step S2-2) respectively using 24 symbol values as the child node of root node O, due to grid index symbol GRID0、 GRID1、GRID2、GRID3Relevance is weak, and symbol accounting spy's very little, is 1 bit, thus this 4 nodes are merged into one Node is denoted as GRID, and it is 4 that bit, which distributes number,;Similar, by symbol ACL1And ACL3A node is merged into, ACL, bit are denoted as Distributing number is 4;Share 20 child nodes after having merged, constitute the directed edge by O to 20 node, each node value according to When symbol association is analyzed by the way of, the value of the symbol by value range greater than T is divided into T section.
Step S2-3) according to incidence edge ACL strong in frame0-ACL2、VQ1-VQ2、VQ1-VQ3、VQ2-VQ3, due to constitute by ACL0To ACL2、VQ1To VQ2、VQ1To VQ3、VQ2To VQ3Four directed edges;Due to VQ2-VQ3Relative index be greater than VQ1- VQ3Relative index, i.e. VQ3Value by VQ2Influence ratio VQ1Greatly, thus remove VQ1To VQ3Directed edge.
Step S2-4) according to the strong incidence edge ACL of interframe0-ACL0、ACL0-ACL2、ACL2-ACL0、ACL2-ACL2、VQ1- VQ1、VQ2-VQ2、VQ3-VQ3, ACL2-ACL2Compare ACL0-ACL2Relevance it is stronger, respectively using adjacent interframe value as node ACL0、ACL2、VQ1、VQ2、VQ3Child node ACL0’、ACL2’、VQ1’、VQ2’、VQ3', it constitutes by ACL0To ACL0’、ACL0It arrives ACL2’、ACL2To ACL2’、ACL2To ACL0’、VQ1To VQ1’、VQ2To VQ2’、VQ3To VQ3' seven directed edges;Due to interframe ACL0-ACL0Compare ACL2-ACL0Relevance it is stronger, ACL2-ACL2Compare ACL0-ACL2Relevance it is stronger, therefore remove ACL0It arrives ACL2' and ACL2To ACL0Two sides;Further, since ACL0’、ACL2’、VQ1’、VQ2’、VQ3' value will receive speech frame class Other influence, therefore by directed edge O to ACL0', O to ACL2', O to VQ1', O to VQ2', O to VQ3' be added in network.
Final constructed symbol space time correlation network as shown in Fig. 2, be a multitiered network being made of 26 nodes, All code element informations are contained, wherein 15 node GAIN0、GAIN1、GAIN2、GAIN3、POS0、POS1、POS2、POS3、 MSBPOS、PSIG0、PSIG1、PSIG2、PSIG3, ACL, GRID be isolated node.
Step S3) training sample is based on to symbol Bayesian network CEBN progress network parameter study;
The step S3) it specifically includes:
Step S3-1) for ease of description, the node in CEBN is denoted as stochastic variable X respectively1,X2,…,X26, wherein X1 Corresponding root node O, other stochastic variables correspond to child node.The value of stochastic variable is denoted as x respectively1,x2,…,x26, then network Joint probability distribution are as follows:
Wherein: Pa (Xi) indicate stochastic variable XiFather node;P(Xi|Pa(Xi)) indicate stochastic variable XiConditional probability.
Step S3-2) note stochastic variable XiShared KiA value, θijkIndicate XiTake its k-th of value, Pa (Xi) take its jth Conditional probability when a value, then θijkIt may be expressed as:
θijk=P (Xi=xik|Pa(Xi)=Pa (Xi)j)
Step S3-3) study of network parameter is substantially to learn each θijkValue, Bayesian network parameters study is logical Frequently with mode shown in following formula:
Wherein: π (θ) indicates prior distribution;χ indicates sample information;π (θ | χ) indicate that Posterior distrbutionp, parameter learning combine Its prior information and sample information.
Step S3-4) in Bayesian network parameters study, prior distribution generally chooses conjugation distribution, i.e. prior distribution π (θ) and Posterior distrbutionp π (θ | χ) belong to same type distribution, the present invention selects common Dirichlet distribution as priori point Cloth, then:
Wherein Γ () is gamma function, αijk,1≤k≤KiFor hyper parameter, Dir () indicates that Dirichlet is distributed letter Number.
Step S3-5) it is located in sample χ and meets Xi=xikAnd Pa (Xi)=Pa (Xi)jNumber be βijk, due to posteriority point Cloth π (θ | χ) also obeys Dirichlet distribution, then π (θ | χ) may be expressed as:
The MAP estimation of network parameter θ are as follows:
Step S4) one section of sound bite comprising N frame is given, the present invention defines voice steganography index J to measure voice sheet The whole steganography degree of section, i.e. J=Nstego/ N, NstegoIndicate the quantity for being judged as steganography frame in sound bite;The present invention passes through Threshold value J is setthrCome judge voice whether steganography, as J < JthrWhen, it is determined as non-steganography voice;As J >=JthrWhen, it is determined as hidden Write voice.
Wherein, JthrIt is obtained by training sample, so that the determination rate of accuracy highest in training sample.Assuming that in training sample Comprising M sections of voices, steganography index in the case of the available steganography index and steganography in non-steganography of every section of voice remembers institute There is steganography index of the training sample in non-steganography to constitute set JC={ Jc1,Jc2,…,JcM, it is hidden in steganography It writes index and constitutes set JS={ Js1,Js2,…,JsM, then JthrIt is obtained by following formula:
Wherein: CNT (JC:Jcj< Jthr) and CNT (JS:Jsj≥Jthr) respectively indicate non-steganography index JCIn meet Jcj< JthrNumber and steganography index JSIn meet Jsj≥JthrNumber.Obtain JthrIt afterwards, can be into for given test sample Row Stego-detection.
Given one section of sound bite comprising N frame, can calculate the probability that each frame is non-steganography frame and steganography frame, note I-th frame is that the probability of non-steganography frame is pi cover, be the probability of steganography frame it is pi stego.Theoretically speaking if pi cover<pi stego, Then the speech frame is steganography frame;It otherwise is non-steganography frame.However, it is difficult to which accurately frame each in sound bite is all determined just Really, therefore the present invention and determine whether entire sound bite is steganography voice without using a certain frame steganography classification results, and make Steganography is determined whether with sound bite entirety steganography degree.
Classified using diagnostic reasoning from bottom to top to sample, i.e., known child node parameter distribution, to calculate father's section The probability of point.I.e.
Work as x1=0 and x1When=1, speech frame is respectively indicated in stochastic variable Xi=xiIt is not hidden when (i=2,3 ..., 26) Write the posterior probability of frame and steganography frame.Work as x1=1 probability is greater than x1When=0 probability, it is believed that the frame steganography.
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.Although ginseng It is described the invention in detail according to embodiment, those skilled in the art should understand that, to technical side of the invention Case is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered in the present invention Scope of the claims in.

Claims (7)

1. a kind of general information towards multiclass low rate compression voice steganography hides detection method, G.723.1 low for detecting The coding of speed speech encoder, specific step is as follows for the method:
Step 1) is closed based on the symbol space-time that the symbol in low rate compression speech code stream constructs intra-frame trunk in all symbol frames Network network CESN;
Step 2) utilizes the strong incidence edge in symbol space time correlation network C ESN further to construct the symbol shellfish for steganalysis This network C of leaf EBN;
Step 3) is based on training sample and carries out network parameter study to symbol Bayesian network CEBN and obtain threshold value Jthr
The given one section of sound bite comprising N frame of step 4), calculates voice according to the network parameter of symbol Bayesian network CEBN Steganography index J, as J < JthrWhen, it is determined as non-steganography sound bite;As J >=JthrWhen, it is determined as steganography sound bite.
2. the general information according to claim 1 towards multiclass low rate compression voice steganography hides detection method, It is characterized in that, the step 1) specifically includes:
Step 1-1) G.723.1 the every frame of the encoding code stream of Speech Coding at Low Bit Rates be made of 24 symbols;24 symbols include: 3 LPC vector quantizations index symbol: VQ1、VQ2And VQ3, 4 adaptive codebooks delay symbols: ACL0、ACL1、ACL2And ACL3, 4 portfolio premium symbols: GAIN0、GAIN1、GAIN2And GAIN3, 5 pulse positions index symbols: POS0、POS1、POS2、POS3 And MSBPOS, 4 impulse codes index symbol: PSIG0、PSIG1、PSIG2And PSIG3, 4 grid index symbol GRID0、 GRID1、GRID2And GRID3
Step 1-2) 24 symbols are denoted as Ci, i=1,2 ..., 24, with CiFor vertex, incidence relation is as side between intra frame Digraph, is denoted as figure D=<V, and E>, wherein V and E is indicated are as follows:
Wherein: V indicates the set on vertex in digraph D, vi[m] indicates i-th of code word of m frame;E indicates the set on side in D, by Vertex vi[p] is directed toward vertex vjThe directed edge of [q];Incidence edge in frame is indicated as i=j and p ≠ q;Indicate that interframe is closed as j > i Join side;Analyze incidence relation when { 0,1 } j-i ∈: incidence edge has 276, intra-frame trunk side in the frame being made of 24 symbols There are 576, shares 852;
Step 1-3) calculate symbol CiAnd CjRelative index Ic(Ci,Cj):
Wherein, riAnd rjRespectively indicate symbol CiAnd CjMaximum value, p (Ci=ci) and p (Cj=ci) respectively indicate symbol Ci= ciMarginal probability and symbol Cj=ciMarginal probability, p (Ci=ci,Cj=ci) indicate symbol Ci=ciAnd Cj=ciJoint it is general Rate;If symbol CiAnd CjIndependently of each other, then arbitrary c is corresponded toiAnd ciIt is all satisfied p (Ci=ci)p(Cj=ci)=p (Ci=ci,Cj= ci), then Ic(Ci,Cj)=0;
Step 1-4) in 852 incidence edges retain Ic(Ci,Cj) > 0.5 strong incidence edge, then symbol space time correlation network C ESN Include strong incidence edge in 4 frames and 7 strong incidence edges of interframe totally 11 sides;Strong incidence edge in frame are as follows: ACL0-ACL2、VQ1-VQ2、 VQ1-VQ3And VQ2-VQ3;The strong incidence edge of interframe are as follows: ACL0-ACL0、ACL0-ACL2、ACL2-ACL0、ACL2-ACL2、VQ1-VQ1、 VQ2-VQ2And VQ3-VQ3
3. the general information according to claim 2 towards multiclass low rate compression voice steganography hides detection method, It is characterized in that, the step 2) specifically includes:
Step 2-1) using voice frame category as root node O, the voice frame category includes: non-steganography, is denoted as 0 and steganography, is remembered It is 1;
Step 2-2) respectively using 24 symbol values as the child node of root node O, by grid index symbol GRID0、GRID1、 GRID2、GRID3This 4 nodes merge into a node, are denoted as GRID, it is 4 that bit, which distributes number,;By symbol ACL1And ACL3Merge For a node, it is denoted as ACL, it is 4 that bit, which distributes number,;20 child nodes are shared after having merged, and are constituted by O to 20 node Directed edge;
Step 2-3) delete step 1-4) strong incidence edge ACL in obtained frame0-ACL2、VQ1-VQ2、VQ1-VQ3And VQ2-VQ3In VQ1To VQ3Directed edge VQ1-VQ3
Step 2-4) respectively using adjacent interframe value as node ACL0、ACL2、VQ1、VQ2、VQ3Child node ACL0’、ACL2’、 VQ1’、VQ2’、VQ3', it constitutes by ACL0To ACL0’、ACL0To ACL2’、ACL2To ACL2’、ACL2To ACL0’、VQ1To VQ1’、VQ2 To VQ2’、VQ3To VQ3' seven directed edges;Remove ACL0To ACL2' and ACL2To ACL0Two sides;By directed edge O to ACL0’、 O to ACL2', O to VQ1', O to VQ2', O to VQ3' be added in network;
Step 2-5) final constructed symbol Bayesian network CEBN is a multitiered network being made of 26 nodes, include All code element informations, wherein 15 node GAIN0、GAIN1、GAIN2、GAIN3、POS0、POS1、POS2、POS3、MSBPOS、 PSIG0、PSIG1、PSIG2、PSIG3, ACL, GRID be isolated node.
4. the general information according to claim 3 towards multiclass low rate compression voice steganography hides detection method, It is characterized in that, the step 3) specifically includes:
Step 3-1) node in symbol Bayesian network CEBN is denoted as stochastic variable X respectively1,X2,…,X26, wherein X1It is corresponding Root node O, other stochastic variables correspond to child node;The value of stochastic variable is denoted as x respectively1,x2,…,x26, then the joint of network Probability distribution are as follows:
Wherein: Pa (Xi) indicate stochastic variable XiFather node;P(Xi|Pa(Xi)) indicate stochastic variable XiConditional probability;
Step 3-2) note stochastic variable XiShared KiA value, θijkIndicate XiTake its k-th of value, Pa (Xi) take its j-th of value When conditional probability, then θijkIt indicates are as follows:
θijk=P (Xi=xik|Pa(Xi)=Pa (Xi)j)
Step 3-3) using Bayesian network parameters study space time correlation network C EBN parameters θijkValue
Step 3-4) pass through training sample calculating threshold value Jthr
5. the general information according to claim 4 towards multiclass low rate compression voice steganography hides detection method, It is characterized in that, the step 3-3) it specifically includes:
Step 3-3-1) Bayesian network parameters study is using mode shown in following formula:
Wherein: π (θ) indicates prior distribution, and θ is parameter;χ indicates sample information;π (θ | χ) indicate Posterior distrbutionp, prior distribution π (θ) is Dirichlet distribution, then:
Wherein, Γ () is gamma function;αijk,1≤k≤KiFor hyper parameter, Dir () indicates Dirichlet distribution function;
Step 3-3-2) it is located in sample χ and meets Xi=xikAnd Pa (Xi)=Pa (Xi)jNumber be βijk, due to Posterior distrbutionp π (θ | χ) also obeys Dirichlet distribution, then π (θ | χ) is indicated are as follows:
Step 3-3-3) space time correlation network C EBN parameters θijkMAP estimationAre as follows:
6. the general information according to claim 5 towards multiclass low rate compression voice steganography hides detection method, It is characterized in that, the step 3-4) it specifically includes:
Step 3-4-1) assume to include M sections of voices in training sample, calculate steganography index of the every section of voice in non-steganography With steganography index in the case of steganography;Remember that steganography index of all training samples in non-steganography constitutes set JC={ Jc1, Jc2,…,JcM, the steganography index in steganography constitutes set JS={ Js1,Js2,…,JsM};
Step 3-4-2) threshold value JthrIt is obtained by following formula:
Wherein: CNT (JC:Jcl< Jthr) and CNT (JS:Jsl≥Jthr) respectively indicate non-steganography index JCIn meet Jcl< Jthr's Number and steganography index JSIn meet Jsl≥JthrNumber, 1≤l≤M.
7. the general information according to claim 6 towards multiclass low rate compression voice steganography hides detection method, It is characterized in that, the step 4) specifically includes:
Step 4-1) one section of sound bite comprising N frame is given, calculate the language for measuring the whole steganography degree of sound bite Sound steganography index J=Nstego/ N, NstegoIndicate the quantity for being judged as steganography frame in sound bite;
Classified using diagnostic reasoning from bottom to top to each speech frame, i.e., known child node parameter distribution, to calculate father The probability of node:
Work as x1=0 and x1When=1, speech frame is respectively indicated in stochastic variable Xi=xi, i=2, when 3 ..., 26 for non-steganography frame and The posterior probability of steganography frame;Work as x1=1 probability is greater than x1When=0 probability, it is believed that the frame steganography, otherwise it is assumed that the frame is not hidden It writes;Thus to obtain Nstego
Step 4-2) as J < JthrWhen, it is determined as non-steganography voice;As J >=JthrWhen, it is determined as steganography voice.
CN201810884205.9A 2018-08-06 2018-08-06 Multi-class low-rate compressed voice steganography-oriented general information hiding detection method Active CN109192217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810884205.9A CN109192217B (en) 2018-08-06 2018-08-06 Multi-class low-rate compressed voice steganography-oriented general information hiding detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810884205.9A CN109192217B (en) 2018-08-06 2018-08-06 Multi-class low-rate compressed voice steganography-oriented general information hiding detection method

Publications (2)

Publication Number Publication Date
CN109192217A true CN109192217A (en) 2019-01-11
CN109192217B CN109192217B (en) 2023-03-31

Family

ID=64920198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810884205.9A Active CN109192217B (en) 2018-08-06 2018-08-06 Multi-class low-rate compressed voice steganography-oriented general information hiding detection method

Country Status (1)

Country Link
CN (1) CN109192217B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120228A (en) * 2019-04-28 2019-08-13 武汉大学 Audio general steganalysis method and system based on sonograph and depth residual error network
CN110689897A (en) * 2019-10-09 2020-01-14 中国科学院声学研究所南海研究站 Information hiding and hidden information extraction method based on linear prediction speech coding

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659659A (en) * 1993-07-26 1997-08-19 Alaris, Inc. Speech compressor using trellis encoding and linear prediction
US20050010401A1 (en) * 2003-07-07 2005-01-13 Sung Ho Sang Speech restoration system and method for concealing packet losses
CN101431578A (en) * 2008-10-30 2009-05-13 南京大学 Information concealing method based on G.723.1 silence detection technology
CN101640802A (en) * 2009-08-28 2010-02-03 北京工业大学 Video inter-frame compression coding method based on macroblock features and statistical properties
CN102227767A (en) * 2008-11-12 2011-10-26 Scti控股公司 System and method for automatic speach to text conversion
WO2015015058A1 (en) * 2013-07-31 2015-02-05 Nokia Corporation Method and apparatus for video coding and decoding
US20150302543A1 (en) * 2014-01-31 2015-10-22 Digimarc Corporation Methods for encoding, decoding and interpreting auxiliary data in media signals
CN107610711A (en) * 2017-08-29 2018-01-19 中国民航大学 G.723.1 voice messaging steganalysis method based on quantization index modulation QIM
CN107910009A (en) * 2017-11-02 2018-04-13 中国科学院声学研究所 A kind of symbol based on Bayesian inference rewrites Information Hiding & Detecting method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659659A (en) * 1993-07-26 1997-08-19 Alaris, Inc. Speech compressor using trellis encoding and linear prediction
US20050010401A1 (en) * 2003-07-07 2005-01-13 Sung Ho Sang Speech restoration system and method for concealing packet losses
CN101431578A (en) * 2008-10-30 2009-05-13 南京大学 Information concealing method based on G.723.1 silence detection technology
CN102227767A (en) * 2008-11-12 2011-10-26 Scti控股公司 System and method for automatic speach to text conversion
CN101640802A (en) * 2009-08-28 2010-02-03 北京工业大学 Video inter-frame compression coding method based on macroblock features and statistical properties
WO2015015058A1 (en) * 2013-07-31 2015-02-05 Nokia Corporation Method and apparatus for video coding and decoding
US20150302543A1 (en) * 2014-01-31 2015-10-22 Digimarc Corporation Methods for encoding, decoding and interpreting auxiliary data in media signals
CN107610711A (en) * 2017-08-29 2018-01-19 中国民航大学 G.723.1 voice messaging steganalysis method based on quantization index modulation QIM
CN107910009A (en) * 2017-11-02 2018-04-13 中国科学院声学研究所 A kind of symbol based on Bayesian inference rewrites Information Hiding & Detecting method and system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JIE YANG ET AL.: "Steganalysis of joint codeword quantization index modulation steganography based on codeword Bayesian network", 《NEUROCOMPUTING》 *
WEI ZENG ET AL.: "An Algorithm of Echo Steganalysis based on Bayes Classifier", 《PROCEEDINGS OF THE 2008 IEEE》 *
李松斌等: "低速率语音码流中的码元替换信息隐藏检测", 《网 络 新 媒 体 技 术》 *
杨 洁等: "基于贝叶斯网络的压缩语音信息隐藏检测", 《计算机应用》 *
汪云路等: "基于统计特征的语音回声隐藏分析", 《数据采集与处理》 *
涂山山等: "基于半监督学习的即时语音通信隐藏检测", 《清华大学学报 ( 自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120228A (en) * 2019-04-28 2019-08-13 武汉大学 Audio general steganalysis method and system based on sonograph and depth residual error network
CN110689897A (en) * 2019-10-09 2020-01-14 中国科学院声学研究所南海研究站 Information hiding and hidden information extraction method based on linear prediction speech coding

Also Published As

Publication number Publication date
CN109192217B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US20220254350A1 (en) Method, apparatus and device for voiceprint recognition of original speech, and storage medium
CN101470897B (en) Sensitive film detection method based on audio/video amalgamation policy
CN104240256B (en) A kind of image significance detection method based on the sparse modeling of stratification
WO2016155047A1 (en) Method of recognizing sound event in auditory scene having low signal-to-noise ratio
CN109817233A (en) Speech stream steganalysis method and system based on hierarchical attention network model
CN110299142A (en) A kind of method for recognizing sound-groove and device based on the network integration
CN112735435A (en) Voiceprint open set identification method with unknown class internal division capability
Yang et al. Steganalysis of joint codeword quantization index modulation steganography based on codeword Bayesian network
CN107910009B (en) A method and system for the detection of code element rewriting information hiding based on Bayesian inference
Yang et al. A common method for detecting multiple steganographies in low-bit-rate compressed speech based on Bayesian inference
CN105869658A (en) Voice endpoint detection method employing nonlinear feature
CN109192217A (en) General information towards multiclass low rate compression voice steganography hides detection method
CN116721677A (en) Noise-containing voice emotion recognition method based on multitasking collaborative attention gating network
CN116884433A (en) Forgery speech detection method and system based on graph attention
Li et al. SANet: A compressed speech encoder and steganography algorithm independent steganalysis deep neural network
Büker et al. Deep convolutional neural networks for double compressed AMR audio detection
CN113743188B (en) Feature fusion-based internet video low-custom behavior detection method
CN117711421A (en) Two-stage voice separation method based on coordination simple attention mechanism
Sun et al. Steganalysis of adaptive multi-rate speech with unknown embedding rates using clustering and ensemble learning
Hanna et al. Audio Features for Noisy Sound Segmentation.
Li et al. Fdn: Finite difference network with hierarchical convolutional features for text-independent speaker verification
Hu et al. Speaker Recognition Algorithm Based on Fca-Res2Net.
CN114419731B (en) A lip reading recognition method and system based on phased cross-training
Abirami et al. Multimodal Cognitive Learning for Media Forgery Detection: A Comprehensive Framework Combining Random Forest and Deep Ensemble Architectures (Xception, ResNeXt) across Image, Video, and Audio Modalities
Geetha et al. An Optimized Hybrid Quantum Deep Neural Network Model For Quantum Audio Steganography and Steganalysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant