CN106055632B

CN106055632B - Video authentication method based on scene frame fingerprints

Info

Publication number: CN106055632B
Application number: CN201610367884.3A
Authority: CN
Inventors: 毛家发; 张明国; 钟丹虹; 高飞; 肖刚
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2016-05-27
Filing date: 2016-05-27
Publication date: 2019-06-14
Anticipated expiration: 2036-05-27
Also published as: CN106055632A

Abstract

The video authentication method based on scene frame fingerprint first extracts 5 consecutive different scene frame fingerprints in the video clip through the scene frame fingerprint determination method to form a video fingerprint. Then, the meta-fingerprint data is formed with the ID information of the video itself. The fingerprint information is stored in the form of Bag‑words, saving 75% of the storage space. In the search and authentication process, the inverted text halving technology is used to improve the matching speed. Through simulation experiments, the video authentication method we proposed has good detection performance, with an average accuracy of more than 98%. In the Matlab soft environment, the search and authentication speed for each video is about 12 seconds, which can achieve real-time detection in a network environment.

Description

Video authentication method based on scene frame fingerprint

Technical field

The invention belongs to video authentication technology fields, disclose one kind and carry out video authentication under new media environment, hit Pirate new method.

Background technique

Most current digital video works use data ciphering method, and digital video content is encrypted, is only awarded The key that power user can just be decrypted.However, data encryption technology faces the problem of being stolen in cipher key transmitting process, once It is stolen that digital video is unable to get protection.The appearance of digital watermark technology can solve the problem of key is lost.Digital watermarking Technology is that hidden label is embedded in digital content, is extracted and is matched by detection instrument, realizes copyright protection purpose.But Digital watermarking product is not strong in the intentional or unintentional attacking ability of resistance at present, and robustness is not firm, greatly restricts number The application development of word digital watermark.

Fingerprint technique can make up the deficiency of encryption technology and digital watermark technology.Video finger print, which refers to, can represent one section The digital signature of vision signal very important visual feature, main purpose are to establish a kind of effective mechanism to compare two video datas Perceived quality.Pay attention to video data itself directly relatively usually very not bigger here, it is corresponding usually smaller to compare it Digital finger-print.

What video finger print technology was particular about is accuracy, robustness, fingerprint size, granularity, certification speed and versatility.Accurately Property includes correct recognition rata, false alarm rate, false dismissed rate；Robustness refers to that unknown video can be subjected to more serious video frequency signal processing After still be able to be identified；Fingerprint size largely determines the inherent capacity of fingerprint database；Granularity be one according to The parameter of Lai Yu application, that is, need unknown video clips how long to identify whole video；The video of practical, commercial is referred to For line system, certification speed is a crucial parameter；Versatility is to refer to carry out recognition capability to different video format. Around these characteristics, numerous scholars set about in terms of the time-space domain of video, airspace, time domain and color space, expand video and refer to The research of line technology achieves gratifying research achievement.In recent years, fingerprint technique is in copyright authentication, copy monitoring, multimedia inspection Rope and tracing pirate etc. are widely used, and vast Study on Fingerprint person proposes many video fingerprinting algorithms, summarize Existing video fingerprinting algorithms can be summarized as 4 classes: color space (color-space-based), time domain (temporal), sky Domain (spatial) and time-space domain (spatio-temporal).

Color space fingerprint extraction method is dependent on the color histogram in video time-space domain.Utilize the color of video clips Statistical property carries out video finger print extraction.But the present vedio color overwhelming majority is 24 true color, statistical magnitude is excessively It is huge, hinder the speed of fingerprint extraction.And different video formats its color can generate apparent change, and it is still more colored Space fingerprint extraction is not applied for black and white video, therefore this method is not applied widely.

Time domain fingerprint extraction method mainly from video sequence from extract time domain specification.This method needs longer video Sequence is not suitable for video clips in short-term.But short-time video is fairly common on webpage now, therefore time domain fingerprint is not It is adapted to online (online) application.

Airspace fingerprint method is to extract feature from each frame or key frame, these methods are similar to finger image method. Airspace fingerprint is divided into global fingerprint and local fingerprint again, and global fingerprint forgives global property, such as image histogram statistical property. The main local feature for extracting image of local fingerprint, such as the partial interest point in frame image, these points of interest are usually answered Use the target retrieval in multimedia.But extract point of interest and need to pre-process image, and video frame enormous amount, This will expend a large amount of calculator memory, therefore this fingerprint extraction method is rarely applied to video field.

Time-space domain fingerprint has forgiven the time domain and spatial information (si) of video, therefore time-space domain fingerprint performance is better than time domain and sky Domain.Mainly there are 3D-DCT, TIRI-DCT, 3D-STIP currently based on time-space domain fingerprint extraction method.These comprehensive video finger prints are calculated Method, they can be reasonably resistant to some common attacks to a certain extent, such as resolution ratio reduce, frame per second reduce, plus make an uproar, Brightness change, contrast change etc., but their certification energy to recodification, reacquisition plus the attacks such as Logo/Text, picture-in-picture Power is limited.

Summary of the invention

The present invention will overcome the disadvantages mentioned above of the prior art, provide a kind of video authentication method based on scene frame fingerprint.

Video authentication method of the present invention based on scene frame fingerprint, comprising the following steps:

1), to the pretreatment of the frame of video；

(1.1) color space conversion is carried out to the color framing in video, takes its luminance component, obtains gray level image；

(1.2) video frame surrounding is sheared, video frame central part is retained；It is scaled to again with fixed dimension (W × H picture Element)；

(1.3) video frame is filtered with 3 × 3 sizes, the Gaussian low-pass filtering that standard deviation is 0.95；

(1.4) by image scaling at 3/4QCIF size (QCIF (144 × 176 pixel)).

2), fingerprint extraction is carried out to by pretreated video frame, comprising the following steps:

(2.1) to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 region, a to h is local pixel Be averaged；So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion；(2) four difference element a-b, C-d, e-f and g-h；720 frame elements are always obtained, wherein 144 mean value elements, are denoted as element A, 576 difference elements are denoted as D element；

(2.2) four weight values are quantized into element A；For the element A of 1-144 dimension, if A_iFor element A value, using formula (1) These element As are quantized into four weight values x_i:

(2.3) threshold value ThA is dynamically sought, including the following steps:

(2.3.1) takes a_i=abs (A_i- 128), abs () is the operator that takes absolute value, by a_iA is arranged in by ascending order_k= {a₁,a₂,…,a_k,…,a_N}；Here index i and index k be not identical；

(2.3.2) threshold value ThA=a_k, k=floor (0.25*N), N=144, floor are to be rounded downwards here；

(2.4) to D Quantification of elements at four weight values；For the D element D of 145-720 dimension_i, they are quantified using formula (2) At four weight values x_i:

(2.5) threshold value ThD is dynamically sought, including the following steps:

(2.5.1) takes d_i=abs (D_i), by d_iD is arranged in by ascending order_k={ d₁,d₂,…,d_k,…,d_N}；Here index i It is not identical as index k；

(2.5.2) threshold value ThD=d_k, k=floor (0.25*N), N=576, floor are to be rounded downwards here；

(2.6) the 4 heavy element X={ x extracted are stored with binary coded form₁,x₂,…,x₇₂₀}

If word_i, i=1,2 ..., 180, which are defined as every 4- dimension element, accounts for 1 coding unit, and this coding mode is using such as Lower formula is calculated:

word_i=4³*x_(i-1)*4+1+4²*x_(i-1)*4+2+4*x_(i-1)*4+3+x_(i-1)*4+4 (3)

(2.7) extraction algorithm of scene frame fingerprint, comprising the following steps:

(2.7.1) whether be blank screen judgement；Applying equation (4) carries out blank screen judgement；

mean(F)<Th_BS (4)

Mean (F) is the mean value for indicating image pixel, Th_BSIt is blank screen threshold value；

(2.7,2) whether be scene frame judgement；Assuming that the fingerprint of previous scenario frame is SF_i-1, the fingerprint of present frame is F_i, i=2 ..., 5；If (5) set up, decide that present frame is another scene frame, otherwise present frame or previous scenario Frame；

d(SF_i-1,F_i)≥Th_SF, i=2 ..., 5 (5)

Here d (SF_i-1,F_i) indicate present frame fingerprint F_iPrevious scenario frame fingerprint SF_i-1The distance between, Th_SFTo determine Threshold value；

3) foundation in video finger print library；The user information, product information and finger print information of copyright authentication video will be needed to tie up It is scheduled on a record, generates metadata (meta data), collection of metadata constitutes metadatabase, it is advised by the row's of pushing down text It is then ranked up and stores；

4) our fingerprint feature: four weight values (Quaternion value) is combined, the invention proposes the row's of falling texts to reduce by half It searches for matching algorithm (inverted file&binary-based Search Matching), its step are as follows:

(4.1) 3600 dimension fingerprint vectors are combined into 900 word, as Bag-Words, each word value by formula (3) Range is 0-255；

(4.2) the literary queue of the row of falling is established；Each video finger print is sequentially inserted into down from small to large by first word size It arranges in literary queue, such as first word is identical, that is so continued on, by the value ascending order arrangement of second word until all Original video fingerprint be inserted into down in the literary queue of row；First fingerprint is constituted with the video finger print and video information of the literary rule compositor of the row of falling Database；

(4.3) Binary searches matching process；Assuming that the Bag-Words sequence of uncertified video fingerprint is AuBW_i, i=1, 2,…,900；Specific compromise search step is as follows:

(4.3.1): the record in all metadatabases is put on and does not look into label；

(4.3.2): taking its first word is AuBW₁, AuBW is searched in compromise in the literary queue of the row of falling₁, the result of lookup can Three kinds of situations can be will appear:

A1) there was only a record；The Bag-Words in the record is so reduced into four weight values fingerprint MeF_i, reduction side Method is that each word removes 4 remainders；It is asked to normalize Hamming distance from d by formula (6):

Here i=1,2 ..., L, L are fingerprint length, and AuF is the fingerprint for authenticating video；Then it is asked by (7) formula Value；

As T=0, poll-final shows that the corresponding video of this yuan record is exactly the video for needing to authenticate；Work as T=1 When, the position for writing down the metadata and Hamming distance by the record from and putting on and looked into label；As T=2, only by the record It puts on and has looked into label；

A2) there is a plurality of record；The Hamming distances of all these records is calculated from while marking these records by (6) formula On looked into label；Take minimum Hamming distance from by (7) formula progress evaluation, as T=0, poll-final shows that this yuan records institute Corresponding video is exactly the video for needing to authenticate；The position for writing down the metadata as T=1 and Hamming distance are from working as T=2 When, with no treatment, it is directly entered in next step；

A3 it) does not record；With no treatment, it is directly entered in next step；

(4.3.3): taking its i-th of word is AuBW_i, i=2,3 ..., K；AuBW is searched in compromise in the literary queue of the row of falling_i, The result of lookup is it is possible that four kinds of situations；It should be noted that K here is a unknown number, but centainly meet K≤L/m；m For the length of word, m=4 herein；

B1 several) indicate the record for having looked into label；Such case is directly entered in next step；

B2) only have one and do not indicate the record for having looked into label；Such case is pressed and the processing of A1 in (4.3.2)) situation；

B3) there is a plurality of record for not indicating and having looked into label；Such case is pressed and the processing of A2 in (4.3.2)) situation；

B4 it) does not record；In this case by A3 in (4.3.2)) situation processing；

Repeat (4.3.3), until occur T=0 or all record all put on looked into label until；

(4.3.4): if the first two step is that T=0 situation do not occur, only two kinds of situations occur:

C1) at least one record meets T=1；Such case takes the smallest Hamming distance to record from that member, this The video that member record exactly needs to authenticate；Poll-final；

C2) meet T=1 without a record；Such case shows that the video of certification not in metadatabase, issues refusal Information；Poll-final.

Poll-final.

The invention has the advantages that

A. select the intermediate region of video frame as the object to take the fingerprint, this is characterized with the finger print using the mankind The theory of different people it is consistent, while doing so the data operation quantity that can reduce fingerprint extraction process, improve fingerprint Extraction rate.

B. we characterize the difference in video frame region using four weight values, more smart than with two-value Hash, three weight values characterization Carefully, more rationally, to also improve certification discrimination.

C. we store fingerprint metadata library using Bag-words form, save 75% memory space.

D. using the literary binary search algorithm of the row of falling, lookup matching speed is improved.

Detailed description of the invention

Fig. 1 is image block schematic diagram of the invention.

Fig. 2 is that video finger print of the invention extracts flow chart schematic diagram.

Fig. 3 is that the present invention works as Th_SFWhen=0.426, five in video display " 28 Weeks Later " segment are continuous different Scene frame.

Fig. 4 aTh_SFAcquired five different scenes frame when=0.40.Fig. 4 b is Th_SFIt is acquired when=0.412 Five different scenes frames.Fig. 4 c is Th_SFAcquired five different scenes frame when=0.44.Fig. 4 d is Th_SFWhen=0.452 Acquired five different scenes frame.

Fig. 5 is that video finger print of the invention matches architecture diagram.

Specific embodiment

The present invention is further illustrated with reference to the accompanying drawing.

Video authentication method based on scene frame fingerprint of the invention, comprising the following steps:

1), to the pretreatment of the frame of video；

(1.4) by image scaling at 3/4QCIF size (QCIF (144 × 176 pixel)).

2), to by pretreated video frame carry out fingerprint extraction, process as shown in Fig. 2 in Figure of description, including Following steps:

(2.1) as shown in Figure of description 1, to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 area In domain, a to h is being averaged for local pixel；So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion； (2) four difference elements a-b, c-d, e-f and g-h；720 frame elements are always obtained, wherein 144 mean value elements, are denoted as A member Element, 576 difference elements, is denoted as D element；

(2.3) threshold value ThA is dynamically sought, including the following steps:

(2.3.1) takes a_i=abs (A_i- 128), abs () is the operator that takes absolute value, by a_iA is arranged in by ascending order_k={ a₁, a₂,…,a_k,…,a_N}；Here index i and index k be not identical；

(2.5) threshold value ThD is dynamically sought, including the following steps:

word_i=4³*x_(i-1)*4+1+4²*x_(i-1)*4+2+4*x_(i-1)*4+3+x_(i-1)*4+4 (3)

mean(F)<Th_BS (4)

d(SF_i-1,F_i)≥Th_SF, i=2 ..., 5 (5)

It is to work as Th as shown in attached drawing 3 in specification_SFWhen=0.426, five in video display " 28 Weeks Later " segment Continuous different scene frame.When taking different decision thresholds, the differentiation of scene frame difference, as shown in Figure of description 4, Fig. 4 a is as threshold value Th_SFAcquired five different scenes frame when=0.40.Fig. 4 b is as threshold value Th_SFInstitute when=0.412 The five different scenes frames obtained.Fig. 4 c is as threshold value Th_SFAcquired five different scenes frame when=0.44.Fig. 4 d is to work as Threshold value Th_SFAcquired five different scenes frame when=0.452.

3) foundation in video finger print library；The user information, product information and finger print information of copyright authentication video will be needed to tie up It is scheduled on a record, generates metadata (meta data), collection of metadata constitutes metadatabase, it is advised by the row's of pushing down text It is then ranked up and stores, the Meta Fingerprint Database in Figure of description 5 is that the video that we are established refers to Line library；

4) our fingerprint feature: four weight values (Quaternion value) is combined, the invention proposes the row's of falling texts to reduce by half It searches for matching algorithm (inverted file&binary-based Search Matching), if Figure of description 5 is video Fingerprint matching architecture diagram, the figure illustrate macroscopical matching process of fingerprint matching, and its step are as follows:

(4.3.3): taking its i-th of word is AuBW_i, i=2,3 ..., K；AuBW is searched in compromise in the literary queue of the row of falling_i, The result of lookup is it is possible that four kinds of situations；It should be noted that K here is a unknown number, but centainly meet K≤L/m； M is the length of word, herein middle m=4；

B4 it) does not record；In this case by A3 in (4.3.2)) situation processing；

Claims

1. A video authentication method based on scene frame fingerprints, including the following steps:

1), the preprocessing of the frame of the video;

(1.1) Perform color space conversion on the color frame in the video, take its luminance component, and obtain a grayscale image;

(1.2) Cut around the video frame, keep the center part of the video frame; rescale to a fixed size with W×H pixel size;

(1.3) Filter the video frames with Gaussian low-pass filtering with a size of 3×3 and a standard deviation of 0.95;

(1.4) Scale the image to 3/4QCIF size, and QCIF is an image with a size of 144 pixels × 176 pixels;

2), perform fingerprint extraction on the preprocessed video frame, including the following steps:

(2.1) Divide the preprocessed video frame into blocks. In a 9×11 area, a to h are the average of local pixels; then the frame element extraction method is: (1) The whole 9×11 sub-area Mean element; (2) Four differential elements a-b, c-d, e-f and g-h; a total of 720 frame elements are obtained, of which 144 mean elements are denoted as A elements, and 576 differential elements are denoted as D elements;

(2.2) A element is quantized into a quadruple value; for the A element of 1-144 dimensions, let A _i be the A element value, and formula (1) is used to quantize these A elements into a quadruple value x _i :

(2.3) Dynamically obtain the threshold ThA, including the following steps:

(2.3.1) Take a _i =abs(A _i -128), abs(·) is the absolute value operator, and arrange a _i in ascending order into a _k ={a ₁ ,a ₂ ,..., _ak , ...,a _N }; here the index i is not the same as the index k;

(2.3.2) Threshold ThA= _ak , where k=floor(0.25*N), N=144, floor is rounded down;

(2.4) Quantize D elements into quartet values; for D elements D _i of dimensions 145-720, apply formula (2) to quantize them into quartet values _xi :

(2.5) Dynamically obtain the threshold ThA, including the following steps:

(2.5.1) Take d _i =abs(D _i ), and arrange d _i in ascending order into d _k ={d ₁ ,d ₂ ,...,d _k ,...,d _N }; here index i and index k Are not the same;

(2.5.2) Threshold ThD=d _k , where k=floor(0.25*N), N=576, floor is rounded down;

(2.6) Store the extracted quadruple element X={x ₁ ,x ₂ ,...,x ₇₂₀ } in binary coding form

Suppose word _i , i=1,2,...,180 is defined as one coding unit per 4-dimensional element. This coding method is calculated by the following formula:

word _i = 4 ³ *x _(i-1)*4+1 +4 ² *x _(i-1)*4+2 +4*x _(i-1)*4+3 +x _{(i-1) *4+4} (3)

(2.7) The extraction algorithm of scene frame fingerprint, including the following steps:

(2.7.1) Judgment of whether it is a black screen; use formula (4) to judge the black screen;

mean(F) < Th _BS (4)

mean(F) is the mean value representing the image pixels, Th _BS is the black screen threshold;

(2.7.2) Judgment of whether it is a scene frame; Suppose the fingerprint of the previous scene frame is SF _i-1 , and the fingerprint of the current frame is F _i , i=2,...,5; if the formula (5) holds, then Determine that the current frame is another scene frame, otherwise the current frame is the previous scene frame;

d(SF _i-1 ,F _i )≥Th _SF ,i=2,...,5 (5)

Here d(SF _i _-1 , Fi ) represents the distance between the current frame fingerprint Fi and the previous scene frame fingerprint SF _i _-1 , and Th _SF is the judgment threshold;

3) The establishment of a video fingerprint database; bind the user information, product information and fingerprint information of the video that needs copyright authentication to a record, generate metadata meta data, and the metadata collection constitutes a metadata database, which is arranged according to the rules of inverted text. sort and store;

4) Combined with fingerprint characteristics: Quaternion value, an inverted file & binary-based Search Matching algorithm is proposed. The steps are as follows:

(4.1) Combine the 3600-dimensional fingerprint vector into 900 words according to formula (3), namely Bag-Words, and the value of each word ranges from 0 to 255;

(4.2) Establish an inverted text queue; each video fingerprint is inserted into the inverted text queue according to the size of the first word from small to large. If the first word is the same, it is sorted in ascending order by the value of the second word, so Continue until all the original video fingerprints are inserted into the inverted text queue; the video fingerprints and video information sorted by the inverted text rules constitute a meta-fingerprint database;

(4.3) Half-fold search and matching method; Assume that the Bag-Words sequence of the video fingerprint to be authenticated is AuBW _i , i=1,2,...,900; the specific compromise search steps are as follows:

(4.3.1): mark all records in the metadata database with unchecked marks;

(4.3.2): Take the first word as AuBW ₁ , and search for AuBW ₁ in the inverted text queue. There may be three situations in the search result:

A1) There is only one record; then the Bag-Words in the record are restored to quadruple-valued fingerprint MeF _i , and the restoration method is to divide each word by 4 and take the remainder; according to formula (6), find its normalized Hamming distance d:

Here i=1,2,...,L, L is the length of the fingerprint, and AuF is the fingerprint of the authentication video; then evaluate according to formula (7);

When T=0, the query ends, indicating that the video corresponding to the record is the video that needs to be authenticated; when T=1, record the location and Hamming distance of the record, and mark the record as checked; when T=1 When = 2, only mark the record as checked;

A2) There are multiple records; calculate the Hamming distance of all these records according to formula (6), and mark these records with checked marks; take the minimum Hamming distance, and evaluate according to formula (7), when T=0 , the query ends, indicating that the video corresponding to the record is the video that needs to be authenticated; when T=1, write down the location of the record and the Hamming distance, when T=2, do not do any processing, and go directly to the next step;

A3) There is no record; do not do any processing, go directly to the next step;

(4.3.3): Take the i-th word as AuBW _i , i=2,3,...,K; search for AuBW _i in the inverted text queue, and the result of the search may have four situations; need to pay attention to The point is that K here is an unknown number, but it must satisfy K≤L/m; here m=4;

B1) There are several records marked with checked marks; in this case, go directly to the next step;

B2) There is only one record that is not marked with the checked mark; this case is handled as in the case of A1) in (4.3.2);

B3) There are multiple records that are not marked with the checked mark; this situation shall be handled as in the case of A2) in (4.3.2);

B4) There is no record; in this case, it is handled as in A3) in (4.3.2);

Repeat (4.3.3) until T=0 or all records are marked as checked;

(4.3.4): If there is no T=0 situation in the first two steps, then there are only two situations:

C1) At least one record satisfies T=1; in this case, take the meta record with the smallest Hamming distance, and this meta record is the video that needs to be authenticated; the query ends;

C2) No record satisfies T=1; this situation indicates that the authenticated video is not in the metadata database, and a rejection message is issued; the query ends.