Video authentication method based on scene frame fingerprint
Technical field
The invention belongs to video authentication technology fields, disclose one kind and carry out video authentication under new media environment, hit
Pirate new method.
Background technique
Most current digital video works use data ciphering method, and digital video content is encrypted, is only awarded
The key that power user can just be decrypted.However, data encryption technology faces the problem of being stolen in cipher key transmitting process, once
It is stolen that digital video is unable to get protection.The appearance of digital watermark technology can solve the problem of key is lost.Digital watermarking
Technology is that hidden label is embedded in digital content, is extracted and is matched by detection instrument, realizes copyright protection purpose.But
Digital watermarking product is not strong in the intentional or unintentional attacking ability of resistance at present, and robustness is not firm, greatly restricts number
The application development of word digital watermark.
Fingerprint technique can make up the deficiency of encryption technology and digital watermark technology.Video finger print, which refers to, can represent one section
The digital signature of vision signal very important visual feature, main purpose are to establish a kind of effective mechanism to compare two video datas
Perceived quality.Pay attention to video data itself directly relatively usually very not bigger here, it is corresponding usually smaller to compare it
Digital finger-print.
What video finger print technology was particular about is accuracy, robustness, fingerprint size, granularity, certification speed and versatility.Accurately
Property includes correct recognition rata, false alarm rate, false dismissed rate;Robustness refers to that unknown video can be subjected to more serious video frequency signal processing
After still be able to be identified;Fingerprint size largely determines the inherent capacity of fingerprint database;Granularity be one according to
The parameter of Lai Yu application, that is, need unknown video clips how long to identify whole video;The video of practical, commercial is referred to
For line system, certification speed is a crucial parameter;Versatility is to refer to carry out recognition capability to different video format.
Around these characteristics, numerous scholars set about in terms of the time-space domain of video, airspace, time domain and color space, expand video and refer to
The research of line technology achieves gratifying research achievement.In recent years, fingerprint technique is in copyright authentication, copy monitoring, multimedia inspection
Rope and tracing pirate etc. are widely used, and vast Study on Fingerprint person proposes many video fingerprinting algorithms, summarize
Existing video fingerprinting algorithms can be summarized as 4 classes: color space (color-space-based), time domain (temporal), sky
Domain (spatial) and time-space domain (spatio-temporal).
Color space fingerprint extraction method is dependent on the color histogram in video time-space domain.Utilize the color of video clips
Statistical property carries out video finger print extraction.But the present vedio color overwhelming majority is 24 true color, statistical magnitude is excessively
It is huge, hinder the speed of fingerprint extraction.And different video formats its color can generate apparent change, and it is still more colored
Space fingerprint extraction is not applied for black and white video, therefore this method is not applied widely.
Time domain fingerprint extraction method mainly from video sequence from extract time domain specification.This method needs longer video
Sequence is not suitable for video clips in short-term.But short-time video is fairly common on webpage now, therefore time domain fingerprint is not
It is adapted to online (online) application.
Airspace fingerprint method is to extract feature from each frame or key frame, these methods are similar to finger image method.
Airspace fingerprint is divided into global fingerprint and local fingerprint again, and global fingerprint forgives global property, such as image histogram statistical property.
The main local feature for extracting image of local fingerprint, such as the partial interest point in frame image, these points of interest are usually answered
Use the target retrieval in multimedia.But extract point of interest and need to pre-process image, and video frame enormous amount,
This will expend a large amount of calculator memory, therefore this fingerprint extraction method is rarely applied to video field.
Time-space domain fingerprint has forgiven the time domain and spatial information (si) of video, therefore time-space domain fingerprint performance is better than time domain and sky
Domain.Mainly there are 3D-DCT, TIRI-DCT, 3D-STIP currently based on time-space domain fingerprint extraction method.These comprehensive video finger prints are calculated
Method, they can be reasonably resistant to some common attacks to a certain extent, such as resolution ratio reduce, frame per second reduce, plus make an uproar,
Brightness change, contrast change etc., but their certification energy to recodification, reacquisition plus the attacks such as Logo/Text, picture-in-picture
Power is limited.
Summary of the invention
The present invention will overcome the disadvantages mentioned above of the prior art, provide a kind of video authentication method based on scene frame fingerprint.
Video authentication method of the present invention based on scene frame fingerprint, comprising the following steps:
1), to the pretreatment of the frame of video;
(1.1) color space conversion is carried out to the color framing in video, takes its luminance component, obtains gray level image;
(1.2) video frame surrounding is sheared, video frame central part is retained;It is scaled to again with fixed dimension (W × H picture
Element);
(1.3) video frame is filtered with 3 × 3 sizes, the Gaussian low-pass filtering that standard deviation is 0.95;
(1.4) by image scaling at 3/4QCIF size (QCIF (144 × 176 pixel)).
2), fingerprint extraction is carried out to by pretreated video frame, comprising the following steps:
(2.1) to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 region, a to h is local pixel
Be averaged;So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion;(2) four difference element a-b,
C-d, e-f and g-h;720 frame elements are always obtained, wherein 144 mean value elements, are denoted as element A, 576 difference elements are denoted as
D element;
(2.2) four weight values are quantized into element A;For the element A of 1-144 dimension, if AiFor element A value, using formula (1)
These element As are quantized into four weight values xi:
(2.3) threshold value ThA is dynamically sought, including the following steps:
(2.3.1) takes ai=abs (Ai- 128), abs () is the operator that takes absolute value, by aiA is arranged in by ascending orderk=
{a1,a2,…,ak,…,aN};Here index i and index k be not identical;
(2.3.2) threshold value ThA=ak, k=floor (0.25*N), N=144, floor are to be rounded downwards here;
(2.4) to D Quantification of elements at four weight values;For the D element D of 145-720 dimensioni, they are quantified using formula (2)
At four weight values xi:
(2.5) threshold value ThD is dynamically sought, including the following steps:
(2.5.1) takes di=abs (Di), by diD is arranged in by ascending orderk={ d1,d2,…,dk,…,dN};Here index i
It is not identical as index k;
(2.5.2) threshold value ThD=dk, k=floor (0.25*N), N=576, floor are to be rounded downwards here;
(2.6) the 4 heavy element X={ x extracted are stored with binary coded form1,x2,…,x720}
If wordi, i=1,2 ..., 180, which are defined as every 4- dimension element, accounts for 1 coding unit, and this coding mode is using such as
Lower formula is calculated:
wordi=43*x(i-1)*4+1+42*x(i-1)*4+2+4*x(i-1)*4+3+x(i-1)*4+4 (3)
(2.7) extraction algorithm of scene frame fingerprint, comprising the following steps:
(2.7.1) whether be blank screen judgement;Applying equation (4) carries out blank screen judgement;
mean(F)<ThBS (4)
Mean (F) is the mean value for indicating image pixel, ThBSIt is blank screen threshold value;
(2.7,2) whether be scene frame judgement;Assuming that the fingerprint of previous scenario frame is SFi-1, the fingerprint of present frame is
Fi, i=2 ..., 5;If (5) set up, decide that present frame is another scene frame, otherwise present frame or previous scenario
Frame;
d(SFi-1,Fi)≥ThSF, i=2 ..., 5 (5)
Here d (SFi-1,Fi) indicate present frame fingerprint FiPrevious scenario frame fingerprint SFi-1The distance between, ThSFTo determine
Threshold value;
3) foundation in video finger print library;The user information, product information and finger print information of copyright authentication video will be needed to tie up
It is scheduled on a record, generates metadata (meta data), collection of metadata constitutes metadatabase, it is advised by the row's of pushing down text
It is then ranked up and stores;
4) our fingerprint feature: four weight values (Quaternion value) is combined, the invention proposes the row's of falling texts to reduce by half
It searches for matching algorithm (inverted file&binary-based Search Matching), its step are as follows:
(4.1) 3600 dimension fingerprint vectors are combined into 900 word, as Bag-Words, each word value by formula (3)
Range is 0-255;
(4.2) the literary queue of the row of falling is established;Each video finger print is sequentially inserted into down from small to large by first word size
It arranges in literary queue, such as first word is identical, that is so continued on, by the value ascending order arrangement of second word until all
Original video fingerprint be inserted into down in the literary queue of row;First fingerprint is constituted with the video finger print and video information of the literary rule compositor of the row of falling
Database;
(4.3) Binary searches matching process;Assuming that the Bag-Words sequence of uncertified video fingerprint is AuBWi, i=1,
2,…,900;Specific compromise search step is as follows:
(4.3.1): the record in all metadatabases is put on and does not look into label;
(4.3.2): taking its first word is AuBW1, AuBW is searched in compromise in the literary queue of the row of falling1, the result of lookup can
Three kinds of situations can be will appear:
A1) there was only a record;The Bag-Words in the record is so reduced into four weight values fingerprint MeFi, reduction side
Method is that each word removes 4 remainders;It is asked to normalize Hamming distance from d by formula (6):
Here i=1,2 ..., L, L are fingerprint length, and AuF is the fingerprint for authenticating video;Then it is asked by (7) formula
Value;
As T=0, poll-final shows that the corresponding video of this yuan record is exactly the video for needing to authenticate;Work as T=1
When, the position for writing down the metadata and Hamming distance by the record from and putting on and looked into label;As T=2, only by the record
It puts on and has looked into label;
A2) there is a plurality of record;The Hamming distances of all these records is calculated from while marking these records by (6) formula
On looked into label;Take minimum Hamming distance from by (7) formula progress evaluation, as T=0, poll-final shows that this yuan records institute
Corresponding video is exactly the video for needing to authenticate;The position for writing down the metadata as T=1 and Hamming distance are from working as T=2
When, with no treatment, it is directly entered in next step;
A3 it) does not record;With no treatment, it is directly entered in next step;
(4.3.3): taking its i-th of word is AuBWi, i=2,3 ..., K;AuBW is searched in compromise in the literary queue of the row of fallingi,
The result of lookup is it is possible that four kinds of situations;It should be noted that K here is a unknown number, but centainly meet K≤L/m;m
For the length of word, m=4 herein;
B1 several) indicate the record for having looked into label;Such case is directly entered in next step;
B2) only have one and do not indicate the record for having looked into label;Such case is pressed and the processing of A1 in (4.3.2)) situation;
B3) there is a plurality of record for not indicating and having looked into label;Such case is pressed and the processing of A2 in (4.3.2)) situation;
B4 it) does not record;In this case by A3 in (4.3.2)) situation processing;
Repeat (4.3.3), until occur T=0 or all record all put on looked into label until;
(4.3.4): if the first two step is that T=0 situation do not occur, only two kinds of situations occur:
C1) at least one record meets T=1;Such case takes the smallest Hamming distance to record from that member, this
The video that member record exactly needs to authenticate;Poll-final;
C2) meet T=1 without a record;Such case shows that the video of certification not in metadatabase, issues refusal
Information;Poll-final.
Poll-final.
The invention has the advantages that
A. select the intermediate region of video frame as the object to take the fingerprint, this is characterized with the finger print using the mankind
The theory of different people it is consistent, while doing so the data operation quantity that can reduce fingerprint extraction process, improve fingerprint
Extraction rate.
B. we characterize the difference in video frame region using four weight values, more smart than with two-value Hash, three weight values characterization
Carefully, more rationally, to also improve certification discrimination.
C. we store fingerprint metadata library using Bag-words form, save 75% memory space.
D. using the literary binary search algorithm of the row of falling, lookup matching speed is improved.
Detailed description of the invention
Fig. 1 is image block schematic diagram of the invention.
Fig. 2 is that video finger print of the invention extracts flow chart schematic diagram.
Fig. 3 is that the present invention works as ThSFWhen=0.426, five in video display " 28 Weeks Later " segment are continuous different
Scene frame.
Fig. 4 aThSFAcquired five different scenes frame when=0.40.Fig. 4 b is ThSFIt is acquired when=0.412
Five different scenes frames.Fig. 4 c is ThSFAcquired five different scenes frame when=0.44.Fig. 4 d is ThSFWhen=0.452
Acquired five different scenes frame.
Fig. 5 is that video finger print of the invention matches architecture diagram.
Specific embodiment
The present invention is further illustrated with reference to the accompanying drawing.
Video authentication method based on scene frame fingerprint of the invention, comprising the following steps:
1), to the pretreatment of the frame of video;
(1.1) color space conversion is carried out to the color framing in video, takes its luminance component, obtains gray level image;
(1.2) video frame surrounding is sheared, video frame central part is retained;It is scaled to again with fixed dimension (W × H picture
Element);
(1.3) video frame is filtered with 3 × 3 sizes, the Gaussian low-pass filtering that standard deviation is 0.95;
(1.4) by image scaling at 3/4QCIF size (QCIF (144 × 176 pixel)).
2), to by pretreated video frame carry out fingerprint extraction, process as shown in Fig. 2 in Figure of description, including
Following steps:
(2.1) as shown in Figure of description 1, to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 area
In domain, a to h is being averaged for local pixel;So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion;
(2) four difference elements a-b, c-d, e-f and g-h;720 frame elements are always obtained, wherein 144 mean value elements, are denoted as A member
Element, 576 difference elements, is denoted as D element;
(2.2) four weight values are quantized into element A;For the element A of 1-144 dimension, if AiFor element A value, using formula (1)
These element As are quantized into four weight values xi:
(2.3) threshold value ThA is dynamically sought, including the following steps:
(2.3.1) takes ai=abs (Ai- 128), abs () is the operator that takes absolute value, by aiA is arranged in by ascending orderk={ a1,
a2,…,ak,…,aN};Here index i and index k be not identical;
(2.3.2) threshold value ThA=ak, k=floor (0.25*N), N=144, floor are to be rounded downwards here;
(2.4) to D Quantification of elements at four weight values;For the D element D of 145-720 dimensioni, they are quantified using formula (2)
At four weight values xi:
(2.5) threshold value ThD is dynamically sought, including the following steps:
(2.5.1) takes di=abs (Di), by diD is arranged in by ascending orderk={ d1,d2,…,dk,…,dN};Here index i
It is not identical as index k;
(2.5.2) threshold value ThD=dk, k=floor (0.25*N), N=576, floor are to be rounded downwards here;
(2.6) the 4 heavy element X={ x extracted are stored with binary coded form1,x2,…,x720}
If wordi, i=1,2 ..., 180, which are defined as every 4- dimension element, accounts for 1 coding unit, and this coding mode is using such as
Lower formula is calculated:
wordi=43*x(i-1)*4+1+42*x(i-1)*4+2+4*x(i-1)*4+3+x(i-1)*4+4 (3)
(2.7) extraction algorithm of scene frame fingerprint, comprising the following steps:
(2.7.1) whether be blank screen judgement;Applying equation (4) carries out blank screen judgement;
mean(F)<ThBS (4)
Mean (F) is the mean value for indicating image pixel, ThBSIt is blank screen threshold value;
(2.7,2) whether be scene frame judgement;Assuming that the fingerprint of previous scenario frame is SFi-1, the fingerprint of present frame is
Fi, i=2 ..., 5;If (5) set up, decide that present frame is another scene frame, otherwise present frame or previous scenario
Frame;
d(SFi-1,Fi)≥ThSF, i=2 ..., 5 (5)
Here d (SFi-1,Fi) indicate present frame fingerprint FiPrevious scenario frame fingerprint SFi-1The distance between, ThSFTo determine
Threshold value;
It is to work as Th as shown in attached drawing 3 in specificationSFWhen=0.426, five in video display " 28 Weeks Later " segment
Continuous different scene frame.When taking different decision thresholds, the differentiation of scene frame difference, as shown in Figure of description 4,
Fig. 4 a is as threshold value ThSFAcquired five different scenes frame when=0.40.Fig. 4 b is as threshold value ThSFInstitute when=0.412
The five different scenes frames obtained.Fig. 4 c is as threshold value ThSFAcquired five different scenes frame when=0.44.Fig. 4 d is to work as
Threshold value ThSFAcquired five different scenes frame when=0.452.
3) foundation in video finger print library;The user information, product information and finger print information of copyright authentication video will be needed to tie up
It is scheduled on a record, generates metadata (meta data), collection of metadata constitutes metadatabase, it is advised by the row's of pushing down text
It is then ranked up and stores, the Meta Fingerprint Database in Figure of description 5 is that the video that we are established refers to
Line library;
4) our fingerprint feature: four weight values (Quaternion value) is combined, the invention proposes the row's of falling texts to reduce by half
It searches for matching algorithm (inverted file&binary-based Search Matching), if Figure of description 5 is video
Fingerprint matching architecture diagram, the figure illustrate macroscopical matching process of fingerprint matching, and its step are as follows:
(4.1) 3600 dimension fingerprint vectors are combined into 900 word, as Bag-Words, each word value by formula (3)
Range is 0-255;
(4.2) the literary queue of the row of falling is established;Each video finger print is sequentially inserted into down from small to large by first word size
It arranges in literary queue, such as first word is identical, that is so continued on, by the value ascending order arrangement of second word until all
Original video fingerprint be inserted into down in the literary queue of row;First fingerprint is constituted with the video finger print and video information of the literary rule compositor of the row of falling
Database;
(4.3) Binary searches matching process;Assuming that the Bag-Words sequence of uncertified video fingerprint is AuBWi, i=1,
2,…,900;Specific compromise search step is as follows:
(4.3.1): the record in all metadatabases is put on and does not look into label;
(4.3.2): taking its first word is AuBW1, AuBW is searched in compromise in the literary queue of the row of falling1, the result of lookup can
Three kinds of situations can be will appear:
A1) there was only a record;The Bag-Words in the record is so reduced into four weight values fingerprint MeFi, reduction side
Method is that each word removes 4 remainders;It is asked to normalize Hamming distance from d by formula (6):
Here i=1,2 ..., L, L are fingerprint length, and AuF is the fingerprint for authenticating video;Then it is asked by (7) formula
Value;
As T=0, poll-final shows that the corresponding video of this yuan record is exactly the video for needing to authenticate;Work as T=1
When, the position for writing down the metadata and Hamming distance by the record from and putting on and looked into label;As T=2, only by the record
It puts on and has looked into label;
A2) there is a plurality of record;The Hamming distances of all these records is calculated from while marking these records by (6) formula
On looked into label;Take minimum Hamming distance from by (7) formula progress evaluation, as T=0, poll-final shows that this yuan records institute
Corresponding video is exactly the video for needing to authenticate;The position for writing down the metadata as T=1 and Hamming distance are from working as T=2
When, with no treatment, it is directly entered in next step;
A3 it) does not record;With no treatment, it is directly entered in next step;
(4.3.3): taking its i-th of word is AuBWi, i=2,3 ..., K;AuBW is searched in compromise in the literary queue of the row of fallingi,
The result of lookup is it is possible that four kinds of situations;It should be noted that K here is a unknown number, but centainly meet K≤L/m;
M is the length of word, herein middle m=4;
B1 several) indicate the record for having looked into label;Such case is directly entered in next step;
B2) only have one and do not indicate the record for having looked into label;Such case is pressed and the processing of A1 in (4.3.2)) situation;
B3) there is a plurality of record for not indicating and having looked into label;Such case is pressed and the processing of A2 in (4.3.2)) situation;
B4 it) does not record;In this case by A3 in (4.3.2)) situation processing;
Repeat (4.3.3), until occur T=0 or all record all put on looked into label until;
(4.3.4): if the first two step is that T=0 situation do not occur, only two kinds of situations occur:
C1) at least one record meets T=1;Such case takes the smallest Hamming distance to record from that member, this
The video that member record exactly needs to authenticate;Poll-final;
C2) meet T=1 without a record;Such case shows that the video of certification not in metadatabase, issues refusal
Information;Poll-final.