[go: up one dir, main page]

CN103618900A - Video region-of-interest extraction method based on encoding information - Google Patents

Video region-of-interest extraction method based on encoding information Download PDF

Info

Publication number
CN103618900A
CN103618900A CN201310591430.0A CN201310591430A CN103618900A CN 103618900 A CN103618900 A CN 103618900A CN 201310591430 A CN201310591430 A CN 201310591430A CN 103618900 A CN103618900 A CN 103618900A
Authority
CN
China
Prior art keywords
frame
mode
current
macroblock
coded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310591430.0A
Other languages
Chinese (zh)
Other versions
CN103618900B (en
Inventor
刘鹏宇
贾克斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Hongyi Environmental Protection Technology Co ltd
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201310591430.0A priority Critical patent/CN103618900B/en
Publication of CN103618900A publication Critical patent/CN103618900A/en
Application granted granted Critical
Publication of CN103618900B publication Critical patent/CN103618900B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明公开了一种基于视觉感知特征和编码信息的视频感兴趣区域提取方法,涉及视频编码领域。本发明包括以下步骤:首先从原始视频流中提取当前编码宏块的亮度信息;然后,利用当前编码宏块的帧间预测模式类型,标识空域视觉特征显著度区域;再以前一帧编码宏块分别在水平和垂直方向上的平均运动矢量为动态双阈值,根据当前编码宏块的水平及垂直方向的运动矢量与动态双阈值的比较结果,标识时域视觉特征显著度区域;最后结合空域和时域视觉特征显著度区域的标识结果,定义视频感兴趣优先级,实现视频感兴趣的自动提取。本发明方法可为基于ROI(Region of Interest,ROI)的视频编码技术提供重要编码依据。

Figure 201310591430

The invention discloses a video interest region extraction method based on visual perception features and coding information, and relates to the field of video coding. The present invention includes the following steps: first extracting the brightness information of the current coded macroblock from the original video stream; then, using the inter-frame prediction mode type of the current coded macroblock to identify the saliency region of the spatial domain visual feature; and then encoding the macroblock of the previous frame The average motion vectors in the horizontal and vertical directions respectively are dynamic double thresholds. According to the comparison results of the horizontal and vertical motion vectors of the current coded macroblock and the dynamic double thresholds, the time-domain visual feature salience area is identified; finally, combined with the spatial domain and The identification result of the saliency region of temporal visual features defines the priority of video interest, and realizes the automatic extraction of video interest. The method of the invention can provide an important coding basis for ROI (Region of Interest, ROI)-based video coding technology.

Figure 201310591430

Description

基于编码信息的视频感兴趣区域提取方法Extraction method of video region of interest based on coding information

技术领域technical field

本发明属视频信息处理领域。利用视频编码技术和人眼视觉感知原理实现一种视频感兴趣区域快速提取方法。该方法能够对输入的视频流进行自动分析,利用编码信息标注并输出视频感兴趣区域。The invention belongs to the field of video information processing. Using video coding technology and the principle of human visual perception to realize a fast extraction method of video interest region. The method can automatically analyze the input video stream, and use the encoding information to mark and output the region of interest in the video.

背景技术Background technique

最新的视频编码标准H.264/AVC采用了多种先进的编码技术,在提高编码性能的同时,其编码复杂度也急剧增加,限制了其在多媒体信息处理与实时通信业务中的广泛应用。人们对如何提高H.264/AVC编码速度进行了深入研究,并提出了大量快速编码优化算法,但多数算法并不区分视频图像中各个区域在视觉意义上的重要程度,对所有编码内容采用相同的编码方案,忽略了人类视觉系统HVS(Human Visual System,HVS)对视频场景感知的差异性。The latest video coding standard H.264/AVC uses a variety of advanced coding technologies. While improving the coding performance, its coding complexity also increases sharply, which limits its wide application in multimedia information processing and real-time communication services. People have conducted in-depth research on how to improve the encoding speed of H.264/AVC, and proposed a large number of fast encoding optimization algorithms, but most of the algorithms do not distinguish the visual importance of each area in the video image, and use the same method for all encoding content. The encoding scheme ignores the difference in the human visual system HVS (Human Visual System, HVS) perception of video scenes.

视觉神经科学研究已证明,HVS对视频场景的感知具有选择性,对不同区域具有不同的视觉重要性。因此,利用已有编码信息进行视觉感知特征分析,再依据视觉感知特征将计算资源优先分配给感兴趣区域,对提高视频编码算法实时性、降低计算复杂度,具有重要的理论意义和应用价值。而快速、有效的视觉特征分析,特别是视觉感兴趣区域的有效检测是优化编码资源、编写高效视频编码方案的重要基础。Visual neuroscience research has demonstrated that HVS is selective in the perception of video scenes, giving different visual importance to different regions. Therefore, it is of great theoretical significance and application value to improve the real-time performance of video coding algorithms and reduce computational complexity by using existing coding information to analyze visual perception features, and then preferentially allocate computing resources to regions of interest based on visual perception features. Fast and effective visual feature analysis, especially effective detection of visual regions of interest is an important basis for optimizing coding resources and writing efficient video coding schemes.

发明内容Contents of the invention

本发明不同于现有的光流法、帧差法、运动能量检测法、背景差法等视频运动对象提取方法,是以视频码流中的预测模式、运动矢量等编码信息为基础,根据编码信息与视觉感兴趣区域的关联性,识别视频编码内容中的空域视觉特征显著度区域和时域特征视觉显著度区域,从而实现视频感兴趣区域的自动标识和获取。The present invention is different from existing video moving object extraction methods such as optical flow method, frame difference method, motion energy detection method, background difference method, etc., and is based on coding information such as prediction mode and motion vector in video code stream, and according to coding The correlation between information and visual regions of interest, identifying spatial visual feature salience regions and temporal feature visual salience regions in video coding content, so as to realize automatic identification and acquisition of video regions of interest.

根据HVS特征,人眼对亮度信息较之色度信息更加敏感,本发明方法针对视频序列中的亮度分量的编码信息,进行视频感兴趣区域的自动标识和获取。According to the HVS feature, human eyes are more sensitive to luminance information than chrominance information, and the method of the present invention aims at the encoding information of luminance components in video sequences to automatically identify and acquire video regions of interest.

本发明方法具体包括下述步骤:The inventive method specifically comprises the following steps:

步骤一:输入YUV格式、GOP(Group of Picture,GOP)结构为IPPP的视频序列,读取编码宏块的亮度分量Y,进行编码参数配置和初始化参数;Step 1: Input a video sequence in YUV format and GOP (Group of Picture, GOP) structure as IPPP, read the luminance component Y of the encoded macroblock, and configure the encoding parameters and initialize the parameters;

步骤二:对视频序列的首帧,即I帧进行帧内预测编码;Step 2: performing intra-frame predictive coding on the first frame of the video sequence, i.e. the I frame;

在视频编码标准中,I帧做为随机访问的参考点,含有大量信息,由于其不能利用相邻帧之间的时间相关性进行编码,因而采用帧内预测编码方法,利用当前帧中己编码重建宏块的编码信息对当前宏块进行预测,以消除空间冗余。对视频序列首帧,即I帧进行帧内预测编码是视频编码中惯用的一种常规编码方式。In the video coding standard, the I frame is used as a reference point for random access and contains a large amount of information. Since it cannot be coded using the temporal correlation between adjacent frames, the intra-frame predictive coding method is used to utilize the coded data in the current frame. The coding information of the reconstructed macroblock is used to predict the current macroblock to eliminate spatial redundancy. Performing intra-frame predictive coding on the first frame of a video sequence, that is, an I frame, is a conventional coding method commonly used in video coding.

步骤三:对当前p帧进行帧间预测编码,利用相邻帧视频内容的相关性消除时间冗余。记录当前帧内所有编码宏块的帧间预测模式类型,记为ModepnStep 3: Perform inter-frame predictive coding on the current p frame, and eliminate time redundancy by utilizing the correlation of video content of adjacent frames. Record the inter prediction mode types of all coded macroblocks in the current frame, denoted as Mode pn ;

其中,p=1,2,3,…,L-1,代表第p个进行帧间编码的视频帧,L为整个视频序列进行编码的总帧数;n表示在当前编码帧中的第n个编码宏块的序号。Among them, p=1,2,3,...,L-1, represents the p-th video frame for inter-frame encoding, L is the total number of frames encoded in the entire video sequence; n indicates the nth in the current encoding frame The sequence number of a coded macroblock.

步骤四:标识当前p帧的空域视觉特征显著度区域,具体为:若当前编码宏块的帧间预测模式Modepn属于亚分割模式集合或者帧内预测模式集合,即Modepn∈{8×8,8×4,4×8,4×4}or{Intra16×16,Intra4×4},则将该宏块标记为SYp(x,y,Modepn)=1,属于空域视觉特征显著度区域,否则标记SYp(x,y,Modepn)=0;其中,Y表示编码宏块的亮度分量,(x,y)表示该编码宏块的位置坐标,p和Modepn的定义同上,遍历当前p帧中的所有编码宏块;Step 4: Identify the spatial visual feature saliency area of the current p frame, specifically: if the inter-frame prediction mode Mode pn of the current coded macroblock belongs to the sub-segmentation mode set or the intra-frame prediction mode set, that is, Mode pn ∈{8×8 ,8×4,4×8,4×4}or{Intra16×16,Intra4×4}, then mark the macroblock as S Yp (x,y,Mode pn )=1, which belongs to the saliency of spatial visual features area, otherwise mark S Yp (x, y, Mode pn )=0; wherein, Y represents the luminance component of the coded macroblock, (x, y) represents the position coordinates of the coded macroblock, and the definitions of p and Mode pn are the same as above, Traverse all coded macroblocks in the current p frame;

图1给出了H.264标准帧间预测模式选择流程示意图。FIG. 1 shows a schematic diagram of the H.264 standard inter-frame prediction mode selection process.

经过实验,发现在H.264/AVC标准编码中,预测编码结果与人眼感兴趣区域之间具有较强相关性:对于人眼关注度较高的运动区域或者纹理丰富区域,Modepn大多选择亚分割模式集合{8×8,8×4,4×8,4×4};在镜头切换,视频内容发生突变,或者出现运动幅度较大的运动对象时,人眼关注度最高,此时Modepn才会选择帧内预测模式集合{Intra16×16,Intra4×4};对于人眼关注度较低的背景平滑区域,Modepn大多选择宏块分割模式集合{Skip,16×16,16×8,8×16}。图2以Claire序列为例,给出了Claire序列第50帧帧间预测模式分布图,从图中可以发现在人眼关注度较高的区域中,编码宏块大都选择了帧间亚分割预测模式集合。After experiments, it is found that in the H.264/AVC standard coding, there is a strong correlation between the predictive coding results and the area of interest to the human eye: for the moving area or texture-rich area with high human attention, Mode pn mostly chooses A set of sub-segmentation modes {8×8, 8×4, 4×8, 4×4}; when the camera is switched, the video content changes suddenly, or there is a moving object with a large range of motion, the human eye pays the most attention. At this time Mode pn will select the intra prediction mode set {Intra16×16, Intra4×4}; for the background smooth area with low human attention, Mode pn mostly selects the macroblock segmentation mode set {Skip,16×16,16× 8,8×16}. Figure 2 takes the Claire sequence as an example, and shows the distribution of inter-frame prediction modes in the 50th frame of the Claire sequence. From the figure, it can be found that in areas with high human attention, most of the coded macroblocks choose inter-frame sub-segmentation prediction Pattern collection.

步骤五:记录第p帧中每一个编码宏块在水平方向上的运动矢量Vxpn和在垂直方向上的运动矢量Vypn;并计算前一个编码帧中所有编码宏块在水平方向上的平均运动矢量

Figure BDA0000419026700000031
以及垂直方向上的平均运动矢量
Figure BDA0000419026700000041
Step 5: Record the motion vector V xpn in the horizontal direction and the motion vector V ypn in the vertical direction of each coded macroblock in the pth frame; and calculate the average of all coded macroblocks in the horizontal direction in the previous coded frame motion vector
Figure BDA0000419026700000031
and the average motion vector in the vertical direction
Figure BDA0000419026700000041

其中, V ‾ x ( p - 1 ) th = Σ n = 1 Num V x ( p - 1 ) n Num , V ‾ y ( p - 1 ) th = Σ n = 1 Num V y ( p - 1 ) n Num ; Vx(p-1)n和Vy(p-1)n表示前一个编码帧中每一个编码宏块在水平和垂直方向上的运动矢量,p和n的定义与步骤三相同;Num表示一个编码帧中包含的宏块个数,也就是累加次数。图3以QCIF格式(176×144)的视频为例,给出了一个编码帧中所有编码宏块(16×16)的位置及其序号n,此时, Num = 176 16 × 144 16 = 11 × 9 = 99 . in, V ‾ x ( p - 1 ) the th = Σ no = 1 Num V x ( p - 1 ) no Num , V ‾ the y ( p - 1 ) the th = Σ no = 1 Num V the y ( p - 1 ) no Num ; V x(p-1)n and V y(p-1)n represent the horizontal and vertical motion vectors of each coded macroblock in the previous coded frame, and the definitions of p and n are the same as in step 3; Num represents The number of macroblocks included in a coded frame, that is, the number of times of accumulation. Figure 3 takes the video in QCIF format (176×144) as an example, and shows the positions and serial numbers n of all coded macroblocks (16×16) in a coded frame. At this time, Num = 176 16 × 144 16 = 11 × 9 = 99 .

步骤六:标识当前p帧的时域视觉特征显著度区域,具体为:若当前编码宏块的水平方向运动矢量Vxpn大于前一帧编码宏块在水平方向运动矢量平均值

Figure BDA0000419026700000044
或者当前编码宏块的垂直方向运动矢量Vypn大于前一帧编码宏块在垂直方向运动矢量平均值
Figure BDA0000419026700000045
则该宏块属于时域视觉特征显著度区域,标记TYp(x,y,Vxpn,Vypn)=1,否则标记TYp(x,y,Vxpn,Vypn)=0,遍历当前p帧中的所有编码宏块;Step 6: Identify the temporal visual feature salience area of the current p frame, specifically: if the horizontal motion vector V xpn of the current coded macroblock is greater than the average value of the horizontal motion vector of the coded macroblock in the previous frame
Figure BDA0000419026700000044
Or the vertical motion vector V ypn of the current coded macroblock is greater than the average value of the vertical motion vector of the coded macroblock in the previous frame
Figure BDA0000419026700000045
Then the macroblock belongs to the time-domain visual feature saliency area, mark T Yp (x,y,V xpn ,V ypn )=1, otherwise mark T Yp (x,y,V xpn ,V ypn )=0, traverse the current all coded macroblocks in the p frame;

其中,Y表示编码宏块的亮度分量,(x,y)表示该编码宏块的位置坐标,p的定义与步骤三相同。Wherein, Y represents the luminance component of the coded macroblock, (x, y) represents the position coordinates of the coded macroblock, and the definition of p is the same as that in Step 3.

运动感知是人眼视觉系统中最重要的视觉处理机制之一。经过实验,发现具有较大运动矢量的编码内容恰好是人眼感兴趣的运动区域(如头部、手臂、人物等);而运动矢量较小甚至为零的编码内容正是人眼关注度较低的静止背景区域。图4以Akiyo序列为例,给出了Akiyo序列第50帧运动矢量分布图,从图中可以发现在人眼关注度较高的人脸及头肩区域中,编码宏块通常具有较大的运动矢量。Motion perception is one of the most important visual processing mechanisms in the human visual system. After experiments, it is found that the coded content with a large motion vector is just the moving area of interest to the human eye (such as the head, arm, person, etc.); Low static background area. Figure 4, taking the Akiyo sequence as an example, shows the motion vector distribution diagram of the 50th frame of the Akiyo sequence. From the figure, it can be found that in the face and head and shoulder areas where the human eye pays more attention, the coded macroblock usually has a larger Motion vector.

当前编码宏块的运动程度剧烈与否,判定阈值的设定对结果的影响较大。为降低误判率,本发明将水平方向和垂直方向的运动程度判定阈值分别记为

Figure BDA0000419026700000051
Figure BDA0000419026700000052
Figure BDA0000419026700000053
表示前一帧中所有编码宏块在水平方向上的平均运动矢量,
Figure BDA0000419026700000054
表示前一帧中所有编码宏块在垂直方向上的平均运动矢量。本发明中动态阈值的设定,充分考虑了视频序列的时间相关性,使阈值能够随前一帧编码宏块运动矢量平均值的变化而改变,有效减少了误判,能够快速、准确地获得时域视觉特征显著度区域。Whether the degree of motion of the currently coded macroblock is severe or not, the setting of the decision threshold has a great influence on the result. In order to reduce the misjudgment rate, the present invention records the thresholds for judging the degree of movement in the horizontal direction and the vertical direction as
Figure BDA0000419026700000051
and
Figure BDA0000419026700000052
Figure BDA0000419026700000053
Indicates the average motion vector in the horizontal direction of all coded macroblocks in the previous frame,
Figure BDA0000419026700000054
Indicates the average motion vector in the vertical direction of all coded macroblocks in the previous frame. The setting of the dynamic threshold in the present invention fully considers the time correlation of the video sequence, so that the threshold can be changed with the change of the average value of the motion vector of the coded macroblock in the previous frame, effectively reducing misjudgment, and can quickly and accurately obtain Temporal visual feature saliency regions.

步骤七:标记当前p帧的视频感兴趣区域,具体为:遍历当前p帧中的所有编码宏块,根据每个编码宏块的空域特征显著度和时域视觉特征显著度进行标记,具体标记公式如下:Step 7: mark the video region of interest of the current p frame, specifically: traverse all the coded macroblocks in the current p frame, mark according to the spatial domain feature saliency and time domain visual feature saliency of each coded macro block, specifically mark The formula is as follows:

ROIROI YpYp (( xx ,, ythe y )) == 33 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 11 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 11 22 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 00 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 11 11 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 11 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 00 00 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 00 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 00

标记视频感兴趣区域,分为以下几类情况:Mark video regions of interest, divided into the following categories:

如果当前编码宏块同时具有空域和时域视觉特征显著度,即SYp(x,y,Modepn)=1并且TYp(x,y,Vxpn,Vypn)=1,说明当前编码宏块不仅纹理细节丰富,而且产生了较大的运动矢量,则人眼感兴趣程度最高,标记ROIYp(x,y)=3;If the current coded macroblock has both spatial and temporal visual feature salience, that is, S Yp (x,y,Mode pn )=1 and T Yp (x,y,V xpn ,V ypn )=1, it means that the current coded macroblock The block is not only rich in texture details, but also produces a large motion vector, the human eye is most interested, and the mark ROI Yp (x,y)=3;

若仅具有时域视觉特征显著度,不具有空域视觉特征显著度,即TYp(x,y,Vxpn,Vypn)=1并且SYp(x,y,Modepn)=0,说明当前编码宏块产生了较大的运动矢量,根据HVS的感知特征,人眼对物体的运动具有高度敏感性,人眼感兴趣程度次之,标记ROIYp(x,y)=2;If it only has the saliency of temporal visual features, but does not have the saliency of spatial visual features, that is, T Yp (x,y,V xpn ,V ypn )=1 and S Yp (x,y,Mode pn )=0, it means that the current The coded macroblock produces a large motion vector. According to the perceptual characteristics of HVS, the human eye is highly sensitive to the motion of the object, and the human eye is less interested. Mark ROI Yp (x,y)=2;

若宏块运动程度较低,不具有时域视觉特征显著度,但具有丰富的纹理信息,仅具有空域视觉特征显著度,即SYp(x,y,Modepn)=1并且TYp(x,y,Vxpn,Vypn)=0,人眼感兴趣程度再次,标记ROIYp(x,y)=1;If the motion degree of the macroblock is low, it does not have temporal visual feature saliency, but it has rich texture information, and only has spatial domain visual feature salience, that is, S Yp (x,y,Mode pn )=1 and T Yp (x ,y,V xpn ,V ypn )=0, the degree of human interest is again, mark ROI Yp (x,y)=1;

若既不具有空域视觉特征显著度也不具有时域视觉特征显著度,即SYp(x,y,Modepn)=0并且TYp(x,y,Vxpn,Vypn)=0,说明当前编码宏块纹理平坦、运动平缓或者静止,通常是静止的背景区域,则为人眼非感兴趣区域,人眼感兴趣程度最低,标记ROIYp(x,y)=0;If there is neither spatial visual feature salience nor temporal visual feature saliency, that is, S Yp (x,y,Mode pn )=0 and T Yp (x,y,V xpn ,V ypn )=0, it means The currently coded macroblock has flat texture, smooth motion, or stillness, usually a static background area, which is not an area of interest to the human eye, with the lowest degree of interest to the human eye, and the mark ROI Yp (x,y)=0;

其中,ROIYp(x,y)代表当前编码宏块视觉感兴趣优先级;TYp(x,y,Vxpn,Vypn)代表当前编码宏块的时域视觉特征显著度;SYp(x,y,Modepn)代表当前编码宏块的空域视觉特征显著度;(x,y)表示当前编码宏块的位置坐标;Y代表宏块的亮度分量;p表示第p个进行帧间编码的视频帧;n表示在当前编码帧中的第n个编码宏块的序号。Among them, ROI Yp (x, y) represents the visual interest priority of the current coded macroblock; T Yp (x, y, V xpn , V ypn ) represents the temporal visual feature salience of the current coded macroblock; S Yp (x , y, Mode pn ) represents the spatial visual feature saliency of the current coded macro block; (x, y) represents the position coordinates of the current coded macro block; Y represents the brightness component of the macro block; p represents the pth inter-frame coding Video frame; n represents the sequence number of the nth coded macroblock in the current coded frame.

步骤八:输出视频编码码流,具体为:根据标记的ROIYp(x,y)感兴趣优先级别高低,对当前p帧中所有宏块的亮度分量Y做如下处理,并输出标记后的视频流,Step 8: output the video coded stream, specifically: according to the ROI Yp (x, y) marked ROI Yp (x, y) interest priority level, do the following processing to the luminance component Y of all macroblocks in the current p frame, and output the marked video flow,

YY pp (( xx ,, ythe y )) == 255255 ,, ROIROI YpYp (( xx ,, ythe y )) == 33 150150 ,, ROIROI YpYp (( xx ,, ythe y )) == 22 100100 ,, ROIROI YpYp (( xx ,, ythe y )) == 11 00 ,, ROIROI YpYp (( xx ,, ythe y )) == 00

由于编码宏块的亮度分量的取值范围为Y∈[0,255],从0到255代表宏块亮度分量从全黑到全白的256个级别。根据标记的ROIYp(x,y)感兴趣优先级别高低,本发明针对宏块的亮度分量Y做如下处理,并输出标记后的视频流。Since the value range of the luminance component of the coded macroblock is Y∈[0,255], from 0 to 255 represents 256 levels of the luminance component of the macroblock from all black to all white. According to the interest priority level of the marked ROI Yp (x, y), the present invention performs the following processing on the luminance component Y of the macroblock, and outputs the marked video stream.

如果ROIYp(x,y)=3,感兴趣程度最高,人眼关注度最高,将该编码宏块的亮度分量设为255,输出宏块亮度分量值最高,即Yp(x,y)=255;If ROI Yp (x, y)=3, the degree of interest is the highest, and the degree of human eye attention is the highest. The brightness component of the coded macroblock is set to 255, and the value of the brightness component of the output macroblock is the highest, that is, Y p (x, y) =255;

如果ROIYp(x,y)=2,感兴趣程度次之,人眼关注度较高,将该编码宏块的亮度分量设为150,输出宏块亮度分量值较高,即Yp(x,y)=150;If ROI Yp (x, y)=2, the degree of interest is second, and the human eye is more concerned, and the luminance component of the coded macroblock is set to 150, and the value of the luminance component of the output macroblock is higher, that is, Y p (x ,y)=150;

如果ROIYp(x,y)=1,感兴趣程度再次,人眼关注度较低,将该编码宏块的亮度分量设为100,输出宏块亮度分量值较低,即Yp(x,y)=100;If ROI Yp (x, y)=1, the degree of interest is again, and the attention of the human eye is low. The luminance component of the coded macroblock is set to 100, and the value of the luminance component of the output macroblock is low, that is, Y p (x, y)=100;

如果ROIYp(x,y)=0,是非感兴趣区域,人眼关注度最低,将该编码宏块的亮度分量设为0,输出宏块亮度分量值最低,即Yp(x,y)=0。If ROI Yp (x, y)=0, it is a non-interest region, and the human eye has the lowest degree of attention. The brightness component of the coded macroblock is set to 0, and the output macroblock brightness component value is the lowest, that is, Y p (x, y) =0.

步骤九:返回步骤三,对下一帧进行处理,直到遍历整个视频序列。Step 9: Return to Step 3 to process the next frame until the entire video sequence is traversed.

图5给出了视频感兴趣区域标识与提取方法流程图。Fig. 5 shows the flow chart of the video ROI identification and extraction method.

图6给出了典型视频序列的标记后的视频感兴趣区域输出结果。有益效果Figure 6 shows the output results of the labeled video ROIs for a typical video sequence. Beneficial effect

本方法根据基本编码信息实现了视频感兴趣区域的快速提取。本方法利用基本编码信息与人眼视觉感兴趣区域之间的关联性,分别识别视频编码内容中的空域视觉特征显著度区域和时域特征视觉显著度区域,再结合空域和时域视觉特征显著度区域的标识结果,定义视频感兴趣优先级,最终实现了视频感兴趣的自动提取。本发明方法可为基于感兴趣区域ROI(Region of Interest,ROI)的视频编码技术提供重要编码依据。This method realizes the rapid extraction of the region of interest in the video according to the basic coding information. This method uses the correlation between the basic coding information and the human visual interest area to identify the spatial visual feature salience area and the temporal feature visual saliency area in the video coding content respectively, and then combines the spatial and temporal visual feature salience The identification results of the degree area define the priority of video interest, and finally realize the automatic extraction of video interest. The method of the invention can provide an important coding basis for the video coding technology based on the ROI (Region of Interest, ROI).

附图说明Description of drawings

图1.H.264标准帧间预测模式选择流程示意图;Figure 1. Schematic diagram of the H.264 standard inter prediction mode selection process;

图2.Claire序列第50帧帧间预测模式分布图;Figure 2. Distribution of inter-frame prediction modes in the 50th frame of the Claire sequence;

图3.一个视频帧中每一个编码宏块的位置及其序号示意图;Figure 3. A schematic diagram of the position and sequence number of each coded macroblock in a video frame;

图4.Akiyo序列第50帧运动矢量分布图;Figure 4. Motion vector distribution diagram of the 50th frame of the Akiyo sequence;

图5.本发明方法流程图;Fig. 5. flow chart of the method of the present invention;

图6.利用本发明方法标记视频感兴趣区域的输出结果示意图。Fig. 6. Schematic diagram of the output result of marking a region of interest in a video using the method of the present invention.

具体实施方式Detailed ways

鉴于人眼对亮度信息较之色度信息更加敏感,本发明方法针对视频帧的亮度分量进行编码。先读入视频序列,提取其亮度分量,调用本发明的视频感兴趣区域提取模块完成感兴趣区域的自动标识与提取。In view of the fact that human eyes are more sensitive to luminance information than chrominance information, the method of the present invention encodes the luminance component of the video frame. First read in the video sequence, extract its luminance component, and call the video region of interest extraction module of the present invention to complete the automatic identification and extraction of the region of interest.

本发明实施中是采用视频摄取装置(如数码摄像机等)实现视频图像的采集,并将图象传输至计算机,在计算机中根据视频码流中的编码信息实现视频感兴趣区域的自动标识。依据当前编码宏块的预测编码模式标识空域视觉特征显著度区域;再依据当前编码宏块在水平或垂直方向上的运动矢量,标识时域视觉特征显著度区域,通过设定动态运动矢量判定阈值减小由于不同的视频运动类型对于感兴趣区域提取准确度的影响;最后依据空域/时域视觉特征显著度得到视频感兴趣分类结果,实现视频感兴趣区域的自动提取。In the implementation of the present invention, a video capture device (such as a digital video camera, etc.) is used to collect video images, and the images are transmitted to a computer, and the automatic identification of the video region of interest is realized in the computer according to the coding information in the video code stream. According to the predictive coding mode of the current coded macroblock, the spatial domain visual feature salience area is identified; then according to the motion vector of the current coded macro block in the horizontal or vertical direction, the time domain visual feature salience area is identified, and the dynamic motion vector judgment threshold is set Reduce the impact of different video motion types on the accuracy of ROI extraction; finally, the classification results of video interest are obtained according to the salience of spatial/temporal visual features, and the automatic extraction of ROI in videos is realized.

具体实施中,在计算机中完成以下程序:In the specific implementation, the following procedures are completed in the computer:

第一步:根据编码配置文件encoder.cfg读入视频序列,按照配置文件中的参数配置编码器。例如:完成视频码流结构GOP=IPPP…;编码帧数FramesToBeEncoded=100;帧率FrameRate=30f/s;视频文件宽度SourceWidth=176,高度SourceHeight=144;输出文件名称OutputFile=ROI.264;量化步长值QPISlice=28,QPPSlice=28;运动估计搜索范围SearchRange=±16;参考帧数NumberReferenceFrames=5;激活率失真代价函数RDOptimization=on;设置熵编码类型SymbolMode=CAVLC等参数配置,初始化参数L=编码帧数,p=1;Step 1: Read in the video sequence according to the encoding configuration file encoder.cfg, and configure the encoder according to the parameters in the configuration file. For example: complete the video code stream structure GOP=IPPP...; number of encoded frames FramesToBeEncoded=100; frame rate FrameRate=30f/s; video file width SourceWidth=176, height SourceHeight=144; output file name OutputFile=ROI.264; quantization step Long value QPISlice=28, QPPSlice=28; motion estimation search range SearchRange=±16; reference frame number NumberReferenceFrames=5; activation rate distortion cost function RDOptimization=on; set entropy coding type SymbolMode=CAVLC and other parameter configurations, initialization parameters L= Number of encoded frames, p=1;

第二步:从输入视频序列中按顺序逐帧读取编码宏块亮度分量值Y;The second step: read the coded macroblock luminance component value Y frame by frame from the input video sequence in sequence;

第三步:对视频序列首帧,即I帧进行帧内预测编码;Step 3: carry out intra-frame predictive encoding to the first frame of the video sequence, i.e. the I frame;

第四步:对当前p帧进行帧间预测编码;记录当前编码宏块的帧间预测模式类型Modepn;其中,p=1,2,3,…,L-1,代表第p个进行帧间编码的视频帧,L为整个视频序列进行编码的总帧数;n表示在当前编码帧中的第n个编码宏块的序号。Step 4: Perform inter-frame predictive coding on the current p frame; record the inter-frame prediction mode type Mode pn of the current coded macroblock; where, p=1,2,3,...,L-1, representing the p-th frame L is the total number of coded frames of the entire video sequence; n represents the sequence number of the nth coded macroblock in the current coded frame.

第五步:标识空域视觉特征显著度区域,若当前编码宏块的帧间预测模式Modepn属于亚分割模式集合或者帧内预测模式集合,Modepn∈{8×8,8×4,4×8,4×4}or{Intra16×16,Intra4×4},则将该宏块标记为SYp(x,y,Modepn)=1,属于空域视觉特征显著度区域,否则标记SYp(x,y,Modepn)=0;Step 5: Identify the spatial visual feature salience area. If the inter prediction mode Mode pn of the current coded macroblock belongs to the sub-segmentation mode set or the intra prediction mode set, Mode pn ∈ {8×8,8×4,4× 8,4×4}or{Intra16×16,Intra4×4}, then mark the macroblock as S Yp (x,y,Mode pn )=1, which belongs to the spatial visual feature saliency area, otherwise mark S Yp ( x,y,Mode pn )=0;

SS (( xx ,, ythe y ,, Modemode pnpn )) == 11 ,, Modemode pnpn ∈∈ {{ 88 ×× 8,88,8 ×× 4,44,4 ×× 8,48,4 ×× 44 }} oror {{ IntraIntra 1616 ×× 1616 ,, IntraIntra 44 ×× 44 }} 00 ,, elseelse

第六步:若p≠1,记录第p帧中每一个编码宏块在水平方向上的运动矢量Vxpn和在垂直方向上的运动矢量Vypn;并计算前一个编码帧中所有编码宏块在水平方向上的平均运动矢量

Figure BDA0000419026700000102
以及垂直方向上的平均运动矢量
Figure BDA0000419026700000103
否则,跳转至第十步;Step 6: If p≠1, record the motion vector V xpn in the horizontal direction and the motion vector V ypn in the vertical direction of each coded macroblock in the pth frame; and calculate all coded macroblocks in the previous coded frame Average motion vector in the horizontal direction
Figure BDA0000419026700000102
and the average motion vector in the vertical direction
Figure BDA0000419026700000103
Otherwise, skip to the tenth step;

第七步:标识时域视觉特征显著度区域,若当前编码宏块的水平方向运动矢量Vxpn大于前一帧编码宏块在水平方向运动矢量平均值

Figure BDA0000419026700000104
或者当前编码宏块的垂直方向运动矢量Vypn大于前一帧编码宏块在垂直方向运动矢量平均值
Figure BDA0000419026700000105
满足其中任何一个判别条件,则该宏块属于时域视觉特征显著度区域,标记TYp(x,y,Vxpn,Vypn)=1,否则标记TYp(x,y,Vxpn,Vypn)=0;Step 7: Identify the time-domain visual feature salience area, if the horizontal motion vector V xpn of the current coded macroblock is greater than the average value of the horizontal motion vector of the coded macroblock in the previous frame
Figure BDA0000419026700000104
Or the vertical motion vector V ypn of the current coded macroblock is greater than the average value of the vertical motion vector of the coded macroblock in the previous frame
Figure BDA0000419026700000105
If any one of the discriminant conditions is met, the macroblock belongs to the time-domain visual feature saliency area, marked T Yp (x,y,V xpn ,V ypn )=1, otherwise marked T Yp (x,y,V xpn ,V ypn )=0;

TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 11 ,, VV xpnxpn >> VV ‾‾ xx (( pp -- )) ththe th oror VV ypnypn >> VV ‾‾ tt (( pp -- 11 )) ththe th 00 ,, elseelse

第八步:标记视频感兴趣区域。Step 8: Mark the region of interest in the video.

ROIROI YpYp (( xx ,, ythe y )) == 33 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 11 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 11 22 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 00 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 11 11 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 11 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 00 00 ,, SS YpYp (( xx ,, ythe y ,, Modemode pnpn )) == 00 || || TT YpYp (( xx ,, ythe y ,, VV xpnxpn ,, VV ypnypn )) == 00

如果当前编码宏块同时具有空域和时域视觉特征显著度,即SYp(x,y,Modepn)=1并且TYp(x,y,Vxpn,Vypn)=1,则人眼感兴趣程度最高,标记ROIYp(x,y)=3;If the current coded macroblock has both spatial and temporal visual feature salience, that is, S Yp (x,y,Mode pn )=1 and T Yp (x,y,V xpn ,V ypn )=1, the human eye perception The highest degree of interest, marked ROI Yp (x,y)=3;

若仅具有时域视觉特征显著度,即TYp(x,y,Vxpn,Vypn)=1并且SYp(x,y,Modepn)=0,人眼感兴趣程度次之,标记ROIYp(x,y)=2;If it only has the salience of time-domain visual features, that is, T Yp (x,y,V xpn ,V ypn )=1 and S Yp (x,y,Mode pn )=0, the degree of interest of human eyes is next, mark ROI Yp (x,y)=2;

若仅具有空域视觉特征显著度,即SYp(x,y,Modepn)=1并且TYp(x,y,Vxpn,Vypn)=0,人眼感兴趣程度再次,标记ROIYp(x,y)=1;If it only has the salience of spatial visual features, that is, S Yp (x,y,Mode pn )=1 and T Yp (x,y,V xpn ,V ypn )=0, the degree of human interest is again, and the ROI Yp ( x,y)=1;

若既不具有空域视觉特征显著度也不具有时域视觉特征显著度,即SYp(x,y,Modepn)=0并且TYp(x,y,Vxpn,Vypn)=0,则为人眼非感兴趣区域,标记ROIYp(x,y)=0;If there is neither spatial visual feature salience nor temporal visual feature saliency, that is, S Yp (x,y,Mode pn )=0 and T Yp (x,y,V xpn ,V ypn )=0, then For the non-interest area of the human eye, mark the ROI Yp (x,y)=0;

第九步:输出视频编码码流。Step 9: Output video encoding stream.

YY pp (( xx ,, ythe y )) == 255255 ,, ROIROI YpYp (( xx ,, ythe y )) == 33 150150 ,, ROIROI YpYp (( xx ,, ythe y )) == 22 100100 ,, ROIROI YpYp (( xx ,, ythe y )) == 11 00 ,, ROIROI YpYp (( xx ,, ythe y )) == 00

如果ROIYp(x,y)=3,感兴趣程度最高,人眼关注度最高,将该编码宏块的亮度分量设为255,输出宏块亮度分量值最高,即Yp(x,y)=255;If ROI Yp (x, y)=3, the degree of interest is the highest, and the degree of human eye attention is the highest. The brightness component of the coded macroblock is set to 255, and the value of the brightness component of the output macroblock is the highest, that is, Y p (x, y) =255;

如果ROIYp(x,y)=2,感兴趣程度次之,人眼关注度较高,将该编码宏块的亮度分量设为150,输出宏块亮度分量值较高,即Yp(x,y)=150;If ROI Yp (x, y)=2, the degree of interest is second, and the human eye is more concerned, and the luminance component of the coded macroblock is set to 150, and the value of the luminance component of the output macroblock is higher, that is, Y p (x ,y)=150;

如果ROIYp(x,y)=1,感兴趣程度再次,人眼关注度较低,将该编码宏块的亮度分量设为100,输出宏块亮度分量值较低,即Yp(x,y)=100;If ROI Yp (x, y)=1, the degree of interest is again, and the attention of the human eye is low. The luminance component of the coded macroblock is set to 100, and the value of the luminance component of the output macroblock is low, that is, Y p (x, y)=100;

如果ROIYp(x,y)=0,是非感兴趣区域,人眼关注度最低,将该编码宏块的亮度分量设为0,输出宏块亮度分量值最低,即Yp(x,y)=0。If ROI Yp (x, y)=0, it is a non-interest region, and the human eye has the lowest degree of attention. The brightness component of the coded macroblock is set to 0, and the output macroblock brightness component value is the lowest, that is, Y p (x, y) =0.

第十步:若p≠L-1,p=p+1,跳转至第三步;否则,结束编码。Step 10: If p≠L-1, p=p+1, go to step 3; otherwise, end coding.

利用本发明方法标记视频感兴趣区域的输出结果示意图,如图6所示。以典型的视频监控序列(Hall)和室内活动视频序列(Salesman)为例,利用运动矢量分布结果和帧间预测模式选择结果,标记视频感兴趣区域,若某宏块的人眼感兴趣程度越高,则在输出视频中该位置处的亮度值越高,反之亮度值越低。从图6中最右侧一列的标记结果可以发现,采用本发明方法获得的视频感兴趣区域的形状是不规则的,与传统的采用固定形状模板的运动目标检测方法获得的感兴趣区域相比较,本发明方法标记结果更接近人眼所关注的感兴趣目标形状,能够更准确地标记感兴趣区域。A schematic diagram of the output result of marking a region of interest in a video using the method of the present invention is shown in FIG. 6 . Taking a typical video surveillance sequence (Hall) and an indoor activity video sequence (Salesman) as examples, use the motion vector distribution results and the inter-frame prediction mode selection results to mark the video interest area. If it is high, the brightness value at this position in the output video is higher, otherwise the brightness value is lower. From the marking results in the rightmost column in Figure 6, it can be found that the shape of the video region of interest obtained by the method of the present invention is irregular, compared with the region of interest obtained by the traditional moving object detection method using a fixed shape template , the marking result of the method of the present invention is closer to the shape of the object of interest that human eyes focus on, and can mark the region of interest more accurately.

本发明方法还可与其他快速编码技术结合,在保证对人眼感兴趣区域编码质量的前提下,降低人眼不感兴趣的背景区域编码复杂度,进一步减少编码时间,也可用于基于H.264的可伸缩编码中,实现感兴趣区域的选择性增强编码。The method of the present invention can also be combined with other fast coding techniques, under the premise of ensuring the coding quality of the region of interest to the human eye, it can reduce the coding complexity of the background region that is not of interest to the human eye, and further reduce the coding time. It can also be used in H.264-based In the scalable coding of , the selective enhancement coding of the region of interest is realized.

Claims (1)

1.基于编码信息的视频感兴趣区域提取方法,其特征在于包括下述步骤:1. The method for extracting the video region of interest based on coding information is characterized in that comprising the following steps: 步骤一:输入YUV格式、GOP(Group of Picture,GOP)结构为IPPP的视频序列,读取编码宏块的亮度分量Y,进行编码参数配置;Step 1: Input a video sequence in YUV format and GOP (Group of Picture, GOP) structure as IPPP, read the luminance component Y of the encoded macroblock, and configure the encoding parameters; 步骤二:对视频序列的首帧,即I帧进行帧内预测编码;Step 2: performing intra-frame predictive coding on the first frame of the video sequence, i.e. the I frame; 步骤三:对当前p帧进行帧间预测编码,记录当前p帧中的所有编码宏块的帧间预测模式类型,记为Modepn;p=1,2,3,…,L-1,代表第p个进行帧间编码的视频帧,L为整个视频序列进行编码的总帧数;n表示在当前编码帧中的第n个编码宏块的序号;Step 3: Perform interframe predictive coding on the current p frame, and record the interframe prediction mode types of all coded macroblocks in the current p frame, denoted as Mode pn ; p=1,2,3,...,L-1, representing The pth video frame for inter-frame encoding, L is the total number of frames encoded in the entire video sequence; n represents the sequence number of the nth encoded macroblock in the current encoded frame; 步骤四:标识当前p帧的空域视觉特征显著度区域,具体为:若当前编码宏块的帧间预测模式Modepn属于亚分割模式集合或者帧内预测模式集合,即Modepn∈{8×8,8×4,4×8,4×4}or{Intra16×16,Intra4×4},则将该宏块标记为SYp(x,y,Modepn)=1,属于空域视觉特征显著度区域,否则标记SYp(x,y,Modepn)=0;Y表示编码宏块的亮度分量,(x,y)表示该编码宏块的位置坐标,遍历当前p帧中的所有编码宏块;Step 4: Identify the spatial visual feature saliency area of the current p frame, specifically: if the inter-frame prediction mode Mode pn of the current coded macroblock belongs to the sub-segmentation mode set or the intra-frame prediction mode set, that is, Mode pn ∈{8×8 ,8×4,4×8,4×4}or{Intra16×16,Intra4×4}, then mark the macroblock as S Yp (x,y,Mode pn )=1, which belongs to the saliency of spatial visual features area, otherwise mark S Yp (x, y, Mode pn )=0; Y represents the luminance component of the coded macroblock, (x, y) represents the position coordinates of the coded macroblock, and traverses all coded macroblocks in the current p frame ; 步骤五:记录第p帧中每一个编码宏块在水平方向上的运动矢量Vxpn和在垂直方向上的运动矢量Vypn;并计算前一个编码帧中所有编码宏块在水平方向上的平均运动矢量
Figure FDA0000419026690000011
以及垂直方向上的平均运动矢量
Figure FDA0000419026690000012
Num表示一个编码帧中包含的宏块个数,即累加次数;步骤六:标识当前p帧的时域视觉特征显著度区域,具体为:若当前编码宏块的水平方向运动矢量Vxpn大于前一帧编码宏块在水平方向运动矢量平均值
Figure FDA0000419026690000022
或者当前编码宏块的垂直方向运动矢量Vypn大于前一帧编码宏块在垂直方向运动矢量平均值
Figure FDA0000419026690000023
则该宏块属于时域视觉特征显著度区域,标记TYp(x,y,Vxpn,Vypn)=1,否则标记TYp(x,y,Vxpn,Vypn)=0,遍历当前p帧中的所有编码宏块;步骤七:标记当前p帧的视频感兴趣区域,具体为:遍历当前p帧中的所有编码宏块,根据每个编码宏块的空域特征显著度和时域视觉特征显著度进行标记,具体标记公式如下: ROI Yp ( x , y ) = 3 , S Yp ( x , y , Mode pn ) = 1 | | T Yp ( x , y , V xpn , V ypn ) = 1 2 , S Yp ( x , y , Mode pn ) = 0 | | T Yp ( x , y , V xpn , V ypn ) = 1 1 , S Yp ( x , y , Mode pn ) = 1 | | T Yp ( x , y , V xpn , V ypn ) = 0 0 , S Yp ( x , y , Mode pn ) = 0 | | T Yp ( x , y , V xpn , V ypn ) = 0 如果当前编码宏块同时具有空域和时域视觉特征显著度,即SYp(x,y,Modepn)=1并且TYp(x,y,Vxpn,Vypn)=1,则标记ROIYp(x,y)=3;若当前编码宏块仅具有时域视觉特征显著度,不具有空域视觉特征显著度,即TYp(x,y,Vxpn,Vypn)=1并且SYp(x,y,Modepn)=0,则标记ROIYp(x,y)=2;若当前编码宏块不具有时域视觉特征显著度,仅具有空域视觉特征显著度,即SYp(x,y,Modepn)=1并且TYp(x,y,Vxpn,Vypn)=0,则标记ROIYp(x,y)=1;若当前编码宏块既不具有空域视觉特征显著度也不具有时域视觉特征显著度,即SYp(x,y,Modepn)=0并且TYp(x,y,Vxpn,Vypn)=0,则标记ROIYp(x,y)=0;步骤八:输出视频编码码流,具体为:根据标记的ROIYp(x,y)感兴趣优先级别高低,对当前p帧中所有宏块的亮度分量Y做如下处理,并输出标记后的视频流, Y p ( x , y ) = 255 , ROI Yp ( x , y ) = 3 150 , ROI Yp ( x , y ) = 2 100 , ROI Yp ( x , y ) = 1 0 , ROI Yp ( x , y ) = 0 步骤九:返回步骤三,对下一帧进行处理,直到遍历整个视频序列。
Step 5: Record the motion vector V xpn in the horizontal direction and the motion vector V ypn in the vertical direction of each coded macroblock in the pth frame; and calculate the average of all coded macroblocks in the horizontal direction in the previous coded frame motion vector
Figure FDA0000419026690000011
and the average motion vector in the vertical direction
Figure FDA0000419026690000012
Num represents the number of macroblocks contained in a coded frame, that is, the number of times of accumulation; Step 6: identify the temporal visual feature salience region of the current p frame, specifically: if the horizontal direction motion vector V xpn of the current coded macroblock is greater than the previous The average value of the motion vector in the horizontal direction of a coded macroblock in one frame
Figure FDA0000419026690000022
Or the vertical motion vector V ypn of the current coded macroblock is greater than the average value of the vertical motion vector of the coded macroblock in the previous frame
Figure FDA0000419026690000023
Then the macroblock belongs to the time-domain visual feature saliency area, mark T Yp (x,y,V xpn ,V ypn )=1, otherwise mark T Yp (x,y,V xpn ,V ypn )=0, traverse the current All coded macroblocks in the p frame; Step 7: mark the video region of interest of the current p frame, specifically: traverse all the coded macroblocks in the current p frame, according to the spatial domain feature saliency and time domain of each coded macroblock The visual feature salience is marked, and the specific marking formula is as follows: ROI Yp ( x , the y ) = 3 , S Yp ( x , the y , mode pn ) = 1 | | T Yp ( x , the y , V xpn , V ypn ) = 1 2 , S Yp ( x , the y , mode pn ) = 0 | | T Yp ( x , the y , V xpn , V ypn ) = 1 1 , S Yp ( x , the y , mode pn ) = 1 | | T Yp ( x , the y , V xpn , V ypn ) = 0 0 , S Yp ( x , the y , mode pn ) = 0 | | T Yp ( x , the y , V xpn , V ypn ) = 0 If the current coded macroblock has both spatial and temporal visual feature salience, that is, S Yp (x,y,Mode pn )=1 and T Yp (x,y,V xpn ,V ypn )=1, mark ROI Yp (x,y)=3; if the current coded macroblock only has temporal visual feature saliency, but does not have spatial domain visual feature saliency, that is, T Yp (x,y,V xpn ,V ypn )=1 and S Yp ( x,y,Mode pn )=0, mark ROI Yp (x,y)=2; if the current coded macroblock does not have temporal visual feature salience, it only has spatial visual feature saliency, that is, S Yp (x, y,Mode pn )=1 and T Yp (x,y,V xpn ,V ypn )=0, then mark ROI Yp (x,y)=1; if the current coded macroblock has neither spatial visual feature salience nor Does not have temporal visual feature salience, that is, S Yp (x,y,Mode pn )=0 and T Yp (x,y,V xpn ,V ypn )=0, then mark ROI Yp (x,y)=0 ; Step 8: output video coded code stream, specifically: according to the ROI Yp (x, y) of the mark ROI Yp (x, y) the level of interest priority is high or low, the luminance component Y of all macroblocks in the current p frame is processed as follows, and output the mark after video stream, Y p ( x , the y ) = 255 , ROI Yp ( x , the y ) = 3 150 , ROI Yp ( x , the y ) = 2 100 , ROI Yp ( x , the y ) = 1 0 , ROI Yp ( x , the y ) = 0 Step 9: Return to Step 3 to process the next frame until the entire video sequence is traversed.
CN201310591430.0A 2013-11-21 2013-11-21 Video area-of-interest exacting method based on coding information Active CN103618900B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310591430.0A CN103618900B (en) 2013-11-21 2013-11-21 Video area-of-interest exacting method based on coding information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310591430.0A CN103618900B (en) 2013-11-21 2013-11-21 Video area-of-interest exacting method based on coding information

Publications (2)

Publication Number Publication Date
CN103618900A true CN103618900A (en) 2014-03-05
CN103618900B CN103618900B (en) 2016-08-17

Family

ID=50169604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310591430.0A Active CN103618900B (en) 2013-11-21 2013-11-21 Video area-of-interest exacting method based on coding information

Country Status (1)

Country Link
CN (1) CN103618900B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104079934A (en) * 2014-07-14 2014-10-01 武汉大学 Method for extracting regions of interest in real-time video communication
CN104539962A (en) * 2015-01-20 2015-04-22 北京工业大学 Layered video coding method fused with visual perception features
CN106331711A (en) * 2016-08-26 2017-01-11 北京工业大学 A Dynamic Bit Rate Control Method Based on Network Features and Video Features
CN107371029A (en) * 2017-06-28 2017-11-21 上海大学 Content-based Video Packet Priority Allocation Method
CN107483934A (en) * 2017-08-17 2017-12-15 西安万像电子科技有限公司 Decoding method, device and system
CN107563371A (en) * 2017-07-17 2018-01-09 大连理工大学 The method of News Search area-of-interest based on line laser striation
CN107623848A (en) * 2017-09-04 2018-01-23 浙江大华技术股份有限公司 A kind of method for video coding and device
CN109151479A (en) * 2018-08-29 2019-01-04 南京邮电大学 Significance extracting method based on H.264 compression domain model with feature when sky
CN109379594A (en) * 2018-10-31 2019-02-22 北京佳讯飞鸿电气股份有限公司 Video coding compression method, device, equipment and medium
CN109862356A (en) * 2019-01-17 2019-06-07 中国科学院计算技术研究所 A video coding method and system based on region of interest
CN110572579A (en) * 2019-09-30 2019-12-13 联想(北京)有限公司 image processing method and device and electronic equipment
CN110784716A (en) * 2019-08-19 2020-02-11 腾讯科技(深圳)有限公司 Media data processing method, device and medium
CN111079567A (en) * 2019-11-28 2020-04-28 中科驭数(北京)科技有限公司 Sampling method, model generation method, video behavior identification method and device
WO2021093059A1 (en) * 2019-11-15 2021-05-20 网宿科技股份有限公司 Method, system and device for recognizing region of interest
WO2022127865A1 (en) * 2020-12-18 2022-06-23 中兴通讯股份有限公司 Video processing method, apparatus, electronic device, and storage medium
CN115550536A (en) * 2021-06-29 2022-12-30 Oppo广东移动通信有限公司 Image processing method, image processor and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11112973A (en) * 1997-10-01 1999-04-23 Matsushita Electric Ind Co Ltd Device and method for converting video signal
CN101640802A (en) * 2009-08-28 2010-02-03 北京工业大学 Video inter-frame compression coding method based on macroblock features and statistical properties
US20120020407A1 (en) * 2010-07-20 2012-01-26 Vixs Systems, Inc. Resource adaptive video encoding system with region detection and method for use therewith
CN102510496A (en) * 2011-10-14 2012-06-20 北京工业大学 Quick size reduction transcoding method based on region of interest
CN102740073A (en) * 2012-05-30 2012-10-17 华为技术有限公司 Coding method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11112973A (en) * 1997-10-01 1999-04-23 Matsushita Electric Ind Co Ltd Device and method for converting video signal
CN101640802A (en) * 2009-08-28 2010-02-03 北京工业大学 Video inter-frame compression coding method based on macroblock features and statistical properties
US20120020407A1 (en) * 2010-07-20 2012-01-26 Vixs Systems, Inc. Resource adaptive video encoding system with region detection and method for use therewith
CN102510496A (en) * 2011-10-14 2012-06-20 北京工业大学 Quick size reduction transcoding method based on region of interest
CN102740073A (en) * 2012-05-30 2012-10-17 华为技术有限公司 Coding method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘鹏宇 贾克斌: "视频感兴趣区域快速提取与编码算法", 《电路与系统学报》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104079934A (en) * 2014-07-14 2014-10-01 武汉大学 Method for extracting regions of interest in real-time video communication
US10313692B2 (en) 2015-01-20 2019-06-04 Beijing University Of Technology Visual perception characteristics-combining hierarchical video coding method
CN104539962A (en) * 2015-01-20 2015-04-22 北京工业大学 Layered video coding method fused with visual perception features
WO2016115968A1 (en) * 2015-01-20 2016-07-28 北京工业大学 Visual perception feature-fused scaled video coding method
CN104539962B (en) * 2015-01-20 2017-12-01 北京工业大学 It is a kind of merge visually-perceptible feature can scalable video coding method
CN106331711A (en) * 2016-08-26 2017-01-11 北京工业大学 A Dynamic Bit Rate Control Method Based on Network Features and Video Features
CN106331711B (en) * 2016-08-26 2019-07-05 北京工业大学 A kind of dynamic code rate control method based on network characterization and video features
CN107371029A (en) * 2017-06-28 2017-11-21 上海大学 Content-based Video Packet Priority Allocation Method
CN107371029B (en) * 2017-06-28 2020-10-30 上海大学 Content-based video packet priority allocation method
CN107563371A (en) * 2017-07-17 2018-01-09 大连理工大学 The method of News Search area-of-interest based on line laser striation
CN107563371B (en) * 2017-07-17 2020-04-07 大连理工大学 Method for dynamically searching interesting region based on line laser light strip
CN107483934A (en) * 2017-08-17 2017-12-15 西安万像电子科技有限公司 Decoding method, device and system
CN107623848A (en) * 2017-09-04 2018-01-23 浙江大华技术股份有限公司 A kind of method for video coding and device
CN107623848B (en) * 2017-09-04 2019-11-19 浙江大华技术股份有限公司 A kind of method for video coding and device
CN109151479A (en) * 2018-08-29 2019-01-04 南京邮电大学 Significance extracting method based on H.264 compression domain model with feature when sky
CN109379594A (en) * 2018-10-31 2019-02-22 北京佳讯飞鸿电气股份有限公司 Video coding compression method, device, equipment and medium
CN109862356A (en) * 2019-01-17 2019-06-07 中国科学院计算技术研究所 A video coding method and system based on region of interest
CN109862356B (en) * 2019-01-17 2020-11-10 中国科学院计算技术研究所 Video coding method and system based on region of interest
CN110784716A (en) * 2019-08-19 2020-02-11 腾讯科技(深圳)有限公司 Media data processing method, device and medium
CN110784716B (en) * 2019-08-19 2023-11-17 腾讯科技(深圳)有限公司 Media data processing method, device and medium
CN110572579A (en) * 2019-09-30 2019-12-13 联想(北京)有限公司 image processing method and device and electronic equipment
WO2021093059A1 (en) * 2019-11-15 2021-05-20 网宿科技股份有限公司 Method, system and device for recognizing region of interest
CN111079567A (en) * 2019-11-28 2020-04-28 中科驭数(北京)科技有限公司 Sampling method, model generation method, video behavior identification method and device
CN111079567B (en) * 2019-11-28 2020-11-13 中科驭数(北京)科技有限公司 Sampling method, model generation method, video behavior identification method and device
WO2022127865A1 (en) * 2020-12-18 2022-06-23 中兴通讯股份有限公司 Video processing method, apparatus, electronic device, and storage medium
CN115550536A (en) * 2021-06-29 2022-12-30 Oppo广东移动通信有限公司 Image processing method, image processor and electronic device

Also Published As

Publication number Publication date
CN103618900B (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN103618900B (en) Video area-of-interest exacting method based on coding information
Zhao et al. Real-time moving object segmentation and classification from HEVC compressed surveillance video
CN111670580B (en) Progressive compressed domain computer vision and deep learning system
WO2020097888A1 (en) Video processing method and apparatus, electronic device, and computer-readable storage medium
CN102158712B (en) Multi-viewpoint video signal coding method based on vision
CN101184221A (en) Video Coding Method Based on Visual Attention
CN101729891B (en) Method for encoding multi-view depth video
CN104065962B (en) The macroblock layer bit distribution optimization method that view-based access control model notes
Kong et al. Object-detection-based video compression for wireless surveillance systems
CN101937578A (en) A Color Image Rendering Method of Virtual Viewpoint
CN104796694A (en) Intraframe video encoding optimization method based on video texture information
CN103327327B (en) For the inter prediction encoding unit selection method of high-performance video coding HEVC
CN100593792C (en) A Text Tracking and Multi-Frame Enhancement Method in Video
WO2023005740A1 (en) Image encoding, decoding, reconstruction, and analysis methods, system, and electronic device
CN112001308A (en) Lightweight behavior identification method adopting video compression technology and skeleton features
CN111083477A (en) HEVC Optimization Algorithm Based on Visual Saliency
CN103561261B (en) The panoramic locatable video coded method that view-based access control model notes
CN107820095A (en) A kind of long term reference image-selecting method and device
CN103957420B (en) Comprehensive movement estimation modified algorithm of H.264 movement estimation code
CN116437102B (en) Can learn general video coding methods, systems, equipment and storage media
CN101917627B (en) Video fault-tolerant coding method based on self-adaptation flexible macro-block order
CN106604029B (en) A kind of bit rate control method of the moving region detection based on HEVC
CN112449182B (en) Video encoding method, device, equipment and storage medium
Ko et al. An energy-quality scalable wireless image sensor node for object-based video surveillance
WO2020227911A1 (en) Method for accelerating coding/decoding of hevc video sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240130

Address after: 073099 Room 309, 3rd Floor, Commercial and Residential Building B, Xinhai Science and Technology Plaza, East Side of Beimen Street and South Side of Beimen Street Market, Dingzhou City, Baoding City, Hebei Province

Patentee after: HEBEI HONGYI ENVIRONMENTAL PROTECTION TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 100124 No. 100 Chaoyang District Ping Tian Park, Beijing

Patentee before: Beijing University of Technology

Country or region before: China

TR01 Transfer of patent right