RU2493602C1

RU2493602C1 - Method and system for selecting key frames from video sequences

Info

Publication number: RU2493602C1
Application number: RU2012134258/08A
Authority: RU
Inventors: Екатерина Витальевна ТОЛСТАЯ; Сеунг-Хун ХАН
Original assignee: Корпорация "САМСУНГ ЭЛЕКТРОНИКС Ко., Лтд."
Priority date: 2012-08-10
Filing date: 2012-08-10
Publication date: 2013-09-20

Abstract

FIELD: information technology.

SUBSTANCE: invention discloses a method and a system for solving a specific task of converting video from monocular to stereoscopic and from black and white to colour, in semi-automatic mode. The method of selecting key frames and supplementing a video sequence with depth or colour information includes the following operations: obtaining data for initialising objects of each key object in each frame; detecting change of scene in an input video sequence and breaking the video sequence into scenes; for each scene, detecting data on activity of each object through a module for analysing video data and global movement (GM) data on all frames of the scene and storing said data in a video analysis result storage; wherein after processing the video scene, stored data on activity of each object are first analysed, key frames are selected, data on GM and key frames of the object are then analysed; key frames are extracted and output through the video data analysis unit; after which the video analysis result storage is cleared and then switched to the next scene of the input video sequence until reaching the end of the video sequence. The system consists of three basic parts: a video data analysis unit; a video analysis result storage; a video analysis result processing unit.

EFFECT: converting video from monocular to stereoscopic and from black and white to colour, in a semi-automatic mode.

21 cl, 7 dwg

Description

Заявляемое изобретение относится к технологии обработки видео, а более конкретно - к устройствам и способам для автоматического извлечения ключевых кадров из видео для дополнения видео с помощью оператора такой информацией, как назначение глубины для последующего преобразования монокулярной видео последовательности в стерео последовательность или для назначения цветовой информации для последующего преобразования черно-белой видео последовательности в цветную видео последовательность. The claimed invention relates to video processing technology, and more particularly, to devices and methods for automatically extracting key frames from a video to complement the video using an operator with information such as assigning depth for the subsequent conversion of a monocular video sequence into a stereo sequence or for assigning color information for subsequent conversion of the black and white video sequence into a color video sequence.

В последние годы разные производители активно разрабатывали стереоскопические устройства отображения (дисплеи), способные воспроизводить стерео изображения. Для получения ощущения трехмерности (стерео), требуется применение видео последовательности в особом формате. Этот формат включает видео, записанное с разных ракурсов, отдельно для левого и правого глаза. Для формирования изображений с различных ракурсов применяют большое число способов. Например, может использоваться съемка несколькими камерами, или видео на основе трехмерной модели.In recent years, various manufacturers have been actively developing stereoscopic display devices (displays) capable of reproducing stereo images. To get the feeling of three-dimensionality (stereo), the use of video sequences in a special format is required. This format includes video recorded from different angles, separately for the left and right eye. A large number of methods are used to form images from various angles. For example, shooting with multiple cameras, or video based on a three-dimensional model, can be used.

Большинство видео изображений, полученных в эпоху синематографа, предназначены для обычных монодисплеев. Чтобы получить ощущение трехмерности на основе старых кинопленок, необходимо преобразовать эти видео изображения в стереоскопическое видео. Это достигается за счет назначения карты глубины для каждого видео кадра, и получения левого и правого ракурсов на основе использования видео кадров и карты глубины.Most video images obtained in the era of cinematography are intended for ordinary monodisplays. To get a sense of three-dimensionality based on old films, you need to convert these video images into stereoscopic video. This is achieved by assigning a depth map for each video frame, and obtaining left and right angles based on the use of video frames and a depth map.

Системы для преобразования видео могут быть или полностью автоматическими, действующими без вмешательства оператора, или полуавтоматическими, где преобразование выполняется с участием оператора. В последнем варианте, оператор, как правило, выбирает ключевые кадры видео последовательности и вручную назначает (рисует) карты глубины, в ряде случаев используя некоторые специальные вспомогательные способы (см., например, патентную заявку США 2002/0048395)[1]. Такие карты глубины распространяют на остальные части видео кадров (такими, например, способами, как в американских патентных заявках 2010/0194856 [2] и 2009/0116732 [3]). Полуавтоматические системы обеспечивают намного более высокое качество, чем полностью автоматизированные системы.Systems for video conversion can be either fully automatic, operating without operator intervention, or semi-automatic, where the conversion is performed with the participation of the operator. In the latter case, the operator, as a rule, selects the key frames of the video sequence and manually assigns (draws) depth maps, in some cases using some special auxiliary methods (see, for example, US patent application 2002/0048395) [1]. Such depth maps extend to the rest of the video frames (for example, by methods such as in US patent applications 2010/0194856 [2] and 2009/0116732 [3]). Semi-automatic systems provide much higher quality than fully automated systems.

В последние годы многие старые черно-белые кинофильмы были преобразованы в цветные. В настоящее время потребителю хочется более полных ощущений от просмотра кинофильма, в том числе получения полноценной цветовой видео информации. Однако большое число кино шедевров записывались на черно-белой пленке, в связи с чем была утрачена цветовая информация. Восстановление такого драгоценного наследия в цвете является утомительной и дорогостоящей процедурой. Обычно пленку обрабатывают следующим образом: одному из кадров (ключевому кадру) в видео последовательности добавляют (назначают) информацию, касающуюся цвета (раскрашивание), и эту информацию далее распространяют на соседние кадры, как это описано, например, в патенте США 4755870 [4]. В этом изобретении, однако, такие ключевые кадры выбираются вручную.In recent years, many old black and white movies have been converted to color. Currently, consumers want a more complete sensation from watching a movie, including getting full color video information. However, a large number of cinema masterpieces were recorded on black and white film, in connection with which color information was lost. Restoring such a precious heritage in color is a tedious and expensive procedure. Typically, a film is processed as follows: color information (coloring) is added (assigned) to one of the frames (key frame) in the video sequence, and this information is then distributed to adjacent frames, as described, for example, in US Pat. No. 4,755,870 [4] . In this invention, however, such keyframes are manually selected.

Способ выбора ключевых кадров играет важную роль в конвертации видео последовательностей. The way key frames are selected plays an important role in converting video sequences.

В американской патентной заявке 2011/0110649 [5] ключевые кадры выбираются автоматически, но они скорее предназначены для видео резюмирования, то есть для выбора наиболее визуально представительных видео кадров. В американском патенте 7046731 [6] ключевые кадры выбирают автоматически на основе определения направления глобального движения и кластеризации глобального движения, но этот способ выбора ключевых кадров предназначен для эффективного представления и резюмирования видео. В американской патентной заявке 2007/0263128 [7] представлены методики, которые позволяют осуществлять адаптивный процесс извлечения ключевых кадров из видео. Процесс выбора включает оценку качества кадров для отбора кадров с наивысшим качеством на роль ключевых кадров. Ключевые кадры выбираются на основе значений энтропии, четкости и контрастности кадров. Однако в этом подходе рассматривается кадр в целом, т. е. не принимаются во внимание объекты в кадре. В американском патенте 7843512 [8] определяют скорость изменения содержимого кадра (по сравнению со смежным видео кадром), и выбирают видео кадр в качестве ключевого видео кадра, если скорость превышает некое пороговое значение. В американской патентной заявке [3] ключевые кадры выбираются автоматически на основе анализа поля движения объектов. В американском патенте 7158676 [9] ключевые кадры и объекты, представляющих интерес, выбираются вручную. Данные интерактивного содержимого встраиваются вместе с объектом, и объект отслеживается через всю последовательность кадров, и данные интерактивного содержимого встраиваются в каждый из кадров. В целом, современные технологии для выбора ключевых кадров предназначены для визуального резюмирования видео последовательностей, для лучшего визуального понимания. In U.S. Patent Application 2011/0110649 [5], keyframes are selected automatically, but rather, they are intended for video summarization, that is, for selecting the most visually representative video frames. In US patent 7046731 [6] key frames are selected automatically based on the determination of the direction of global movement and the clustering of global movement, but this method of selecting key frames is designed to effectively present and summarize the video. US Patent Application 2007/0263128 [7] provides techniques that allow for the adaptive process of extracting key frames from a video. The selection process includes evaluating the quality of personnel for selecting personnel with the highest quality for the role of key personnel. Key frames are selected based on the values of entropy, clarity and contrast of frames. However, in this approach, the frame as a whole is considered, i.e., objects in the frame are not taken into account. In US patent 7843512 [8] determine the rate of change of the contents of the frame (compared with the adjacent video frame), and select the video frame as a key video frame if the speed exceeds a certain threshold value. In the US patent application [3] key frames are selected automatically based on the analysis of the field of motion of objects. In US Pat. No. 7,158,676 [9], key frames and objects of interest are manually selected. Interactive content data is embedded with the object, and the object is tracked through the entire sequence of frames, and interactive content data is embedded in each of the frames. In general, modern technologies for choosing key frames are intended for visual summarization of video sequences, for better visual understanding.

Американский патент [9] и американскую патентную заявку [3] можно рассматривать в качестве прототипов.The American patent [9] and the American patent application [3] can be considered as prototypes.

Заявляемый способ предназначен для решения конкретной задачи преобразования видео из монокулярного в стереоскопическое и из черно-белого - в цветное в полуавтоматическом режиме. Заявляемый способ учитывает конкретные характеристики видео последовательностей, такие, как данные о главных персонажах (ключевых кадрах) видеопоследовательности. Это позволяет добиться более целенаправленного выбора видео кадров. The inventive method is intended to solve the specific problem of converting video from monocular to stereoscopic and from black and white to color in semi-automatic mode. The inventive method takes into account the specific characteristics of the video sequences, such as data on the main characters (key frames) of the video sequence. This allows for a more focused selection of video frames.

Технический результат достигается за счет разработки усовершенствованного способа выделения ключевого кадра в процессе полуавтоматического дополнения видео последовательности информацией о глубине или цвете, причем заявленный способ предусматривает выполнение следующих операций:The technical result is achieved through the development of an improved method for highlighting a key frame during the semi-automatic completion of a video sequence with information about depth or color, and the claimed method involves the following operations:

• выявляют смены сцены во входной видео последовательности и осуществляют разбиение видео последовательности на сцены; • identify scene changes in the input video sequence and split the video sequence into scenes;

• для каждой сцены на модуль анализа видеоданных подают данные об инициализации объектов, выявляют действия каждого объекта и данные GM (глобального движения) на всех кадрах сцены и записывают в накопителе результаты видеоанализа;• for each scene, the video analysis module provides data on the initialization of objects, reveals the actions of each object and GM (global motion) data on all frames of the scene and records the results of the video analysis in the drive;

• после обработки всех кадров видео сцены с помощью модуля анализа видеоданных сначала анализируют сохраненные данные о деятельности каждого объекта, извлекают ключевые кадры объекта, затем анализируют данные GM и ключевые кадры объекта, извлекают и выводят ключевые кадры; после чего накопитель результатов видеоанализа очищают и загружают модуль анализа видеоданных следующей сценой входной видео последовательности. • after processing all the frames of the video scene using the video analysis module, first analyze the stored data on the activity of each object, extract the key frames of the object, then analyze the GM data and key frames of the object, extract and output key frames; then the video analysis results accumulator is cleaned and the video data analysis module is loaded with the next scene of the input video sequence.

Как правило, ключевые кадры выбирают для резюмирования видео визуальной информации. Новизна заявляемого изобретения подтверждается следующими отличительными признаками:Typically, key frames are selected to summarize video visual information. The novelty of the claimed invention is confirmed by the following distinctive features:

• ключевые кадры выбирают на основе вводимой пользователем информации о содержимом сцены;• key frames are selected based on information entered by the user about the contents of the scene;

• каждый ключевой объект выбирают на одном или нескольких кадрах;• each key object is selected on one or more frames;

• каждый ключевой объект отслеживают во всех кадрах в вырезанной сцене;• each key object is tracked in all frames in the cut scene;

• оценивают качество участков кадра, содержащих ключевой объект;• evaluate the quality of frame sections containing a key object;

• анализируют траекторию движения каждого ключевого объекта;• analyze the trajectory of each key object;

• выбирают ключевые кадры на основе анализа участков кадров и/или точек качества траектории. • select key frames based on the analysis of frame sections and / or path quality points.

В заявляемом способе данные для инициализации объекта включают в себя координаты RBB, а именно координаты {x, y} левого верхнего угла, ширина и высота, кадр для этого RBB f ₀ и два кадра f _begin и f _end для отслеживания промежутка RBB между ними таким образом, что

и

.In the inventive method, the data for initializing the object includes the coordinates of the RBB, namely the coordinates {x, y} of the upper left corner, width and height, a frame for this RBB f ₀ and two frames f _begin and f _end for tracking the gap RBB between them so way that

and

.

В заявляемом способе для выявления действий объекта RBB объекта отслеживают по всем кадрам в видео сцене, а именно, определяют координаты RBB в видео кадре таким образом, чтобы сравнение областей видео кадров, содержащихся внутри RBB, на текущем и на следующем кадре, давало максимальное значение в терминах предопределенных метрик, и вычисляют параметры объекта, представляющие собой особенности изображения области видео кадра, содержащейся внутри RBB.In the inventive method for detecting the actions of an object, the RBB object is monitored for all frames in the video scene, namely, the coordinates of the RBB in the video frame are determined so that the comparison of the areas of the video frames contained inside the RBB on the current and the next frame gives the maximum value in in terms of predetermined metrics, and object parameters are calculated, which are image features of the video frame region contained within the RBB.

В заявляемом способе дополнительные ключевые кадры выбирают в промежутке между ключевыми кадрами объектов, основываясь на анализе данных глобального движения (GM).In the inventive method, additional key frames are selected in the interval between key frames of objects, based on the analysis of global motion data (GM).

В заявляемом способе отслеживают RBB из кадра f ₀ инициализации объекта вперед, из кадра f ₀ до кадра f _end , после чего в обратном направлении, из кадра f ₀ до кадра f _begin , для получения отслеженных координат RBB.In the inventive method, the RBB is monitored from the object initialization frame f ₀ forward, from the f ₀ frame to the f _end frame, and then in the opposite direction, from the f ₀ frame to the f _begin frame, to obtain tracked RBB coordinates.

В заявляемом способе в модуле обработки результатов видеоанализа сравнивают аккумулированные координаты RBB и параметры объекта и выводят ряд ключевых кадров KFo:In the claimed method, the accumulated RBB coordinates and object parameters are compared in the video analysis results processing module and a number of key frames KFo are output:

,

где f _start и f _finish - последовательные кадры сцены, для которой вычисляют данные о действиях объекта, where f _start and f _finish are sequential frames of the scene for which data on the actions of the object are calculated,

где ƒ₁ = max(ƒ-T, ƒ_start), ƒ₂ = min (ƒ+T, ƒ_finish), T - предопределенный порог,where ƒ ₁ = max (ƒ-T, ƒ _start ), ƒ ₂ = min (ƒ + T, ƒ _finish ), T is a predefined threshold,

.

В заявляемом способе в модуле обработки результатов видеоанализа анализируют кривизну траектории объекта

в видео кадрах и получают на выходе ряд ключевых кадров KFoIn the inventive method in the module for processing the results of the video analysis analyze the curvature of the trajectory of the object

in video frames and get a number of key frames KFo

$k_{f} = \frac{| (x^{f + 1} - x^{f}) (y^{f + 1} - 2 y^{f} + y^{f - 1}) - (y^{f + 1} - y^{f}) (x^{f + 1} - 2 x^{f} + x^{f - 1}) |}{{[{(x^{f + 1} - x^{f})}^{2} + {(y^{f + 1} - y^{f})}^{2}]}^{\frac{3}{2}}}, f = f_{s t a r t} + 1... f_{f i n i s h} - 1,$

k_{f} = \frac{| (x^{f + one} - x^{f}) (y^{f + one} - 2 y^{f} + y^{f - one}) - (y^{f + one} - y^{f}) (x^{f + one} - 2 x^{f} + x^{f - one}) |}{{[{(x^{f + one} - x^{f})}^{2} + {(y^{f + one} - y^{f})}^{2}]}^{\frac{3}{2}}}, f = f_{s t a r t} + one... f_{f i n i s h} - one,

где ƒ₁ = max(ƒ-T, ƒ_start+1), ƒ₂ = min(ƒ+T, ƒ_finish-1), T - предопределенный порог,

.Where ƒ_one = max (ƒ-T, ƒ_start+1), ƒ₂ = min (ƒ + T, ƒ_finish-1), T is a predetermined threshold,

.

В заявляемом способе параметры объектов включают оценку резкости и контрастности.In the inventive method, the parameters of the objects include an assessment of sharpness and contrast.

В заявляемом способе анализируют параметры объекта, такие как контраст C и резкость S, вычисляют функции F и F' C, и S, а T₁ и T₂ являются предопределенными реальными значениями:In the inventive method, the parameters of the object are analyzed, such as contrast C and sharpness S , the functions F and F 'C , and S are calculated, and T ₁ and T ₂ are predefined real values:

,

где ƒ₁ = max(ƒ-T, ƒ_start+1), ƒ₂ = min(ƒ+T, ƒ_finish-1), T - предопределенный порогWhere ƒ_one = max (ƒ-T, ƒ_start+1), ƒ₂ = min (ƒ + T, ƒ_finish-1), T - predefined threshold

.

Кроме того, в заявленном изобретении предлагается система для реализации способа выделения ключевых кадров в процессе полуавтоматического дополнения видео последовательности информацией о глубине или цвете, отличающаяся тем, что состоит из трех основных частей: In addition, the claimed invention proposes a system for implementing the method of highlighting key frames in the process of semi-automatic addition of a video sequence with information about depth or color, characterized in that it consists of three main parts:

модуля анализа видеоданных, выполненного с возможностью извлечения данных из входного видео потока и получения множества инициализационных данных для ключевых объектов в видео через устройство разметки видео;a video data analysis module configured to extract data from the input video stream and obtain a plurality of initialization data for key objects in the video through the video marking device;

накопителя результатов видеоанализа, выполненного с возможностью запоминания аккумулированных данных видеоанализа, обнаруженных модулем анализа; a video analysis result storage device configured to store accumulated video analysis data detected by the analysis module;

и модуля обработки результатов видеоанализа, выполненного с возможностью проведения анализа аккумулированных данных.and a video analysis result processing module configured to analyze accumulated data.

В заявляемой системе модуль анализа видеоданных выполнен с возможностью получения множества координат прямоугольных рамок (RBB), индексов кадра и числа кадров, через устройство разметки видео.In the inventive system, the video data analysis module is configured to obtain a plurality of coordinates of rectangular frames (RBB), frame indices and number of frames through a video marking device.

В заявляемой системе модуль анализа видеоданных включает детектор смены кадров, детектор данных глобального движения, детектор действий объекта. In the inventive system, the video analysis module includes a frame change detector, a global motion data detector, an object action detector.

В заявляемой системе накопитель результатов видеоанализа состоит из аккумулятора данных о действиях объекта, запоминающего аккумулированные данные о действиях объекта, и аккумулятора данных глобального движения, запоминающего параметры, описывающие относительное смещение статичных объектов в двух последовательных видео кадрах.In the inventive system, the video analysis result storage device consists of an accumulator of data on the actions of an object storing accumulated data on the actions of an object and a data accumulator of global motion data storing parameters describing the relative displacement of static objects in two consecutive video frames.

В заявляемой системе модуль обработки результатов видеоанализа включает в себя выделитель данных, детектор ключевых кадров объекта и детектор ключевых кадров.In the inventive system, the video analysis result processing module includes a data extractor, a key frame detector of an object, and a key frame detector.

Для лучшего понимания сущности заявляемого изобретения ниже приводится подробное описание с чертежами. For a better understanding of the essence of the claimed invention, the following is a detailed description with drawings.

Фиг. 1. Описывает основные части системы, с помощью которой реализуется способ для выделения ключевых кадров. FIG. 1. Describes the main parts of the system with which a method is implemented for highlighting key frames.

Фиг. 2. Иллюстрирует основные шаги видео анализа для выбора ключевых кадров. FIG. 2. Illustrates the basic steps of video analysis for selecting key frames.

Фиг. 3. Объясняет в подробностях этап 203, касающийся обнаружения данных видео анализа в видео сцене. FIG. 3. Explains in detail step 203 regarding the detection of video analysis data in a video scene.

Фиг. 4. Иллюстрирует процесс выполнения анализа видео последовательности. FIG. 4. Illustrates the process of performing video sequence analysis.

Фиг. 5. Объясняет в подробностях этап 303, касающийся обработки конкретного видео кадра и извлечения данных о действиях объекта во время прохождения через видео кадр «вперед». FIG. 5. Explains in detail step 303 regarding the processing of a particular video frame and the extraction of data about the actions of the object while passing through the video frame "forward".

Фиг. 6. Объясняет в подробностях этап 308, касающийся обработки конкретного видео кадра и извлечения данных о действиях объекта во время прохода через видео кадр «назад». FIG. 6. Explains in detail step 308 regarding the processing of a particular video frame and the extraction of data about the actions of the object during the passage through the video frame "back".

Фиг. 7. Объясняет в подробностях этап 204, касающийся обработки данных видео анализа, извлечения ключевых кадров и очистки накопителя. FIG. 7. Explains in detail step 204 relating to the processing of video analysis data, extracting key frames and cleaning the drive.

Система, реализующая способ для выделения ключевых кадров в процессе дополнения монокулярного видео глубиной (преобразование из 2D в 3D) или черно-белого видео цветом состоит из трех основных частей: модуль анализа видеоданных, который извлекает данные из входного видео потока и который может получить множество данных для инициализации для ключевых объектов в видео через устройство разметки видео; накопитель результатов видеоанализа, который запоминает аккумулированные данные видеоанализа, обнаруженные модулем анализа; и модуль обработки результатов видеоанализа, который выполняет анализ аккумулированных данных. A system that implements a method for highlighting key frames in the process of complementing a monocular video with depth (conversion from 2D to 3D) or black and white video with color consists of three main parts: a video data analysis module that extracts data from an input video stream and which can receive a lot of data to initialize for key objects in a video through a video markup device; a video analysis result storage device that stores accumulated video analysis data detected by the analysis module; and a video analysis result processing module that analyzes the accumulated data.

Модуль анализа видеоданных может получать множество координат прямоугольных рамок (RBB) и кадров от устройства разметки видео. Накопитель результатов видеоанализа сохраняет данные о деятельности объекта в аккумуляторе данных о деятельности объекта, сохраняет данные глобального движения (GM данные) (параметры, описывающие относительное смещение неподвижных объектов в двух последовательных видео кадрах) в аккумуляторе данных глобального движения. Модуль обработки результатов видеоанализа анализирует аккумулированные данные и выделяет ключевые кадры для дальнейшего дополнения видео с помощью оператора.The video analysis module may receive a plurality of rectangular frame coordinates (RBBs) and frames from the video marking device. The video analysis results accumulator stores data on the object’s activity in the object’s activity data accumulator, stores global motion data (GM data) (parameters describing the relative displacement of stationary objects in two consecutive video frames) in the global motion data accumulator. The video analysis results processing module analyzes the accumulated data and selects key frames for further video additions using the operator.

Способ выделения ключевых кадров в процессе полуавтоматического дополнения видео включает следующие шаги:The way to highlight key frames in the process of semi-automatic video addition includes the following steps:

получают данные для инициализации объектов каждого ключевого объекта в каждом кадре;receive data to initialize the objects of each key object in each frame;

выявляют смены сцен во входной видео последовательности и разбивают видео последовательность на сцены; detecting scene changes in the input video sequence and breaking the video sequence into scenes;

для каждой сцены выявляют данные о деятельности каждого объекта посредством модуля анализа видеоданных и данных GM по всем кадрам сцены и сохраняют эти данные в накопителе результатов видеоанализа;for each scene, data on the activity of each object is detected by means of the video and GM data analysis module for all frames of the scene and these data are stored in the video analysis results store;

после обработки видео сцены, сначала анализируют сохраненные данные о деятельности каждого объекта, выделяют ключевые кадры, затем анализируют данные GM и ключевых кадров объекта, и извлекают и выводят ключевые кадры посредством модуля анализа видеоданных; после чего очищают накопитель результатов видеоанализа и переходят к следующей сцене входной видео последовательности до достижения конца видео последовательности. after processing the video scene, first, the stored data on the activity of each object is analyzed, key frames are extracted, then the data of GM and key frames of the object are analyzed, and key frames are extracted and output via the video analysis module; after which they clear the video analysis results store and go to the next scene of the input video sequence until the end of the video sequence is reached.

Фиг. 1 показывает схему функционирования основных компонентов системы, которая осуществляет заявляемый способ. Входное видео обрабатывается Модулем 101 анализа видеоданных, который получает данные для инициализации каждого объекта, который необходимо принять во внимание. FIG. 1 shows a diagram of the functioning of the main components of a system that implements the inventive method. The input video is processed by the Video Analysis Module 101 , which receives data to initialize each object that needs to be taken into account.

Данные для инициализации (идентификатор) конкретного объекта k даны для одного или нескольких видео кадров как {k, RBB, ƒ₀, ƒ_begin, ƒ_end}, где: The initialization data (identifier) of a specific object k is given for one or several video frames as { k , RBB, ƒ ₀ , ƒ _begin , ƒ _end }, where:

k - индекс объекта, k is the index of the object,

RBB - прямоугольная рамка, отмечающая позицию объекта в видео кадре,RBB - a rectangular frame marking the position of an object in a video frame,

ƒ₀ - кадр, в котором объект отмечен,ƒ ₀ - the frame in which the object is marked,

ƒ_begin - кадр, от которого начинают вычислять данные о действиях объекта (отслеживание объекта), ƒ _begin - a frame from which they begin to calculate data about the actions of the object (tracking the object),

ƒ_end - кадр, на котором заканчивают вычисление данных о действиях объекта (отслеживание объекта).ƒ _end - the frame on which the calculation of data on the actions of the object (tracking of the object) is completed.

Такая информация предоставляется оператором вместе с видео последовательностью посредством устройства 100 разметки видео, который может включать, по меньшей мере, устройство отображения (дисплей) и манипулятор, такой как компьютерная мышь. Число объектов оставляют на усмотрение оператора. Модуль 101 анализа видеоданных в процессе обработки видео последовательности дает на выходе данные видеоанализа, такие как действия объекта и данные о глобальном движении (GM). Данные GM включают параметры, описывающие относительное смещение неподвижных (статичных) объектов в двух последовательных видео кадрах. Данные видео анализа аккумулируются в накопителе 102 результатов видеоанализа. Когда модуль 101 анализа видеоданных обнаруживает смену сцены, модуль 108 обработки результатов видеоанализа выделяет и анализирует аккумулированные данные и дает на выходе выявленные ключевые кадры. Модуль 101 анализа видеоданных включает детектор 103 смены кадров, детектор 104 данных GM и детектор 105 действий объекта. Накопитель 102 результатов видеоанализа сохраняет данные о действиях объекта в аккумуляторе 107 данных о действиях объекта и данные GM в аккумуляторе 106 данных глобального движения. Модуль 108 обработки результатов видеоанализа включает в себя выделитель 111 данных, детектор 110 ключевых кадров объекта и детектор 109 ключевых кадров. Such information is provided by the operator along with the video sequence through the video marking device 100 , which may include at least a display device (display) and a pointing device, such as a computer mouse. The number of objects is left to the discretion of the operator. The video analysis module 101 during video sequence processing outputs video analysis data, such as object actions and global motion data (GM). GM data includes parameters describing the relative displacement of stationary (static) objects in two consecutive video frames. The video analysis data is accumulated in the drive 102 of the results of the video analysis. When the video data analysis module 101 detects a scene change, the video analysis result processing module 108 extracts and analyzes the accumulated data and outputs the identified key frames. The video analysis module 101 includes a frame change detector 103 , a GM data detector 104 , and an object action detector 105 . The drive 102 of the results of the video analysis stores data about the actions of the object in the battery 107 data about the actions of the object and the data GM in the battery 106 of the data of global motion. Video analysis result processing module 108 includes a data extractor 111 , an object key frame detector 110 and a key frame detector 109 .

Все компоненты данной системы могут быть изготовлены путем применения современных систем проектирования электрических цепей, дополненных исполняемыми программами. All components of this system can be manufactured by using modern systems for designing electrical circuits, supplemented by executable programs.

Фиг. 2 описывает основные шаги заявляемого способа. На шаге 201 данные для инициализации вводят для всех объектов. Видео обработка начинается с анализа видео сцена за сценой начиная с первого кадра 202. В каждой видео сцене выявляют данные видеоанализа и заносят их в накопитель результатов видеоанализа, шаг 203. После завершения анализа сцены данные видеоанализа обрабатываются, выделяются ключевые кадры и накопитель результатов видеоанализа очищается, шаг 204. Затем, проверяют выполнение условия 206, в случае если последний обработанный кадр не был последним кадром видео последовательности, процесс переходит к следующему видео кадру, шаг 205. В ином случае обработка продолжается до завершения на шаге 207. FIG. 2 describes the main steps of the proposed method. At step 201, initialization data is entered for all objects. Video processing begins with the analysis of the video scene by scene starting from the first frame 202 . In each video scene, the video analysis data is detected and entered into the video analysis results store, step 203 . After the scene analysis is completed, the video analysis data is processed, key frames are highlighted, and the video analysis results store is cleared, step 204 . Then, condition 206 is checked, if the last processed frame was not the last frame of the video sequence, the process proceeds to the next video frame, step 205 . Otherwise, processing continues to completion at step 207 .

Фиг. 3 объясняет в подробностях процесс обнаружения данных видеоанализа во время обработки сцены, шаг 203. Начиная с первого кадра сцены (CSFI, т. е. кадр текущей сцены, = 0), сцена обрабатывается кадр за кадром, шаг 301. Два прохода выполняют по сцене (см. также Фиг. 4). Сначала сцена обрабатывается в прямом направлении («вперед»), то есть с увеличением CSFI. После достижения конца сцены кадры обрабатываются в обратном порядке («назад»), то есть с CSFI. В каждом кадре извлекаются данные GM, шаг 302. Как было упомянуто выше, данные GM включают параметры, описывающие относительное смещение неподвижных (статичных) объектов в двух последовательных видео кадрах. Есть много способов, известных из уровня техники, для обнаружения такого смещения, например, в американском патенте 7312819 [10]. На шаге 303 данные о действиях объекта извлекают из видео кадра, подробности этого процесса объяснены ниже, на Фиг. 5. На шаге 305 CSFI увеличивается на 1. В случае если условие 306 не выполняется, процесс продолжается в отношении следующего кадра, шаг 304. Условие 306 проверяют с помощью детектора смены сцен, то есть сравнивая текущий кадр со следующим кадром. Есть много способов, известных из уровня техники, для выявления смены сцен, например, в американском патенте 7123769 [11]. В случае если обнаружена смена сцены, текущий кадр определяют как последний кадр сцены. Потом данная сцена анализируется в обратном направлении («назад») начиная с последнего кадра сцены, шаг 307. На шаге 308 извлекают данные о действиях объекта из видео кадра, подробности этого процесса объяснены ниже, на Фиг. 6. Затем CSFI уменьшают на 1, шаг 309. Когда достигается первый кадр сцены, CSFI = 0, то есть тот же самый кадр в видео, как на шаге 301, и обработка видео последовательности продолжается с последнего кадра сцены, как на шаге 307. В ином случае, если условие на шаге 311 не соблюдено, обрабатывают следующий кадр, шаг 310. На шаге 312 индекс текущего видео кадра устанавливается на последнем кадре сцены для того, чтобы начать обработку следующей сцены (шаги 205 и далее). FIG. 3 explains in detail the process of detecting video analysis data during scene processing, step 203 . Starting from the first frame of the scene (CSFI, that is, the frame of the current scene, = 0), the scene is processed frame by frame, step 301. Two passes are performed through the scene (see also Fig. 4). First, the scene is processed in the forward direction (“forward”), that is, with an increase in CSFI. After reaching the end of the scene, frames are processed in reverse order (“back”), that is, with CSFI. In each frame, GM data is retrieved, step 302 . As mentioned above, GM data includes parameters describing the relative displacement of stationary (static) objects in two consecutive video frames. There are many methods known in the art for detecting such a bias, for example, in US Pat. No. 7,312,819 [10]. At step 303, data about the actions of the object is extracted from the video frame, the details of this process are explained below, in FIG. 5. At step 305, the CSFI is increased by 1. If condition 306 is not satisfied, the process continues with respect to the next frame, step 304 . Condition 306 is checked using a scene change detector, that is, comparing the current frame with the next frame. There are many methods known in the art for detecting scene changes, for example, in US patent 7123769 [11]. If a scene change is detected, the current frame is determined as the last frame of the scene. Then this scene is analyzed in the opposite direction (“back”) starting from the last frame of the scene, step 307 . At step 308 , data about the actions of the object is extracted from the video frame, the details of this process are explained below, in FIG. 6. Then, CSFI is reduced by 1, step 309 . When the first frame of the scene is reached, CSFI = 0, that is, the same frame in the video as in step 301 , and the processing of the video sequence continues from the last frame of the scene, as in step 307 . Otherwise, if the condition in step 311 is not met, the next frame is processed, step 310 . At step 312, the index of the current video frame is set on the last frame of the scene in order to start processing the next scene (steps 205 onwards).

Фиг. 5 описывает процесс обнаружения данных о действиях объекта. Процесс выполняется в отношении конкретного кадра в сцене, с индексом CSFI, индексом кадра текущей сцены, шаг 501, для всех объектов начиная с объекта с индексом k=0, шаг 502. Данные о действиях объекта могут состоять из, но не ограничиваются, {k, {RBB_f }|ƒ, ƒ₀, ƒ_begin, ƒ_end, {features_ƒ}}, где: FIG. 5 describes a process for detecting activity data of an object. The process is performed on a specific frame in the scene, with the CSFI index, the frame index of the current scene, step 501 , for all objects starting from the object with index k = 0, step 502 . Data on the actions of an object may consist of, but are not limited to, { k , {RBB _f } | ƒ, ƒ ₀ , ƒ _begin , ƒ _end , {features _ƒ }}, where:

k - индекс объекта, k is the index of the object,

{RBB_ƒ} - множество прямоугольных рамок, отмечающих положение (позицию) объекта в видео кадре с индексом ƒ, 0 <ƒ <N, где N - индекс последнего кадра в сцене.{RBB _ƒ } - a set of rectangular frames marking the position (position) of the object in the video frame with the index ƒ, 0 <ƒ <N, where N is the index of the last frame in the scene.

ƒ₀ - кадр, в котором отмечен объект,ƒ ₀ - the frame in which the object is marked,

ƒ_begin - кадр, с которого начинают вычисление данных о действиях объекта (отслеживание объекта), ƒ _begin - the frame from which the calculation of data on the actions of the object begins (tracking the object),

ƒ_end - кадр, в котором завершают вычисление данных о действиях объекта (отслеживание объекта),ƒ _end - a frame in which the calculation of data on the actions of the object is completed (object tracking),

{features_ƒ} - набор параметров, связанных с кадром ƒ и RBB_ƒ, таких как особенности изображения, вычисленного для заплаты изображения, содержавшейся в соответствующей RBB_ƒ. Параметры изображения могут включать резкость, контраст, или тому подобные.{features _ƒ } - a set of parameters associated with frame ƒ and RBB _ƒ , such as features of the image calculated to patch the image contained in the corresponding RBB _ƒ . Image options may include sharpness, contrast, or the like.

RBB = {x, y, w, h}, где {x, y} - координаты верхнего левого угла, w и h - ширина и высота соответственно.RBB = {x, y, w, h}, where {x, y} are the coordinates of the upper left corner, w and h are the width and height, respectively.

На шаге 503 инициализационные данные (ID) анализируют на предмет того, присутствуют ли там данные с ƒ_0, равным CSFI, и индексом объекта, равным k. В случае если эти данные присутствуют, объект инициализируют, то есть его RBB инициализируют на основе RBB данных для инициализации, шаг 504, и затем параметры объекта вычисляются, шаг 505 (будет объяснено ниже). В случае если условие 503 не выполняется, шаги 504 и 505 пропускают. На шаге 506 проверяют выполнение условия, находится ли кадр текущей сцены (CSFI) между ƒ₀ объекта и ƒ_endобъекта. В случае положительного результата на шаге 507 вычисляют координаты объекта RBB, используя координаты этого RBB на предыдущем кадре, CSFI-1. Есть много способов, известных из уровня техники, для вычисления таких координат, например, американские патенты 5099324 [12] и 7620204 [13]. На шаге 505 вычисляют параметры области изображения, содержащей объект, для оценки параметров объекта. На шаге 508 проверяют, имеются ли еще необработанные объекты. В случае если таких объектов больше нет, процесс завершают, шаг 510. В ином случае индекс k объекта увеличивают на 1 (шаг 509) и шаги 503 и последующие повторяют. At step 503, the initialization data (ID) is analyzed to determine if there is data there with ƒ ₀ equal to CSFI and an object index equal to k . If this data is present, the object is initialized, that is, its RBB is initialized based on RBB data for initialization, step 504 , and then the object parameters are calculated, step 505 (to be explained below). If condition 503 is not satisfied, steps 504 and 505 are skipped. At step 506 , the condition is checked whether the frame of the current scene (CSFI) is between ƒ _{0 of the} object and ƒ _{end of the} object. If the result is positive, the coordinates of the RBB object are calculated in step 507 using the coordinates of this RBB in the previous frame, CSFI-1. There are many methods known in the art for calculating such coordinates, for example, US patents 5099324 [12] and 7620204 [13]. At step 505, the parameters of the image area containing the object are calculated to evaluate the parameters of the object. At step 508, it is checked whether there are still unprocessed objects. If there are no more such objects, the process is completed, step 510 . Otherwise, the object index k is increased by 1 (step 509 ) and steps 503 and subsequent are repeated.

Следующие параметры вычисляют на шаге 505: контраст и резкость. Контраст вычисляют как разность между максимальными и минимальными значениями в области изображения, содержащейся в RBB:The following parameters are calculated in step 505 : contrast and sharpness. The contrast is calculated as the difference between the maximum and minimum values in the image area contained in the RBB:

Резкость S вычисляют, например, так, как это описано в статье Safonov, I. V.; Rychagov, М. N.; Kang, KiMin; Ким, Sang Ho; “Adaptive sharpening of photos”, Proceedings of the SPIE, Volume 6807, pp. 68070U-68070U-12 (2008) [14].The sharpness S is calculated, for example, as described in the article Safonov, I. V .; Rychagov, M. N .; Kang, KiMin; Kim, Sang Ho; “Adaptive sharpening of photos”, Proceedings of the SPIE, Volume 6807, pp. 68070U-68070U-12 (2008) [14].

Фиг. 6 описывает в подробностях процесс выявления данных о действиях объекта во время обратного прохода («назад») через кадр, шаг 308. Процесс выполняют для конкретного кадра в сцене, с индексом CSFI, индекс кадра текущей сцены, шаг 601, для всех объектов начиная с объекта с индексом k=0, шаг 602. Если CSFI оказывается между ƒ_begin текущего объекта и ƒ₀ текущего объекта, то координаты RBB объекта вычисляют используя координаты этого RBB на предыдущем обработанном кадре, CSFI+1 (поскольку это обратный проход) таким же способом, как это было сделано на шаге 507. На шаге 605 параметры объекта вычисляют тем же способом, как это было сделано на шаге 505. В случае если текущий объект является последним (условие 606), процесс заканчивается, в ином случае он продолжается в отношении следующего объекта, шаг 607, и индекс объекта увеличивают на 1. FIG. 6 describes in detail the process of identifying data about the actions of an object during a return pass ("backward") through the frame, step308. The process is performed for a specific frame in the scene, with a CSFI index, the frame index of the current scene, step601, for all objects starting with an object with an indexk= 0, step602. If CSFI is between ƒ_begin current object and ƒ₀ of the current object, then the coordinates of the RBB of the object are calculated using the coordinates of this RBB on the previous processed frame, CSFI + 1 (since this is a return pass) in the same way as it was done in step507. On the step605 object parameters are calculated in the same way as in step505. If the current object is the last (condition606), the process ends, otherwise it continues with respect to the next object, step607, and the index of the object is increased by 1.

Фиг. 7 объясняет более детально шаг 204, касающийся обработки данных видеоанализа, выделения ключевых кадров и очистки накопителя. На шаге 701 выбирают набор ключевых кадров KFo исходя из выявленных данных о действиях объекта, подробности этого процесса объяснены далее. На шаге 702 вводят дополнительные ключевые кадры, если необходимо, между кадрами, которые присутствуют в наборе KFo. Из уровня техники известны способы для этого, например, S. V. Porter, M. Mirmehdi, and B. T. Thomas. A shortest path representation for video summarization. In Proc. of 12th ICIAP, pp. 460-465. IEEE Comp. Soc., Sept. 2003 [15], где данные глобального движения используют для выявления ключевых кадров. Полученный набор ключевых кадров подается на выход, и накопитель результатов видеоанализа очищают для использования при дальнейшей обработке следующей видео сцены, шаг 703. Набор ключевых кадров KFo выбирают на основе данных о действиях обнаруженного объекта, шаг 701, следующим образом. В одном из предложенных вариантов реализации заявляемого изобретения ключевые кадры выбирают на основе скорости движения объекта. Модуль обработки результатов видеоанализа сравнивает аккумулированные координаты RBB и параметры объекта. FIG. 7 explains in more detail step 204 regarding processing of video analysis data, extracting key frames, and cleaning the drive. At step 701 , a set of KFo key frames is selected based on the detected data on the object’s actions, details of this process are explained below. At 702, additional key frames are inserted, if necessary, between frames that are present in the KFo set. Methods for this are known in the art, for example, SV Porter, M. Mirmehdi, and BT Thomas. A shortest path representation for video summarization. In Proc. of 12th ICIAP, pp. 460-465. IEEE Comp. Soc., Sept. 2003 [15], where global movement data is used to identify key personnel. The resulting set of key frames is output, and the video analysis results store is cleaned for use in further processing of the next video scene, step 703 . The set of KFo key frames is selected based on the actions of the detected object, step 701, as follows. In one of the proposed embodiments of the claimed invention, key frames are selected based on the speed of the object. The video analysis results processing module compares the accumulated RBB coordinates and object parameters.

,

где ƒ_start и ƒ_finish - последовательные кадры сцены, для которых вычисляют данные о действиях объекта. Пусть T является заранее заданным порогом,where ƒ _start and ƒ _finish are sequential scene frames for which data on the actions of the object are calculated. Let T be a predetermined threshold,

0<T<

0 <T <

где ƒ₁ = max(ƒ-T, ƒ_start), ƒ₂ = min(ƒ+T, ƒ_finish)

Where ƒ_one = max (ƒ-T, ƒ_start), ƒ₂ = min (ƒ + T, ƒ_finish)

В другом варианте реализации заявляемого изобретения выполняют анализ кривизны траектории объекта

в видео кадрах.In another embodiment, the implementation of the claimed invention perform an analysis of the curvature of the trajectory of the object

in video frames.

где ƒ₁ = max(ƒ-T, ƒ_start+1), ƒ₂ = min(ƒ+T, ƒ_finish-1)

Where ƒ_one = max (ƒ-T, ƒ_start+1), ƒ₂ = min (ƒ + T, ƒ_finish-one)

В еще одном варианте реализации заявленного изобретения проводят анализ параметров изображения, таких, как контраст C и резкость S. Пусть F является функцией C и S, а T₁ и T₂ являются предопределенными реальными значениями.In yet another embodiment of the claimed invention, image parameters are analyzed, such as contrast C and sharpness S. Let F be a function of C and S, and T ₁ and T ₂ are predefined real values.

,

где ƒ₁ = max(ƒ-T, ƒ_start+1), ƒ₂ = min(ƒ+T, ƒ_finish-1),

Where ƒ_one = max (ƒ-T, ƒ_start+1), ƒ₂ = min (ƒ + T, ƒ_finish-one),

Специалист в данной области техники поймет, что возможны также и другие варианты реализации данного изобретения и что элементы изобретения могут быть изменены различным образом не выходя за рамки концепции данного изобретения. Поэтому чертежи и описание следует рассматривать лишь в качестве иллюстрации, которая не имеет ограничительного характера.One skilled in the art will recognize that other embodiments of the invention are also possible and that the elements of the invention can be modified in various ways without departing from the concept of the present invention. Therefore, the drawings and description should be considered only as an illustration, which is not restrictive.

Заявляемый способ предназначен для реализации в программном обеспечении полуавтоматических систем преобразования "моно-в-стерео" или «черно/белое-в-цветное», используя современное вычислительное оборудование (персональные компьютеры, рабочие станции, компьютерные кластеры или тому подобное). Способ применим для систем преобразования видео "моно-в-стерео" или «черно/белое-в-цветное», оборудованных процессором, памятью, устройствами ввода - вывода и шиной передачи данных.The inventive method is intended for implementation in the software of semi-automatic conversion systems "mono-in-stereo" or "black / white-in-color" using modern computing equipment (personal computers, workstations, computer clusters or the like). With the manual, it is applicable for “mono-in-stereo” or “black / white-in-color” video conversion systems equipped with a processor, memory, input / output devices and a data bus.

Claims

1. The method of highlighting key frames in the process of semi-automatic addition of a video sequence with information about depth or color, characterized in that it includes the following operations:
receive data to initialize the objects of each key object in each frame;
detecting scene changes in the input video sequence and breaking the video sequence into scenes;
for each scene, data on the activity of each object is detected by means of the video data analysis module and global motion data (GM) for all frames of the scene and save these data in the video analysis results store;
moreover, after processing the video scene, first, the stored data on the activity of each object is analyzed, key frames are extracted, then the global motion (GM) data and key frames of the object are analyzed, key frames are extracted and output via the video analysis module; after which they clear the video analysis results store and go to the next scene of the input video sequence until the end of the video sequence is reached.

2. The method according to p. 1, characterized in that the key frames are selected based on user input information about the contents of the scene.

3. The method according to p. 1, characterized in that each key object is selected on one or more frames.

4. The method according to p. 1, characterized in that each key object is tracked in all frames in the cut scene.

5. The method according to p. 1, characterized in that assess the quality of the frame sections containing the key object.

6. The method according to p. 1, characterized in that they analyze the trajectory of each key object.

7. The method according to p. 1, characterized in that the key frames are selected based on the analysis of frame sections and / or path quality points.

8. The method according to p. 1, characterized in that the change of scene is revealed through comparisons of histograms calculated in image blocks.

9. The method according to p. 1, characterized in that the data for initializing the object include the coordinates of the RBB, namely the coordinates {x, y} of the upper left corner, width and height, a frame for this RBB f ₀ , and two frames, f _begin and f _end to track the RBB gap between them, so that

and

.

10. The method according to p. 9, characterized in that the RBB is monitored from the initialization frame f _{0 of the} object forward, from the f ₀ frame to the f _end frame, and then in the opposite direction from the f ₀ frame to the f _begin frame, to obtain the RBB tracked coordinates .

11. The method according to p. 1, characterized in that to identify the actions of the object, the RBB of the object is monitored for all frames in the video scene, namely, the coordinates of the RBB in the video frame are determined so that the comparison of the areas of the video frames contained within the RBB the current and the next frame gave the maximum value in terms of predefined metrics, and object parameters are calculated, which are image features of the video frame region contained inside the RBB.

12. The method according to p. 1, characterized in that the additional key frames are selected in the interval between the key frames of the objects, based on the analysis of global motion data GM.

13. The method according to claim 1, characterized in that the accumulated RBB coordinates and object parameters are compared in the video analysis results processing module and a number of key frames KFo are output:

,
where f _start and f _finish are sequential frames of the scene for which data on the actions of the object are calculated,

where f ₁ = max ( f -T, f _start ), f ₂ = min ( f + T, f _finish ), T is a predetermined threshold,
KFo = {f: D ^' _f = D _f }

14. The method according to p. 1, characterized in that in the module for processing the results of the video analysis analyze the curvature of the trajectory of the object

in video frames and get a number of key frames KFo

k_{f} = \frac{| (x^{f + one} - x^{f}) (y^{f + one} - 2 y^{f} + y^{f - one}) - (y^{f + one} - y^{f}) (x^{f + one} - 2 x^{f} + x^{f - one}) |}{{[{(x^{f + one} - x^{f})}^{2} + {(y^{f + one} - y^{f})}^{2}]}^{\frac{3}{2}}}, f = f_{s t a r t} + one... f_{f i n i s h} - one,

Wheref _one = max (f-T,f _start+1)f ₂ = min (f+ T,f _finish-1), T is a predetermined threshold,
KFo = {f: k ^'' _f = k _f }

15. The method according to p. 1, characterized in that the parameters of the objects include an assessment of sharpness and contrast.

16. The method according to p. 15, characterized in that analyze the parameters of the object, such as contrastC and sharpnessS, calculate the functions F andF 'c, andS, and T_one and T₂ are predefined real values:

,

Wheref _one = max (f-T,f _start+1)f ₂ = min (f+ T,f _finish-1), T - predefined threshold
KFo = {f: F ^'' _f = F _f }

17. The system for implementing the method of highlighting key frames in the process of semi-automatic addition of a video sequence with information about depth or color, characterized in that it consists of three main parts:
a video data analysis module configured to extract data from the input video stream and obtain a plurality of initialization data for key objects in the video through the video marking device;
a video analysis result storage device configured to store accumulated video analysis data detected by the analysis module;
and a video analysis result processing module configured to analyze accumulated data.

18. The system of claim 17, wherein the video data analysis module is configured to obtain a plurality of coordinates of rectangular frames (RBB), frame indices and number of frames through a video marking device.

19. The system according to p. 17, characterized in that the video analysis module includes a frame change detector, a global motion data detector, an object action detector.

20. The system according to p. 17, characterized in that the storage of the results of the video analysis consists of an accumulator of data on the actions of the object, made with the possibility of storing accumulated data on the actions of the object, and a data accumulator of global movement, made with the possibility of storing parameters describing the relative displacement of static objects in two consecutive video frames.

21. The system according to claim 17, characterized in that the video analysis result processing module includes a data extractor, an object key frame detector and a key frame detector.