RU2276407C2

RU2276407C2 - Method and device for background segmentation on basis of movement localization

Info

Publication number: RU2276407C2
Application number: RU2004115026/09A
Authority: RU
Inventors: Александр В. Бовырин (RU); Александр В. Бовырин; Виктор Львович Ерухимов (RU); Виктор Львович Ерухимов; Сергей А. Молинов (RU); Сергей А. Молинов
Original assignee: Интел Зао
Priority date: 2001-10-22
Filing date: 2001-10-22
Publication date: 2006-05-10
Also published as: RU2004115026A

Abstract

FIELD: movement detection systems, technical cybernetics, in particular, system and method for detecting static background in video series of images with moving objects of image foreground.

SUBSTANCE: method contains localization of moving objects in each frame and learning of background model with utilization of image remainder.

EFFECT: increased speed and reliability of background extraction from frames, with possible processing of random background changes and camera movements.

4 cl, 14 dwg

Description

Область техникиTechnical field

Данное изобретение относится к области детекторов движения или систем обнаружения движения и, в частности, касается способа и устройства для сегментации фона на основе локализации движения.This invention relates to the field of motion detectors or motion detection systems, and in particular, relates to a method and apparatus for background segmentation based on motion localization.

Уровень техникиState of the art

Видеоконференции и автоматическое видеонаблюдение являются быстро развивающимися областями техники, росту которых способствует увеличивающаяся доступность систем с низкой стоимостью и прогрессом в области техники обнаружения движения. Техника видеоизображения обеспечивает отображение последовательно изображений с помощью устройства, обеспечивающего отображение изображений, например, такого как дисплей компьютера. Последовательность изображений изменяется во времени, так что она может адекватно представлять движение в сцене.Video conferencing and automatic video surveillance are rapidly developing areas of technology, the growth of which is facilitated by the increasing availability of systems with low cost and progress in the field of motion detection technology. The video image technique provides for displaying sequentially images using a device for displaying images, such as, for example, a computer display. The sequence of images varies over time, so that it can adequately represent movement in the scene.

Кадр является единичным изображением в последовательности изображений, которая передается в монитор для отображения. Каждый кадр состоит из элементов изображения (пэлов или пикселей), которые являются основными единицами программируемого цвета в изображении или в кадре. Пиксель является наименьшей зоной экрана монитора, которую можно включать и выключать для создания изображения, при этом физический размер пикселя зависит от разрешения дисплея компьютера. Пиксели могут формироваться в строки и столбцы дисплея компьютера с целью визуализации кадра. Если кадр содержит цветное изображение, то каждый пиксель можно включать с конкретным цветом для визуализации кадра. Конкретный цвет, который выдает пиксель, является смесью компонентов цветного спектра, обычно таких, как красный, зеленый и синий.A frame is a single image in a sequence of images that is transmitted to a monitor for display. Each frame consists of image elements (pels or pixels), which are the basic units of programmable color in the image or in the frame. A pixel is the smallest area of a monitor screen that can be turned on and off to create an image, and the physical size of the pixel depends on the resolution of the computer display. Pixels can be formed into rows and columns of a computer display to render a frame. If the frame contains a color image, then each pixel can be included with a specific color to render the frame. The specific color that a pixel produces is a mixture of color spectrum components, usually red, green, and blue.

Видеопоследовательности могут содержать как неподвижные объекты, так и движущиеся объекты. Неподвижные объекты являются объектами, которые остаются неподвижными из одного кадра в другой. Таким образом, пиксели, используемые для визуализации цветов неподвижного объекта, остаются по существу одинаковыми в последовательных кадрах. Зоны кадра, содержащие объекты с неизменным цветом, называются фоном. Движущиеся объекты являются объектами, которые изменяют положение в кадре относительно предыдущего положения внутри прежнего кадра в последовательности изображений. Если объект изменяет свою позицию в следующем кадре относительно его позиции в предшествующем кадре, то пиксели, используемые для визуализации изображения объекта, также изменяют цвет в последовательных кадрах. Такие зоны кадра называются передним планом.Video sequences can contain both stationary objects and moving objects. Fixed objects are objects that remain motionless from one frame to another. Thus, the pixels used to visualize the colors of a stationary object remain essentially the same in successive frames. Frame zones containing objects with the same color are called the background. Moving objects are objects that change the position in the frame relative to the previous position inside the previous frame in the sequence of images. If an object changes its position in the next frame relative to its position in the previous frame, then the pixels used to render the image of the object also change color in successive frames. Such areas of the frame are called foreground.

Некоторые варианты применения, такие как техника видеоизображения, часто основываются на обнаружении движения объектов в видеопоследовательностях. Во многих системах это обнаружение движения основывается на вычитании фона. Вычитание фона является простым и эффективным способом идентификации объектов и событий, представляющих интерес, в видеопоследовательности. Существенной стадией вычитания фона является обучение модели фона способности изучать частное окружение. В большинстве случаев это подразумевает получение изображений фона для последующего сравнения с тестируемыми изображениями, где могут иметься объекты переднего плана. Однако этот подход сталкивается с проблемами при применениях, где фон не присутствует или быстро меняется.Some applications, such as the video technique, are often based on detecting the movement of objects in video sequences. In many systems, this motion detection is based on subtracting the background. Subtracting a background is a simple and effective way of identifying objects and events of interest in a video sequence. An essential stage in background subtraction is the training of the background model for the ability to study the private environment. In most cases, this implies obtaining background images for subsequent comparison with the tested images, where there may be foreground objects. However, this approach encounters problems in applications where the background is not present or changes rapidly.

Некоторые способы, согласно уровню техники, которые направлены на решение этих проблем, часто называются сегментацией фона. Подходы к задаче сегментации фона можно грубо разделить на две стадии: сегментацию движения и обучение фону. Сегментацию движения используют для нахождения в каждом кадре видеопоследовательности зон, которые соответствуют движущимся объектам. Сегментацию движения начинают с поля движения, полученного из оптического потока, вычисленного в двух последовательных кадрах. Поле движения разделяют на два кластера с использованием k-средств. Наибольшую группу принимают за фон.Some methods, according to the prior art, which are aimed at solving these problems, are often called background segmentation. The approaches to the problem of background segmentation can be roughly divided into two stages: movement segmentation and background training. Segmentation of movement is used to find in each frame a video sequence of zones that correspond to moving objects. Segmentation of movement begins with the field of motion obtained from the optical stream, calculated in two consecutive frames. The motion field is divided into two clusters using k-means. The largest group is taken as the background.

Обучение фону состоит в обучении модели фона на остатке изображения. Основанное на модели вычитание фона состоит в вычитании фона из "музейных" цветных изображений на основе предположений о свойствах изображения. Это включает небольшое число объектов в фоне, который является относительно гладким с изменениями цвета в пространстве и легкими текстурами.Background training consists in training the background model on the remainder of the image. Model-based background subtraction consists in subtracting the background from the “museum” color images based on assumptions about the image properties. This includes a small number of objects in the background, which is relatively smooth with color changes in space and light textures.

Недостатком этих решений сегментации фона, согласно уровню техники, является то, что они предлагают основанный на пикселях подход к сегментации движения. При основанном на пикселях подходе анализируют каждый пиксель для принятия решения, относится ли он к фону или нет. Поэтому время Т обработки каждого пикселя является суммой времени Т1 обнаружения движения и времени Т2 обучения фону. Если кадр состоит из N пикселей, то время обработки одного кадра составляет T*N. Такой подход может быть достаточно надежным, но требует очень много времени.The disadvantage of these background segmentation solutions, according to the prior art, is that they offer a pixel-based approach to motion segmentation. In a pixel-based approach, each pixel is analyzed to decide whether it relates to the background or not. Therefore, the processing time T of each pixel is the sum of the motion detection time T1 and the background learning time T2. If the frame consists of N pixels, then the processing time of one frame is T * N. This approach can be quite reliable, but it takes a lot of time.

Краткое описание чертежейBrief Description of the Drawings

Ниже приводится описание изобретения с использованием конкретного примера со ссылками на прилагаемые фигуры чертежей, которые не ограничивают существо изобретения, при этом на фигурах чертежей изображено следующее:The following is a description of the invention using a specific example with reference to the accompanying figures of the drawings, which do not limit the essence of the invention, while the figures of the drawings depict the following:

фиг.1 - изображает вариант выполнения способа выделения фона изображения из видеопоследовательности;figure 1 - depicts an embodiment of a method of highlighting the background image from a video sequence;

фиг.2А - пример кадра из видеопоследовательности;figa is an example of a frame from a video sequence;

фиг.2В - другой пример кадра из видеопоследовательности, следующий за кадром, согласно фиг.2А;figv is another example of a frame from a video sequence following the frame according to figa;

фиг.2С - пример выполнения изображения обнаружения изменения;figs - an example of the implementation of the image change detection;

фиг.2D - пример выполнения контуров границы обнаружения изменения изображения, согласно фиг.2С;fig.2D is an example of the implementation of the contours of the borders of the detection of image changes according to figs;

фиг.2Е - пример выполнения конструкции оболочки;fige - an example of the design of the shell;

фиг.3 - вариант выполнения итеративной конструкции оболочки;figure 3 - embodiment of the iterative design of the shell;

фиг.4 - вариант выполнения схемы обучения фону;4 is an embodiment of a background learning scheme;

фиг.5 - пример выполнения относительного разброса текущих средних значений в зависимости от а;5 is an example of a relative variation in the current average values depending on a;

фиг.6 - пример признаков отслеживания примера фона кадра;6 is an example of signs of tracking example frame background;

фиг.7 - вариант выполнения обнаружения движения камеры и его компенсации;7 is an embodiment of the detection of camera movement and its compensation;

фиг.8 - пример процентного количества движущихся пикселей, сегментированных с помощью алгоритма локализации движения;Fig. 8 is an example of a percentage of moving pixels segmented using a motion localization algorithm;

фиг.9 - пример процентного количества пикселей фона, сегментированных в качестве переднего плана, полученного с помощью алгоритма локализации движения;Fig. 9 is an example of a percentage of background pixels segmented as a foreground obtained using a motion localization algorithm;

фиг.10 - пример выполнения компьютерной системы с камерой.figure 10 is an example of a computer system with a camera.

Подробное описаниеDetailed description

В последующем описании многочисленные специфичные детали описываются как примеры специальных систем, технологий, компонентов и т.д. с целью обеспечения глубокого понимания данного изобретения. Однако для специалистов в данной области техники очевидно, что эти специальные детали не являются обязательными для реализации изобретения. В других случаях хорошо известные из уровня техники компоненты и способы не описываются подробно с целью исключения ненужного усложнения данного описания.In the following description, numerous specific details are described as examples of special systems, technologies, components, etc. in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that these special details are not required to implement the invention. In other cases, components and methods well known in the art are not described in detail in order to avoid unnecessarily complicating this description.

Данное изобретение содержит различные стадии, описание которых приводится ниже. Стадии данного изобретения могут быть выполнены с помощью компонентов аппаратурного обеспечения или же могут быть представлены в выполняемых машинами командах, которые могут приводить к выполнению процессором общего назначения или специальным процессором, программируемым командами, указанных стадий. В качестве альтернативного решения, стадии могут выполняться с помощью комбинации аппаратурного обеспечения и программного обеспечения.This invention contains various stages, the description of which is given below. The stages of the present invention can be performed using hardware components or can be represented in instructions executed by machines, which can lead to the execution by a general-purpose processor or a special processor programmed by instructions of the indicated stages. Alternatively, steps may be performed using a combination of hardware and software.

Данное изобретение может быть предложено в виде компьютерной программы или программного обеспечения, которое может содержать машинно-считываемый носитель информации, имеющий хранящиеся в нем команды, которые можно использовать для программирования компьютерной системы (или других электронных устройств) для выполнения способа, согласно изобретению. Машинно-считываемый носитель информации включает любой механизм для хранения или передачи информации в виде (например, программного обеспечения, прикладных программ обработки и т.д.), пригодном для считывания машиной (например, компьютером). Машинно-считываемый носитель информации может включать, но не ограничиваясь этим, магнитный носитель записи (например, гибкий магнитный диск); оптический носитель записи (например, CD-ROM), магнитно-оптический носитель записи; постоянную память (ROM), оперативную память (RAM), стираемую программируемую память (например, EPROM и EEPROM), флэш-память, электрический, оптический, акустический сигнал или сигнал с другой формой распространения (например, несущие волны, инфракрасные сигналы, цифровые сигналы и т.д.) или другие типы носителей записи, подходящие для хранения электронных команд.The present invention may be proposed in the form of a computer program or software, which may comprise a computer-readable storage medium having instructions stored therein that can be used to program a computer system (or other electronic devices) to carry out the method according to the invention. A computer-readable storage medium includes any mechanism for storing or transmitting information in the form of (e.g., software, processing applications, etc.) suitable for reading by a machine (e.g., computer). A computer-readable storage medium may include, but is not limited to, a magnetic recording medium (eg, a flexible magnetic disk); an optical recording medium (e.g., CD-ROM); a magneto-optical recording medium; read-only memory (ROM), random access memory (RAM), erasable programmable memory (e.g. EPROM and EEPROM), flash memory, electrical, optical, acoustic or other waveforms (e.g. carrier waves, infrared signals, digital signals etc.) or other types of recording media suitable for storing electronic commands.

Данное изобретение может быть реализовано также в распределенной вычислительной системе, где машинно-считываемый носитель информации хранится и/или выполняется более чем одной компьютерной системой. Дополнительно к этому, информация, передаваемая между компьютерными системами, может распространяться методом опроса изменений или методом принудительной рассылки в среде связи, соединяющей компьютерные системы.The present invention can also be implemented in a distributed computing system, where a computer-readable storage medium is stored and / or executed by more than one computer system. In addition, the information transmitted between computer systems can be disseminated by the method of polling changes or by the method of forced distribution in a communication medium connecting computer systems.

Некоторые части описания представлены в виде алгоритмов или символьных представлений операций с битами данных, которые могут храниться в памяти и с которыми может работать компьютер. Эти алгоритмы и представления являются средствами, используемыми специалистами в данной области техники для эффективного выполнения своей работы. Алгоритм обычно понимается как самодостаточная последовательность действий, приводящих к желаемому результату. Действиями являются действия, требующие обращения с количествами. Обычно, но не обязательно, эти количества принимают вид электрических или магнитных сигналов, которые можно хранить, передавать, комбинировать, сравнивать и т.д. Иногда удобно для общего пользования называть эти сигналы битами, величинами, элементами, символами, знаками, понятиями, числами, параметрами или т.п.Some parts of the description are presented in the form of algorithms or symbolic representations of operations with data bits that can be stored in memory and with which the computer can work. These algorithms and representations are the means used by specialists in the given field of technology for the effective performance of their work. An algorithm is usually understood as a self-contained sequence of actions leading to the desired result. Actions are actions requiring the handling of quantities. Usually, but not necessarily, these quantities take the form of electrical or magnetic signals that can be stored, transmitted, combined, compared, etc. Sometimes it is convenient for general use to call these signals bits, quantities, elements, symbols, signs, concepts, numbers, parameters, etc.

Ниже приводится описание способа и системы для выделения фона изображения из видеопоследовательности с объектами переднего плана. Зоны фона в кадре, которые не перекрываются объектами переднего плана во время видеопоследовательности, могут быть захвачены посредством обработки отдельных кадров видеопоследовательности.The following is a description of a method and system for extracting an image background from a video sequence with foreground objects. Background areas in the frame that are not overlapped by foreground objects during the video sequence can be captured by processing individual frames of the video sequence.

На фиг.1 показан конкретный неограничивающий вариант выполнения способа выделения фона изображения из видеопоследовательности. В одном варианте выполнения способ может включать локализацию движущихся объектов в изображении с использованием маски обнаружения изменений на стадии 110 и обучение модели фона на остальных зонах изображения на стадии 120. При локализации движущихся объектов на стадии 110 границы движущихся объектов, которые имеют однородный цвет для, по меньшей мере, двух последовательных кадров, маркируют посредством создания одной или нескольких оболочек, которые охватывают зоны, соответствующие движущимся объектам. Остаток изображения рассматривается в качестве фона и используется для обучения модели фона на стадии 120. В одном варианте выполнения фон может также использоваться для обнаружения и компенсации движения камеры на стадии 130.Figure 1 shows a specific non-limiting embodiment of a method for extracting a background image from a video sequence. In one embodiment, the method may include localizing moving objects in the image using a change detection mask in step 110 and training the background model in the remaining areas of the image in step 120. When localizing moving objects in step 110, the boundaries of moving objects that are uniform in color, at least two consecutive frames are marked by creating one or more shells that span areas corresponding to moving objects. The remainder of the image is considered as a background and is used to train the background model in step 120. In one embodiment, the background can also be used to detect and compensate for camera movement in step 130.

На фиг.2А и 2В показаны два последовательных кадра одной и той же видеопоследовательности. В качестве примера стадии 110 на фиг.1 предполагается, что в видеопоследовательности представлен только один движущийся объект 205 (например, части шагающего человека), который имеет однородный цвет. В кадре 25 части шагающего человека 205 могут иметь измененное положение по сравнению с их положением в кадре 250. Разница между этими двумя кадрами - кадром 250 и кадром 255 изображения является объектом или его частями, который переместился и который показан на фиг.2С в виде изображения 209 обнаружения изменения. Например, левая нога 261 человека почти не видна в изображении 209, поскольку человек делает шаг правой ногой 264, одновременно сохраняя левую ногу 206 по существу без движения на полу. Таким образом, левая нога 262 человека не появляется в изображении 209 обнаружения изменения. В противоположность этому, пятка правой ноги 264 человека поднялась из кадра 250 в кадре 255, и поэтому она появилась в изображении 209 обнаружения изменения.2A and 2B show two consecutive frames of the same video sequence. As an example of step 110 in FIG. 1, it is assumed that only one moving object 205 (e.g., parts of a walking person) that has a uniform color is represented in the video sequence. In frame 25, parts of the walking person 205 may have a changed position compared to their position in frame 250. The difference between these two frames - frame 250 and image frame 255 is an object or its parts that has moved and which is shown in Fig. 2C as an image 209 change detection. For example, the left leg 261 of a person is almost invisible in the image 209, since the person takes a step with the right foot 264, while keeping the left foot 206 essentially without movement on the floor. Thus, the human left foot 262 does not appear in the change detection image 209. In contrast, the heel of the human right foot 264 rose from frame 250 in frame 255, and therefore, it appeared in the change detection image 209.

Применение маски 219 обнаружения изменения приводит к маркировке лишь пограничных контуров 210, 211 и 212 движущихся зон 209 однородного цвета, а не самих зон полностью, как показано на фиг.2D. Например, контур 210 соответствует границе вокруг торса, рук и наружных частей ног объекта 205; контур 210 соответствует границе вокруг внутренних частей ног объекта 205; и контур 212 соответствует голове и шее движущегося объекта 205. В результате маска 219 обнаружения изменения содержит намного меньшее число пикселей, чем общее число пикселей в кадре. Использование алгоритма обнаружения изменения для изображения высокого разрешения с последующей обработкой маски обнаружения изменения для локализации движения занимает намного меньше времени, чем применение сложной технологии развертки, подобной оптическому потоку.The use of a change detection mask 219 results in marking only the boundary contours 210, 211 and 212 of the moving zones 209 of a uniform color, and not the zones themselves completely, as shown in FIG. 2D. For example, contour 210 corresponds to a border around the torso, arms, and outside of the legs of an object 205; contour 210 corresponds to the border around the inner parts of the legs of the object 205; and the contour 212 corresponds to the head and neck of the moving object 205. As a result, the change detection mask 219 contains a much smaller number of pixels than the total number of pixels in the frame. Using a change detection algorithm for a high-resolution image followed by processing a change detection mask to locate movement takes much less time than applying complex sweep technology similar to optical flow.

Все движущиеся объекты локализуют посредством применения быстрого анализа связанных компонентов к маске 219 обнаружения изменения, в результате которого конструируют оболочку 239 вокруг контура каждой движущейся зоны, как показано на фиг.2Е. Например, оболочку 220 создают вокруг контура 210, оболочку 221 - вокруг контура 211, и оболочку 222 - вокруг контура 212.All moving objects are localized by applying a quick analysis of related components to the change detection mask 219, which results in the construction of a shell 239 around the contour of each moving zone, as shown in FIG. 2E. For example, sheath 220 is created around contour 210, sheath 221 is around contour 211, and sheath 222 is around contour 212.

Пусть I_t является изображением в момент времени t, m_t⊂I_t является комплектом пикселей, которые соответствуют действительно движущимся объектам, и M_t⊂I_t - комплект пикселей, которые принадлежат одной из оболочек. Локализация означает, что M_t должно охватывать m_t. На практике, если пиксель р принадлежит к S_t=I_t-M_t, то он соответствует статичному объекту с высокой степенью достоверности.Let I _t be an image at time t, m _t ⊂I _t be a set of pixels that correspond to really moving objects, and M _t ⊂I _t be a set of pixels that belong to one of the shells. Localization means that M _t should cover m _t . In practice, if the pixel p belongs to S _t = I _t -M _t , then it corresponds to a static object with a high degree of reliability.

Для нахождения движущихся объектов алгоритм обнаружения изменения применяют к кадрам видеопоследовательности (например, к кадрам 250 и 255). В одном варианте выполнения можно использовать, например, алгоритм обнаружения изменения, описанный в "Введение в технологию трехмерного компьютерного зрения" Эмануэля Трукко и Алессандро Верри, Издательство "Prentice Hall", 1998 год. В качестве альтернативного решения можно использовать другие алгоритмы обнаружения изменения. Кроме того, алгоритм обнаружения изменения можно выбирать на основе требований конкретного применения.To find moving objects, a change detection algorithm is applied to frames of a video sequence (for example, frames 250 and 255). In one embodiment, you can use, for example, the change detection algorithm described in "Introduction to 3D Computer Vision Technology" by Emanuel Trukko and Alessandro Verry, Prentice Hall, 1998. As an alternative solution, other change detection algorithms may be used. In addition, a change detection algorithm can be selected based on the requirements of a particular application.

Если для любого n

, то пиксель рассматривается как движущийся, где

является максимальным изменением в последовательно текущих средних величинах, так что модель фона для пикселей рассматривается как обученная. Пороговое значение -

выбрано как произведение σ⁽ⁿ⁾ вычисленное из последовательности изображений статичной сцены, где σ является стандартным отклонением нормального распределения цвета пикселей в случае одного или нескольких цветных каналов. В одном варианте выполнения маска обнаружения изменения маркирует зоны изменения шума и освещенности дополнительно к границам движущихся зон однородного цвета. Как указывалось выше, для локализации движущегося объекта создают оболочку этих зон, так что она содержит движущиеся пиксели и не захватывает, по возможности, статичные пиксели.If for any n

, then the pixel is considered moving, where

is the maximum change in successively current averages, so the background model for pixels is considered trained. The threshold value is

selected as the product σ ⁽ⁿ⁾ calculated from a sequence of images of a static scene, where σ is the standard deviation of the normal distribution of pixel colors in the case of one or more color channels. In one embodiment, a change detection mask marks areas of variation in noise and light in addition to the boundaries of moving areas of uniform color. As indicated above, to localize a moving object, create a shell of these zones, so that it contains moving pixels and does not capture, if possible, static pixels.

Движущийся объект является накоплением зон обнаружения изменения в текущий момент времени t. Для упрощения можно принять, что имеется лишь один движущийся объект. Все связанные компоненты в маске обнаружения изменения и их контуры установлены. В одном варианте выполнения для избавления от контуров шума (например, контура 231 на фиг.2D) зоны с малой площадью отфильтровываются. Затем выбирают контур С_max с наибольшей площадью (который соответствует объекту или его границе), например контур 220 на фиг.2D. Итеративное конструирование оболочки Н начинают посредством объединения С_max с другими площадями контуров (например, контуров 221 и 222). Эти другие площади контуров представляют другие движущиеся зоны движущегося объекта 205.A moving object is the accumulation of change detection zones at the current time t. For simplicity, we can assume that there is only one moving object. All related components in the change detection mask and their contours are installed. In one embodiment, to eliminate noise loops (e.g., loop 231 in FIG. 2D), areas of small area are filtered out. Then select the circuit With _max with the largest area (which corresponds to the object or its border), for example, the circuit 220 in fig.2D. The iterative construction of the sheath H begins by combining With _max with other areas of the contours (for example, circuits 221 and 222). These other contour areas represent other moving areas of the moving object 205.

На фиг.3 показан вариант выполнения итеративного построения оболочки. На стадии 120 для всех контуров С_i конструируют их выпуклые оболочки. Выпуклая оболочка является наименьшим выпуклым многоугольником, который содержит один или несколько компонентов движущихся зон. Выпуклую оболочку контура С_i обозначают как Н_i, а выпуклую оболочку контура C_max - как Н_max. На стадии 320 находят индекс k, так что евклидово расстояние между H_k и Н_max является минимальным:Figure 3 shows an embodiment of the iterative construction of the shell. At step 120, convex hulls are constructed for all C _i loops. A convex hull is the smallest convex polygon that contains one or more components of moving zones. The convex hull of the contour C _{i is} denoted by H _i , and the convex hull of the contour C _max is denoted by H _max . At stage 320, the index k is found, so that the Euclidean distance between H _k and H _max is minimal:

k=arg min(dist(H_i, Н_max)) и d_k=min dist(H_i, Н_max).k = arg min (dist (H _i , H _max )) and d _k = min dist (H _i , H _max ).

На стадии 340 определяют, находится ли выпуклая оболочка внутри минимального расстояния D_max выпуклой оболочки С_max (d_k меньше пороговой величины D_max). Если это так, то конструируют выпуклую оболочку

вокруг комплекта оболочек H_k и Н_max на стадии 350. Если нет, то повторяют стадию 340 для следующего контура на стадии 345. На стадии 360 обозначают

и на стадии 370 определяют, были ли учтены все контуры. Затем все повторяют со стадии 320, пока все С_i не будут учтены. В противном случае переходят на стадию 380. На стадии 380 устанавливают движущуюся зону равной самому последнему максимальному контуру (M_t=Н_max). Указанные выше стадии можно объединить для случая нескольких движущихся объектов.At step 340, it is determined whether the convex hull is within the minimum distance D _{max of the} convex hull C _max (d _{k is} less than the threshold value D _max ). If so, then construct the convex hull

around the set of shells H _k and H _max at step 350. If not, then repeat step 340 for the next circuit at step 345. At step 360 denote

and at step 370, it is determined whether all contours have been taken into account. Then everything is repeated from step 320 until all C _i are taken into account. Otherwise, they go to step 380. At step 380, a moving zone is set equal to the most recent maximum contour (M _t = H _max ). The above stages can be combined for the case of several moving objects.

Качество указанного выше алгоритма можно оценить с использованием двух величин. Первая величина является условной вероятностью того, что пиксель рассматривается как движущийся при условии, что он действительно соответствует движущемуся объекту:The quality of the above algorithm can be estimated using two values. The first value is the conditional probability that a pixel is considered to be moving, provided that it really corresponds to a moving object:

P₁=Р(р∈М_t|р∈m_t).P ₁ = P (p∈M _t | p∈m _t ).

Вторая величина является условной вероятностью того, что пиксель рассматривается как движущийся при условии, что он является статичным: Р₂=P(p∈M_t|p∈I_t-m_t), где I_t является изображением в момент времени t, m_t является комплектом пикселей в I_t, которые соответствуют движущимся объектам, и M_t является комплектом пикселей в I_t, которые претерпели значительное изменение в цвете в последнем или нескольких последних кадрах.The second quantity is the conditional probability that the pixel is considered to be moving, provided that it is static: P ₂ = P (p∈M _t | p∈I _t -m _t ), where I _t is the image at time t, m _t is a set of pixels in I _t that correspond to moving objects, and M _t is a set of pixels in I _t that have undergone a significant change in color in the last or several last frames.

P₁ должно быть как можно больше, в то время как Р₂ должно быть небольшим. Если P₁ является недостаточно большим, то может быть выполнено обучение искаженному фону, в то время как при недостаточно малом P₂ увеличивается время обучения. P₁ и Р₂ должны, очевидно, расти с увеличением D_max. Это определяет D_max как минимальную величину, обеспечивающую P₁ выше определенного уровня достоверности. Выбор D_max будет описан со ссылками на применительно к фиг.8.P ₁ should be as large as possible, while P ₂ should be small. If P ₁ is not large enough, then a distorted background can be trained, while with P ₂ not enough small, the training time increases. P ₁ and P ₂ should obviously increase with increasing D _max . This defines D _max as the minimum value providing P ₁ above a certain level of confidence. The selection of D _max will be described with reference to FIG.

Как указывалось выше, маска обнаружения изменения маркирует лишь границы однородных движущихся зон. Кроме того, она может не маркировать зоны, которые движутся достаточно медленно. Поэтому некоторые медленно движущиеся объекты могут постоянно переходить в фон, а некоторые движущиеся объекты могут случайно рассматриваться как принадлежащие к фону. Одно решение первой проблемы состоит в выполнении обнаружения изменения несколько раз с разными опорными кадрами, например одним кадром перед текущим кадром, двумя кадрами перед текущим кадром и т.д. Одно решение второй проблемы состоит в выполнении обучения фону с учетом того, что некоторые кадры фона могут быть искажены. В этом отношении две характеристики алгоритма локализации движения представляют интерес: вероятность Р^(m) того, что движущийся пиксель неправильно классифицирован m раз подряд, а индекс m* является таким, что вероятность P^(m*) находится ниже уровня достоверности, в этом случае m* можно использовать в качестве параметра для алгоритма обучения фону.As indicated above, the change detection mask only marks the boundaries of uniform moving zones. In addition, it may not mark areas that move quite slowly. Therefore, some slowly moving objects can constantly turn into the background, and some moving objects can be accidentally considered to belong to the background. One solution to the first problem is to perform change detection several times with different reference frames, for example, one frame before the current frame, two frames before the current frame, etc. One solution to the second problem is to do background training, given that some background frames may be distorted. In this regard, two characteristics of the motion localization algorithm are of interest: the probability P ^(m) that the moving pixel is incorrectly classified m times in a row, and the index m * is such that the probability P ^{(m *)} is below the confidence level, in this case m * can be used as a parameter for the background learning algorithm.

Как показано на фиг.1, когда все движущиеся зоны в текущем кадре локализованы на стадии 110, то выполняется обучение модели фона с заданными статичными пикселями текущего кадра на стадии 120. Цвет пикселя может быть охарактеризован в заданный момент времени тремя величинами {X(n)}, n=1...3, которые в случае статичного пикселя можно приемлемо моделировать нормальными распределениями N(μ⁽ⁿ⁾, σ⁽ⁿ⁾) с неизвестными средними μ⁽ⁿ⁾ и стандартными отклонениями σ⁽ⁿ⁾.As shown in FIG. 1, when all the moving areas in the current frame are localized at step 110, the background model is trained with the given static pixels of the current frame at step 120. The color of the pixel can be characterized at a given point in time by three values {X (n) }, n = 1 ... 3, which in the case of a static pixel can be reasonably modeled by normal distributions N (μ ⁽ⁿ⁾ , σ ⁽ⁿ⁾ ) with unknown mean μ ⁽ⁿ⁾ and standard deviations σ ⁽ⁿ⁾ .

Обучение выполняют в несколько стадий для удаления выбросов, создаваемых неправильным предсказанием на стадии 110. Со случайными изменениями фона можно обращаться аналогичным образом. Если пиксель переднего плана представляет нормальное распределение с небольшим отклонением в течение длительного времени, то это рассматривается как изменение фона и модель фона сразу же обновляется. Для сегментации фона в каждом изображении можно использовать вычитание фона, как описано, например, в статье "Не параметрическая модель для вычитания фона" Ахмеда Эльгамалля, Дэвида Харвуда, Ларри Дэвиса, Proc. ECCV, том 2, страницы 751-767, 2000. А в альтернативном варианте выполнения настоящего изобретения можно использовать другие методы вычитания фона.The training is carried out in several stages to remove the outliers created by the incorrect prediction in step 110. Random background changes can be handled in a similar way. If the foreground pixel represents a normal distribution with a slight deviation for a long time, then this is considered as a background change and the background model is immediately updated. To subtract the background in each image, you can use background subtraction, as described, for example, in the article “Non-parametric model for background subtraction” by Ahmed Elgamall, David Harwood, Larry Davis, Proc. ECCV, Volume 2, Pages 751-767, 2000. And in an alternative embodiment of the present invention, other background subtraction methods may be used.

Во время процесса обучения используют вычисление величин μ⁽ⁿ⁾ с использованием обновления текущего среднего:During the learning process, the calculation of μ ⁽ⁿ⁾ using the update of the current average is used:

где t_i обозначает кадры, где пиксель был классифицирован как статичный.where t _i denotes frames where the pixel has been classified as static.

Когда последовательность сходится, т.е. разница между

и

является небольшой:When the sequence converges, i.e. difference between

and

is small:

то модель фона рассматривается как обученная в этом пикселе и

. Поэтому каждый пиксель может соответствовать одному из четырех состояний, показанных на фиг.4: состояние 410 неизвестного фона (что соответствует пикселям, которые никогда не были в S_t), состояние 420 необученного фона (когда набирается статистика и не выполняется неравенство 2), состояние 430 обученного фона (выполняется неравенство 2) и состояние 440 переднего плана (когда обучение фону выполнено и на текущем кадре обнаружен передний план с помощью вычитания фона). Возможные переходы показаны на фиг.4. Переход А471 происходит, когда пиксель появляется в S_t первый раз. Переход В472 происходит, когда модель пикселя рассматривается как достаточно обученная. Переход С473 происходит, когда передний фон является статичным в течение длительного периода времени.then the background model is considered as trained in this pixel and

. Therefore, each pixel can correspond to one of four states shown in FIG. 4: state 410 of an unknown background (which corresponds to pixels that have never been in S _t ), state 420 of an untrained background (when statistics are collected and inequality 2 is not satisfied), state 430 trained background (inequality 2 holds) and foreground state 440 (when background training is completed and the foreground is detected in the current frame by subtracting the background). Possible transitions are shown in figure 4. Transition A471 occurs when a pixel appears in S _{t for the} first time. Transition B472 occurs when a pixel model is considered as sufficiently trained. The C473 transition occurs when the foreground is static for a long period of time.

С целью упрощения пиксель в заданный момент времени t можно характеризовать с помощью лишь одной величины X_t. Уравнение (1) и неравенство (2) содержат неизвестные параметры α и β, которые определяют процесс обучения. Подходящий выбор этих параметров обеспечивает быстрое и в то же время статистически оптимальное обучение фону. При предположении, что X₁=I+Δt, где I является постоянной величиной цвета пикселя фона, и Δ является шумом Гаусса с нулевым средним в цвете пикселя в момент времени t со стандартным отклонением Δσ, то для δ_t=μ_t-I получают следующее равенство δ_t=(1-α)δ_ti-1+αΔ_ti, где δ_t является разницей текущего среднего и постоянным цветом фона.In order to simplify, a pixel at a given point in time t can be characterized using only one quantity X _t . Equation (1) and inequality (2) contain unknown parameters α and β, which determine the learning process. A suitable choice of these parameters provides fast and at the same time statistically optimal background training. Under the assumption that X ₁ = I + Δt, where I is a constant value of the background pixel color, and Δ is a Gaussian noise with a zero average in the pixel color at time t with a standard deviation Δσ, then for δ _t = μ _t -I get the following equality δ _t = (1-α) δ _ti-1 + αΔ _ti , where δ _t is the difference of the current average and constant background color.

δ_t имеет нормальное распределение со средним и отклонением σ_t.δ _t has a normal distribution with mean and deviation σ _t .

, где а является текущей средней постоянной

where a is the current average constant

Для того чтобы иметь надежный фон обучение фону должно проводиться достаточно длительное время, чтобы быть уверенным, что обучение фону не выполнено с помощью движущегося объекта. Другими словами, если величина пикселя изменяется значительно, то обучение должно продолжаться по меньшей мере в течение m* кадров. Поэтому должно выполняться следующее неравенство:In order to have a reliable background, background training should be conducted for a sufficiently long time to be sure that background training has not been performed using a moving object. In other words, if the pixel value changes significantly, then training should continue for at least m * frames. Therefore, the following inequality must be satisfied:

где δ_to равно σΔ и m* является минимальным числом последовательных кадров, так что вероятность P^(m*) ниже уровня достоверности; другими словами, можно предположить, что ни один пиксель не был неправильно классифицирован во всех m* последовательных кадрах. В одном варианте выполнения может не быть причины делать β меньше, чем величина, заданная в неравенстве (4), поскольку это резко увеличивает время, необходимое для обучения фону.where δ _to is equal to σΔ and m * is the minimum number of consecutive frames, so that the probability P ^{(m *) is} below the confidence level; in other words, it can be assumed that not a single pixel was misclassified in all m * consecutive frames. In one embodiment, there may be no reason to make β smaller than the value specified in inequality (4), since this drastically increases the time required to teach the background.

В то же время стандартное отклонение δ_m* должно быть как можно меньше. Можно доказать, что

как функция α∈[0, 1] имеет один минимум

, гдеAt the same time, the standard deviation δ _{m *} should be as small as possible. It can be proved that

as a function α∈ [0, 1] has one minimum

where

Примеры ζ(α) для разных чисел кадров показаны на фиг.5.Examples of ζ (α) for different frame numbers are shown in FIG. 5.

На фиг.5 показан пример выполнения относительного рассеяния текущего среднего в зависимости от значений α. В одном варианте выполнения сплошная линия 510 соответствует пятому кадру, пунктирная линия 520 соответствует десятому кадру, а штрих-пунктирная линия 530 соответствует двадцатому кадру.Figure 5 shows an example of the relative dispersion of the current average depending on the values of α. In one embodiment, the solid line 510 corresponds to the fifth frame, the dashed line 520 corresponds to the tenth frame, and the dashed line 530 corresponds to the twentieth frame.

Выбор слишком большой или слишком малой величины а приводит к большой статистической неопределенности δ, и текущее среднее μа=а*_m* можно выбирать так, что при статичном пикселе фона текущее среднее μ_tm*, принимаемое в качестве величины фона пикселя, имеет минимально возможное стандартное отклонение. Заданное m*, неравенство 4 и равенство 5 задают оптимальную величину β и α.Selecting too large or too small a leads to a large statistical uncertainty δ, and the current average μa = a * _{m *} can be chosen so that for a static background pixel, the current average μ _{tm *} , taken as the pixel background value, has the lowest possible standard deviation. Given m *, inequality 4, and equality 5 determine the optimal value of β and α.

В одном варианте выполнения изменения фона могут учитываться при обучении модели фона. Предположим, что камера не движется, а фон значительно изменился, хотя после этого остается статичным. Например, один из статичных объектов переместился в другое положение. Система маркирует прежнее и текущее места объекта как передний план. Такие пиксели обычно не являются пикселями переднего плана, а являются статичным фоном. Это свойство позволяет отслеживать такие изменения фона и адаптировать модель фона. Модель обучают для каждого пикселя в фоне, и если он имеет статичное поведение в течение длительного периода времени, то его состояние изменяется в состояние не обученного фона. После заданного числа кадров (например, трех кадров) он становится обученным фоном.In one embodiment, background changes may be considered when training the background model. Suppose the camera does not move, and the background has changed significantly, although after that it remains static. For example, one of the static objects has moved to a different position. The system marks the previous and current places of the object as the foreground. These pixels are usually not foreground pixels, but are a static background. This property allows you to track such background changes and adapt the background model. The model is trained for each pixel in the background, and if it has a static behavior for a long period of time, then its state changes to the state of an untrained background. After a given number of frames (for example, three frames), it becomes a trained background.

Как показано на фиг.1, в одном варианте выполнения фон можно также использовать для обнаружения и компенсации движения камеры на стадии 130. Описанные выше способы можно обобщить для случая движущейся камеры посредством включения быстрого обнаружения глобального движения. Когда часть изображения принимает состояние 430 обученного фона, согласно фиг.4, то вычитание 450 фона можно применять к каждому кадру и алгоритм оценки глобального движения воздействует на найденную маску фона.As shown in FIG. 1, in one embodiment, the background can also be used to detect and compensate for camera movement in step 130. The methods described above can be generalized to the case of a moving camera by enabling fast global motion detection. When a part of the image assumes the trained background state 430, according to FIG. 4, background subtraction 450 can be applied to each frame and the global motion estimation algorithm acts on the found background mask.

На фиг.7 показан вариант выполнения обнаружения и компенсации движения камеры. В одном варианте выполнения признаки кадра выбраны с возможностью отслеживания фона на стадии 710, например, углов 681-693, как показано на фиг.6. Можно использовать оптический поток для отслеживания нескольких сильных признаков фона для определения движения камеры на стадии 720. В одном варианте выполнения технология выбора признаков соответствует технологии, описанной в статье "Хорошие признаки для отслеживания" Джианбо Ши и Карло Томаси, Proc. CVPR, страницы 593-600, 1994. В одном варианте выполнения технология отслеживания признаков соответствует технологии, описанной в "Введение в технологию трехмерного компьютерного зрения" Эмануэля Трукко и Алессандро Верри, Издательство "Prentice Hall", 1998. В качестве альтернативного решения можно использовать другие признаки, выбор признаков и способы отслеживания.7 shows an embodiment of the detection and compensation of camera movement. In one embodiment, the features of the frame are selected with the ability to track the background at step 710, for example, angles 681-693, as shown in Fig.6. An optical stream can be used to track several strong background features to determine camera movement in step 720. In one embodiment, the feature selection technology is consistent with the technology described in the article “Good Features for Tracking” by Gianbo Shi and Carlo Tomashi, Proc. CVPR, pages 593-600, 1994. In one embodiment, the feature tracking technology is in accordance with the technology described in "Introduction to 3D Computer Vision Technology" by Emanuel Trukko and Alessandro Verry, Prentice Hall Publishing House, 1998. As an alternative, you can use other signs, choice of signs and ways of tracking.

После обнаружения глобального движения в фоне, указывающего на движение камеры на стадии 730, модель фона возвращается в исходное состояние на стадии 740 путем перевода всех пикселей в состояние неизвестного фона (например, состояние 410 на фиг.4). Отслеживание признаков обеспечивает хорошую оценку глобального движения при отслеживании точек устойчивым образом в течение длительного времени. Если все пиксели фона потеряны, то можно отслеживать определенный процент пикселей из алгоритма обнаружения изменения. Если обнаруживается ложный конец движения (может иметь место небольшая скорость обнаружения изменения во время движения камеры, например, из-за однородного фона), то на стадиях 110 и 120 локализации движения и обучения, согласно фиг.1, производится фильтрация неправильных величин пикселей. Когда на стадии 760 камера прекращает движение, то снова начинается обучение модели фона для каждой величины пикселя (стадия 120 на фиг.1).After detecting global movement in the background, indicating the movement of the camera in step 730, the background model returns to its original state in step 740 by transferring all the pixels to an unknown background state (for example, state 410 in FIG. 4). Feature tracking provides a good estimate of global movement when tracking points in a sustainable way over time. If all background pixels are lost, then you can track a certain percentage of pixels from the change detection algorithm. If a false end of the movement is detected (there may be a small rate of detecting changes during camera movement, for example, due to a uniform background), then at the stages 110 and 120 of the localization of movement and training, according to FIG. 1, incorrect pixel values are filtered. When the camera stops moving at step 760, the background model training begins again for each pixel value (step 120 in FIG. 1).

Ниже приводятся некоторые результаты экспериментов с использованием способов локализации движения и обучения фону. Следует отметить, что экспериментальные результаты приведены лишь для целей более наглядного описания данного изобретения и не должны ограничивать данное изобретение. В одном варианте выполнения описываемая схема осуществлялась с использованием Библиотеки обработки изображений (Image Processing Library), представленной на рынке компанией Intel^®, и Библиотеки компьютерного зрения с открытым источником (OpenCV) компании Intel в системе, способной обрабатывать 320×240 изображений за 15 мс. Тестирование выполнялось на большом числе видеопоследовательностей, снятых непосредственно видеокамерой USB.The following are some experimental results using methods for localizing movement and learning the background. It should be noted that the experimental results are presented only for the purpose of more descriptive description of the present invention and should not limit this invention. In one embodiment, the described circuit was implemented using the Image Processing Library, marketed by Intel ^® , and Intel's Open Source Computer Vision Library (OpenCV) in a system capable of processing 320 × 240 images in 15 ms. Testing was performed on a large number of video sequences shot directly by a USB video camera.

Пороговое значение D_max локализации движения можно в одном варианте выполнения выбирать согласно фиг.8. На фиг.8 показаны в качестве примера результаты тестирования алгоритма на видеопоследовательности и сравнение этих результатов с сегментацией переднего плана на основе вычитания фона. Величина P₁ представляет процентное содержание пикселей из переднего плана, которые классифицировались как движущиеся пиксели. В альтернативных вариантах выполнения пороговое значение D_max можно выбирать на основе других эмпирических данных или с помощью других способов, например имитирования, моделирования и допущений.The threshold value D _{max of the} localization of movement in one embodiment can be selected according to Fig. 8. Fig. 8 shows, by way of example, the results of testing the algorithm on video sequences and comparing these results with foreground segmentation based on background subtraction. The value of P ₁ represents the percentage of pixels from the foreground that are classified as moving pixels. In alternative embodiments, the threshold value D _max can be selected based on other empirical data or using other methods, such as simulations, simulations, and assumptions.

На фиг.9 показано процентное содержание пикселей фона, сегментированных в качестве переднего плана, полученного с помощью тех же способов. Указанные выше P₁ и Р₂ можно варьировать с использованием параметра D_max. Для D_max=15 вычисляют число n(m) пикселей переднего плана, которые были неправильно классифицированы m раз подряд. Результаты приведены в следующей таблице:Figure 9 shows the percentage of background pixels segmented as a foreground obtained using the same methods. The above P ₁ and P ₂ can be varied using the parameter D _max . For D _max = 15, the number n (m) of foreground pixels that are incorrectly classified m times in a row is calculated. The results are shown in the following table:

МM 1one 22 33 4four 55 66 NN 542542 320320 238238 128128 33 00

Приняв m*=5, получают для приведенных выше неравенства (4) и равенства (5) следующие значения: α~0,25 и β~0,71.Taking m * = 5, we obtain the following values for inequality (4) and equality (5): α ~ 0.25 and β ~ 0.71.

На фиг.10 показан вариант выполнения компьютерной системы (например, системы клиента или сервер) в виде цифровой процессорной системы, представляющей собой, например, рабочую станцию, сервер, персональный компьютер, компактный компьютер, переносной компьютер, персональный цифровой ассистент, радиотелефон, телеприставку и т.д., в которой можно реализовать признаки данного изобретения. Цифровую процессорную систему 1000 можно использовать для таких применений, как видеонаблюдение, видеоконференция, зрение роботов и т.д.Figure 10 shows an embodiment of a computer system (for example, a client or server system) in the form of a digital processor system, which is, for example, a workstation, server, personal computer, compact computer, laptop computer, personal digital assistant, cordless telephone, set top box and etc., in which you can implement the features of the present invention. Digital processor system 1000 can be used for applications such as video surveillance, video conferencing, robot vision, etc.

Цифровая процессорная система 1000 включает одну или более шин или другие средства для передачи данных между компонентами цифровой процессорной системы 1000. Цифровая процессорная система 1000 включает также средства обработки данных, такие как процессор 1002, соединенный с системной шиной для обработки информации. Процессор 1002 может представлять один или более процессоров общего назначения (например, процессор Motorola PowerPC или процессор Intel Pentium) или специальный процессор, такой как цифровой сигнальный процессор (DSP) (например, Texas Instrument DSP). Процессор 1002 может быть выполнен с возможностью выполнения команд для осуществления указанных выше операций и стадий. Например, процессор 1002 может быть выполнен с возможностью обработки алгоритма для локализации движущегося объекта в кадрах видеопоследовательности.The digital processor system 1000 includes one or more buses or other means for transferring data between the components of the digital processor system 1000. The digital processor system 1000 also includes data processing means, such as a processor 1002 connected to the system bus for processing information. The processor 1002 may represent one or more general-purpose processors (e.g., a Motorola PowerPC processor or Intel Pentium processor) or a special processor, such as a digital signal processor (DSP) (e.g., Texas Instrument DSP). The processor 1002 may be configured to execute instructions for performing the above operations and steps. For example, processor 1002 may be configured to process an algorithm to localize a moving object in frames of a video sequence.

Кроме того, цифровая процессорная система 1000 включает системную память 1004, которая может включать оперативную память (RAM) или другое устройство динамической памяти, соединенное с контроллером 1065 памяти, для хранения информации и команд, подлежащих исполнению процессором 1002. Контроллер 1065 управляет операциями между процессором 1002 и устройствами памяти, такими как память 1004. Память 1004 можно также использовать для хранения временных переменных или другой промежуточной информации во время исполнения команд процессором 1002. Память 1004 представляет одно или более устройств памяти, например память 1004, может также содержать постоянную память (ROM) и/или другое устройство статического хранения для хранения статической информации и команд для процессора 1002.In addition, the digital processor system 1000 includes system memory 1004, which may include random access memory (RAM) or other dynamic memory device connected to the memory controller 1065, for storing information and instructions to be executed by the processor 1002. The controller 1065 controls operations between the processor 1002 and memory devices, such as memory 1004. Memory 1004 can also be used to store temporary variables or other intermediate information during execution of instructions by processor 1002. Memory 1004 is provided Includes one or more memory devices, such as memory 1004, may also comprise read-only memory (ROM) and / or another static storage device for storing static information and instructions for processor 1002.

Цифровая процессорная система 1000 может содержать также контроллер 1070 ввода/вывода для управления операциями между процессором 1002 и одним или более устройствами 1075 ввода-вывода, клавиатуру и мышь. Контроллер 1075 ввода-вывода может также управлять операциями между процессором 1002 и периферийными устройствами, например устройством 1007 хранения. Устройство 1007 хранения представляет одно или более устройств хранения (например, привод магнитного диска или привод оптического диска), соединенных с контроллером 1070 ввода-вывода, для хранения информации и команд. Устройство 1007 хранения можно использовать для хранения команд для выполнения указанных выше стадий. Контроллер 1070 ввода-вывода может быть также соединен с базовой системой 1050 ввода-вывода (BIOS) для загрузки цифровой процессорной системы 1000.The digital processor system 1000 may also include an input / output controller 1070 for controlling operations between the processor 1002 and one or more input / output devices 1075, a keyboard, and a mouse. An I / O controller 1075 can also control operations between the processor 1002 and peripheral devices, such as a storage device 1007. The storage device 1007 represents one or more storage devices (eg, a magnetic disk drive or an optical disk drive) connected to an input / output controller 1070 for storing information and instructions. Storage device 1007 may be used to store instructions for performing the above steps. An I / O controller 1070 may also be coupled to a basic input / output system (BIOS) 1050 to boot the digital processor system 1000.

Цифровая процессорная система содержит также видеокамеру 1071 для записи и/или воспроизведения видеопоследовательностей. Камера 1071 может быть соединена с контроллером 1070 ввода-вывода с использованием, например, универсальной последовательной шины (USB) 1073. В качестве альтернативного решения, для соединения камеры 1071 с контроллером 1070 ввода-вывода можно использовать другие типы шин, например противопожарную проводную шину. С контроллером 1070 ввода-вывода может быть соединено дисплейное устройство 1021, такое как катодно-лучевая трубка или жидкокристаллический дисплей, для отображения видеопоследовательностей для пользователя.The digital processor system also comprises a video camera 1071 for recording and / or reproducing video sequences. Camera 1071 can be connected to an I / O controller 1070 using, for example, Universal Serial Bus (USB) 1073. Alternatively, other types of buses can be used to connect camera 1071 to an I / O controller 1070, such as a fire wire bus. A display device 1021, such as a cathode ray tube or a liquid crystal display, may be connected to the input / output controller 1070 to display video sequences to the user.

Устройство 1026 связи (например, модем или плата интерфейса с сетью) может быть также соединено с контроллером 1070 ввода-вывода. Например, устройство 1026 связи может быть платой локальной сети Ethernet, платой локальной сети с маркерным кольцом или другим типом интерфейса для обеспечения линии связи с сетью, с которой цифровая процессорная система 1000 приспособлена устанавливать соединение. Например, устройство 1026 связи можно использовать для приема данных, относящихся к видеопоследовательностям, из другой камеры и/или компьютерной системы или сети.A communication device 1026 (e.g., a modem or network interface card) may also be connected to an input / output controller 1070. For example, the communication device 1026 may be an Ethernet LAN card, a LAN card with a token ring, or another type of interface to provide a communication link to a network with which the digital processor system 1000 is capable of establishing a connection. For example, communication device 1026 can be used to receive data related to video sequences from another camera and / or computer system or network.

Следует отметить, что показанная на фиг.10 архитектура является лишь примером. В альтернативных вариантах выполнения настоящего изобретения можно использовать другую архитектуру для цифровой процессорной системы 1000. Например, контроллер 1065 памяти и контроллер 1070 ввода-вывода могут быть интегрированы в единственный компонент и/или разные компоненты могут быть соединены друг с другом в другие конфигурации (например, непосредственно друг с другом) или с помощью других типов шин.It should be noted that the architecture shown in FIG. 10 is only an example. In alternative embodiments of the present invention, a different architecture can be used for the digital processor system 1000. For example, the memory controller 1065 and the I / O controller 1070 can be integrated into a single component and / or different components can be connected to each other in other configurations (for example, directly with each other) or with other types of tires.

В приведенном выше описании на конкретных примерах был представлен новый и быстрый способ извлечения фона из последовательности изображений с движущимися объектами переднего плана. Способ использует операции обработки изображения и контуров и способен надежно извлекать фон для небольшого числа кадров. Например, способ может работать примерно с 30 кадрами в типичной для видеоконференции видеопоследовательности со статичным фоном и одним человеком на переднем плане. Это является значительным преимуществом для видеоприменений в реальном времени, таких как наблюдение и зрение роботов, по сравнению с системами, известными согласно уровню техники, которые основываются на дорогих вычислительных операциях. Способы, согласно данному изобретению, можно применять для решения широкого круга задач, которые связаны со стационарным фоном и представляющими интерес объектами на переднем плане. Дополнительно к этому, универсальность систем обеспечивает выбор алгоритма обнаружения изменения для потребностей конкретного применения. Такие способы можно также использовать в соединении с сжатием видеоданных с использованием преимущества знания статичных зон в последовательности.In the above description, specific examples presented a new and quick way to extract the background from a sequence of images with moving foreground objects. The method uses image processing and contouring and is able to reliably extract the background for a small number of frames. For example, the method can work with approximately 30 frames in a typical video sequence for a video conference with a static background and one person in the foreground. This is a significant advantage for real-time video applications, such as observation and vision of robots, compared to systems known in the art that rely on expensive computing operations. The methods according to this invention can be used to solve a wide range of problems that are associated with a stationary background and objects of interest in the foreground. In addition, the versatility of systems provides the choice of a change detection algorithm for the needs of a particular application. Such methods can also be used in conjunction with video compression, taking advantage of the knowledge of static zones in sequence.

В приведенном выше описании было представлено изобретение со ссылками на конкретные примеры его выполнения. Однако, очевидно, что возможны различные модификации и изменения без отхода от более широко понимаемых сущности и объема изобретения, представленных в прилагаемой формуле изобретения. В соответствии с этим описание и чертежи следует рассматривать как иллюстрацию, а не для ограничения изобретения.In the above description, the invention was presented with reference to specific examples of its implementation. However, it is obvious that various modifications and changes are possible without departing from the more widely understood essence and scope of the invention presented in the attached claims. Accordingly, the description and drawings should be considered as an illustration, and not to limit the invention.

Claims

1. A method of extracting the background of an image from a video sequence, comprising the following steps: localizing a moving object in a video sequence based on a change in a moving object in a plurality of frames of a video sequence, wherein the moving object occupies frame zones of a changing color; and training a background model for a plurality of frames outside the frame zones of a changing color.

2. The method according to claim 1, characterized in that the localization comprises localizing a moving object using a change detection mask.

3. The method according to claim 1, characterized in that the localization contains

defining a border for a moving object that has a uniform color and constructing a shell around the moving object using the border.

4. The method according to claim 3, characterized in that the definition of the boundary includes the following steps: determining the maximum contour from the set of contours of a moving object, the maximum contour having the largest area from the set of contours; determination of other contours of a moving object and combining the maximum contour with other contours.

5. The method according to claim 4, characterized in that it further comprises the step of eliminating the smallest contour from the union with the maximum contour.

6. The method according to claim 4, characterized in that the combination comprises combining one of the other circuits with a maximum circuit, if the distance between the maximum circuit and one of the other circuits is less than a predetermined distance.

7. The method according to claim 6, characterized in that the frames contain a plurality of pixels and the predetermined distance is based on the probability that a pixel of the plurality of pixels is considered to be moving if it refers to a moving object.

8. The method according to claim 7, characterized in that the predetermined distance is based on the probability that the pixel is considered to be moving, provided that it is static.

9. The method according to claim 3, characterized in that the frames contain many pixels and the shell is designed so that it contains only pixels with a changing color in successive frames.

10. The method according to claim 3, characterized in that the construction of the shell includes the following steps: determining all connected elements at a boundary in which each of the components has a contour having an area; filtering the contour with the smallest area; selecting a contour with a maximum area and combining a contour with a maximum area with other contours of connected components.

11. The method according to claim 1, characterized in that the frames contain many pixels and the training contains a characterization of the color of the pixel at a given point in time by a value based on a state, with each pixel corresponding to one state from many states.

12. The method according to claim 11, characterized in that the plurality of states includes an untrained background state.

13. The method according to claim 11, characterized in that the plurality of states includes the state of the trained background.

14. The method according to claim 11, characterized in that the plurality of states includes a foreground state.

15. The method according to claim 11, characterized in that the plurality of states includes an unknown background state.

16. The method according to claim 11, characterized in that the training includes the following steps: training the background model for the pixel in the foreground and changing the state of the pixel to the state of an untrained background if the pixel exhibits static behavior for a certain period of time.

17. The method according to clause 16, characterized in that it further comprises a state change to the state of the trained background after a given number of frames.

18. The method according to claim 1, characterized in that the video sequence is recorded using a video camera and the method further comprises the following steps: detecting the movement of the video camera and compensating for the movement of the video camera.

19. The method according to p. 18, characterized in that the motion detection comprises the following steps: selecting a frame attribute and tracking frame attributes for a plurality of frames.

20. The method according to claim 19, characterized in that the compensation includes a return to the initial state of the background model when the movement stops.

21. A computer-readable storage medium designed to extract the background of an image from a video sequence, having instructions recorded on it that, when executed by the processor, result in the processor performing the following operations: localization of a moving object in a video sequence based on a change in a moving object over a plurality of frames of a video sequence, while the moving object occupies the frame zone of a changing color; and training a background model for a plurality of frames outside areas of a frame of varying color.

22. Machine-readable storage medium according to item 21, wherein the localization includes the localization of a moving object using a mask for detecting changes.

23. Machine-readable storage medium according to item 21, characterized in that the localization contains the following steps: determining the boundary for a moving object that has a uniform color; and constructing a shell around a moving object using a border.

24. Machine-readable storage medium according to item 23, wherein the definition of the boundary contains the following steps: determining the maximum contour from the set of contours of a moving object, while the maximum contour has the largest area from the set of contours; determination of other contours of a moving object and combining the maximum contour with other contours.

25. Machine-readable storage medium according to paragraph 24, wherein the processor additionally performs the following commands: determining the smallest loop from the set of loops and excluding the smallest loop from the union with the maximum loop.

26. Machine-readable storage medium according to paragraph 24, wherein the combination includes combining one of the other loops with a maximum loop, if the distance between the maximum loop and one of the other loops is less than a specified distance.

27. Machine-readable storage medium according to item 23, wherein the processor executing the shell design includes the processor executing the following commands: determining all connected elements in a boundary in which each of the components has a contour having an area; filtering the contour with the smallest area; selecting a contour with a maximum area and combining a contour with a maximum area with other contours of connected components.

28. The computer-readable storage medium according to item 21, wherein the frames comprise a plurality of pixels and the execution by the learning processor includes the execution by the processor characterizing the color of the pixel at a given point in time by a value based on a state, each pixel corresponding to one state from a plurality of states .

29. The machine-readable storage medium according to p. 28, characterized in that the processor execution of the training includes the execution of the following commands by the processor: training the background model for the pixel in the foreground and changing the state of the pixel to the state of untrained background, if the pixel exhibits a static behavior for a certain period of time.

30. The computer-readable storage medium according to claim 21, wherein the video sequence is recorded by the video camera and in which the processor additionally performs the following commands: detecting the movement of the video camera and compensating for the movement of the video camera.

31. The computer-readable storage medium according to claim 30, characterized in that the execution by the motion detection processor includes the execution by the processor of the following commands: selecting a frame attribute and tracking frame attributes for a plurality of frames.

32. The computer-readable storage medium according to claim 30, wherein the compensation by the processor includes the processor performing a reset of the background model to the initial state when the movement stops.

33. A device for extracting the background image from a video sequence, comprising means for localizing a moving object in a video sequence based on a change in a moving object over a plurality of frames of a video sequence, while the moving object occupies the frame zones of a changing color; and means for training the background model for a plurality of frames outside the zones of the frame of a changing color.

34. The device according to p, characterized in that the means for localization includes means for determining the boundary for a moving object that has a uniform color; and means for constructing a shell around a moving object using a border.

35. The device according to p. 33, wherein the video sequence is recorded by a video camera and in which the device further includes means for detecting the movement of the camera and means for compensating for the movement of the camera.

36. A device for extracting the background image from a video sequence, comprising a processor for executing one or more programs for localizing a moving object in a video sequence based on a change in a moving object over a plurality of frames of a video sequence, while the moving object occupies the frame zones of a changing color; and learning a background model for a plurality of frames outside the zones of the frame of a changing color and a storage device connected to the processor, the storage device having one or more programs stored therein for localizing a moving object and learning a background model.

37. The device according to clause 36, wherein the processor is adapted to run one or more programs to localize a moving object using a change detection mask.

38. The device according to clause 36, wherein the processor is adapted to execute one or more programs for determining the boundary of a moving object that has a uniform color, and for constructing a shell around a moving object using the border.

39. The device according to p. 36, characterized in that it further comprises a display connected to the processor, adapted to display multiple frames of a video sequence.

40. The device according to p. 36, characterized in that it further comprises a camera connected to a processor for recording a plurality of frames of a video sequence.

41. The device according to p, characterized in that the processor is adapted to run one or more programs for detecting the movement of the camera in order to compensate for the movement of the camera.