[go: up one dir, main page]

CN101641717A - Estimating a location of an object in an image - Google Patents

Estimating a location of an object in an image Download PDF

Info

Publication number
CN101641717A
CN101641717A CN200780043330A CN200780043330A CN101641717A CN 101641717 A CN101641717 A CN 101641717A CN 200780043330 A CN200780043330 A CN 200780043330A CN 200780043330 A CN200780043330 A CN 200780043330A CN 101641717 A CN101641717 A CN 101641717A
Authority
CN
China
Prior art keywords
particle
mrow
estimated
msub
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200780043330A
Other languages
Chinese (zh)
Inventor
黄宇
琼·利亚奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Publication of CN101641717A publication Critical patent/CN101641717A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

An implementation provides a method for estimating a location for an object in a particular image of a sequence of images. The location is estimated using a particle-based framework, such as a particle filter. It is determined that the estimated location for the object in the particular image is occluded. A trajectory is estimated for the object based on one or more previous locations of the object in one or more previous images in the sequence of images. The estimated location of the object is changed based on the estimated trajectory.

Description

Estimating the position of an object in an image
Cross Reference to Related Applications
The present application claims priority from each of the following three applications: (1) U.S. provisional application, application serial No. 60/872,145 (attorney docket No. PU060244), entitled "cluttered background and object tracking", filed on 1/12/2006; (2) U.S. provisional application, application serial No. 60/872,146 (attorney docket No. PU060245), entitled "model for object tracking", filed on 1/12 2006; (3) U.S. provisional application, application serial No. 60/885,780 (attorney docket No. PU070030), entitled "object tracking", filed on 19/1 of 2007. The three priority applications are hereby incorporated by reference for all purposes in their entireties.
Technical Field
At least one implementation in the application document relates to dynamic state estimation.
Background
A dynamic system relates to a system whose state changes over time. The system behaves as a collection of arbitrarily selected variables whose states, but which typically include the variable of interest. For example, a dynamic system is used to represent a video. For example, a video may describe a tennis match and the status may be selected for the location of the tennis ball. The system is dynamic in that the position of tennis balls changes over time. It is interesting to estimate the state of the system, i.e. the position of the tennis ball in a new frame of the video.
Disclosure of Invention
According to a general aspect of the invention, the position of an object is estimated in a particular image of a sequence of images. The position is estimated using a particle-based framework. It is determined that the estimated location in the particular image is blocked. The trajectory of the object is estimated based on the position of one or more previous objects in one or more previous images of the sequence of images. The estimated object position is changed based on the estimated trajectory.
The details of one or more implementations are set forth in the accompanying drawings and the description below. While these implementations are illustrated in a particular manner, these implementations may be configured or embodied in other different ways. For example, an implementation is performed as a method, or embodied as an apparatus for performing a set of operations, or embodied as an apparatus for storing instructions for performing a set of operations, or embodied as a signal. Other aspects and features of the present invention can be understood by considering the present invention in conjunction with the accompanying drawings, the claims, and the following detailed description.
Drawings
FIG. 1 is a block diagram illustrating one implementation of a state estimator;
FIG. 2 is a block diagram illustrating one implementation of an apparatus for implementing the state estimator of FIG. 1;
FIG. 3 is a block diagram illustrating one implementation of a system for encoding data based on states estimated by the state estimator of FIG. 1;
FIG. 4 is a block diagram illustrating one implementation of a system for processing data based on states estimated by the state estimator of FIG. 1;
FIG. 5 illustrates various functions performed by an implementation of the state estimator of FIG. 1;
FIG. 6 is a flow diagram illustrating one implementation of a method for determining the position of an object in an image of a sequence of digital images;
FIG. 7 is a flow diagram illustrating one implementation of a method for implementing a particle filter;
FIG. 8 is a flow chart illustrating another method for implementing an ion filter;
FIG. 9 is a flow diagram illustrating one implementation of a method for implementing a dynamic model in the method of FIG. 8;
FIG. 10 is a flow diagram illustrating one implementation of a method for implementing a dynamic model in a particle filter that includes computing motion estimates;
FIG. 11 is a flow diagram illustrating one implementation of a method for implementing a measurement model in a particle filter;
FIG. 12 illustrates one embodiment of a projected trajectory of an object position with an occlusion;
FIG. 13 is a flow diagram illustrating one implementation of a method for determining whether to update a template after estimating a state using a particle filter;
FIG. 14 is a flow diagram illustrating one implementation of a method for determining whether to update a template and improve the position of an object after estimating a state using a particle filter;
FIG. 15 illustrates one implementation of a method for improving an estimated position of an object relative to a projected trajectory;
FIG. 16 is a flow diagram illustrating one implementation of a method for estimating a position of an object;
FIG. 17 is a flow diagram illustrating one implementation of a method for selecting a position estimate;
FIG. 18 is a flow diagram illustrating one implementation of a method for determining particle locations in a particle filter;
FIG. 19 is a flow diagram illustrating one implementation of a method for determining whether to update a template;
FIG. 20 is a flow diagram illustrating one implementation of a method for detecting particle blockage in a particle filter;
FIG. 21 is a flow diagram illustrating one implementation of a method for estimating a state based on particles output by a particle filter;
FIG. 22 is a flow diagram illustrating one implementation of a method for changing an estimated position of an object;
FIG. 23 is a flow diagram illustrating one implementation of a method for determining a position of an object.
Detailed Description
One or more embodiments provide a method for dynamic state estimation. One or more embodiments provide a method for estimating a dynamic state. An embodiment using dynamic state estimation is used to predict motion of a feature of video between frames. An example of video is compressed video, e.g., video compressed using the MPEG-2 format. In compressed video, typically only a subset of the frames include the full information of the image associated with the frame. Such frames, which include all information, are referred to as I-frames in the MPEG-2 format. Most frames provide only information indicating the distinction of the frame and one or more neighboring frames (e.g., neighboring I frames). In the MPEG-2 format, this type of frame is referred to as a P-frame and a B-frame. Providing enough information to predict a change in a feature in video on a data compression basis is a challenge.
For example, a ball in a sporting event may be a feature of a video. Such balls include tennis, soccer and basketball. An example of an application of the method is used to predict the position of a ball between frames of a multi-frame video. This sphere may be a relatively small object, for example, occupying less than 30 pixels of space. Another example of a characteristic of a video is an athlete or referee at a sporting event.
A difficulty in tracking (track) the motion of objects between video frames is the presence of an occlusion (occlusion) of the object within one or more frames. One form of blockage is where the object is hidden behind a feature of the foreground. This blocking is called "real blocking" (real occlusion). For example, in a tennis game, tennis balls may pass behind the players. There are several ways of such blocking, e.g. the object is hidden, obstructed or covered. In another case, the blockage may be present in the form of a background, and the background renders it difficult or impossible to determine the object position. This blocking is called "virtual blocking" (virtual occlusion). For example, tennis balls may pass in front of a cluttered background, such as a crowd, who may have difficulty or inability to select tennis balls from other similar objects because the crowd contains many objects that are close to the size and color of the tennis ball. In another example, a tennis ball may pass in front of an area that is the same color as the tennis ball, and thus it may be difficult or impossible to determine the position of the tennis ball. Including clutter blocking makes it difficult to generate a correct likelihood estimate of particles in a particle filter. Including clutter blocking often results in uncertainty in object tracking.
These problems often become more severe for small objects or fast moving objects. One reason for this is that the position of small objects typically do not overlap among successive images (e.g., frames) of a video, for example. When the object positions do not overlap, the objects themselves do not overlap. This means that the object moves a distance of at least its own width in the time interval of two successive images. It is often difficult to find an object in the next image, or to find an object in the next image without a great degree of confidence, due to the lack of overlap.
The uncertainty of object tracking is not limited to small objects. For example, a cluttered background may contain features similar to the object to be tracked. In this case, uncertainty in tracking is caused regardless of the size of the object.
Determining whether an object is blocked may also be a challenge. For example, one known method for determining the blocking of an object is the inlier/outlier ratio (outlier/outlier ratio). However, in the presence of small objects and/or cluttered backgrounds, it may also be difficult to determine whether an object is blocked, typically by a point/outlier ratio.
One implementation addresses these challenges by forming a metric surface (metric surface) in a particle-based framework (particle-based framework). Another implementation addresses these challenges by using and computing motion estimates in a particle-based framework. Yet another implementation addresses these challenges by using multiple hypotheses (multiple hypothesies) in likelihood estimation.
In a particle-based framework, monte carlo simulations (monte carlo simulation) are typically used for many particles. For example, the particles represent different possible positions of the object in a frame. A particular particle is selected based on the likelihood determined by the monte carlo simulation. A Particle Filter (PF) is a typical particle-based framework. In the particle filter, particles are generated that represent possible states corresponding to possible positions of an object in an image. Each particle is associated with a likelihood (also referred to as a weight) in the particle filter. In a particle filter, particles with a lower probability or weight are excluded in one or more resampling steps. For example, the state representing the output of the particle filter may be a weighted average of the amount of particles.
In one implementation shown in FIG. 1, the system 100 includes a state estimator 110, which may be implemented on a computer. The state estimator 110 includes a particle algorithm module 120, a local mode module 130, and a quantity adaptation module 140. The particle algorithm module 120 executes a particle-based algorithm for estimating the dynamic system state, e.g., the algorithm is a particle filter. The local mode module 130 applies a local lookup mechanism, for example, by performing mean-shift analysis (mean-shift analysis) on the particles of the PF. The quantity adaptation module 140 modifies the quantity of particles used by the particle-based algorithm, for example, by applying a Kullback-leibler distance (KLD) sampling process to the particles of the PF. In one implementation, the particle filter enables adaptive sampling based on the size of the state space in which the particles are located. For example, if a particle is found to exist in a very small portion of the state space, a small number of particles are sampled. If the state space is large or the state uncertainty is high, a large number of particles are sampled. The module 120 and 140 may be implemented separately or integrated into an algorithm.
State estimator 110 takes as input start state 150 and data input 160 and provides as output estimated state 170. The start state 150 may be determined, for example, by a start state detector or a manual process. The present invention provides more specific examples by considering a system whose state is the position of an object in an image (e.g., a frame of video) of a sequence of digital images. In such systems, the starting object position may be determined by an automatic object detection method using edge detection and template comparison, or manually by the user viewing the video. The data input 160 may be a sequence of video images. The estimated state 170 may be an estimate of the location of the ball in a particular video image.
Fig. 2 illustrates an exemplary device 190 for implementing the state estimator 110 of fig. 1. The apparatus 190 comprises a processing means 180, the processing means 180 being arranged to receive the start state 150 and the data input 160 and to provide the estimated state 170 as an output. The processing means 180 has access to a storage means 185. the storage means 185 is arranged to perform storage of data relating to a particular image of the sequence of digital images.
The estimated state 170 may be applied to a variety of uses. To provide further context, several application examples are described using fig. 3 and 4.
In one implementation shown in fig. 3, system 200 includes an encoder 210 coupled to a transmission/storage device 220. The encoder 210 and the transmission/storage 220 may be implemented using a computer or communication encoder. The encoder 210 reads the estimated states 170 provided by the state estimator 110 of the system 100 of fig. 1 and reads the data input 160 used by the state estimator 110. Encoder 210 encodes data input 160 according to one or more of a variety of encoding algorithms and provides an encoded data output 230 to transmit/store 220.
Further, the encoder 210 differentially encodes different portions of the data input 160 using the estimation state 170. For example, if the state represents the position of an object in the video, the encoder 210 encodes a portion of the video that is opposite the estimated position using a first encoding algorithm and another portion of the video that is not opposite the estimated position using a second encoding algorithm. The first encoding algorithm may provide more coding redundancy than the second encoding algorithm, so that the estimated position of the object (preferably the object) will have more detail and better resolution than the rest of the video during the regeneration process.
Thus, the low resolution transmission may provide a higher resolution for the object being tracked, allowing the user to more easily view the golf ball in the game of golf. One such implementation allows for viewing of a golf game on a mobile device with less bandwidth (low data rate). The mobile device may be a cellular telephone or a personal digital assistant. The video of the golf game is encoded using a low data rate to maintain the data rate at a lower level, but additional bits are used to encode the golf ball relative to other portions of the image.
The transmission/storage device 220 may include one or more of a storage device and a transmission device. Accordingly, the transmission/storage device 220 reads the encoded data 230 and either transmits the data 230 or stores the data 230.
In one implementation shown in fig. 4, system 300 includes a processing device 310 connected to a local storage device 315 and a display 320. The processing device 310 reads the estimated state 170 provided by the state estimator 110 of the system 100 of fig. 1 and reads the data input 160 used by the state estimator 110. The processing device 310 uses the estimated state 170 to enhance the data input 160 and provides an enhanced data output 330. The processing device 310 may store data including the estimated states, data inputs, and their elements to the local storage device 315 and may retrieve such data from the local storage device 315. Display 320 reads enhanced data output 330 and displays the enhanced data on display 320.
In FIG. 5, a graph 400 is a probability distribution function 410 of states of a dynamic system. Diagram 400 schematically depicts various functions performed by one implementation of state estimator 110. Graph 400 shows one or more functions at levels A, B, C and D.
Level a shows the generation of 4 particles a1, a2, A3 and a4 by using PF. For convenience, the corresponding positions of each of the 4 particles a1, a2, A3, and a4 on the probability distribution function 410 are indicated using vertical dashed lines.
Level B shows the transfer of 4 particles a1-a4 to their corresponding particles B1-B4 based on mean shift analysis using a local pattern search algorithm. For convenience, the corresponding positions of each of the 4 particles B1, B2, B3, and B4 on the probability distribution function 410 are indicated using vertical solid lines. The transfer of each of the particles a1-a4 is graphically illustrated using respective arrows MS1-MS4, which indicate the movement of the particles from the position indicated by the particles a1-a4 to the position indicated by the particles B1-B4, respectively.
Level C shows weighted particles C2-C4, which have the same positions as particles B2-B4, respectively. The particles C2-C4 have different sizes, the sizes of which indicate the weights determined in the PF for the particles B2-B4. Level C also reflects a reduction in the number of particles according to a sampling process, e.g., the KLD sampling process, in which particles B1 are discarded.
Level D shows three new particles generated in one resampling process. The number of particles generated in level D is the same as the number of particles generated in level C, as indicated by arrow R (R represents resampling).
Fig. 6 shows a process flow 600 of a method for determining the position of an object in images of a sequence of digital images. The trajectory of the object is estimated based on the position information of the previous frame 605. Trajectory estimation methods are known to those skilled in the art. The particle filter 610 is operated. Various implementations of particle filters are described below. It is checked whether the position of the object predicted by the output of the particle filter is blocked 615. The implementation of the method for checking whether blocked or not will be described below. If a blockage is found 620, then a location is determined using trajectory projection and interpolation 625. One implementation of position determination is described below in conjunction with fig. 16. If no blockage is found, the output of the particle filter is used to determine the particle location 630. If no blocking is found, the template is checked for deviation (drift) 635. A deviation is a change in the template, for example, an object going far, near, or color change, causing a change in the template. If the deviation is found to exceed a threshold 635, the object template 640 is not updated. This is useful because a large deviation value means that there is a particle blockage. Updating the template based on particle blocking may cause a difference template to be used. Conversely, if the deviation does not exceed the threshold, the template is updated 645. When small variations (small deviation values) occur, these variations are largely true variations that occur on the object, and the variations are not due to blocking.
Fig. 7 illustrates a method 500 of implementing a particle filter. The method 500 includes reading a starting set of particles and the cumulative weighting factor from a previous state 510. The cumulative weighting factor may be generated from a set of particle weights and generally allows for fast processing. It is noted that when the method 500 is performed for the first time, the previous state is the starting state, and the starting set of particles and weights (cumulative weighting factors) need to be generated. For example, the start state can be used as the start state 150 of FIG. 1.
Turning again to FIG. 7, a loop control variable "it" is initialized 515 and the loop 520 is repeated until the current state is determined. Loop 520 uses the loop control variable "it" and executes "iteration" times. In loop 520, each particle in the starting set of particles is processed individually in a loop 525. In one implementation, a PF is applied to a video of a game of tennis for tracking tennis balls. Loop 520 is executed a predetermined number of times (the value of loop repeat variable) for each new frame. Each execution of loop 520 improves the position of the particle such that when the position of the tennis ball is estimated for each frame, the estimate is presumed to be based on good particles.
Loop 525 includes selecting particles 530 based on the cumulative weighting factor. One method of choice is to select the remaining particle positions with the greatest weight. It is noted that many particles may be present at the same location, in which case the loop 525 need only be performed once for each location. Loop 525 then includes updating 535 the particle by predicting a new position in the state space for the selected particle. The prediction uses a dynamic model of the PF. This step will be explained in detail below.
The dynamic model is characterized by changes in the state of the object from frame to frame. For example, a motion model or motion estimation that reflects the amount of motion of the object may be used. In one implementation, a fixed constant velocity model with fixed noisy variations is adapted to the position of the object in the past frame.
Loop 525 then includes determining the weights of the updated particles using the measured model of the PF 540. Determining the weights includes analyzing observed/measured data (e.g., video data of the current frame). Continuing with the tennis match as an example, the data of the indicated position of the particle in the current frame is compared with the data of the previous position of the tennis ball. This comparison may include analyzing a color histogram or performing edge detection. The weight of this particle is determined based on the result of the comparison. Operation 540 also includes determining a cumulative weighting factor for the particle locations.
Loop 525 then includes determining 542 whether there are additional particles to process. If there are more particles to process, then loop 525 is repeated and method 500 jumps to operation 530. After the loop 525 is completed for all particles in the starting set (or old set) of particles, a complete set of updated particles is generated.
Loop 520 then includes using the resampling algorithm to generate a new set of particles and a new cumulative weighting factor 545. The resampling algorithm is based on the weight of the particles, so that particles with larger weights are concerned. The resampling algorithm generates a set of particles in which all particles have the same weight, but many particles are located at certain positions. These locations typically have different cumulative weighting factors.
Resampling may also alleviate the degradation problem common in PFs. There are many methods available for resampling, such as multiple item resampling, residual resampling, hierarchical resampling, and systematic resampling. One implementation uses residual resampling because residual resampling is insensitive to particle order.
Loop 520 continues to run by incrementing loop control variable "it" 550 and comparing "it" 555 to a repeating variable "iterate". If it is also necessary to continue running loop 520, the new set of particles and their cumulative weighting factors are made available 560.
After "iterate" is performed on loop 520, this set of particles should be a good set of particles and the current state 565 determined. The new state is determined by averaging the particles in the new set of particles.
FIG. 8 illustrates another implementation of a process flow including a particle filter. The overall process flow is similar to that described above in connection with fig. 7. The same elements in fig. 7 and 8 will not be described in detail below. The method 800 includes obtaining a starting set of particles and cumulative weighting factors 805 from a previous state. The loop control variable "it" is initialized 810 and the loop is repeatedly executed before determining the current state. In this loop, a particle is selected based on the cumulative weighting factor. The method then updates 820 the particle by predicting a new position in the state space for the selected particle. The prediction uses a dynamic model of the PF.
Then, a correlation surface (correlation surface) is used to look up the local mode of the particle, such as SSD-based correlation surface 825. A local minimum of the SSD is identified and then the location of the particle is changed to the identified local minimum of the SSD. Another implementation identifies a local maximum (local maximum) of this surface using a suitable surface, and then changes the position of the particle to the identified local maximum. The weight of this moving particle is then determined 830 in the measurement model. As an example, a method of calculating the weight using the correlation plane and the multiple hypotheses will be described below. If there are additional particles to process 835, the loop returns to pick one particle. If all particles have been processed, the particles are resampled based on the new weights and a new set of particles is generated 840. The value 845 of the loop control variable "it" is incremented. If "it" is less than the repeat threshold 850, the method transitions 870 the old particle set and the new particle set and repeats the process described above.
If all iterations have been performed, a step needs to be performed before the current state is obtained. An occlusion indicator (855) for the object is checked in the previous frame. If the block indicator indicates that there is a block in the previous frame, then a subset of the particles are considered for selecting the current state 860. The particular largest weighted particle is selected into the subset of particles. In one embodiment, the subset of particles is the one with the largest weight. If the number of particles with the same maximum weight exceeds 1, then these particles with the maximum weight are all selected into the subset of particles. The state of the particle may be considered the detection state. The subset of particles is chosen because blocking reduces the reliability of particles with lower weights. If the blocking indicator indicates that there is no blocking in the previous frame, the average amount of the new set of particles is used to determine the current state 865. In this case, the state of the particle is a tracking state. It can be appreciated that the average amount may be weighted according to the weight of the particle. It can also be appreciated that other statistical measures besides average quantities (such as averages) can be used to determine the current state.
FIG. 9 illustrates one implementation 900 of the dynamic model 820 of FIG. 8. In the dynamic model, motion information in the previous frame is used. By using the motion information of the previous frame, the particles will be more likely to approach the actual position of the object, thereby improving efficiency, accuracy, or both. In a dynamic model, as an alternative, random walking may be used to generate particles.
The dynamic model may use a state space model for small object tracking. For example, at time t, a state space model for tracking a small object in one image of a sequence of digital images is formulated as:
Xt+1=f(Xt,μt),
Zt=g(Xt,ξt),
wherein, XtRepresenting the object state vector, ZtRepresenting the observation vector, f and g representing two vector-valued functions (dynamic model and observation model, respectively), μtAnd xitRepresenting processing or dynamic noise and observation noise, respectively. In motion estimation, an object state vector is defined as X ═ X, y, where (X, y) is the coordinates of the center of the object window. Preferably, the estimated motion is obtained from the data of the previous frame and possibly estimated using optical flow equations. By VtRepresenting the estimated motion of an object in the image at time t. The dynamic model may be represented as:
Xt+1=Xt+Vtt
prediction of noise variance (variance of prediction noise) mutThe prediction noise variance may be estimated from the motion data, e.g. from erroneous measurements of the motion estimation. The motion residual in the optical flow equation can be used. Alternatively, the predicted noise variance may be a luminance-based criterion, such as a motion-compensated residual; however, methods based on motion data may be preferred over methods based on variance of luminance data.
As shown at block 905, a stored blockage indicator is read for each particle. The blocking indicator indicates whether the object was blocked in the previous frame. If the read block indicator indicates that the object is blocked 910, then no motion estimation 925 is applied in the dynamic model. It can be appreciated that blocking impairs the accuracy of the motion prediction. The predicted noise variance value for the particle may be set to the maximum amount 930. In contrast, if the read block indicator indicates that there is no block in the previous frame, then motion prediction is used to generate particle 915. The prediction noise variance is estimated from the motion data using a method of predicting noise variance 920.
FIG. 10 illustrates one implementation 1000 of the process flow performed for each particle in the dynamic model of the particle filter before sampling. Initially, the in-memory block indicator is checked 1005. The blocking indicator is used to indicate blocking of an object in a previous frame. If the object was found to be blocked 1010 in the previous frame, then the motion estimation is not used 1030 in the dynamic model and the predicted noise variance of the particle is set to the maximum amount 1035. If the stored blocking indicator does not indicate that the object was blocked in the previous frame, motion estimation is performed 1015.
Motion estimation may be based on using the position of objects in past frames in the optical flow equations. Optical flow equations are known to those skilled in the art. After motion estimation, failure detection 1020 is performed on the particle locations resulting from the motion estimation. A variety of metric methods may be used for failure detection. In one implementation, an average of the absolute luminance differences between the object image to which the template corresponds and the image block whose center is located around the particle position obtained from the motion estimation is calculated. If the average magnitude exceeds a selected threshold, then the motion prediction is deemed to have failed 1025 and the motion prediction result is not used 1030 for the particle. The predicted noise variance of the particle may be set to a maximum amount 1035. If the motion prediction is not deemed to have failed, the motion prediction result is saved 1040 as a prediction for the particle. The predicted noise variance is then estimated 1045. For example, an optical flow equation may be used to provide a motion residual value, which may be used as a prediction noise variance.
FIG. 11 depicts one implementation of calculating particle weights using a measurement model. Method 1100 is performed for each particle. The method 1100 begins with the computation of a metric surface, which may be, for example, the correlation surface indicated by block 1105. The metric surface is used to measure the difference between the template or target model and the current candidate particle. In one implementation, the metric surfaces may be generated as described below.
The metric for the difference between the template and the candidate particle may be a metric surface, such as a correlation surface. One implementation of the present invention uses a Sum of Squared Differences (SSD) surface, which is formulated as follows,
<math> <mrow> <msub> <mi>Z</mi> <mi>t</mi> </msub> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>min</mi> </mrow> <mrow> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>&Element;</mo> <mi>Neib</mi> </mrow> </munder> <munder> <mi>&Sigma;</mi> <mrow> <mi>&chi;</mi> <mo>&Element;</mo> <mi>W</mi> </mrow> </munder> <msup> <mrow> <mo>[</mo> <mi>T</mi> <mrow> <mo>(</mo> <mi>&chi;</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>&chi;</mi> <mo>+</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mn>2</mn> </msup> </mrow> </math>
wherein W represents the object window and Neib is the object center XtA small peripheral area around. T is the object template and I is the image in the current frame. In the presence of impuritiesIn small objects with a cluttered background, this surface does not represent an accurate likelihood estimate. Further exemplary correlations are as follows:
<math> <mrow> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>&chi;</mi> <mo>&Element;</mo> <mi>W</mi> </mrow> </munder> <msup> <mrow> <mo>[</mo> <mi>T</mi> <mrow> <mo>(</mo> <mi>&chi;</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>&chi;</mi> <mo>+</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>&Element;</mo> <mi>Neib</mi> </mrow> </math>
the sizes of the relevant faces may vary. The size of the correlation surface may vary based on the quality of the motion estimation. Wherein the quality of the motion estimation can be determined as the inverse of the variance. In general, the higher the quality of the motion estimation, the smaller the correlation surface.
Multiple hypotheses for particle motion are generated based on the metric surfaces 1110. The candidate hypotheses are associated with a local minimum or maximum of the relevant surface. For example, if J candidates for SSD correlation surfaces are identified in the region Neib, then J +1 hypotheses are defined as:
H0={cj=C:j=1,...,J}
Hj={cj=T,ci=C:i=1,...,J,i≠j},j=1,…,J,
wherein, cjT means that the jth candidate is a true match, cjThe opposite is true for C. Suppose H0Meaning that none of the candidates is a true match. In this implementation, we assume that the clutter is evenly distributed over the peripheral region Neib. Otherwise, the true match-oriented measurement (true match-oriented measurement) is gaussian distributed.
Based on the above assumptions, the likelihood associated with each particle is expressed as
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>q</mi> <mn>0</mn> </msub> <mi>U</mi> <mrow> <mo>(</mo> <mo>&CenterDot;</mo> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>C</mi> <mi>N</mi> </msub> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>J</mi> </msubsup> <msub> <mi>q</mi> <mi>j</mi> </msub> <mi>N</mi> <mrow> <mo>(</mo> <msub> <mi>r</mi> <mi>t</mi> </msub> <mo>,</mo> <msub> <mi>&sigma;</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
<math> <mrow> <msub> <mi>q</mi> <mn>0</mn> </msub> <mo>+</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mi>J</mi> </mrow> </msub> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> </mrow> </math>
Wherein, CNAs a normalization factor, q0Is to assume H0Prior probability of (q), qjIs to assume HjJ is 1, …, J. Accordingly, with the use of multiple hypotheses, the measure of likelihood of using an SSD is further refined, thereby being able to take clutter into account.
In block 1115, a response distribution variance (response distribution variance) is estimated.
A particle blocking determination is performed for determining whether the particle is blocked. The particle blocking determination may be based on a brightness-based evaluation 1120, such as a Sum of Average Differences (SAD) metric. The metric is used to compare the object template to the candidate particles. The person skilled in the art knows this brightness-based evaluation. Based on the SAD, a determination is made of those particles that are likely to be blocked. The brightness-based assessment of occlusion is relatively computationally simple, but in the case of cluttered backgrounds, the assessment may not be very accurate. By setting a large threshold, some particles may be determined to be blocked particles by using a brightness-based evaluation 1125. For these particles, their weights are set to a minimum amount 1130. In this case, there is a high possibility that the blocking occurs. For example, selecting a threshold makes it possible to identify a situation of real occlusion without clutter, but not identify an occlusion of another situation.
If the brightness-based evaluation indicates no blockage, a probabilistic particle blockage determination is performed 1135. Probabilistic particle blocking detection may be based on the generated multi-hypothesis and response distribution variance estimates. A distribution is generated that approximates the SSD surface, and based on the generated distribution, the eigenvalues of the covariance matrix are used to determine the presence or absence of blocking. Further explanation will be given below.
The response distribution is defined to approach the probability distribution over the true matching locations. In other words, the probability D that the particle position is the true match position is:
D(Xt)=exp(-ρ·r(Xt)),
where ρ is the normalization factor. The normalization factor is selected to ensure a selected maximum response, such as a maximum response of 0.95. Constructed from the response distribution and measured ZtCorrelated covariance matrix RtComprises the following steps:
<math> <mrow> <msub> <mi>R</mi> <mi>t</mi> </msub> <mo>=</mo> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msub> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>Neib</mi> </mrow> </msub> <msub> <mi>D</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>-</mo> <msub> <mi>x</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mtd> <mtd> <msub> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>Neib</mi> </mrow> </msub> <msub> <mi>D</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>-</mo> <msub> <mi>x</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>y</mi> <mo>-</mo> <msub> <mi>y</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>Neib</mi> </mrow> </msub> <msub> <mi>D</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>-</mo> <msub> <mi>x</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>y</mi> <mo>-</mo> <msub> <mi>y</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> </mtd> <mtd> <msub> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>Neib</mi> </mrow> </msub> <msub> <mi>D</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <mi>y</mi> <mo>-</mo> <msub> <mi>y</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mtd> </mtr> </mtable> </mfenced> <mo>/</mo> <msub> <mi>N</mi> <mi>R</mi> </msub> </mrow> </math>
wherein (x)p,yp) Is the center of each candidate window, NR=∑(x,y)∈NeibDt(x, y) is a covariance normalization factor. Using covariance matrix RtThe reciprocal of the characteristic value of (c) is used as a confidence measure associated with the candidate. In one implementation, the covariance matrix R is divided intotIs compared to a threshold value. If the maximum eigenvalue exceeds the threshold, then the presence of blocking is detected. In response to detecting the blockage 1140, the particles are given a minimum effective weight 1130, which is typically not 0. If no blockage is detected, a likelihood is calculated.
In one implementation, if an occlusion is detected, the method generates particle probabilities based on brightness and motion without considering trajectory, rather than setting the weight or probability to a minimum. On the other hand, if no blocking is detected, the possibility of generating particles based on brightness.
In one implementation, the weight to be assigned to the particle is based at least in part on consideration of a portion of the image of the periphery of the indicated location of the particle. For example, for a given particle, a slice of the object template, e.g., a 5 x 5 block of pixels, is compared to the location and other regions indicated by the particle. The comparison may be based on Sum of Absolute Differences (SAD) matrices or histograms, especially for large objects. The object template is compared with the image around the indicated position of the particle. If the off-position comparison (off-position comparison) shows a significant difference, a greater weight is given to the particles. On the other hand, if the region indicated by the particle is very similar to the other regions, the weight of the particle is reduced accordingly. Based on the comparison, a correlation surface, such as an SSD, is generated to model the deviation location area.
If the determination is that the particle is not occluded, then the trajectory likelihood is estimated 1145. The determination of weighting is used to estimate the particle weights 1150.
The weighted determination includes one or more of a luminance likelihood (e.g., template matching), a motion likelihood (e.g., linear extrapolation of past positions of the object), and a trajectory likelihood. The particle filter may consider these factors to determine a likelihood or weight for each particle. In one implementation, we assume that the motion of the camera does not affect the trajectory smoothing (and thus the trajectory likelihood). In one implementation, the particle probability is defined as follows,
P ( z t | X t ) = P ( Z t int | X t ) P ( Z t mot | X t ) P ( Z t trj | X t ) ,
wherein,
Figure A20078004333000242
Zt intis based on a measurement of the brightness of the SSD surface, Zt motIs possibility of movement, Zttr jIs the trajectory possibility. These three values are generally considered independent. The luminance possibilities P (Z) are known to the person skilled in the artt int|Xt) And (4) calculating.
The motion likelihood is calculated based on the difference of the particle position change (velocity) and the average change of the object position within the current frame:
<math> <mrow> <msubsup> <mi>d</mi> <mi>mot</mi> <mn>2</mn> </msubsup> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mo>|</mo> <mi>&Delta;</mi> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>|</mo> <mo>-</mo> <mover> <mi>&Delta;x</mi> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mo>|</mo> <mi>&Delta;</mi> <msub> <mi>y</mi> <mi>t</mi> </msub> <mo>|</mo> <mo>-</mo> <mover> <mi>&Delta;y</mi> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <mi>t</mi> <mo>></mo> <mn>1</mn> </mrow> </math>
wherein (Δ x)t,Δyt) Is about (x)t-1,yt-1) Is the average velocity of the object within the selected current frame, (ax, ay), i.e.,
<math> <mrow> <mover> <mi>&Delta;x</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <msub> <mi>x</mi> <mi>s</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mi>s</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <mo>/</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math> <math> <mrow> <mover> <mi>&Delta;y</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mo>|</mo> <msub> <mi>y</mi> <mi>s</mi> </msub> <mo>-</mo> <msub> <mi>y</mi> <mrow> <mi>s</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>|</mo> <mo>/</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </math>
thus, the distance d between the position and the particle position, which can be predicted based on the dynamic modelmot(e.g., euclidean distance) to calculate the motion likelihood. The calculation formula is as follows,
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mi>t</mi> <mi>mot</mi> </msubsup> <mo>|</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msqrt> <mn>2</mn> <mi>&pi;</mi> </msqrt> <msub> <mi>&sigma;</mi> <mi>mot</mi> </msub> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <mo>-</mo> <msubsup> <mi>d</mi> <mi>mot</mi> <mn>2</mn> </msubsup> </mrow> <mrow> <mn>2</mn> <msubsup> <mi>&sigma;</mi> <mi>mot</mi> <mn>2</mn> </msubsup> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>
in one implementation, the trajectory smoothing likelihood may be estimated from the proximity of the particles with respect to a trajectory calculated based on a series of positions of the object in a current frame of the video. The trajectory function is expressed as y ═ f (x), and its parametric form is as follows,
<math> <mrow> <mi>y</mi> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>m</mi> </msubsup> <msub> <mi>a</mi> <mi>i</mi> </msub> <msup> <mi>x</mi> <mi>i</mi> </msup> </mrow> </math>
wherein, aiRepresenting a polynomial parameter, m is the order of the polynomial function (e.g., m-2). In the calculation of the trajectory function, the formula may be modified. The first modification is a case where if the object position corresponds to the blocking state in one frame in the past, the object position is ignored or not considered. A second modification is the case where a weighting factor, called forgetting factor, is calculated to weight the proximity of the particles with respect to the trajectory. The greater the number of frames in which an object is blocked, the less reliable the estimated trajectory and, correspondingly, the greater the forgetting factor.
The forgetting factor is only a confidence value. The user assigns values to the forgetting factor based on a variety of considerations. For example, these considerations include whether an object was blocked in a previous image, the number of previous frames in which the object was blocked, the number of previous frames in which the object was continuously blocked, and the reliability of the non-blocking data. Each image may have a different forgetting factor.
In one example implementation, the following trajectory smoothing possibilities are given:
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>Z</mi> <mi>t</mi> <mi>trj</mi> </msubsup> <mo>|</mo> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msqrt> <mn>2</mn> <mi>&pi;</mi> </msqrt> <msub> <mi>&sigma;</mi> <mi>trj</mi> </msub> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <mo>-</mo> <msup> <mrow> <mo>[</mo> <msub> <mi>d</mi> <mi>trj</mi> </msub> <mo>/</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>f</mi> </msub> <mo>)</mo> </mrow> <mrow> <mi>t</mi> <mo>_</mo> <mi>ocl</mi> </mrow> </msup> <mo>]</mo> </mrow> <mn>2</mn> </msup> </mrow> <mrow> <mn>2</mn> <msubsup> <mi>&sigma;</mi> <mi>trj</mi> <mn>2</mn> </msubsup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow> </math>
wherein the value of proximity is dtrj=|y-f(x)|,λfIs a forgetting ratio of artificial choice, 0 < lambdaf< 1 (e.g.. lambda.)f0.9), t _ ocl is the number of the most recent frames in which the object was blocked.
In one implementation, if it is determined that an object was blocked in a previous frame, a particle likelihood is determined based on the luminance likelihood and the trajectory likelihood, where the determination of the particle likelihood does not take into account the motion likelihood. If it is determined that the object was not blocked in the previous frame, a particle likelihood is determined based on the luminance likelihood and the motion likelihood, where the determination of the particle likelihood does not take into account the trajectory likelihood. It is often beneficial to do so. This is because providing a trajectory limit generally provides less benefit when the position of the object is known in the previous frame. And the introduction of trajectory limits destroys the temporal markov chain assumption, i.e. the use of trajectory limits makes the subsequent states dependent on the state in the frame, not on the previous frame. If the object is blocked or the motion estimation is determined to be below a threshold, then there is no benefit to include the motion probability in the determination of the particle probability. In this implementation, the particle probability is expressed as follows,
P ( Z t | X t ) = P ( Z t int | X t ) P ( Z t mot | X t ) O t - 1 P ( Z t trj | X t ) 1 - O t - 1
wherein if the object is blocked, OtOtherwise, it is 1.
FIG. 12 illustrates adapting a trajectory of an object relative to a position of the object in a video frame. Elements 1205, 1206, and 1207 represent the position of a small object in three frames of the video. Elements 1205, 1206, and 1207 are in one zone 1208 and are not blocked. Elements 1230 and 1231 represent the position of the small object in two video frames after the frame represented by elements 1205, 1206, and 1207. Elements 1230 and 1231 are in zone 1232 and are determined to be occluded, thus having a high level of uncertainty with respect to the determined position. Therefore, in fig. 12, t _ ocl is 2. 1210 are actual trajectories. The actual trajectory is projected onto a predicted trajectory 1220.
FIG. 13 illustrates a process flow for one implementation of a template. At the beginning of the process flow of FIG. 13, a new state of the object is estimated. For example, a particle filter may be used for the estimation. This new estimated state corresponds to the estimated position of the object in a new frame. The process flow 1300 of fig. 13 may be used to determine whether to estimate the state of the next frame using an existing template. As shown in step 1305, occlusion detection is performed at the newly estimated position of the object in the current frame. If block 1310 is detected, a block indicator 1330 is set in memory. For example, the indication of the blocking indicator may be used by the particle filter for subsequent frames. If no blockage is detected, process flow continues with detecting deviation (drift) 1315. In one implementation, the deviation may exist as a residual of motion between the image of the object in the new frame and the starting template. If the deviation exceeds a threshold 1320, template 1335 is not updated. If the deviation does not exceed the threshold, the template is updated using the object window of the current frame. And the object motion parameters are also updated.
FIG. 14 is a flow diagram illustrating another implementation of a process flow 1300 for updating an object template and improving a position estimate. In process flow 1400, after determining the current object state, occlusion detection 1405 is performed on the determined object position and the current frame. If an obstruction is detected 1410, the estimated object position is modified. Such a modification is beneficial because blocking may reduce the confidence level that the determined object position is correct. Thus, improved position estimation is useful. In one example, the determination of blockage may be based on the presence of clutter and the determined object location may actually be some of the clutter locations.
The modification is achieved by using information related to trajectory smoothing. Using the information of the position data of the previous frame, the object position is projected on a determined trajectory 1415. For example, a straight line projection using a constant velocity may be employed. The location is improved 1420.
Fig. 15 shows a process of projecting an object position on a trajectory and improving the object position. 1505 is a track. Position 1510 represents the position of the object in the previous frame. Data point 1515 represents position X in a previous frame at time jj. Data point 1520 represents position X in a previous frame at time ii. Data points 1510, 1515, and 1520 represent object positions that are not blocked and, therefore, are relatively high quality data. Data points 1525, 1530, 1535, and 1540 represent the location of the object in the previous frame, but all suffer from blockage. Accordingly, these data points are either ignored or given a small weight in the trajectory calculation process. The trajectory 1505 is generated by fitting these data points, wherein some of the data points are weighted because of the blockage.
The straight line and constant velocity are used to perform the initial calculation of the object position in the current frame at time cur. The formula used is as follows:
X ^ cur = X i + ( X i - X j ) * ( cur - i ) / ( i - j ) .
this is a straight line projection 1550 (also known as linear extrapolation). The estimated starting position 1545 (also referred to as linear position estimate) of the current frame is obtained by using straight line prediction. Then, the estimated start position of the current frame is projected to the position of the calculated trajectory
Figure A20078004333000272
(also referred to as proxels).
Figure A20078004333000273
Is the distance on the track
Figure A20078004333000274
The closest point. The formula used for this prediction is as follows:
<math> <mrow> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>cur</mi> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msup> <msub> <mi>&lambda;</mi> <mi>f</mi> </msub> <mrow> <mi>t</mi> <mo>_</mo> <mi>ocl</mi> </mrow> </msup> <mo>)</mo> </mrow> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>cur</mi> </msub> <mo>+</mo> <msub> <mover> <mi>X</mi> <mo>~</mo> </mover> <mi>cur</mi> </msub> <mo>*</mo> <msup> <msub> <mi>&lambda;</mi> <mi>f</mi> </msub> <mrow> <mi>t</mi> <mo>_</mo> <mi>ocl</mi> </mrow> </msup> <mo>.</mo> </mrow> </math>
wherein λ isfIs forgetting ratio, 0 < lambdaf< 1 (e.g.. lambda.)f0.9), t _ ocl is the number of frames in which an object is blocked since the last object was visible. In one implementation, the projection may be
Figure A20078004333000281
And
Figure A20078004333000282
to interpolate a point on the replaced trajectory. Thus, the projection will be at
Figure A20078004333000283
And
Figure A20078004333000284
on a line therebetween. In this implementation, the projection is represented as follows:
<math> <mrow> <msub> <mi>X</mi> <mi>cur</mi> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msup> <msub> <mi>&lambda;</mi> <mi>f</mi> </msub> <mrow> <mi>t</mi> <mo>_</mo> <mi>ocl</mi> </mrow> </msup> <mo>)</mo> </mrow> <msub> <mover> <mi>X</mi> <mo>^</mo> </mover> <mi>cur</mi> </msub> <mo>+</mo> <msub> <mover> <mi>X</mi> <mo>~</mo> </mover> <mi>cur</mi> </msub> <mo>*</mo> <msup> <msub> <mi>&lambda;</mi> <mi>f</mi> </msub> <mrow> <mi>t</mi> <mo>_</mo> <mi>ocl</mi> </mrow> </msup> </mrow> </math>
in fig. 15, the object is blocked in the last two frames, 1530 and 1535. Therefore, t _ ocl is 2. By applying this formula the object position is moved to a position between the trajectory and the straight line projection. the larger the value of t _ ocl, the more uncertain the trajectory and, correspondingly, the closer the object position is to the straight projection. In the example given in FIG. 15, an interpolated alternative location 1540 is determined. Because position 1540 is in blocked region 1560, position 1540 is blocked.
Turning back to fig. 14, when the blockage detection result is that there is no blockage, the following process is described. The deviations of the object template are determined 1425. The deviation of the template is detected by applying motion estimation to the current template and the starting template. The results are compared. If the difference between the two templates after applying motion estimation exceeds a threshold 1430, a bias exists. In this case, the previous template is not updated 1445 and a new template is obtained. If the difference does not exceed the threshold, the template is updated 1435.
This process also includes updating 1440 the in-memory blocking indicator. While the position estimation is performed for the following frame, the blocking indicator for the previous frame is detected in the particle filter.
Fig. 16 illustrates a method 1600. The method includes generating a metrology surface 1605 in a particle-based framework for tracking the object. The measurement surface is associated with a particular image in the sequence of digital images. Based on the metric surface, multiple hypotheses of object locations for the image are generated 1610. The location of the object is estimated based on the probability of the multiple hypotheses 1615.
In fig. 17, a method 1700 includes evaluating 1705 a motion estimate for an object in a particular image of a sequence of digital images, the motion estimate being based on a previous image of the sequence of digital images. At least one position estimate 1710 is selected for the object based on the results of the calculation. The position estimate is part of a particle-based framework for tracking the object.
In FIG. 18, method 1800 includes selecting a particle 1805 in a particle-based framework for tracking objects between images of a digital image sequence, the particle having a position. Method 1800 also includes reading a facet indicating a degree to which one or more particles match the object 1810. Method 1800 also includes determining a location on the surface 1815, the location being associated with the selected particle and the location further indicating how well the selected particle matches the object. Method 1800 also includes associating 1820 the determined location with a local minimum or local maximum of the face. The method 1800 further includes moving 1825 the position of the selected particle corresponding to the determined local minimum or local maximum.
In FIG. 19, a method 1900 includes generating an object template 1905 for an object in a sequence of digital images. Method 1900 further includes generating an estimated object position in a particular image of the sequence of digital images, wherein the estimate is generated using a particle-based framework. The object template is compared 1915 with the portion of the image at the estimated location. It is determined whether to update the object template based on the result of the comparison 1920.
In FIG. 20, method 2000 includes performing a brightness-based evaluation to detect occlusion 2005 in a particle-based framework for tracking objects between images of a sequence of digital images. In one implementation, the brightness-based evaluation may be based on data correlation. If no occlusion 2010 is detected, a probability evaluation is performed to detect occlusion 2015. In one implementation, the probability evaluation may include the correlation surface-based approach described above. An indicator 2020 that blocks the detection processing result is selectively stored.
In FIG. 21, method 2100 includes selecting a subset of active particles for tracking an object 2105 between images of a sequence of digital images. In one implementation, as shown in FIG. 21, the particle with the highest probability is selected. The state is estimated 2110 based on the selected subset of particles.
In fig. 22, method 2200 includes determining whether an estimated object position in a frame of the digital image sequence is occluded 2205. A trajectory 2210 is estimated for the object. The estimated object position 2215 is changed based on the estimated trajectory.
In FIG. 23, method 2300 includes determining an object trajectory 2310. For example, the object is in a particular image of the sequence of digital images and the trajectory is based on one or more previous positions in one or more previous images in the sequence of digital images. Method 2300 also includes determining a particle weight 2320 based on the distance of the particle to the trajectory. For example, the particles are used in a particle-based framework for tracking objects. Method 2300 also includes determining an object position 2330 based on the determined particle weights. For example, a particle-based framework may be used to determine object position.
For example, implementations of the invention may be used to generate a position estimate of an object. The position estimate may be used to encode an image containing the object. The encoding may use MPEG-1, MPEG-2, MPEG-4, H.264, or other encoding techniques. The position estimate or code may be provided on a signal medium or a processor readable medium. These implementations may also be modified for non-object tracking applications or non-video applications. For example, a state represents a feature other than the position of an object, and it need not be related to the object.
Implementations described herein may be embodied in methods, procedures, devices, or software programs. Even though an implementation may be described in only one form (e.g., using only methods), the implementation may also be implemented in other forms (e.g., as an apparatus or program). For example, the device may be implemented in appropriate hardware, software and firmware. The method may be implemented in a device, for example, a processor. A processor is also commonly referred to as a processing device. Which typically includes a computer, microprocessor, integrated circuit, and programmed logic device. Processing devices also include communication devices such as computers, cellular telephones, mobile digital assistants/personal digital assistants, and other devices for the exchange of information between end users.
Implementations of the various process flows and features described in this disclosure may be included in various devices or applications. Such as devices or applications associated with data codecs. Examples of devices include video encoders, video decoders, video codecs, web servers, set-top boxes, notebooks, personal computers, cellular telephones, personal digital assistants, and other communication devices. It should be clear that the device may be mobile and may even be installed in a moving vehicle.
Furthermore, the method of the present invention may be implemented by instructions executable by a processor, and these instructions may be stored on a processor-readable medium, such as an integrated circuit, a software carrier or other storage device, e.g., a hard disk, an optical disk, a Random Access Memory (RAM) or a read-only memory (ROM). These instructions may form application software and then be tangibly stored on a processor-readable medium. The instructions may reside in hardware, firmware, software, and combinations thereof. The instructions may be found in the operating system, a separate application, or a combination of both. The processor is embodied as an apparatus for performing a process flow or includes an apparatus having a computer readable medium that executes instructions of a process flow.
Those skilled in the art will recognize that implementations of the invention are capable of generating signals for carrying stored or transmitted information. Wherein the information comprises instructions for performing a method or data resulting from one of the implementations described above. The signals may be format converted to electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or baseband signals. The format conversion step may include encoding the data stream and modulating a carrier with the encoded data stream. The information carried by the signal may be analog information or digital information. The signals may be transmitted over a variety of different wired or wireless links.
Various implementations are described. It should be recognized, however, that various modifications may be made to these implementations. For example, components of different implementations may be combined, added, modified, or removed to generate other implementations. Further, those skilled in the art will recognize that other structures and process flows may be used in place of the structures and flows described herein and that substantially the same function and result, at least in substantially the same way, may be achieved. Accordingly, such implementations are contemplated by the invention and are covered by the scope of the claims that follow.

Claims (18)

1. A method, comprising:
estimating the position of an object in a particular image of the sequence of images, said position being estimated using a particle-based framework;
determining that the object position estimated in the particular image is blocked;
estimating a trajectory of the object based on one or more previous object positions in one or more previous images of the sequence of images; and
the estimated object position is changed based on the estimated trajectory.
2. The method of claim 1, further comprising:
determining an object portion of said particular image containing the altered estimated object position;
determining a non-object portion of said particular image separate from said object portion; and
encoding the object portion and the non-object portion such that the object portion is encoded with more coding redundancy than the non-object portion.
3. The method of claim 1, wherein altering the estimated object position comprises:
determining a linear position estimate based on a linear extrapolation of one or more previous object positions in one or more previous images of the sequence of images;
determining a modified estimated object position based on the linear position estimate.
4. The method of claim 3, wherein determining the altered estimated object position comprises:
determining a projection point on the estimated trajectory that is closest to the linear position estimate;
selecting a location on a line connecting the determined projection point and the linear location estimate, the location being selected based on a confidence value of the estimated trajectory.
5. The method of claim 4, wherein the confidence value is based on a number of previous images in the sequence of images in which the object is successively blocked.
6. The method of claim 1, wherein the object is small enough that one or more previous positions of the object do not overlap each other in an image.
7. The method of claim 1, wherein the estimated trajectory is non-linear.
8. The method of claim 1, wherein one or more previous object positions used to estimate the trajectory are non-blocking positions.
9. The method of claim 1, wherein the estimation of the trajectory is based at least in part on a weighted object occlusion occurrence rate in a preceding image of the sequence of images.
10. The method of claim 1, wherein a position of an object in a blocked state in a previous image of the sequence of images is not considered in estimating the trajectory.
11. The method of claim 1, wherein the reliability of the estimated trajectory is weighted using information related to object blockage in one or more previous images.
12. The method of claim 1, wherein the object is less than 30 pixels in size.
13. The method of claim 1, wherein the particle-based framework comprises a particle filter.
14. The method of claim 1, wherein the method is implemented in an encoder.
15. An apparatus, comprising:
storage means for storing data relating to a particular image of the sequence of digital images; and
a processor for performing a brightness-based measurement to detect occlusion in a particle-based framework for tracking objects in a sequence of digital images, and for performing a probability measurement to detect occlusion in the particle-based framework if no occlusion is detected in performing the brightness-based measurement.
16. The device of claim 15, further comprising an encoder including the storage device and the processor.
17. A processor-readable medium comprising instructions stored thereon for performing:
in a particle-based framework for tracking objects in a sequence of digital images, performing a brightness-based evaluation to detect blocking; and
if no blockage is detected in the step of performing a brightness-based evaluation, a probability measurement is performed in the particle-based framework to detect blockage.
18. An apparatus, comprising:
means for storing data relating to a particular image of the sequence of digital images;
in a particle-based framework for tracking objects in a sequence of digital images, means for performing a brightness-based evaluation to detect blocking; and
means for performing a probability measurement in said particle-based framework if no occlusion is detected in the step of performing a brightness-based evaluation.
CN200780043330A 2006-12-01 2007-11-30 Estimating a location of an object in an image Pending CN101641717A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US87214506P 2006-12-01 2006-12-01
US60/872,145 2006-12-01
US60/872,146 2006-12-01
US60/885,780 2007-01-19

Publications (1)

Publication Number Publication Date
CN101641717A true CN101641717A (en) 2010-02-03

Family

ID=41615768

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200780043330A Pending CN101641717A (en) 2006-12-01 2007-11-30 Estimating a location of an object in an image
CN200780043360A Pending CN101647043A (en) 2006-12-01 2007-11-30 Estimating a location of an object in an image

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN200780043360A Pending CN101647043A (en) 2006-12-01 2007-11-30 Estimating a location of an object in an image

Country Status (1)

Country Link
CN (2) CN101641717A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110170167A (en) * 2019-05-28 2019-08-27 上海米哈游网络科技股份有限公司 A kind of picture display process, device, equipment and medium
CN111915917A (en) * 2019-05-07 2020-11-10 现代安波福Ad有限责任公司 System and method for planning and updating trajectory of vehicle

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915917A (en) * 2019-05-07 2020-11-10 现代安波福Ad有限责任公司 System and method for planning and updating trajectory of vehicle
CN111915917B (en) * 2019-05-07 2023-09-26 动态Ad有限责任公司 Computer-implemented methods, storage media and delivery vehicles
US11772638B2 (en) 2019-05-07 2023-10-03 Motional Ad Llc Systems and methods for planning and updating a vehicle's trajectory
CN110170167A (en) * 2019-05-28 2019-08-27 上海米哈游网络科技股份有限公司 A kind of picture display process, device, equipment and medium
CN110170167B (en) * 2019-05-28 2023-02-28 上海米哈游网络科技股份有限公司 Picture display method, device, equipment and medium

Also Published As

Publication number Publication date
CN101647043A (en) 2010-02-10

Similar Documents

Publication Publication Date Title
CN101681517A (en) Estimating a location of an object in an image
US8229174B2 (en) Technique for estimating motion and occlusion
Shen et al. Probabilistic multiple cue integration for particle filter based tracking
JP4849464B2 (en) Computerized method of tracking objects in a frame sequence
Shin et al. Optical flow-based real-time object tracking using non-prior training active feature model
JP4619987B2 (en) How to model a scene
US20070092110A1 (en) Object tracking within video images
CN101512528A (en) Dynamic state estimation
WO2011113444A1 (en) Method and apparatus for trajectory estimation, and method for segmentation
CN110532921B (en) SSD-based generalized label detection multi-Bernoulli video multi-target tracking method
US20140126818A1 (en) Method of occlusion-based background motion estimation
Al-Najdawi et al. An automated real-time people tracking system based on KLT features detection.
KR20060055296A (en) How to estimate noise displacement from video sequence
Jung et al. Sequential Monte Carlo filtering with long short-term memory prediction
US20100239019A1 (en) Post processing of motion vectors using sad for low bit rate video compression
CN101641717A (en) Estimating a location of an object in an image
JP4879257B2 (en) Moving object tracking device, moving object tracking method, and moving object tracking program
KR102107177B1 (en) Method and apparatus for detecting scene transition of image
WO2010070128A1 (en) Method for multi-resolution motion estimation
JP4688147B2 (en) Moving image processing device
Tissainayagam et al. Performance measures for assessing contour trackers
Loutas et al. Entropy-based metrics for the analysis of partial and total occlusion in video object tracking
Mecke et al. A robust method for motion estimation in image sequences
Tran et al. Object tracking at multiple levels of spatial resolutions
Gao et al. Real time object tracking using adaptive Kalman particle filter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100203