CV Lab Project
CV Lab Project
MeanShift
MeanShift is a computer vision algorithm used for object tracking and clustering. It works by
iteratively shifting a window or kernel towards the mode (peak) of the data distribution. In the
context of object tracking, the algorithm is applied to track an object by iteratively updating the
position and size of the tracking window to locate the object in subsequent frames.
MeanShift Code
import cv2
import numpy as np
from google.colab.patches import cv2_imshow
if not ret:
break
CAMShift
CAMShift (Continuously Adaptive Mean Shift) is an extension of the MeanShift algorithm that
allows for adaptive tracking of objects with changing scale and orientation.
The CAMShift algorithm incorporates the concept of color histograms, similar to MeanShift, but
it adds the ability to adaptively adjust the size and orientation of the tracking window. This makes
it particularly useful for tracking objects that undergo changes in size, shape, and orientation over
time.
The main steps of the CAMShift algorithm are as follows:
1. Initialize the tracking window: Define the initial position and size of the tracking window around
the target object.
2. Convert the tracking window to HSV color space: This allows for more robust color-based
feature extraction.
3. Calculate the histogram of the hue channel within the tracking window: This histogram
represents the color distribution of the target object.
4. Backproject the histogram: Calculate the probability of each pixel belonging to the target object
based on the histogram. This creates a backprojection image.
5. Apply the CAMShift algorithm: Iteratively adjust the size and orientation of the tracking
window based on the backprojection image. This is done by computing the centroid of the
backprojection and updating the tracking window to enclose the region of high probability. The
centroid computation is performed using moments.
6. Repeat steps 3 to 5 until convergence: The algorithm iteratively refines the tracking window
until it stabilizes or reaches a predefined stopping criterion.
7. Extract the final position, size, and orientation of the tracking window: These parameters
represent the tracked object in subsequent frames.
CAMShift offers several advantages over MeanShift, including the ability to handle scale and
orientation changes, improved accuracy, and adaptability to various tracking scenarios. However,
it may struggle with significant occlusions, complex backgrounds, or objects with similar colors.
CAMShift has found applications in various domains, including video surveillance, augmented
reality, human-computer interaction, and robotics. It provides a powerful and versatile approach
for tracking objects in video sequences.
CAMShift Code
import cv2
import numpy as np
from google.colab.patches import cv2_imshow
if not ret:
break
Code Explanation
1. The necessary libraries are imported, including `cv2` for OpenCV functions, `numpy` for
numerical operations, and `cv2_imshow` from `google.colab.patches` to display the frames in
Google Colab.
2. The code starts by specifying the path to the input video file and creating a `VideoCapture`
object to read the video.
3. The first frame of the video is read using `cap.read()`.
4. Manual coordinates are defined to set the initial region of interest (ROI) for tracking. These
values represent the top-left corner `(x, y)` and the width `w` and height `h` of the ROI.
5. The tracking window is initialized with the initial ROI coordinates.
6. The ROI is converted from the BGR color space to the HSV color space using `cv2.cvtColor()`.
7. A binary mask is created by applying a range thresholding on the HSV image. This mask is used
to extract the ROI within the HSV color space.
8. The histogram of the hue channel within the ROI is calculated using `cv2.calcHist()`. The
histogram represents the color distribution of the target object.
9. The histogram is normalized to a range of 0-255 using `cv2.normalize()`.
10. CAMShift termination criteria are defined, specifying the number of iterations and the required
accuracy.
11. A loop is started to read frames from the video. For each frame:
a. The current frame is converted to the HSV color space using `cv2.cvtColor()`.
b. The back projection of the hue channel is calculated using `cv2.calcBackProject()`. This back
projection image represents the probability of each pixel belonging to the target object based on
the histogram.
c. The CAMShift algorithm is applied to the back projection image, the current tracking window,
and the termination criteria. It returns the updated tracking window and a rotated rectangle that
encapsulates the object.
d. The four corner points of the rotated rectangle are obtained using `cv2.boxPoints()`. These
points are then drawn as a polygon on the frame using `cv2.polylines()`, visualizing the tracked
object.
e. The ROI within the HSV color space is updated based on the new tracking window
coordinates.
f. Steps 7-9 are repeated for the updated ROI to recalculate and normalize the histogram.
g. The frame with the visualized tracked object is displayed using `cv2_imshow()`.
h. The loop continues until there are no more frames to read from the video. The video capture
is then released, and all windows are closed.
In summary, this code performs object tracking using the CAMShift algorithm. It initializes a
tracking window based on a manually defined ROI, calculates the histogram of the target object,
and iteratively applies CAMShift to update the tracking window and visualize the tracked object
in subsequent frames. The HSV color space is used for better color-based feature extraction, and
the back projection is used to calculate the probability of each pixel belonging to the object.
Conclusion
In conclusion, the MeanShift and CAMShift algorithms provide effective solutions for object
tracking in computer vision applications. These algorithms rely on color-based feature extraction
and adaptively adjust the tracking window to follow the target object across frames.
The MeanShift algorithm calculates the back projection of the target object's color histogram and
iteratively updates the tracking window's position to maximize the probability of finding the object
within the window. MeanShift is suitable for tracking objects with relatively stable scale and
orientation.
On the other hand, the CAMShift algorithm is an extension of MeanShift that introduces the ability
to handle scale and orientation changes. By iteratively adjusting the size and orientation of the
tracking window based on the back projection, CAMShift provides more robust tracking
capabilities. It utilizes a rotated rectangle to represent the object's position and orientation.
Both algorithms have been widely applied in various domains, including video surveillance, object
tracking in augmented reality, human-computer interaction, and robotics. They offer real-time
tracking performance and are capable of handling different types of objects.
In practice, the choice between MeanShift and CAMShift depends on the specific tracking
requirements. MeanShift is suitable for scenarios where the object's scale and orientation remain
relatively constant, while CAMShift is more appropriate for situations where the object undergoes
scale and orientation changes.
Overall, MeanShift and CAMShift are valuable tools in the field of object tracking, providing
reliable and efficient solutions for tracking objects in videos or real-time camera streams. Their
versatility and adaptability make them essential algorithms in various computer vision
applications.