[go: up one dir, main page]

0% found this document useful (0 votes)
28 views10 pages

CV Lab Project

The document discusses object tracking algorithms MeanShift and CAMShift: - MeanShift tracks objects by iteratively shifting a window towards the peak of the data distribution based on color histograms. It can handle changes in appearance and position but struggles with occlusions. - CAMShift extends MeanShift to allow adaptive tracking of objects with changing scale and orientation. It calculates moments to continuously adjust the size and orientation of the tracking window based on backprojections of color histograms. - Example code demonstrates initializing tracking windows and applying MeanShift and CAMShift to track objects across video frames by updating the windows' positions and sizes based on color information represented by histograms.

Uploaded by

hik um
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

CV Lab Project

The document discusses object tracking algorithms MeanShift and CAMShift: - MeanShift tracks objects by iteratively shifting a window towards the peak of the data distribution based on color histograms. It can handle changes in appearance and position but struggles with occlusions. - CAMShift extends MeanShift to allow adaptive tracking of objects with changing scale and orientation. It calculates moments to continuously adjust the size and orientation of the tracking window based on backprojections of color histograms. - Example code demonstrates initializing tracking windows and applying MeanShift and CAMShift to track objects across video frames by updating the windows' positions and sizes based on color information represented by histograms.

Uploaded by

hik um
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

COMPUTER VISION

PROJECT: OBJECT TRACKING USING MEANSHIFT AND


CAMSHIFT

Course Instructor: Kamran Aziz Bhatti


Lab Engineer: Kamran Aziz Bhatti

Name: Muhammed Umer Siddiq


Reg no: 296908
Degree: DE 41 EE B
Background
Object tracking is a fundamental task in computer vision and plays a crucial role in various
applications, including surveillance, video analysis, augmented reality, robotics, and autonomous
driving. The objective of object tracking is to locate and follow a specific object or multiple objects
over a sequence of frames in a video.
There are several challenges involved in object tracking due to factors such as changes in
appearance, occlusions, scale variations, cluttered backgrounds, and camera motion. To address
these challenges, various algorithms and techniques have been developed over the years.
One common approach for object tracking is the use of motion-based methods. These methods
exploit the motion information between consecutive frames to estimate the position and movement
of the object.
Overall, object tracking is a challenging and dynamic research field with a wide range of
applications. Researchers and developers continue to explore new algorithms, techniques, and
datasets to improve the accuracy, efficiency, and robustness of object tracking systems.

MeanShift
MeanShift is a computer vision algorithm used for object tracking and clustering. It works by
iteratively shifting a window or kernel towards the mode (peak) of the data distribution. In the
context of object tracking, the algorithm is applied to track an object by iteratively updating the
position and size of the tracking window to locate the object in subsequent frames.

Here's an overview of the MeanShift algorithm for object tracking:


1. Initialization: Define the initial position and size of the tracking window around the target object
in the first frame.
2. Feature Extraction: Extract the features of the target object within the tracking window. These
features can include color, texture, gradient, or any other relevant characteristics.
3. Kernel Estimation: Create a kernel density estimation (usually a Gaussian kernel) based on the
extracted features of the target object within the tracking window. The kernel represents the
probability distribution of the target object's appearance.
4. Iterate MeanShift: Repeat the following steps until convergence or a maximum number of
iterations:
a. Calculate the weighted mean of the feature vectors within the current tracking window using
the kernel density estimation. This mean represents the estimated position of the target object.
b. Shift the tracking window to the new estimated position.
c. Update the kernel density estimation based on the new position of the tracking window.
d. Check for convergence by comparing the distance between the current and previous estimated
positions. If the distance falls below a certain threshold, the algorithm is considered to have
converged.
5. Output: The final estimated position and size of the tracking window represent the tracked object
in subsequent frames.
MeanShift is robust to changes in object appearance, scale, and moderate occlusions. It can handle
objects with varying shapes and adapt to changes in object position. However, it has limitations in
scenarios with significant occlusions, abrupt changes in scale, or similar objects in the scene.
The MeanShift algorithm is widely used in various computer vision applications, including video
surveillance, object tracking, and image segmentation. It provides a computationally efficient and
effective approach for tracking objects in real-time video streams.

MeanShift Code
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

# Load the video file


video_path = '/content/drive/MyDrive/Colab Notebooks/boy-walking.mp4'
cap = cv2.VideoCapture(video_path)

# Read the first frame


ret, frame = cap.read()

# Manually define the ROI coordinates


x, y, w, h = 100, 100, 200, 200

# Initialize the tracking window


track_window = (x, y, w, h)

# Convert ROI to HSV color space


hsv_roi = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# Set the termination criteria for MeanShift


criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

# Start object tracking


while True:
# Read a new frame
ret, frame = cap.read()

if not ret:
break

# Convert frame to HSV color space


hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

# Calculate back projection


back_projection = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180],
1)

# Apply MeanShift algorithm to track the object


ret, track_window = cv2.meanShift(back_projection, track_window,
criteria)

# Extract the new position and size of the tracking window


x, y, w, h = track_window

# Draw the tracked region on the frame


cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

# Display the frame


cv2_imshow(frame)

# Exit if ESC key is pressed


if cv2.waitKey(1) == 27:
break

# Release the video capture


cap.release()
cv2.destroyAllWindows()
Code Explanation
First, the necessary libraries are imported, including `cv2` for OpenCV functions, `numpy` for
numerical operations, and `cv2_imshow` from `google.colab.patches` to display the frames in
Google Colab.
The code begins by specifying the path to the input video file and creating a `VideoCapture` object
to read the video. The first frame of the video is then read using `cap.read()`.
Next, an initial position and size of the tracking window are manually defined. These values can
be adjusted to set the initial region of interest (ROI) to track. The tracking window is initialized
with these values.
To track the object, the initial ROI is converted from the BGR color space to the HSV color space.
The histogram of the hue channel in the ROI is calculated using `cv2.calcHist()`, and it is then
normalized. This histogram will be used to represent the appearance of the object.
The termination criteria for the MeanShift algorithm are set, specifying the number of iterations
and the required accuracy.
A loop is started to read frames from the video. For each frame, the current frame is converted to
HSV color space, and the back projection of the hue channel is calculated using the histogram as
a probability distribution.
The MeanShift algorithm is applied to the back projection image, the current tracking window, and
the termination criteria. It returns the updated tracking window, which represents the new position
and size of the tracked object.
The updated tracking window is used to draw a rectangle on the frame, visualizing the tracked
region. The frame with the rectangle is displayed using `cv2_imshow()`.
The loop continues until there are no more frames to read from the video. The video capture is
then released, and all windows are closed.
In summary, this code tracks an object in a video by initializing a tracking window and using the
MeanShift algorithm to iteratively update the window's position based on the back projection. The
algorithm utilizes the color information of the object, represented by the histogram, to track it
across frames. The resulting tracked region is visualized by drawing a rectangle on the frames.

CAMShift
CAMShift (Continuously Adaptive Mean Shift) is an extension of the MeanShift algorithm that
allows for adaptive tracking of objects with changing scale and orientation.
The CAMShift algorithm incorporates the concept of color histograms, similar to MeanShift, but
it adds the ability to adaptively adjust the size and orientation of the tracking window. This makes
it particularly useful for tracking objects that undergo changes in size, shape, and orientation over
time.
The main steps of the CAMShift algorithm are as follows:
1. Initialize the tracking window: Define the initial position and size of the tracking window around
the target object.
2. Convert the tracking window to HSV color space: This allows for more robust color-based
feature extraction.
3. Calculate the histogram of the hue channel within the tracking window: This histogram
represents the color distribution of the target object.
4. Backproject the histogram: Calculate the probability of each pixel belonging to the target object
based on the histogram. This creates a backprojection image.
5. Apply the CAMShift algorithm: Iteratively adjust the size and orientation of the tracking
window based on the backprojection image. This is done by computing the centroid of the
backprojection and updating the tracking window to enclose the region of high probability. The
centroid computation is performed using moments.
6. Repeat steps 3 to 5 until convergence: The algorithm iteratively refines the tracking window
until it stabilizes or reaches a predefined stopping criterion.
7. Extract the final position, size, and orientation of the tracking window: These parameters
represent the tracked object in subsequent frames.
CAMShift offers several advantages over MeanShift, including the ability to handle scale and
orientation changes, improved accuracy, and adaptability to various tracking scenarios. However,
it may struggle with significant occlusions, complex backgrounds, or objects with similar colors.
CAMShift has found applications in various domains, including video surveillance, augmented
reality, human-computer interaction, and robotics. It provides a powerful and versatile approach
for tracking objects in video sequences.

CAMShift Code
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

# Load the video file


video_path = '/content/drive/MyDrive/Colab Notebooks/pexels-zlatin-
georgiev-5607553-426x240-24fps.mp4'
cap = cv2.VideoCapture(video_path)
# Read the first frame
ret, frame = cap.read()

# Manually define the ROI coordinates


x, y, w, h = 100, 100, 200, 200

# Initialize the tracking window


track_window = (x, y, w, h)

# Convert ROI to HSV color space


hsv_roi = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
roi_mask = cv2.inRange(hsv_roi, np.array((0., 60., 32.)), np.array((180.,
255., 255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], roi_mask, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# Set CAMShift termination criteria


criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

# Start object tracking


while True:
# Read a new frame
ret, frame = cap.read()

if not ret:
break

# Convert frame to HSV color space


hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

# Calculate back projection


back_projection = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180],
1)

# Apply CAMShift algorithm to track the object


ret, track_window = cv2.CAMShift(back_projection, track_window,
criteria)

# Extract the new position and size of the tracking window


x, y, w, h = track_window

# Draw the object's rotated rectangle


pts = cv2.boxPoints(ret)
pts = np.int0(pts)
cv2.polylines(frame, [pts], True, (0, 255, 0), 2)
# Update the ROI for the next iteration
hsv_roi = hsv[y:y+h, x:x+w]
roi_mask = cv2.inRange(hsv_roi, np.array((0., 60., 32.)),
np.array((180., 255., 255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], roi_mask, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# Display the frame


cv2_imshow(frame)

# Exit if ESC key is pressed


if cv2.waitKey(1) == 27:
break

# Release the video capture


cap.release()
cv2.destroyAllWindows()

Code Explanation
1. The necessary libraries are imported, including `cv2` for OpenCV functions, `numpy` for
numerical operations, and `cv2_imshow` from `google.colab.patches` to display the frames in
Google Colab.
2. The code starts by specifying the path to the input video file and creating a `VideoCapture`
object to read the video.
3. The first frame of the video is read using `cap.read()`.
4. Manual coordinates are defined to set the initial region of interest (ROI) for tracking. These
values represent the top-left corner `(x, y)` and the width `w` and height `h` of the ROI.
5. The tracking window is initialized with the initial ROI coordinates.
6. The ROI is converted from the BGR color space to the HSV color space using `cv2.cvtColor()`.
7. A binary mask is created by applying a range thresholding on the HSV image. This mask is used
to extract the ROI within the HSV color space.
8. The histogram of the hue channel within the ROI is calculated using `cv2.calcHist()`. The
histogram represents the color distribution of the target object.
9. The histogram is normalized to a range of 0-255 using `cv2.normalize()`.
10. CAMShift termination criteria are defined, specifying the number of iterations and the required
accuracy.
11. A loop is started to read frames from the video. For each frame:
a. The current frame is converted to the HSV color space using `cv2.cvtColor()`.
b. The back projection of the hue channel is calculated using `cv2.calcBackProject()`. This back
projection image represents the probability of each pixel belonging to the target object based on
the histogram.
c. The CAMShift algorithm is applied to the back projection image, the current tracking window,
and the termination criteria. It returns the updated tracking window and a rotated rectangle that
encapsulates the object.
d. The four corner points of the rotated rectangle are obtained using `cv2.boxPoints()`. These
points are then drawn as a polygon on the frame using `cv2.polylines()`, visualizing the tracked
object.
e. The ROI within the HSV color space is updated based on the new tracking window
coordinates.
f. Steps 7-9 are repeated for the updated ROI to recalculate and normalize the histogram.
g. The frame with the visualized tracked object is displayed using `cv2_imshow()`.
h. The loop continues until there are no more frames to read from the video. The video capture
is then released, and all windows are closed.
In summary, this code performs object tracking using the CAMShift algorithm. It initializes a
tracking window based on a manually defined ROI, calculates the histogram of the target object,
and iteratively applies CAMShift to update the tracking window and visualize the tracked object
in subsequent frames. The HSV color space is used for better color-based feature extraction, and
the back projection is used to calculate the probability of each pixel belonging to the object.
Conclusion
In conclusion, the MeanShift and CAMShift algorithms provide effective solutions for object
tracking in computer vision applications. These algorithms rely on color-based feature extraction
and adaptively adjust the tracking window to follow the target object across frames.
The MeanShift algorithm calculates the back projection of the target object's color histogram and
iteratively updates the tracking window's position to maximize the probability of finding the object
within the window. MeanShift is suitable for tracking objects with relatively stable scale and
orientation.
On the other hand, the CAMShift algorithm is an extension of MeanShift that introduces the ability
to handle scale and orientation changes. By iteratively adjusting the size and orientation of the
tracking window based on the back projection, CAMShift provides more robust tracking
capabilities. It utilizes a rotated rectangle to represent the object's position and orientation.
Both algorithms have been widely applied in various domains, including video surveillance, object
tracking in augmented reality, human-computer interaction, and robotics. They offer real-time
tracking performance and are capable of handling different types of objects.
In practice, the choice between MeanShift and CAMShift depends on the specific tracking
requirements. MeanShift is suitable for scenarios where the object's scale and orientation remain
relatively constant, while CAMShift is more appropriate for situations where the object undergoes
scale and orientation changes.
Overall, MeanShift and CAMShift are valuable tools in the field of object tracking, providing
reliable and efficient solutions for tracking objects in videos or real-time camera streams. Their
versatility and adaptability make them essential algorithms in various computer vision
applications.

You might also like