Course Code: BAI505B
Course Coordinator: Prof. Sanjay M Belgaonkar
Department of Artificial Intelligence & Machine Learning
Introduction to the Syllabus
Brief Introduction, Why We Learn & Industry Applications
Module I: Introduction and Image Formation
• What it covers: Basics of computer vision, geometric primitives, image
formation process, pinhole perspective, camera models, human eye
analogy, intrinsic/extrinsic parameters.
• Why we learn it: To understand how images are captured, represented,
and transformed, which is the foundation of all CV tasks.
• Industry relevance: Essential for camera calibration in robotics,
augmented reality (AR), autonomous vehicles, and 3D reconstruction.
Module II: Early Vision – One Image
• What it covers: Linear filters, convolution, Fourier transforms,
sampling/aliasing, gradients, feature detection.
• Why we learn it: These are the building blocks for detecting edges,
textures, and patterns in images.
• Industry relevance: Used in medical imaging (tumor detection),
quality inspection in manufacturing, satellite image analysis, and
biometric authentication.
Module III: Early Vision – Multiple Images
• What it covers: Stereopsis, structure from motion, depth estimation
from multiple views.
• Why we learn it: Helps computers perceive depth and 3D structure
from 2D images.
• Industry relevance: Core to self-driving cars (depth sensing), AR/VR
applications, drone navigation, and 3D mapping (Google Maps,
LiDAR alternatives).
Module IV: Mid-level Vision
• What it covers: Image segmentation, clustering, grouping, model
fitting (Hough transform, fitting lines/planes).
• Why we learn it: To break down images into meaningful
regions/objects and fit geometric models.
• Industry relevance: Used in medical image segmentation
(organ/tumor detection), surveillance systems, document analysis,
and remote sensing (land-use mapping).
Module V: High-level Vision
• What it covers: Object recognition, registration, smooth surface
representation, outlines of shapes.
• Why we learn it: Moves from pixels to semantic understanding—
identifying and recognizing objects/scenes.
• Industry relevance: Powers facial recognition, product recommendation
systems (visual search), industrial robotics (object picking), video
surveillance, and autonomous navigation.
In Summary
• Low-level vision (Modules I–II) = How images are
captured & basic feature extraction.
• Mid-level vision (Modules III–IV) = Understanding
structure, segmentation, grouping.
• High-level vision (Module V) = Recognizing and
interpreting objects/scenes.
In Summary
• This layered approach mirrors how modern AI systems
process visual data, and the skills are highly valued in
industries like autonomous vehicles, healthcare,
AR/VR, surveillance, robotics, and e-commerce.
https://giphy.com/gifs/thanks-thank-you-thnx-3o6ozuHcxTtVWJJn32/download