CS231A
Computer Vision:
From 3D Reconstruction
to Recognition
Class Time
M‐W; 11:30—12:50PM
Silvio Savarese & Jeanette Bohg Lecture 1 11-Jan-21
CS231A
• Instructor
• ssilvio@stanford.edu
o Silvio Savarese
• Office: Gates Building, room: 154
o ssilvio@stanford.edu
• Office hour: Friday 2-3pm or by
o Office: Gates Building, room: 154
appointment
o Office hour: Thursday 2‐3pm or under appoint.
• bohg@stanford.edu
• Office: Gates Building, room: 140
• Office hour: Friday 1-2pm or by
appointment
CAs:
- Andrey Kurenkov, Kuan Fang, Brent Yi, Krishnan
Srinivasan
Silvio Savarese & Jeanette Bohg Lecture 1
Lecture 1
Introduction
• An introduction to computer vision
• Course overview
Silvio Savarese & Jeanette Bohg Lecture 1 11-Jan-21
AI is a propelling force of
today’s technology
4
Smart Agriculture
Courtesy of Agriculture Corner
Courtesy of D. Rubin, Stanford
Courtesy of of Amazon.com
Health care
Courtesy of D. Rubin, Stanford
Retail
From Imagining the Retail Store of the Future ‐ The New York Times, April 12, 2017
Manufacturing
Courtesy of ITRI, 2017
Transportation and Logistics
Construction Management
Why is this acceleration
happening now?
11
Enabling factors
• Big data
ImageNet, 2009
ShapeNet, 2015
12
Enabling factors
• Big data
• Faster hardware
13
Enabling factors
• Big data
• Faster hardware
• New algorithms
– Representation learning
– Neural networks
– Inject learning to
deterministic reasoning
14
Computer vision
• Information
extraction
• Interpretation
Sensing device Computational
device
1. Information extraction: features, 3D structure, motion
flows, etc…
2. Interpretation: recognize objects, scenes, actions,
events
Major areas in Computer Vision
Space/Geometry Semantics/Learning
• Object detection and pose
• Object shape recovery estimation
• Depth estimation • Object tracking
• 3D scene reconstruction • Scene understanding
16
Major areas in Computer Vision
Space/Geometry Semantics/Learning
• Object detection and pose
• Object shape recovery estimation
• Depth estimation • Object tracking
• 3D scene reconstruction • Scene understanding
17
Recovering 3D models of the environments
Armeni et al. 2016
Recovering 3D models of the environments
Courtesy of Luminar
19
This is critical for autonomous
driving or navigation!
20
Major areas in Computer Vision
Semantics/Learning
Space/Geometry
• Object detection and pose
• Object shape recovery
estimation
• Depth estimation
• Object tracking
• 3D scene reconstruction
• Scene understanding
21
Detecting and tracking objects in the environments
building
pedestrian
car
car
22
3D Scene Parsing
Held, Thrun, Savarese, 2016‐206
23
Major areas in Computer Vision
Semantics/Learning
Space/Geometry
• Object detection and pose
• Object shape recovery
estimation
• Depth estimation
• Object tracking
• 3D scene reconstruction
• Scene understanding
24
CS 231A course overview
1. Space/Geometry
Estimating spatial properties of objects and scene
from images through geometrical methods
2. Semantics/Learning
CS 231A course overview
1. Space/Geometry
Estimating spatial properties of objects and scene
from images through geometrical methods
2. Semantics/Learning
Estimating semantic and dynamic properties of
scene elements from images through learning
methods
CS 231A course overview
1. Space/Geometry
Estimating spatial properties of objects and scene
from images through geometrical methods
2. Semantics/Learning
Estimating semantic and dynamic properties of
scene elements from images through learning
methods
Camera systems
Establish a mapping from 3D to 2D
How to calibrate a camera
Estimate camera parameters such pose or focal length
?
Single view metrology
Estimate 3D properties of the world from a single image
?
Multiple view geometry
Estimate 3D properties of the world from multiple views
Epipolar geometry
Structure from motion
Courtesy of Oxford Visual Geometry Group
Panoramic Photography
kolor
3D Modeling of landmarks
34
Accurate 3D Object Prototyping
Scanning Michelangelo’s “The David”
• The Digital Michelangelo Project
‐ http://graphics.stanford.edu/projects/mich/
• 2 BILLION polygons, accuracy to .29mm
Augmented Reality
• Magic leap
• Daqri
• Meta
• Etc…
36
CS 231A course overview
1. Space/Geometry
Estimating spatial properties of objects and scene
from images through geometrical methods
2. Semantics/Learning
Estimating semantic and dynamic properties of
scene elements from images through learning
methods
Representations and
Representation Learning
Example from Advances in Computer Vision – MIT – 6.869/6.819
Feature Tracking and Flow
J. J. Gibson, The Ecological Approach to Visual Lucas‐Kanade Feature Tracking over multiple frames.
Perception Picture adopted from OpenCV Webpage.
Object Pose Estimation and Tracking
: DQJ HWDOq' HQVH)XVLRQ' 2 EMHFW3RVH(VWLP DWLRQE\,WHUDWLYH 0 DQXHO: ÙKWULFK HWDOq3UREDELOLVWLF2 EMHFW7UDFNLQJXVLQJD' HSWK
' HQVH)XVLRQr& 935 & DP HUDr,52 6
SLAM and Localization
Accumulated registered point cloud from lidar SLAM.
Autonomous navigation and
safety
Mobileye: Vision systems in high‐end BMW, GM, Volvo models
But also, Toyota, Google, Apple, Tesla, Nissan, Ford, etc….
Source: A. Shashua, S. Seitz
Personal robotics
43
More Applications
Assistive technologies Surveillance
Factory inspection
Exploration and remote operations
Syllabus
Lecture Topic
January
1 Introduction
2 Camera models
3 Camera calibration
Geometry
4 Single view metrology
5 Epipolar geometry
6 Multi‐view geometry
7 Structure from motion/ SLAM
Proposal due
8 Volumetric stereo
9 Fitting and Matching
Mid term
February
10 Low Level Representations
11 Depth Estimation, Low Level Tracking
12 Optical and Scene Flow
Learning
13 6D pose Estimation, Object Tracking
14 Object Tracking Continued
15 SLAM
March
16 Guest
Project presentations Final projects
Prerequisites
• This course requires knowledge of linear algebra,
probability, statistics, machine learning and computer
vision, as well as decent programming skills.
• Though not an absolute requirement, it is encouraged and
preferred that you have at least taken either CS221 or
CS229 or CS131A or have equivalent knowledge.
• We will leverage concepts from low‐level image processing
(CS131A) (e.g., linear filters, edge detectors, corner
detectors, etc…) and machine learning (CS229) (e.g., SVM,
basic Bayesian inference, clustering, neural networks,
etc…) which we won’t cover in this class.
• We will provide links to background material related to
CS131A and CS229 (or discuss during TA sessions) so
students can refresh or study those topics if needed.
Text books
Required:
‐ [FP] D. A. Forsyth and J. Ponce. Computer Vision: A Modern Approach (2nd
Edition). Prentice Hall, 2011.
‐ [HZ] R. Hartley and A. Zisserman. Multiple View Geometry in Computer
Vision. Academic Press, 2002.
Recommended:
‐ R. Szeliski. Computer Vision: Algorithms and Applications. Springer, 2011.
‐ D. Hoiem and S. Savarese. Representations and Techniques for 3D Object
Recognition and Scene Interpretation, Synthesis
lecture on Artificial Intelligence and Machine Learning. Morgan Claypool
Publishers, 2011
‐ Learning OpenCV, by Gary Bradski & Adrian Kaehler, O'Reilly Media, 2008.
Course assignments
• 1 warm up problem set (HW‐0)
• 4 problem sets
• 1 mid‐term exam
• 1 project
• Look up class schedule for release and due dates.
• Problems will be released through the schedule page and must
be submitted through Gradescope.
Midterm Exam
• The exam will be on 02/22. It will be released on
Canvas and be available for 48 hours. You will have 2
hours to complete it once you start.
• You will be updated with more details, e.g., material
to be covered, review sessions etc., as we approach
the midterm.
49
Course Projects
• Replicate an interesting paper
• Comparing different methods to a test bed
• A new approach to an existing problem
• Original research
• Write a 10‐page paper summarizing your results
• Release the final code
• Give a final in‐class presentation
• SCPD students can send videos instead.
• We will introduce projects in 1‐2 weeks
• Important dates: look up class schedule
Course Projects
• Form your team:
– 1‐3 people
– The larger is the team, the more work we expect
from the team
– Be nice to your partner: do you plan to drop the
course?
• Evaluation
– Quality of the project (including writing)
– Final project in‐class presentation (~ TBA minutes
spotlight presentations)
Grading policy
• Homeworks: 37%
– 1% for HW0
– 9% for HW1, HW2, HW3, HW4 (each)
• Mid term exam: 20%
• Course project: 38%
– Project proposal 1%
– mid term progress report 5%
– final report 25%
– presentation 7%
• Attendance and class participation: 5%
– Questions, answers, remarks, piazza posts,…
– Class participation are waived for SCPD students. For the project
presentation, SCPD students can send videos instead.
Grading policy (HWs)
– 25% will be deducted per day late.
– Two 48‐hours one‐time late submission
“bonuses” are available; that is, you can use
this bonus to submit your HW late after at
most 48 hours. This is one time deal: After
you use all your bonuses, you must adhere to
the standard late submission policy.
– No exceptions will be made.
Grading policy (project)
– If 1 day late, 25% off the grade for the
project
– If 2 days late, 50% off the grade for the
project
– Zero credits if more than 2 days
– No "late submission bonus" is allowed when
submitting your progress report or project
report
CS231
Introduction to
Computer Vision
Next lecture: Camera systems
Silvio Savarese & Jeanette Bohg Lecture 1 11-Jan-21