[go: up one dir, main page]

0% found this document useful (0 votes)
367 views55 pages

Computer Vision:: From 3D Reconstruction To Recognition

This document provides an overview of the CS231A Computer Vision course at Stanford University. The course covers two main areas: 1) Space/Geometry, which involves estimating spatial properties of objects and scenes from images using geometric methods, and 2) Semantics/Learning, which involves estimating semantic and dynamic properties through learning methods. The syllabus outlines 16 lectures covering topics like camera models, calibration, 3D reconstruction, tracking, SLAM, representation learning, and applications in areas like robotics, augmented reality, and autonomous systems. Prerequisites include knowledge of linear algebra, probability, statistics, machine learning and basic computer vision and programming skills.

Uploaded by

Nono Nono
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
367 views55 pages

Computer Vision:: From 3D Reconstruction To Recognition

This document provides an overview of the CS231A Computer Vision course at Stanford University. The course covers two main areas: 1) Space/Geometry, which involves estimating spatial properties of objects and scenes from images using geometric methods, and 2) Semantics/Learning, which involves estimating semantic and dynamic properties through learning methods. The syllabus outlines 16 lectures covering topics like camera models, calibration, 3D reconstruction, tracking, SLAM, representation learning, and applications in areas like robotics, augmented reality, and autonomous systems. Prerequisites include knowledge of linear algebra, probability, statistics, machine learning and basic computer vision and programming skills.

Uploaded by

Nono Nono
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

CS231A

Computer Vision:
From 3D Reconstruction 
to Recognition
Class Time
M‐W; 11:30—12:50PM

Silvio Savarese & Jeanette Bohg Lecture 1 11-Jan-21


CS231A
• Instructor
• ssilvio@stanford.edu
o Silvio Savarese
• Office: Gates Building, room: 154
o ssilvio@stanford.edu
• Office hour: Friday 2-3pm or by
o Office: Gates Building, room: 154
appointment
o Office hour: Thursday 2‐3pm or under appoint. 
• bohg@stanford.edu
• Office: Gates Building, room: 140
• Office hour: Friday 1-2pm or by
appointment

CAs:
- Andrey Kurenkov, Kuan Fang, Brent Yi, Krishnan
Srinivasan

Silvio Savarese & Jeanette Bohg Lecture 1


Lecture 1
Introduction

• An introduction to computer vision
• Course overview

Silvio Savarese & Jeanette Bohg Lecture 1 11-Jan-21


AI is a propelling force of 
today’s technology 

4
Smart Agriculture

Courtesy of Agriculture Corner

Courtesy of D. Rubin, Stanford

Courtesy of of Amazon.com
Health care

Courtesy of D. Rubin, Stanford
Retail

From Imagining the Retail Store of the Future ‐ The New York Times, April 12, 2017
Manufacturing

Courtesy of ITRI, 2017
Transportation and Logistics
Construction Management
Why is this acceleration 
happening now?

11
Enabling factors
• Big data

ImageNet, 2009 
ShapeNet, 2015

12
Enabling factors
• Big data
• Faster hardware

13
Enabling factors
• Big data
• Faster hardware
• New algorithms
– Representation learning
– Neural networks
– Inject learning to 
deterministic reasoning 

14
Computer vision

• Information 
extraction 
• Interpretation
Sensing device Computational 
device

1. Information extraction: features, 3D structure, motion 
flows, etc…
2. Interpretation: recognize objects, scenes, actions, 
events
Major areas in Computer Vision

Space/Geometry Semantics/Learning 
• Object detection and pose 
• Object shape recovery  estimation
• Depth estimation • Object  tracking
• 3D scene reconstruction • Scene understanding

16
Major areas in Computer Vision

Space/Geometry Semantics/Learning 
• Object detection and pose 
• Object shape recovery  estimation
• Depth estimation • Object  tracking
• 3D scene reconstruction • Scene understanding

17
Recovering 3D models of the environments 

Armeni et al. 2016
Recovering 3D models of the environments 
Courtesy of Luminar

19
This is critical for autonomous 
driving or navigation!

20
Major areas in Computer Vision

Semantics/Learning 
Space/Geometry
• Object detection and pose 
• Object shape recovery 
estimation
• Depth estimation
• Object  tracking
• 3D scene reconstruction
• Scene understanding

21
Detecting and tracking objects in the environments 

building
pedestrian

car
car

22
3D Scene Parsing
Held, Thrun, Savarese, 2016‐206

23
Major areas in Computer Vision

Semantics/Learning 
Space/Geometry
• Object detection and pose 
• Object shape recovery 
estimation
• Depth estimation
• Object  tracking
• 3D scene reconstruction
• Scene understanding

24
CS 231A course overview

1. Space/Geometry
Estimating spatial properties of objects and scene 
from images through geometrical methods

2. Semantics/Learning 
CS 231A course overview

1. Space/Geometry
Estimating spatial properties of objects and scene 
from images through geometrical methods

2. Semantics/Learning
Estimating semantic and dynamic properties of 
scene elements from images through learning 
methods
CS 231A course overview

1. Space/Geometry
Estimating spatial properties of objects and scene 
from images through geometrical methods

2. Semantics/Learning
Estimating semantic and dynamic properties of 
scene elements from images through learning 
methods
Camera systems
Establish a mapping from 3D to 2D
How to calibrate a camera
Estimate camera parameters such pose or focal length

?
Single view metrology
Estimate 3D properties of the world from a single image

?
Multiple view geometry
Estimate 3D properties of the world from multiple views

Epipolar geometry
Structure from motion

Courtesy of Oxford Visual Geometry Group


Panoramic Photography

kolor
3D Modeling of landmarks

34
Accurate 3D Object Prototyping

Scanning Michelangelo’s “The David”
• The Digital Michelangelo Project
‐ http://graphics.stanford.edu/projects/mich/
• 2 BILLION polygons, accuracy to .29mm
Augmented Reality

• Magic leap
• Daqri
• Meta
• Etc…

36
CS 231A course overview

1. Space/Geometry
Estimating spatial properties of objects and scene 
from images through geometrical methods

2. Semantics/Learning
Estimating semantic and dynamic properties of 
scene elements from images through learning 
methods
Representations and 
Representation Learning

Example from Advances in Computer Vision – MIT – 6.869/6.819


Feature Tracking and Flow

J. J. Gibson, The Ecological Approach to Visual  Lucas‐Kanade Feature Tracking over multiple frames. 


Perception Picture adopted from OpenCV Webpage.
Object Pose Estimation and Tracking

: DQJ HWDOq' HQVH)XVLRQ' 2 EMHFW3RVH(VWLP DWLRQE\,WHUDWLYH 0 DQXHO: ÙKWULFK HWDOq3UREDELOLVWLF2 EMHFW7UDFNLQJXVLQJD' HSWK
' HQVH)XVLRQr& 935  & DP HUDr,52 6
SLAM and Localization

Accumulated registered point cloud from lidar SLAM.
Autonomous navigation and 
safety

Mobileye: Vision systems in high‐end BMW, GM, Volvo models 
But also, Toyota, Google, Apple, Tesla, Nissan, Ford, etc….

Source: A. Shashua, S. Seitz


Personal robotics

43
More Applications

Assistive technologies Surveillance
Factory inspection

Exploration and remote operations
Syllabus
Lecture Topic
January

1 Introduction
2 Camera models
3 Camera calibration

Geometry
4 Single view metrology
5 Epipolar geometry
6 Multi‐view geometry
7 Structure from motion/ SLAM
Proposal due
8 Volumetric stereo
9 Fitting and Matching
Mid term
February

10 Low Level Representations
11 Depth Estimation, Low Level Tracking
12 Optical and Scene Flow

Learning
13 6D pose Estimation, Object Tracking
14 Object Tracking Continued
15 SLAM
March

16 Guest

Project presentations  Final projects


Prerequisites
• This course requires knowledge of linear algebra, 
probability, statistics, machine learning and computer 
vision, as well as decent programming skills. 
• Though not an absolute requirement, it is encouraged and 
preferred that you have at least taken either CS221 or 
CS229 or CS131A or have equivalent knowledge. 
• We will leverage concepts from low‐level image processing 
(CS131A) (e.g., linear filters, edge detectors, corner 
detectors, etc…) and machine learning (CS229) (e.g., SVM, 
basic Bayesian inference, clustering, neural networks, 
etc…) which we won’t cover in this class. 
• We will provide links to background material related to 
CS131A  and CS229 (or discuss during TA sessions) so 
students can refresh or study those topics if needed.
Text books

Required:
‐ [FP] D. A. Forsyth and J. Ponce. Computer Vision: A Modern Approach (2nd 
Edition). Prentice Hall, 2011.
‐ [HZ] R. Hartley and A. Zisserman. Multiple View Geometry in Computer 
Vision. Academic Press, 2002.

Recommended:
‐ R. Szeliski. Computer Vision: Algorithms and Applications. Springer, 2011.
‐ D. Hoiem and S. Savarese. Representations and Techniques for 3D Object 
Recognition and Scene Interpretation, Synthesis
lecture on Artificial Intelligence and Machine Learning. Morgan Claypool 
Publishers, 2011
‐ Learning OpenCV, by Gary Bradski & Adrian Kaehler, O'Reilly Media, 2008.
Course assignments
• 1 warm up problem set (HW‐0)
• 4 problem sets 
• 1 mid‐term exam
• 1 project 

• Look up class schedule for release and due dates.
• Problems will be released through the schedule page and must 
be submitted through Gradescope. 
Midterm Exam
• The exam will be on 02/22. It will be released on 
Canvas and be available for 48 hours. You will have 2 
hours to complete it once you start.

• You will be updated with more details, e.g., material 
to be covered, review sessions etc., as we approach 
the midterm.

49
Course Projects
• Replicate an interesting paper
• Comparing different methods to a test bed
• A new approach to an existing problem
• Original research

• Write a 10‐page paper summarizing your results
• Release the final code
• Give a final in‐class presentation 
• SCPD students can send videos instead.

• We will introduce projects in 1‐2 weeks
• Important dates: look up class schedule
Course Projects
• Form your team:
– 1‐3 people
– The larger is the team, the more work we expect 
from the team
– Be nice to your partner: do you plan to drop the 
course?
• Evaluation
– Quality of the project (including writing)
– Final project in‐class presentation (~ TBA minutes 
spotlight presentations)
Grading policy
• Homeworks: 37%
– 1% for HW0
– 9% for HW1, HW2, HW3, HW4 (each) 

• Mid term exam: 20%
• Course project: 38%
– Project proposal 1% 
– mid term progress report 5%
– final report 25% 
– presentation 7%

• Attendance and class participation: 5%
– Questions, answers, remarks, piazza posts,…
– Class participation are waived for SCPD students. For the project 
presentation, SCPD students can send videos instead. 
Grading policy (HWs)

– 25% will be deducted per day late. 
– Two 48‐hours one‐time late submission 
“bonuses” are available; that is, you can use 
this bonus to submit your HW late after at 
most 48 hours. This is one time deal: After 
you use all your bonuses, you must adhere to 
the standard late submission policy. 
– No exceptions will be made.
Grading policy (project)

– If 1 day late, 25% off the grade for the 
project
– If 2 days late, 50% off the grade for the 
project
– Zero credits if more than 2 days
– No "late submission bonus" is allowed when 
submitting your progress report or project 
report
CS231
Introduction to 
Computer Vision

Next lecture: Camera systems

Silvio Savarese & Jeanette Bohg Lecture 1 11-Jan-21

You might also like