SYMBIOSIS INTERNATIONAL (DEEMED UNIVERSITY)
(Established under section 3 of the UGC Act 1956)
Re - accredited by NAAC with ‘A’ Grade
Founder: Prof. Dr. S. B. Mujumdar, M.Sc.,Ph.D. (Awarded Padma Bhushan and
Padma Shri by President of India)
(Established under section 3 of the UGC Act 1956, by notification No.F.9-
12/2001-U3 Government of India)
Accredited by NAAC with ‘A’ Grade
Founder: Prof. Dr. S. B. Mujumdar, M.Sc.,Ph.D. (Awarded Padma Bhushan and Padma Shri
by President of India)
___________________________________________________________________________
___
Sub Committee - Specialization for Curriculum Development
Post Graduate/ Under Graduate
Course Title: Introduction to Data Science
Number of Credits : 3
Level : 3
Learning Objective/Outcome(s):
1.Introduction to major concepts in data science
2. Identify probability distributions commonly used as foundations for statistical modeling. Use
R to carry out basic statistical modeling and analysis
3. Also Explain the significance of exploratory data analysis (EDA) in data science. Apply
basic tools (plots, graphs, summary statistics) to carry out EDA.
4. Apply basic machine learning algorithms for predictive modeling. Identify common
approaches used for Feature Generation. Identify basic Feature Selection
algorithms
5. Create effective visualization of given data
6. Understanding of ethical and privacy issues in data science conduct.
Pre-learning:
Students are expected to have basic knowledge of algorithms and reasonable programming
experience, some familiarity with basic linear algebra, basic probability and statistics.
Course Outline
Sr.No. Topics Hours
1. Introduction: What is Data Science? 4
- Big Data and Data Science hype - and getting past the hype
- Why now? - Datafication
- Current landscape of perspectives
- Skill sets needed
2. Statistical Inference 8
- Populations and samples
- Statistical modeling, probability distributions, fitting a model
- Intro to R
3. Exploratory Data Analysis and the Data Science Process 8
- Basic tools (plots, graphs and summary statistics) of EDA
- Philosophy of EDA
- The Data Science Process
- Case Study: RealDirect (online real estate firm)
4. Three Basic Machine Learning Algorithms 10
- Linear Regression
- k-Nearest Neighbors (k-NN)
- k-means
Feature Generation and Feature Selection (Extracting Meaning From
Data)
- Motivating application: user (customer) retention
- Feature Generation (brainstorming, role of domain expertise, and place
for imagination)
- Feature Selection algorithms
{ Filters; Wrappers; Decision Trees; Random Forests
5. Mining Social-Network Graphs 10
- Social networks as graphs
- Clustering of graphs
- Direct discovery of communities in graphs
- Partitioning of graphs
- Neighborhood properties in graphs
Data Visualization
- Basic principles, ideas and tools for data visualization
6. ` Data Science and Ethical Issues 5
- Discussions on privacy, security, ethics
- A look back at Data Science
- Next-generation data scientists
Pedagogy
1. 30% of the sessions is a classroom teaching
2. 50 % of the content is to be conducted with handson.
3. Some of the advanced topics will be based on self learning
Books Recommended :
Cathy O'Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline.
O'Reilly. 2014.
Suggested Assessment/ Evaluation Methods
A) Continuous Assessment
1. Essential
a) Unit tests
b) Seminars
c) Assignments
2. Optional
a) Quiz/MCQ
b) Mini Project
B) End Semester Examination
a) Written Exam
Benchmarked against similar courses in other national/ international universities
/organizations
S. Name of the Course Name of University where it is
No. offered
Name of
Members
Designation
Org. / Inst.
Signature
Name of
Experts
Designation
Org. / Inst.
Signature
Signature of Dean:
Date: