P.E.S.
College of Engineering, Mandya
Department of Information Science & Engineering
DATA SCIENCE
[As per Choice Based Credit System (CBCS) & OBE Scheme]
SEMESTER – VII
Course Code: P21IS704 Credits: 04
Teaching Hours/Week (L:T:P): 3:0:2 CIE Marks: 50
Total Number of Teaching Hours: 40
SEE Marks: 50
Total Laboratory Hours: 24
Course Learning Objectives: This course will enable the students to:
Describe the fundamentals of Data Science.
Carry out EDA on a given dataset.
Use basic machine learning algorithms on a given dataset by considering ethical issues using
R.
UNIT – I Introduction to Data Science 08 Hours
What is Data Science? Big Data and Data Science hype - and getting past the hype, Why now? –
Datafication, Current landscape of perspectives, Skill sets needed.
Statistical Inference - Populations and samples, Statistical modeling, probability distributions, fitting
a model.
Self-study component: Intro to R.
Practical Topics: Programs to implement the following statistical tests:
(06 Hours) i) Correlation test between two variables
ii) Correlation Matrix between multiple variables
iii) Comparing the means of two groups
iv) Comparing the means of more than two groups
UNIT – II Data Science Process 08 Hours
Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA, The Data Science
Process, Case Study: RealDirect (online real estate firm).
Three Basic Machine Learning Algorithms - Linear Regression, k-Nearest Neighbors(k-NN), k-
means.
Self-study component: Exercise: Basic Machine Learning Algorithms.
Practical Topics: Program to perform data exploration and pre-processing on a given
(06 Hours) dataset.
Program to implement linear regression for a given dataset.
UNIT – III Applications of Machine Learning Algorithms 08 Hours
Motivating application: Filtering Spam, Why Linear Regression and k-NN are poor choices for
Filtering Spam, Naive Bayes and why it works for Filtering Spam.
Feature Generation and Feature Selection (Extracting Meaning From Data) - Motivating application:
user (customer) retention, Feature Generation (brainstorming, role of domain expertise, and place for
imagination), Feature Selection algorithms, Filters; Wrappers.
Self-study component: Data Wrangling: APIs and other tools for scraping the Web.
Practical Topics: Program to implement Multiple Linear regression for a given dataset.
P21 Scheme - VII & VIII Semester Syllabus Page | 27
P.E.S. College of Engineering, Mandya
Department of Information Science & Engineering
(04 Hours) Program to implement K-NN algorithm on a given dataset.
UNIT – IV Recommendation Systems and Mining Social-Network Graphs 08 Hours
Building a User-Facing Data Product – Algorithmic ingredients of a Recommendation Engine,
Dimensionality Reduction, Singular Value Decomposition, Principal Component Analysis.
Mining Social-Network Graphs – Social networks as graphs, Clustering of graphs, Direct discovery
of communities in graphs, Partitioning of graphs.
Self-study component: Neighborhood properties in graphs.
Practical Topics: Build a recommendation system using;
(04 Hours) i) Item Based Collaborative Filtering
ii) User Based Collaborative Filtering
UNIT – V Data Visualization 08 Hours
Basic principles, ideas and tools for data visualization, Examples of inspiring (industry) projects,
Exercise: create your own visualization of a complex dataset.
Data Science and Ethical Issues – Discussions on privacy, security, ethics, A look back at Data
Science.
Self-study component: Next-generation data scientists.
Practical Topics: The United States has resettled more than 600,000 refugees from 60
(04 Hours) different countries since 2006. Download the department of Homeland
Security’s annual count of people granted refugee status between2006-
2015. Use ggplot, Illustrator, Inkscape, or Gravit Designer to explore
where these refugees have come from by handling Personal
Identifiable Information (PII), if any.
Course Outcomes: On completion of this course, students are able to:
Bloom’s
Level
COs Course Outcomes with Action verbs for the Course topics Taxonomy
Indicator
Level
CO1 Explain data science process and statistical inference. Understand L2
CO2 Illustrate EDA and feature engineering. Apply L3
CO3 Identify basic machine learning algorithms to use in
Apply L3
applications.
CO4 Illustrate mining social-network graphs. Apply L3
CO5 Create effective visualization of a given data (to communicate
Apply L3
or persuade ethically).
Text Book(s):
1. Cathy O’Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline.
O’Reilly. 2014.
Reference Book(s):
1. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. V2.1,
Cambridge University Press. 2014.
2. Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.
P21 Scheme - VII & VIII Semester Syllabus Page | 28
P.E.S. College of Engineering, Mandya
Department of Information Science & Engineering
3. Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about
Data Mining and Data-analytic Thinking. ISBN 1449361323. 2013.
Web and Video link(s):
1. https://infyspringboard.onwingspan.com/web/en/app/toc/lex_auth_013094438031630336256
6_shared/overview
E-Books/Resources:
• https://sites.google.com/view/brameshsm
P21 Scheme - VII & VIII Semester Syllabus Page | 29