Christ MSC Data Science
Christ MSC Data Science
Syllabus for
Master of Science (Data Science)
Academic Year (2021)
1 Semester - 2021 - Batch
Hours
Course
Course Type Per Credits Marks
Code
Week
MATHEMATICAL
Core
MDS131 FOUNDATION FOR DATA 4 4 100
Courses
SCIENCE - I
PROBABILITY AND Core
MDS132 4 4 100
DISTRIBUTION THEORY Courses
PRINCIPLES OF DATA Core
MDS133 4 4 100
SCIENCE Courses
RESEARCH Core
MDS134 2 2 50
METHODOLOGY Courses
INTRODUCTION TO Generic
MDS161A 2 2 50
STATISTICS Elective
INTRODUCTION TO
Generic
MDS161B COMPUTERS AND 2 2 50
Elective
PROGRAMMING
Generic
MDS161C LINUX ADMINISTRATION 2 2 50
Elective
DATA BASE Core
MDS171 6 5 150
TECHNOLOGIES Courses
Core
MDS172 INFERENTIAL STATISTICS 6 5 150
Courses
PROGRAMMING FOR DATA Core
MDS173 6 4 100
SCIENCE IN PYTHON Courses
2 Semester - 2021 - Batch
Hours
Course
Course Type Per Credits Marks
Code
Week
MATHEMATICAL
MDS231 FOUNDATION FOR DATA - 4 4 100
SCIENCE - II
MDS232 REGRESSION ANALYSIS - 4 4 100
MDS241A MULTIVARIATE ANALYSIS - 4 4 100
MDS241B STOCHASTIC PROCESS - 4 4 100
Department Overview:
Department of Computer Science of CHRIST (Deemed to be
University) strives to shape outstanding computer professionals with
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 2/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
Mission Statement:
Vision The Department of Computer Science endeavours to imbibe
the vision of the University "Excellence and Service". The
department is committed to this philosophy which pervades every
aspect and functioning of the department.
Introduction to Program:
Data Science is popular in all academia, business sectors, and
research and development to make effective decision in day to day
activities. MSc in Data Science is a two year programme with four
semesters. This programme aims to provide opportunity to all
candidates to master the skill sets specific to data science with
research bent. The curriculum supports the students to obtain
adequate knowledge in theory of data science with hands on
experience in relevant domains and tools. Candidate gains exposure
to research models and industry standard applications in data science
through guest lectures, seminars, projects, internships, etc.
Program Objective:
Programme Objective
Programme Outcome
PO3 Enhance research culture and uphold the scientific integrity and
objectivity
Assesment Pattern
CIA - 50%
ESE - 50%
ESE - 50%
Department Overview:
Mission Statement:
VISION
Introduction to Program:
Data Science is popular in all academia, business sectors, and research and
development to make effective decision in day to day activities. MSc in Data
Science is a two year programme with four semesters. This programme aims to
provide opportunity to all candidates to master the skill sets specific to data science
with research bent. The curriculum supports the students to obtain adequate
knowledge in theory of data science with hands on experience in relevant domains
and tools. Candidate gains exposure to research models and industry standard
applications in data science through guest lectures, seminars, projects, internships,
etc.
Program Objective:
Programme Objective ? To acquire in-depth understanding of the theoretical
concepts in statistics, data analysis, data mining, machine learning and other
advanced data science techniques. ? To gain practical experience in programming
tools for data sciences, database systems, machine learning and big data tools. ? To
strengthen the analytical and problem solving skill through developing real time
applications. ? To empower students with tools and techniques for handling,
managing, analyzing and interpreting data. ? To imbibe quality research and develop
solutions to the social issues. Programme Specific Outcomes PSO1: Abstract
thinking: Ability to understand the abstract concepts that lead to various data
science theories in Mathematics, Statistics and Computer science. PSO2: Problem
Analysis and Design Ability to identify analyze and design solutions for data
science problems using fundamental principles of mathematics, Statistics,
computing sciences, and relevant domain disciplines. PSO3: Modern software tool
usage: Acquire the skills in handling data science programming tools towards
problem solving and solution analysis for domain specific pr
Assesment Pattern
50-50
DefinitionofLinearMaps-AlgebraicOperationson L(V,W) -
Null spaces and Injectivity-RangeandSurjectivity-
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 6/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
FundamentalTheoremsofLinearMaps-Representing
aLinearMapbyaMatrix-InvertibleLinearMaps-
IsomorphicVectorspaces-LinearMap as Matrix
Multiplication - Operators - Products of Vector Spaces -
Product of Direct Sum - Quotients of Vector spaces.
Unit-3 Teaching Hours:12
EIGENVALUES, EIGENVECTORS, AND
INNER PRODUCT SPACES
Eigenvalues and Eigenvectors - Eigenvectors and Upper
Triangular matrices - Eigenspaces and Diagonal Matrices -
Inner Products and Norms - Linear functionals on Inner
Product spaces.
Unit-4 Teaching Hours:12
BASIC MATRIX METHODS FOR
APPLICATIONS
Matrix Norms – Least square problem - Singular value
decomposition- Householder Transformation and QR
decomposition- Non Negative Matrix Factorization –
bidiagonalization.
Unit-5 Teaching Hours:12
MATHEMATICS APPLIED TO DATA
SCIENCE
Handwritten digits recognition using simple algorithm -
Classification of handwritten digits using SVD bases and Tangent
distance - Text Mining using Latent semantic index, Clustering,
Non-negative Matrix Factorization and LGK bidiagonalization.
Text Books And Reference Books:
CIA - 50%
ESE - 50%
MDS131L - MATHEMATICAL FOUNDATION
FOR DATA SCIENCE I (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at
introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra
in applications to Data Science
Learning Outcome
Understand the properties of Vector spaces
Use the properties of Linear Maps in solving problems on Linear Algebra
Demonstrate proficiency on the topics Eigenvalues, Eigenvectors and Inner Product
Spaces
Apply mathematics for some applications in Data Science
2. Eldén Lars, Matrix methods in data mining and pattern recognition, Society for Industrial
and Applied Mathematics, 2007.
Essential Reading / Recommended Reading
1. E. Davis, Linear algebra and probability for computer science applications, CRC Press,
2012.
2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society
for Industrial and Applied Mathematics, 2011.
3. D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012.
4. P. N. Klein, Coding the matrix: linear algebra through applications to computer science,
Newtonian Press,2015
Evaluation Pattern
CIA I : 10%
CIA II : 25%
ATTENDANCE : 5%
ESE : 50%
MDS132 - PROBABILITY AND DISTRIBUTION
THEORY (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
Probability and probability distributions play an essential role in
modeling data from the real-world phenomenon. This course will
equip students with thorough knowledge in probability and various
probability distributions and model real-life data sets with an
appropriate probability distribution
Learning Outcome
CO1: Describe random event and probability of events
CIA: 50%
ESE: 50%
MDS132L - PROBABILITY AND DISTRIBUTION
THEORY (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
Course Objectives
To enable the students to understand the properties and applications of
various probability functions.
Learning Outcome
CO1: Demonstrate the random variables and its functions
CO2: Infer the expectations for random variable functions and generating functions.
1. B.R Bhat, Modern Probability Theory, New Age International, 4th
Edition, 2014.
2. H.A David and H.N Nagaraja, Order Statistics, John Wiley & Sons, 3rd
Edition, 2003.
Evaluation Pattern
CIA - 50%
ESE - 50%
MDS133 - PRINCIPLES OF DATA SCIENCE (2021
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 12/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
MDS133 - PRINCIPLES OF DATA SCIENCE (2021
Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
To provide strong foundation for data science and application area
related to information technology and understand the underlying
core concepts and emerging technologies in data science
Learning Outcome
CO1:Explore the fundamental concepts of data science
Unit-1 Teaching Hours:10
INTRODUCTION TO DATA
SCIENCE
Definition – Big Data and Data Science Hype – Why data science –
Getting Past the Hype – The Current Landscape – Who is Data
Scientist? - Data Science Process Overview – Defining goals –
Retrieving data – Data preparation – Data exploration – Data
modeling – Presentation.
Unit-2 Teaching Hours:12
BIG DATA
Problems when handling large data – General techniques for
handling large data – Case study – Steps in big data – Distributing
data storage and processing with Frameworks – Case study.
Unit-3 Teaching Hours:12
MACHINE LEARNING
Machine learning – Modeling Process – Training model – Validating
model – Predicting new observations –Supervised learning
algorithms – Unsupervised learning algorithms.
Unit-4 Teaching Hours:12
DEEP LEARNING
Introduction – Deep Feedforward Networks – Regularization –
Optimization of Deep Learning – Convolutional Networks –
Recurrent and Recursive Nets – Applications of Deep Learning.
Unit-5 Teaching Hours:14
DATA VISUALIZATION
Introduction to data visualization – Data visualization options –
Filters – MapReduce – Dashboard development tools – Creating an
interactive dashboard with dc.js-summary.
Unit-5 Teaching Hours:14
ETHICS AND RECENT TRENDS
Data Science Ethics – Doing good data science – Owners of the data
- Valuing different aspects of privacy - Getting informed consent -
The Five Cs – Diversity – Inclusion – Future Trends.
Text Books And Reference Books:
[1]. Data Science from Scratch: First Principles with Python, Joel
Grus, O’Reilly, 1st edition, 2015
[2]. Doing Data Science, Straight Talk from the Frontline, Cathy
O'Neil, Rachel Schutt, O’Reilly, 1st edition, 2013
CIA : 50 %
ESE : 50 %
MDS133L - PRINCIPLES OF DATA SCIENCE (2021
Batch)
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 14/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
Course Objectives:
To provide strong foundation for data science and application area related to information
technology and understand the underlying core concepts and emerging technologies in
data science
Learning Outcome
CO1:Explore the fundamental concepts of data science
CO5:Learn to think through the ethics surrounding privacy, data sharing and algorithmic
decision-making
Learning Outcome
CO1: Understand the essense of research and the necessity of
defining a research problem.
Unit-1 Teaching Hours:8
RESEARCH METHODOLOGY
Defining research problem:Selecting the problem, Necessity of
defining the problem ,Techniques involved in defining a problem-
Ethics in Research.
Unit-2 Teaching Hours:8
RESEARCH DESIGN
Principles of experimental design,Working with Literature:
Importance, finding literature, Using your resources, Managing the
literature, Keep track of references,Using the literature, Literature
review,On-line Searching: Database ,SCIFinder, Scopus, Science
Direct ,Searching research articles , Citation Index ,Impact Factor
,H-index.
Unit-3 Teaching Hours:7
RESEARCH DATA
Measurement of Scaling: Quantitative, Qualitative, Classification of
Measure scales, Data Collection, Data Preparation.
Unit-4 Teaching Hours:7
REPORT WRITING
Scientific Writing and Report Writing: Significance, Steps, Layout,
Types, Mechanics and Precautions, Latex: Introduction, Text, Tables,
Figures, Equations, Citations, Referencing, and Templates (IEEE
CIA- 50%
ESE- 50%
CO2: Infer the concept of correlation and regression for relating two
or more related variables.
BASICS OF PROBABILITY
Random experiment, sample point and sample space, event, algebra
of events. Definition of Probability: classical, empirical and
axiomatic approaches to probability, properties of probability.
Theorems on probability, conditional probability and independent
events, Laws of total probability, Baye’s theorem and its applications
Text Books And Reference Books:
[2]. Walpole R.E, Myers R.H, and Myers S.L, Probability and
Statistics for Engineers and Scientists, Pearson, New Delhi, 2017.
[4]. Mood A.M, Graybill F.A and Boes D.C, Introduction to the
Theory of Statistics, McGraw Hill, New Delhi, 2008.
Evaluation Pattern
CIA - 50%
ESE - 50%
MDS161B - INTRODUCTION TO COMPUTERS
AND PROGRAMMING (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:30
Hours/Week:2
Max Marks:50 Credits:2
Course Objectives/Course Description
To enable the students to understand the fundamental concepts of
problem solving and programming structures.
Learning Outcome
CO1: Demonstrate the systematic approach for problem-solving
using computers.
CIA: 50%
ESE: 50%
MDS161BL - INTRODUCTION TO COMPUTERS
AND PROGRAMMING (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:30
Hours/Week:2
Max Marks:50 Credits:2
Course Objectives/Course Description
To enable the students to understand the fundamental concepts of
problem solving and programming structures.
Learning Outcome
CO1: Demonstrate the systematic approach for problem solving
using computers. EM
Essential Reading / Recommended Reading
[1]. EBalagurusamy,FundamentalsofComputers, TMH,2011
Evaluation Pattern
CIA:50%
ESE:50%
MDS161C - LINUX ADMINISTRATION (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:30
Hours/Week:2
Max Marks:50 Credits:2
Course Objectives/Course Description
To Enable the students to excel in the Linux Platform
Learning Outcome
CO2: Manage the Linux environment to work with open source data
science tools
Unit-1 Teaching Hours:10
Module-1
RHEL7.5,breaking root password, Understand and use essential tools for handling
files, directories, command-line environments, and documentation - Configure
local storage using partitions and logical volumes
Unit-2 Teaching Hours:10
Module-2
Swapping, Extend LVM Partitions,LVM Snapshot - Manage users and groups,
including use of a centralized directory for authentication
Unit-3 Teaching Hours:10
Module-3
Kernel updations,yum and nmcli configuration, Scheduling jobs,at,crontab -
Configure firewall settings using firewall config, firewall-cmd, or iptables ,
Configure key-based authentication for SSH ,Set enforcing and permissive modes
for SELinux , List and identify SELinux file and process context ,Restore default
file contexts
Text Books And Reference Books:
1. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/
2. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/
Essential Reading / Recommended Reading
-
Evaluation Pattern
CIA:50%
ESE:50%
MDS161LA - INTRODUCTION TO STATISTICS (2021 Batch)
Total Teaching Hours for Semester:1 No of Lecture Hours/Week:2
Max Marks:50 Credits:2
Course Objectives/Course Description
CO2: Infer the concept of correlation and regression for relating two or more related
variables.
[2]. Gupta S.C and Kapoor V.K, Fundamentals of Mathematical Statistics, 11th edition,
Sultan Chand & Sons, New Delhi, 2014.
Essential Reading / Recommended Reading
[1]. Mukhopadhyay P, Mathematical Statistics, Books and Allied (P) Ltd, Kolkata,
2015.
[2]. Walpole R.E, Myers R.H, and Myers S.L, Probability and Statistics for
Engineers and Scientists, Pearson, New Delhi, 2017.
[3]. Montgomery D.C and Runger G.C, Applied Statistics and Probability for
Engineers, Wiley India, New Delhi, 2013.
[4]. Mood A.M, Graybill F.A and Boes D.C, Introduction to the Theory of Statistics, McGraw
Hill, New Delhi, 2008.
Evaluation Pattern
CIA - 50%
ESE - 50%
MDS171 - DATA BASE TECHNOLOGIES (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
The main objective of this course is to fundamental knowledge and practical
experience with, database concepts. It includes the concepts and terminologies
which facilitate the construction of relational databases, writing effective queries
comprehend data warehouse and NoSQL databases and its types
Learning Outcome
CO1: Demonstrate various databases and Compose effective queries
Lab Exercises
1. Data Definition,
2. Table Creation
3. Constraints
Unit-2 Teaching Hours:18
RELATIONAL MODEL AND
DATABASE DESIGN
SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic
Structure, Set operations, Aggregate Functions, Null Values, Domain
Constraints, Referential Integrity Constraints, assertions, views,
Nested Subqueries, Functional Dependency, Different anomalies in
Lab Exercises
3. Views
Unit-3 Teaching Hours:18
DATA WAREHOUSE: THE BUILDING
BLOCKS
Defining Features, Data Warehouses and Data Marts, Architectural
Types, Overview of the Components, Metadata in the Data
warehouse, Data Design and Data Preparation: Principles of
Dimensional Modeling, Dimensional Modeling Advanced Topics
From Requirements To Data Design, The Star Schema, Star Schema
Keys, Advantages of the Star Schema, Star Schema: Examples,
Dimensional Modeling: Advanced Topics, Updates to the Dimension
Tables, Miscellaneous Dimensions, The Snowflake Schema,
Aggregate Fact Tables, Families Oo Stars
Lab Exercises:
Lab Exercises:
3. Generating Reports
4. Creating the Pivot table and pivot chart using some existing data
Unit-5 Teaching Hours:18
NOSQL Databases
Lab Exercises:
1. MongoDB Exercise - 1
2. MongoDB Exercise - 2
Text Books And Reference Books:
CIA: 50%
ESE: 50%
MDS171L - DATABASE TECHNOLOGIES (2021
Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
Course Description and Course Objectives
The main objective of this course is to fundamental knowledge and practical
experience with, database concepts. It includes the concepts and terminologies
which facilitate the construction of relational databases, writing effective
queries comprehend data warehouse and NoSQL databases and its types
Learning Outcome
CO1: Demonstrate various databases and Compose effective queries
Lab Exercises
1. Data Definition,
2. Table Creation
3. Constraints
Unit-2 Teaching Hours:18
RELATIONAL MODEL AND
DATABASE DESIGN
SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set
operations, Aggregate Functions, Null Values, Domain Constraints, Referential
Integrity Constraints, assertions, views, Nested Subqueries, Functional Dependency,
Different anomalies in designing a Database, Normalization: using functional
dependencies, Boyce-Codd Normal Form, 4NF
Lab Exercises
3. Views
Unit-3 Teaching Hours:18
DATA WAREHOUSE: THE BUILDING
BLOCKS
Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview
of the Components, Metadata in the Data warehouse, Data Design and Data
Preparation: Principles of Dimensional Modeling, Dimensional Modeling Advanced
Topics From Requirements To Data Design, The Star Schema, Star Schema Keys,
Advantages of the Star Schema, Star Schema: Examples, Dimensional Modeling:
Advanced Topics, Updates to the Dimension Tables, Miscellaneous Dimensions,
The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars
Lab Exercises:
Requirements, ETL Data Structures, Extracting, Cleaning and Conforming,
Delivering Dimension Tables, Delivering Fact Tables, Real-Time ETL Systems
Lab Exercises:
3. Generating Reports
4. Creating the Pivot table and pivot chart using some existing data
Lab Exercises:
1. MongoDB Exercise - 1
2. MongoDB Exercise - 2
Text Books And Reference Books:
[1]. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”,
Mc.Graw Hill.
CO3: Test the hypothesis using nonparametric tests for real world
problems.
Lab Exercise:
8. Single sample mean test
9. Independent and Paired sample mean test
10. Tests of proportion of one and two samples based on t-distribution
11. Test of equality of two variances
12. Chi-square test for independence of attributes and goodness of fit.
Unit-4 Teaching Hours:18
ANALYSIS OF VARIANCE
Meaning and assumptions - Fixed, random and mixed effect models -
Analysis of variance of one-way and two-way classified data with and
without interaction effects – Multiple comparison tests: Tukey’s method -
critical difference.
Lab Exercise:
13. Construction of one-way ANOVA
14. Construction of two-way ANOVA with interaction
15. Construction of two-way ANOVA without interaction
16. Multiple comparision test using Tukey’s method and critical difference
methods
Unit-5 Teaching Hours:18
NONPARAMETRIC TESTS
Concept of Nonparametric tests - Run test for randomness - Sign test and
Wilcoxon Signed Rank Test for one and paired samples - Run test - Median
test and Mann-Whitney-Wilcoxon tests for two samples.
Lab Exercise:
17. Test of one sample using Run and sign tests
18. Test of paried sample using Wilcoxon signed rank test
19. Test of two samples using Run test and Median test
1. Walpole R.E, Myers R.H and Myers S.L, Probability and Statistics
for Engineers and Scientists, 9th edition, Pearson, New Delhi, 2017.
CIA: 50%
ESE:50%
MDS172L - INFERENTIAL STATISTICS (2021 Batch)
Total Teaching Hours for No of Lecture
Semester:90 Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course
Description
This course is designed to introduce the concepts of theory of
estimation and testing of hypothesis. This paper also deals with the
concept of parametric tests for large and small samples. It also
provides knowledge about non-parametric tests and its applications
Learning Outcome
CO1: Demonstrate the concepts of point and interval estimation of
unknown parameters and their significance using large and small
samples.
CO3: Infer the concept of nonparametric tests for single sample and
two samples.
Unit-1 Teaching Hours:15
SUFFICIENT STATISTICS
Neyman - Fisher Factorisation theorem - the existence and
construction of minimal sufficient statistics - Minimal sufficient
statistics and exponential family - sufficiency and completeness -
sufficiency and invariance.
Lab Excercise
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 34/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
Lab Excercise
Lab Exercise:
Lab Excercise :
4. Tests for mean, equality of means when variance is (i) known, (ii)
unknown under normality
Lab Excercise:
Lab Exercise :
2. Nonparametric tests.
[2]. Linear Statistical Inference and its Applications, Rao C.R, Willy
Publications, 2nd Edition, 2001.
Evaluation Pattern
CIA - 50%
ESE - 50%
MDS173 - PROGRAMMING FOR DATA
SCIENCE IN PYTHON (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:100 Credits:4
Course Objectives/Course Description
The objective of this course is to provide comprehensive
knowledge of python programming paradigms required for
Data Science.
Learning Outcome
Lab Exercises
1. Demonstrate usage of branching and loopingstatements
3. DemonstrateLists
Unit-2 Teaching Hours:17
SEQUENCE DATATYPES AND OBJECT-ORIENTED
PROGRAMMING
2. DemonstrateDictionaries
2. Demonstrate3Dplotting
Text Books And Reference Books:
[1]. Jake VanderPlas ,Python Data Science Handbook - Essential Tools for Working
with Data, O’Reily Media,Inc, 2016
[2]. Zhang.Y ,An Introduction to Python and Computer Programming,
Springer Publications,2016
Essential Reading / Recommended Reading
[1].JoelGrus,DataSciencefromScratchFirstPrincipleswithPython,O’ReillyMedia,2016
[2]. T.R.Padmanabhan, Programming with Python,SpringerPublications,2016
Evaluation Pattern
CIA: 50%
ESE: 50%
MDS173L - PROGRAMMING OF DATA SCIENCE IN PYTHON (2021
Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:100 Credits:4
Course Objectives/Course Description
This course aims at laying down the foundational concepts of python programming. Starting
with the fundamental programming using python, it escalates to the advanced programming
concepts required for Data Science. It enables the students to organize, process and
visualize data using the packages available in Python.
CO3: Implement numerical programming, data handling and visualization through NumPy,
Pandas and MatplotLib modules.
Unit-1 Teaching Hours:17
INTRODUCTION TO PYTHON
Structure of Python Program-Underlying mechanism of Module Execution-
Branching and Looping-Problem Solving Using Branches and Loops-Functions -
Lists and Mutability- Problem Solving Using Lists and Functions
Unit-2 Teaching Hours:17
SEQUENCE DATATYPES AND OBJECT-
ORIENTED PROGRAMMING
Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-
Inheritance- Exceptional Handling-Introduction to Regular Expressions using “re”
module.
Unit-3 Teaching Hours:13
USING NUMPY
Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays-
Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured
Data: NumPy’s Structured Array.
Unit-4 Teaching Hours:13
DATA MANIPULATION WITH PANDAS -I
Introduction to Pandas Objects-Data indexing and Selection-Operating on Data in
Pandas- Handling Missing Data-Hierarchical Indexing - Combining Data Sets
Unit-5 Teaching Hours:17
DATA MANIPULATION WITH PANDAS -II
Aggregation and Grouping-Pivot Tables-Vectorized String Operations -Working
with Time Series-High Performance Pandas- and query()
Unit-6 Teaching Hours:13
VISUALIZATION AND MATPLOTLIB
Basic functions of matplotlib-Simple Line Plot, Scatter Plot-Density and Contour
Plots- Histograms, Binnings and Density-Customizing Plot Legends, Colour Bars-
Three- Dimensional Plotting in Matplotlib
Text Books And Reference Books:
Evaluation Pattern
Unit-4 Teaching Hours:12
Graph Theory - Basics
Graph Classes: Definition of a Graph and Graph terminology, isomorphism of
graphs, Completegraphs, bipartite graphs, complete bipartite graphs-Vertex degree:
adjacency and incidence, regular graphs - subgraphs, spanning subgraphs, induced
subgraphs, removing or adding edges of a graph, removing vertices from graphs -
Graph Operations: Graph Union, intersection, complement, self complement, Paths
and Cycles, Connected graphs, Eulerian and HamiltonianGraphs.
Unit-5 Teaching Hours:12
Graph Theory - More concepts
Matrix Representation of Graphs, Adjacency matrices, Incidence Matrices, Trees
and its properties, Bridges (cut-edges), spanning trees, weighted Graphs, minimal
spanning tree problems, Shortest path problems, cut vertices, cuts, vertex and edge
connectivity, Graph Algorithms - Applications of Graph Theory
Text Books And Reference Books:
CIA:50%
ESE :50%
MDS232 - REGRESSION ANALYSIS (2021 Batch)
Total Teaching Hours for No of Lecture
Semester:60 Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course
Description
This course aims to provide the grounding knowledge about the regression
model building of simple and multiple regression.
Learning Outcome
CO3: Understand the forward, backward and stepwise methods for selecting
the variables
[2]. S. Chatterjee and AHadi, Regression Analysis by Example, 4th Ed., John
Wiley and Sons, Inc, 2006
[3].Seber, A.F. and Lee, A.J. (2003) Linear Regression Analysis, John Wiley,
Relevant sections from chapters 3, 4, 5, 6, 7, 9, 10.
Essential Reading / Recommended Reading
[1]. Iain Pardoe, Applied Regression Modeling, John Wiley and Sons, Inc,
2012.
CIA - 50%
ESE - 50%
MDS241A - MULTIVARIATE ANALYSIS (2021 Batch)
Total Teaching Hours for No of Lecture
Semester:60 Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course
Description
This course lays the foundation of Multivariate data analysis. The exposure
provided to multivariate data structure, multinomial and multivariate normal
distribution, estimation and testing of parameters, various data reduction
methods would help the students in having a better understanding of research
data, its presentation and analysis.
Learning Outcome
CO1: Understand multivariate data structure, multinomial and multivariate
normal distribution
CIA - 50%
ESE - 50%
MDS241B - STOCHASTIC PROCESS (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
This course is designed to introduce the concepts of theory of estimation and
testing of hypothesis. This paper also deals with the concept of parametric
tests for large and small samples. It also provides knowledge about non-
parametric tests and its applications.
Learning Outcome
CO1: Demonstrate the concepts of point and interval estimation of
unknown parameters and their significance using large and small
samples.
CO3: Infer the concept of nonparametric tests for single sample and
two samples.
Unit-1 Teaching Hours:12
INTRODUCTION TO STOCHASTIC
PROCESSES
Classification of Stochastic Processes, Markov Processes – Markov Chain -
Countable State Markov Chain. Transition Probabilities, Transition
Probability Matrix. Chapman - Kolmogorov's Equations, Calculation of n -
step Transition Probability and its limit.
Unit-2 Teaching Hours:12
POISSON PROCESS
Classification of States, Recurrent and Transient States - Transient Markov
Chain, Random Walk and Gambler's Ruin Problem. Continuous Time
Markov Process:, Poisson Processes, Birth and Death Processes,
Kolmogorov’s Differential Equations, Applications.
Unit-3 Teaching Hours:12
BRANCHING PROCESS
Branching Processes – Galton – Watson Branching Process - Properties of
Generating Functions – Extinction Probabilities – Distribution of Total
Number of Progeny. Concept of Weiner Process.
Unit-4 Teaching Hours:12
RENEWAL PROCESS
Renewal Processes – Renewal Process in Discrete and Continuous Time –
Renewal Interval – Renewal Function and Renewal Density – Renewal
Equation – Renewal theorems: Elementary Renewal Theorem. Probability
Generating Function of Renewal Processes.
Unit-5 Teaching Hours:12
STATIONARY PROCESS
Stationary Processes: Discrete Parameter Stochastic Process – Application to
Time Series. Auto-covariance and Auto-correlation functions and their
properties. Moving Average, Autoregressive, Autoregressive Moving
Average, Autoregressive Integrated Moving Average Processes. Basic ideas
of residual analysis, diagnostic checking, forecasting.
Text Books And Reference Books:
[2]. Stochastic Processes, S.M Ross, Wiley India Pvt. Ltd, 2008.
Essential Reading / Recommended Reading
[1]. Stochastic Processes from Applications to Theory, P.D Moral and S.
Penev, CRC Press, 2016
CIA - 50%
ESE - 50%
MDS241C - CATEGORICAL DATA ANALYSIS
(2021 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
Categorical data analysis deals with the study of information
captured through expressions or verbal forms. This course equips the
students with the theory and methods to analyse and categorical
responses.
Learning Outcome
CO1: Describe the categorical response.
1. Agresti, A. (2012). Categorical Data Analysis, 3rd Edition. New York: Wiley
Essential Reading / Recommended Reading
1. Le, C.T. (2009). Applied Categorical Data Analysis and Translational Research,
2nd Ed., John Wiley and Sons.
2. Agresti, A. (2010). Analysis of ordinal categorical. John Wiley & Sons.
3. Stokes, M. E., Davis, C. S., & Koch, G. G. (2012). Categorical data analysis
using SAS. SAS Institute.
4. Agresti, A. (2018). An introduction to categorical data analysis. John Wiley &
Sons.
CIA:50%
ESE:50%
MDS271 - MACHINE LEARNING (2021 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
Theobjectiveofthiscourseistoprovideintroductiontotheprinciplesanddesignofmachine
learning algorithms. The course is aimed at providing foundations for conceptual
aspects of machine learning algorithms along with their applications to solve real
world problems.
Learning Outcome
CO1: Understand the basic principles of machine learning techniques.
CO2:Understandhowmachinelearningproblemsareformulatedandsolved.
CO3:Applymachinelearningalgorithmstosolverealworldproblems.
Unit-1 Teaching Hours:18
INRTODUCTION
MachineLearning-ExamplesofMachineApplications-LearningAssociations-
Classification- Regression-UnsupervisedLearning-Reinforcement
Learning.Supervised Learning: Learning class from examples- Probably
Approach Correct(PAC) Learning-Noise-Learning Multiple classes. Regression-
Model Selection and Generalization.
IntroductiontoParametricmethods-MaximumLikelihood Estimation:Bernoulli
Density- Multinomial Density-Gaussian Density, Nonparametric Density
Estimation: Histogram Estimator-Kernel Estimator-K-Nearest
NeighbourEstimator.
Lab Exercise:
Lab Exercise:
Lab Exercise
1. Lineardiscrimination
2. Logisticdiscrimination
Lab Exercise
2. Ensemble Learning
Unit-5 Teaching Hours:18
UNSUPERVISED LEARNING
Clustering
Lab Exercise
2. Hierarchical clustering
Text Books And Reference Books:
1. C.M.Bishop,PatternRecognitionandMachineLearning,Springer,2016.
3.
K.P.Murphy,MachineLearning:AProbabilisticPerspective,MITPress,2012.
Evaluation Pattern
CIA: 50%
ESE: 50%
MDS272A - HADOOP (2021 Batch)
Total Teaching Hours for Semester:90 No of Lecture Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
The subject is intended to give the knowledge of Big Data evolving in every real-time
applications and how they are manipulated using the emerging technologies. This course
breaks down the walls of complexity in processing Big Data by providing a practical
approach to developing Java applications on top of the Hadoop platform. It describes the
Hadoop architecture and how to work with the Hadoop Distributed File System (HDFS)
and HBase in Ubuntu platform.
Learning Outcome
Lab Exercise
1. Installing and Configuring Hadoop
Unit-2 Teaching Hours:15
CONFIGURATIONS OF HADOOP
Lab Exercise
Lab Exercise:
Lab Exercise:
1. 1. Count the number of missing and invalid values through joining
two large given datasets.
3. Analyze the sentiment for product reviews, this work proposes
3.
Lab Exercise
1. Trend Analysis based on Access Pattern over Web Logs using
Hadoop.
2. Service Rating Prediction by Exploring Social Mobile Users Geographical
Locations.
Unit-6 Teaching Hours:15
Hbase
RDBMS VsNoSQL, HBasics, Installation, Building an online query
application – Schema design, Loading Data, Online Queries, Successful
service.
Hands On: Single Node Hadoop Cluster Set up in any cloud service
provider- How to create instance.How to connect that Instance Using
putty.InstallingHadoop framework on this instance. Run sample programs
which come with Hadoop framework.
Lab Exercise:
Springer, 2015.
[2] Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop Real-World
Solutions Cookbook, Packt Publishing, 2013.
[3] Tom White, HADOOP: The definitive Guide, O Reilly, 2012.
Evaluation Pattern
CIA - 50%
ESE - 50%
MDS272B - IMAGE AND VIDEO ANALYTICS (2021
Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
This course will provide a basic foundation towards digital image
processing and video analysis. This course will also provide brief
introduction about various Object Detection, Recognition,
Segmentation and Compression methods which will help the
students to demonstrate real-time image and video analytics
applications.
Learning Outcome
CO1: Understand the fundamental principles of image and video
analysis
Lab Programs:
Lab Programs:
Lab Programs:
Lab Programs:
Lab Programs:
Essential Reading / Recommended Reading
[2] RichardSzeliski,ComputerVision–
AlgorithmsandApplications,Springer,2011.
ESE: 50%
6. Create a basic sound system by making use of knobs, speakers, LED bulbs
etc., to mimic the sound produced by a race car, ambulance, siren etc.
10. Elderly care: We want to monitor very senior citizens whether they had a
sudden fall. If a very senior citizen falls suddenly while walking, due to stroke or
slippery ground etc, a notification should be sent out so that he/she can get
immediate medical attention. shown, for better functionality of later projects.
Unit-5 Teaching Hours:18
MAC, Routing and Transport Protocols in
WSN
Introduction – Fundamentals of MAC Protocols – MAC protocols for
WSN – Sensor MAC CaseStudy–RoutingChallengesandDesignIssues–
RoutingStrategies–TransportControl Protocols–
TransportProtocolDesignIssues–PerformanceofTransportProtocols
Unit-5 Teaching Hours:18
Lab Exercise
11. Smart street lights: The street lights should increase or decrease their
intensity based on the actual requirements of the amount of light needed at that
time of the day. This will save a lot of energy for the municipal corporation.
a. Glow RED if the Binary bit is '0'. Glow GREEN if the binary bit is '1'
i. For example:
iii. 001 = 1 (Two LEDs Should be RED , and one LED should be GREEN)
iv. If Button is pressed in between, Reset the counter and Re-start from 0.
Theft prevention system for night: When the room is dark and Board is moved or
tilted (say around 90 degree), it should alarm.
Text Books And Reference Books:
[1] Arshdeep Bahgaand, Vijay Madisetti, Internet of Things: Hands-
on Approach, Hyderabad University Press, 2015.
[2] Kazem Sohraby, Daniel Minoli and TaiebZnati, Wireless Sensor
Networks: Technology. Protocols and Application, Wiley Publications,
2010.
[3] Waltenegus Dargie and Christian Poellabauer, Fundamentals of
Wireless Sensor Networks: Theory and Practice, A John Wiley and
Sons Ltd., 2010.
Essential Reading / Recommended Reading
[1] Edgar Callaway, Wireless Sensor Networks: Architecture and
Protocols, Auerbach Publications, 2003.
[2] Michael Miller, The Internet of Things, Pearson Education, 2015.
[3] Holger Karl and Andreas Willig, Protocols and Architectures for
Wireless Sensor Networks, John Wiley & Sons Inc., 2005.
[4] Erdal Çayırcı and Chunming
Rong, SecurityinWirelessAdHocandSensorNetworks,John Wiley and
Sons, 2009.
[5] Carlos De MoraisCordeiro and Dharma Prakash Agrawal, Ad Hoc
and Sensor Networks: Theory and Applications, World Scientific
Publishing, 2011.
CIA - 50%
ESE - 50%
MDS273 - PROGRAMMING FOR DATA SCIENCE IN R
(2021 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:100 Credits:4
Course Objectives/Course Description
This lab is designed to introduce implementation of practical machine learning algorithms
using R programming language. The lab will extensively use datasets from real life
situations.
Learning Outcome
CO1: Demonstrate to use R in any OS (Windows / Mac / Linux).
CO3: Demonstrate exploratory data analysis (EDA) for a given data set.
CO5: Implement and assess relevance and effectiveness of machine learning algorithms
for a given dataset.
ESE - 50%
MDS331 - NEURAL NETWORKS AND DEEP
LEARNING (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
The main aim of this course is to provide fundamental knowledge of
neural networks and deep learning. On successful completion of the
course, students will acquire fundamental knowledge of neural
networks and deep learning, such as Basics of neural networks,
shallow neural networks, deep neural networks, forward & backward
propagation process and build various research projects
Learning Outcome
CO1: Understand the major technology trends in neural networks
and deep learning
CO2: Build, train and apply neural networks and fully connected
deep neural networks
CIA: 50%
ESE: 50%
MDS331L - NEURAL NETWORKS AND DEEP
LEARNING (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
The main aim of this course is to provide fundamental knowledge of neural
networks and deep learning. On successful completion of the course, students will
acquire fundamental knowledge of neural networks and deep learning, such as
Basics of neural networks, shallow neural networks, deep neural networks, forward
& backward propagation process and build various research projects
Learning Outcome
CO1: Understand the major technology trends in neural networks and deep learning
CO2: Build, train and apply neural networks and fully connected deep neural
networks
CO3: Implement efficient (vectorized) neural networks for real time application
Unit-1 Teaching Hours:12
INTRODUCTION TO ARTIFICIAL
NEURAL NETWORKS
Neural Networks-Application Scope of Neural Networks- Fundamental Concept of
ANN: The Artificial Neural Network-Biological Neural Network-Comparison
between Biological Neuron and Artificial Neuron-Evolution of Neural Network.
Basic models of ANN-Learning Methods-Activation Functions-Importance
Terminologies of ANN.
Unit-2 Teaching Hours:12
SUPERVISED LEARNING NETWORK
Shallow neural networks- Perceptron Networks-Theory-Perceptron Learning
RuleArchitecture-Flowchart for training Process-Perceptron Training Algorithm for
Single and Multiple Output Classes.
2. Dr. S Lovelyn Rose, Dr. L Ashok Kumar, Dr. D Karthika Renuka, Deep
Learning Using Python, Wiley-India, 1st Edition, 2019.
Essential Reading / Recommended Reading
1. Charu C. Aggarwal, Neural Networks and Deep Learning, Springer, September
2018.
2. Francois Chollet, Deep Learning with Python, Manning Publications; 1st edition,
2017
3. John D. Kelleher, Deep Learning (MIT Press Essential Knowledge series), The
MIT Press, 2019.
Evaluation Pattern
CIA- 50%
ESE-50%
MDS341A - TIME SERIES ANALYSIS AND
FORECASTING TECHNIQUES (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
This course covers applied statistical methods pertaining to time
series and forecasting techniques. Moving average models like
simple, weighted and exponential are dealt with. Stationary time
series models and non-stationary time series models like AR, MA,
ARMA and ARIMA are introduced to analyse time series data.
Learning Outcome
CO1: Ability to approach and analyze univariate time series
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 66/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
CIA: 50%
ESE: 50%
MDS341AL - TIME SERIES ANALYSIS AND
FORECASTING TECHNIQUES (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
This course covers applied statistical methods pertaining to time series and
forecasting techniques. Moving average models like simple, weighted and
exponential are dealt with. Stationary time series models and non-stationary
time series models like AR, MA, ARMA and ARIMA are introduced to
analyse time series data.
Learning Outcome
CO1: Ability to approach and analyze univariate time series
CO2: Able to differentiate between various time series
models like AR, MA, ARMA and ARIMA models
Unit-3 Teaching Hours:15
Estimation of ARMA models
Unit-4 Teaching Hours:15
Non-Stationary Time Series Models
Concept of non-stationarity, general unit root tests for testing non-
stationarity; basic formulation of the ARIMA Model and their statistical
properties-ACF and PACF; forecasting using ARIMA models
Text Books And Reference Books:
CIA: 50%
ESE: 50%
MDS341C - ECONOMETRICS (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course Description
The course is designed to impart the learning of principles of
econometric methods and tools. This is expected to improve
student’s ability to understand of econometrics in the study of
economics and finance. The learning objective of the course is to
provide students to get the basic knowledge and skills of
econometric analysis, so that they should be able to apply it to the
investigation of economic relationships and processes, and also
understand the econometric methods, approaches, ideas, results and
conclusions met in the majority of economic books and articles.
Introduce the students to the traditional econometric methods
developed mostly for the work with cross-sections data.
Learning Outcome
CO1: Demonstrate Simple and multiple Econometric models
CIA : 50%
ESE : 50%
MDS341D - BIO-STATISTICS (2020 Batch)
Total Teaching Hours for No of Lecture
Semester:60 Hours/Week:4
Max Marks:100 Credits:4
Course Objectives/Course
Description
This course provides an understanding of various statistical methods
in describing and analyzing biological data. Students will be
equipped with an idea about the applications of statistical hypothesis
testing, related concepts and interpretation in biological data.
Learning Outcome
CIA:50%
ESE:50%
MDS371 - CLOUD ANALYTICS (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
Amazon Elastic Compute Cloud - Google Compute Engine - Windows Azure
Virtual Machines
Storage Services
Amazon Simple Storage Service - Google Cloud Storage - Windows Azure Storage
Database Services
Amazon Relational Data Store - Amazon DynamoDB - Google Cloud SQL - Google
Cloud Datastore - Windows Azure SQL Database - Windows Azure Table Service
2. Transforming data
Unit-4 Teaching Hours:18
PROCESSING AND VISUALIZING
Google BigQuery - Cloud Dataproc - Google Cloud Datalab - Google Data Studio
2. Setting up and executing a data pipeline job to load data into cloud
Text Books And Reference Books:
1. Sanket Thodge, Cloud Analytics with Google Cloud Platform, Packt Publishing, 2018.
2. Arshdeep Bahga and Vijay Madisetti, Cloud computing - A Hands-On Approach,
Create Space Independent Publishing Platform, 2014.
Essential Reading / Recommended Reading
1. Deven Shah, Kailash Jayaswal, Donald J. Houde, Jagannath Kallakurchi, Cloud
Computing - Black Book, Wiley, 2014.
2. Thomas Erl, Ricardo Puttini, Zaigham Mahmood, Cloud Computing: Concepts,
Technology & Architecture, Prentice Hall, 2014.
Evaluation Pattern
CIA: 50%
ESE: 50%
Amazon Elastic Compute Cloud - Google Compute Engine - Windows Azure
Virtual Machines
Storage Services
Amazon Simple Storage Service - Google Cloud Storage - Windows Azure Storage
Database Services
Amazon Relational Data Store - Amazon DynamoDB - Google Cloud SQL - Google
Cloud Datastore - Windows Azure SQL Database - Windows Azure Table Service
2. Transforming data
PROCESSING AND VISUALIZING
Google BigQuery - Cloud Dataproc - Google Cloud Datalab - Google Data Studio
2. Setting up and executing a data pipeline job to load data into cloud
Text Books And Reference Books:
1.
Sanket Thodge, Cloud Analytics with Google Cloud Platform, Packt Publishing, 2018.
2. Arshdeep Bahga and Vijay Madisetti, Cloud computing - A Hands-On Approach,
Create Space
Essential Reading / Recommended Reading
1. Deven Shah, Kailash Jayaswal, Donald J. Houde, Jagannath Kallakurchi, Cloud
Computing - Black Book, Wiley, 2014.
CIA-50%
ESE-50%
MDS372A - NATURAL LANGUAGE
PROCESSING (2020 Batch)
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 79/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
The goal is to make familiar with the concepts of the study of human
language from a computational perspective. It covers syntactic,
semantic and discourse processing models, emphasizing machine
learning concepts.
Learning Outcome
CO1: Understand various approaches on syntax and semantics in
NLP
Lab Exercises:
Lab Exercises:
Lab Exercises:
Unit-4 Teaching Hours:18
NATURAL LANGUAGE GENERATION
AND MACHINE TRANSLATION
Natural Language Generation: Architecture of NLG Systems,
Applications
LabExercises:
Unit-5 Teaching Hours:18
INFORMATION RETRIEVAL AND
LEXICAL RESOURCES
Information Retrieval: Design features of Information
Retrieval Systems-Classical, Non- classical, Alternative
Models of Information Retrieval – valuation Lexical
Resources: Word Embeddings - Word2vec- Glove.
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 81/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
Lab Exercises
Essential Reading / Recommended Reading
2. Steven Bird, Ewan Klein and Edward Loper Natural Language
Processing with Python, O’Reilly Media; 1 edition,2009.
Web resources:
1. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
2. https://nptel.ac.in/courses/106101007/
CIA:50%
ESE:50%
MDS372AL - NATURAL LANGUAGE
PROCESSING (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
The goal is to make familiar with the concepts of the study of human
language from a computational perspective. It covers syntactic,
semantic and discourse processing models, emphasizing machine
learning concepts.
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 82/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
Learning Outcome
CO1: Understand various approaches on syntax and semantics in NLP
2. Write a program to count word frequency and to remove stop words
Unit-2 Teaching Hours:18
PARSING AND SYNTAX
WordLevelAnalysis: RegularExpressions,Text Normalization,Edit Distance, Parsing
and Syntax-Spelling,Error Detection and correction-Words and Word classes-Part-
ofSpeech Tagging, Naive Bayes and Sentiment Classification: Case study
Lab Exercises:
1. Write a program to tokenize Non-English Languages
Lab Exercises:
2. Write a program to differentiate stemming and lemmatizing words
Unit-5 Teaching Hours:18
UNSUPERVISED METHODS IN NLP
Graphical Models for Sequence Labelling in NLP
Lab Exercises
1. Write a program for POS Tagging or Word Embeddings.
CO2: Identify key tools and diagnostics associated with Web analytics.
Unit-1 Teaching Hours:18
INTRODUCTION TO WEB ANALYTICS
Introduction to Web Analytics: Web Analytics Approach – A Model
of Analysis – Context matters – Data Contradiction – Working of
Web Analytics: Log file analysis – Page tagging – Metrics and
Dimensions – Interacting with data in Google Analytics
Lab Exercise
Unit-2 Teaching Hours:18
LEARNING ABOUT USERS THROUGH
WEB ANALYTICS
Goals: Introduction – Goals and Conversions – Conversion Rate –
Goal reports in Google Analytics – Performance Indicators –
Analyzing Web Users: Learning about users – Traffic Analysis –
Analyzing user content – Click-Path analysis – Segmentation
Lab Exercise
1. Collection of web data and other internet data with the help of
web analytics
Lab Exercise
Lab Exercise
Lab Exercise
Essential Reading / Recommended Reading
CIA 50%
ESE 50%
MDS372C - BIO INFORMATICS (2020 Batch)
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 87/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
MDS372C - BIO INFORMATICS (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
To enable the students to learn the information search and retrieval,
Genome analysis and Gene mapping, alignment of multiple
sequences, and PERL for Bioinformatics.
Learning Outcome
CO1: To understand the molecular Biology and Bioinformatics
applications.
Introduction, Tools for web search, Data retrieval tools, Data mining of Biological
databases.
Lab Exercise
Lab Exercise
3. Write a PERL script to demonstrate the Array operations and Regular expressions.
Unit-3 Teaching Hours:18
ALIGNMENT OF MULTIPLE SEQUENCES
ALIGNMENT OF MULTIPLE SEQUENCES Methods of multiple sequence
alignment, Evaluating multiple alignments, Applications of multiple alignments,
Phylogenetic analysis, Methods of phylogenetic analysis, Tree evaluation, Problems
in Phylogenetic analysis.
Lab Exercise
Motifs and Loops: Flow control, Code layout, Finding motifs, Counting Nucleotides,
Exploding strings and arrays, Operating on strings. Subroutine and Bugs: Subroutines,
Scoping and Subroutines, Command line arguments and Arrays, Passing data to
Subroutines, Modules and Libraries of Subroutines.
Lab Exercise
2 . Case Study: a. To retrieve the sequence of the Human keratin protein from UniProt
database and to interpret the results. b. To retrieve the sequence of the Human keratin
protein from GenBank database and to interpret the results.
Text Books And Reference Books:
[2] Beginning Perl for Bioinformatics, Tisdall James, 1st edition, Shroff Publishers
(O’Reilly), 2009.
Essential Reading / Recommended Reading
[2] Bioinformatics Technologies, Yi-Ping Phoebe Chen (Ed), 1st edition, Springer,
2005.
[3] Bioinformatics Computing, Bryan Bergeron, 2nd Edition, Prentice Hall, 1st
edition, 2003.
Web resources:
[1]
http://cac.annauniv.edu/PhpProject1/aidetails/afug_2013_fu/24.%20BIO%20MED.pdf
[2] https://www.amrita.edu/school/biotechnology/academics/pg/introduction-
bioinformaticsbif410
[3] https://canvas.harvard.edu/courses/8084/assignments/syllabus
[4] https://www.coursera.org/specializations/bioinformatics
[5] http://www.dtc.ox.ac.uk/modules/introduction-bioinformatics-bioscientists.html
Evaluation Pattern
CIA 50%
ESE 50%
MDS372D - EVOLUTIONARY ALGORITHMS (2020 Batch)
Total Teaching Hours for Semester:90 No of Lecture Hours/Week:6
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 90/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
2.
D.Goldberg,Geneticalgorithmsinsearch,optimization,andmachinelearning.Boston:
Addison-Wesley,2012.
5.
T.Bäck,Evolutionaryalgorithmsintheoryandpractice.NewYork:OxfordUniv.Press,
1996.
Web Resources:
1 E.A.EandS.J.E,"IntroductiontoEvolutionaryComputing|Theon-line
accompaniment to the book Introduction toEvolutionary
Computing",Evolutionarycomputation.org,2015.[Online].Available:
http://www.evolutionarycomputation.org/.
2 F.Lobo,"EvolutionaryComputation2018/2019",Fernandolobo.info,2018.
[Online]. Available:http://www.fernandolobo.info/ec1819.
3 "EClabTools",Cs.gmu.edu,2008.[Online].Available:
https://cs.gmu.edu/~eclab/tools.html.
CIA: 50%
ESE : 50%
MDS372E - OPTIMIZATION TECHNIQUE (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:90
Hours/Week:6
Max Marks:150 Credits:5
Course Objectives/Course Description
This course will help the students to acquire and demonstrate the implementation
of the necessary algorithms for solving advanced level Optimization techniques.
Learning Outcome
CO3: Use linear programming in the formulation of the shortest route problem.
Lab Exercise
1. Simplex Method
2. Dual Simplex Method
Unit-2 Teaching Hours:18
TRANSPORTATION MODEL
Determination of the Starting Solution – Iterative computations of
the transportation algorithm. Assignment Model: The Hungarian
Method – Simplex explanation of the Hungarian Method – The
trans-shipment Model.
Lab Exercise
Unit-3 Teaching Hours:18
CPM and PERT
Network Representation – Critical Path Computations –
Construction of the time Schedule – Linear Programming
https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021 94/100
08/01/2022, 01:21 https://christuniversity.in/School of Sciences/COMPUTER SCIENCE/MSc in Data Science/syllabus/480/2021
Lab Exercise:
Unit-3 Teaching Hours:18
NETWORK MODELS
Minimal Spanning tree Algorithm – Linear Programming
formulation of the shortest-route problem. Maximal Flow Model:
Enumeration of cuts – Maximal Flow Diagram – Linear
Programming Formulation of Maximal Flow Model.
Unit-4 Teaching Hours:18
GOAL PROGRAMMING
Formulation – Tax Planning Problem – Goal Programming
algorithms – Weights method – Preemptive method.
Lab Exercise:
Lab Exercise:
1. Goal Programming
2. Dynamic Programming
Unit-5 Teaching Hours:18
MARKOV CHAINS
CIA: 50%
ESE: 50%
MDS381 - SPECIALIZATION PROJECT (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:60
Hours/Week:4
Max Marks:100 Credits:2
Course Objectives/Course Description
The course is designed to provide a real-world project development
and deployment environment for the students.
Learning Outcome
CO1: Identify the problem and relevant analytics for the selected
domain.
-
Essential Reading / Recommended Reading
-
Evaluation Pattern
CIA: 50%
ESE: 50%
MDS381L - SPECIALIZATION PROJECT (2020
Batch)
NOT APPLICABLE
Essential Reading / Recommended Reading
NOT APPLICABLE
Evaluation Pattern
CIA: 50%
ESE: 50%
MDS382 - SEMINAR (2020 Batch)
Learning Outcome
CO1: Understand new and latest trends in data science
Teaching
Unit-1 Hours:30
Students will be giving presentations on any
advanced concepts and technologies in data
science and submit the report
-
Text Books And Reference Books:
Recommended References
Evaluation Pattern
CIA 100%
MDS382L - SEMINAR (2020 Batch)
No of Lecture
Total Teaching Hours for Semester:30
Hours/Week:2
Max Marks:50 Credits:1
Course Objectives/Course Description
The course is designed to provide to enhance the soft skills and technical
understanding of the students.
Learning Outcome
CO1:Understand new and latest trends in data science
NA
Essential Reading / Recommended Reading
NA
Evaluation Pattern
100% CIA
-
Essential Reading / Recommended Reading
-
Evaluation Pattern
CIA: 50%
ESE: 50%