MCA 3rd Semester Big Data Analytics Syllabus
MCA 3rd Semester Big Data Analytics Syllabus
Module-2
Understanding Hadoop Ecosystem :
Hadoop Ecosystem, Hadoop Distributed File System: HDFS Architecture, Concept of Blocks in HDFS in HDFS
Architecture, NameNodes and DataNodes, The Command-line Interface, Using HDFS Files, HDFS High
Availability, Features of HDFS, MapReduce, Hadoop YARN, Introducing HBase: HBase Architecture, Regions,
Storing Big Data with Hbase, Interacting with Hadoop Ecosystem, Hbase in Operation – Programming with
HBase,Combining HBase and HDFS: REST and Thrift, Data Integrity in HDFS, Features of HBase, Hive, Pig and
Pig Latin, Sqoop, Zookeeper, Flume, Oozie
Module-3
Understanding MapReduce
Fundamentals and HBasE:The MapReduce Framework: Exploring the Features of MapReduce , working of
MapReduce, Exploring Map and Reduce functions.
Techniques to Optimize MapReduce Jobs: Harware / Network Topology,
Synchronization, File System. Uses of MapReduce, Role of HBase in Big data Processing: Characteristics of
HBase, Installation of HBase.
Module-5
Introduction to Hive and Pig :
what is Hive? , Hive Architecture, Hive Data Types, Hive File Format, Hive Query Language (HQL), RCFile
Implementation, SerDe, User-defined Function (UDF).
What is Pig? The Anatomy of Pig, Pig on Hadoop , Pig Philosophy, Use Case for Pig: ETL Processing, Pig Latin
Overview , Data Types in Pig ,Running Pig , Execution Modes of Pig ,HDFS Commands ,Relational Operators,
Eval Function, Complex Data Types ,Piggy Bank, User- Defined Functions (UDF)
,Parameter Substitution , Diagnostic Operator , Word Count Example using Pig ,When to use Pig? When not
to use Pig? Pig at Yahoo!, Pig versus Hive.
2. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics: Emerging Business Intelligence and
Analytic Trends for Today‟s Businesses”, 1st Edition, Michael Minelli, Michele Chambers,
AmbigaDhiraj, Wiley CIO Series, 2013.
3. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O‟reilly, 2012.
Skill Development Activities Suggested
The students with the help of the course teacher can take up technical –activities which will enhance their
skill or the students should interact with industry (small, medium and large), understand their problems or
foresee what can be undertaken for study in the form of research/testing/projects, and for creative and
innovative methods to solve the identified problem. The prepared report shall be evaluated for CIE marks.
1 Foundation Knowledge: Apply knowledge of mathematics, programming logic and coding PO1
fundamentals for solution architecture and problem solving.
2 Problem Analysis: Identify, review, formulate and analyse problems for primarily PO2
focussing on customer requirements using critical thinking frameworks
4 Modern Tool Usage: Select, adapt and apply modern computational tools such as PO4
development of algorithms with an understanding of the limitations including human
biases..
6 Project Management and Finance: Use the principles of project management such as PO6
scheduling, work breakdown structure and be conversant with the principles of Finance for
profitable project management.
7 Ethics: Commit to professional ethics in managing software projects with financial aspects. PO7
Learn to use new technologies for cyber security and insulate customers from malware
8 Life-long learning: Change management skills and the ability to learn, keep up with PO8
contemporary technologies and ways of working.
Mapping of COS and POs
1. William Stallings: Cryptography And Network Security- Principles And Practice, 5th Edition,
Pearson/PHI, 2011.
Reference books:
1. William Stallings, “Network Security Essentials (Applications and Standards)”, 4th Edition,
Pearson Education. ,2012
3. 5. Robert Bragg, Mark Rhodes: “Network Security: The complete reference”, 1st Edition,
TMH, 2004.
Skill Development Activities Suggested:
The students with the help of the course teacher can take up technical –activities which will
enhance their skill or the students should interact with industry (small, medium and large),
understand their problems or foresee what can be undertaken for study in the form of
research/testing/projects, and for creative and innovative methods to solve the identified
problem. The prepared report shall be evaluated for CIE marks.
Course outcome (Course Skill Set):
At the end of the course the student will be able to :
Sl.No Description Blooms
Level
CO1 Analyze and design classical encryption techniques and block L2
ciphers.
CO2 Understand and analyse data encryption standard. L2
CO3 Understand and analyze public-key cryptography, RSA and L2
other public-key cryptosystems
CO4 Understand key management and distribution schemes and L2
design User Authentication, such as Diffie-Hellman Key
Exchange, ElGamal Cryptosystem, etc
CO5 Analyze and design hash and MAC algorithms, and digital L2
signatures
Data Mining and Business Intelligence
Course Code 24MCA242 CIE Marks 50
Teaching Hours/Week(L:P:SDA) 4:0:0 SEE Marks 50
Total Hours of Pedagogy 40 External Marks 100
Credits 4 Exam Hours 03
Course Learning objectives:
To introduce the concept of data Mining as an important tool for enterprise data management and as
a cutting edge technology for building competitive advantage.
To enable students to effectively identify sources of data and process it for data mining.
To impart skills that can enable students to approach business problems analytically by identifying
opportunities to derive business value from data.
Learning how to gather and analyse large sets of data to gain useful business understanding.
Module-1
Introduction to Data Mining: What Is Data Mining, Kinds of Data Can Be Mined, Kinds of Patterns Can Be
Mined, Technologies Used, Kinds of Applications Are Targeted, Major Issues in Data Mining, KDD Process,
Types of Attribute Types, Measuring Data Similarity and Dissimilarity.
Data Pre-processing: Why pre-processing, Data cleaning: Missing Values, Noisy Data; Data Integration,
Data Reduction: Attribute Subset Selection; Data Transformation and Data Discretization: Normalization,
Binning, Histogram Analysis, Concept Hierarchy Generation.
Teaching-Learning Process Chalk and talk method / PowerPoint
Presentation
Module-2
Classification: Basic Concepts; Decision Tree Induction: Decision Tree Induction, Attribute Selection
Measures, Tree Pruning; Bayes Classification Methods: Bayes’ Theorem, Naive Bayesian Classification;
Model Evaluation and Selection: Metrics for Evaluating Classifier Performance, Holdout Method and
Random Subsampling, Cross-Validation, Bootstrap.
Teaching-Learning Process Chalk and talk method /
PowerPoint Presentation
Module-3
Clustering Techniques:Cluster analysis: Basic Concepts; Partitional Methods: KMeans, K-Medoids;
Hierarchical Methods: Agglomerative, Divisive, BIRCH; Density-Based Methods: DBSCAN.
Outlier Analysis: Statistical Distribution-Based Outlier Detection,Distance based outlier detection,Density
Based Local Outlier Detection
Teaching-Learning Process Chalk and talk method /
PowerPoint Presentation
Module-4
Association Analysis: Basic Concepts and Algorithms:Basic Concepts: Market Basket Analysis - A
Motivating Example, Frequent Itemsets, Closed Itemsets, and Association Rules;
Frequent Itemset Mining Methods:Apriori Algorithm - Finding Frequent Itemsets by ConfinedCandidate
Generation, Generating Association Rules from Frequent Itemsets, Evaluation of Association Patterns.
Teaching-Learning Process Chalk and talk method /
PowerPoint Presentation
Module-5
Applications and Trends in Data Mining: Data Mining Applications, Data mining system products and
Research prototypes, Additional themes on data mining, Trends in Data Mining.
Teaching-Learning Process Chalk and talk method /
PowerPoint Presentation
2. Pang-Ning Tan, Michael Steinbach, Vipin Kumar “ Introduction to Data Mining”, Pearson, First
impression.
3. G. Shmueli, N.R. Patel, P.C. Bruce, “Data Mining for Business Intelligence: Concepts, Techniques,
and Applications in Microsoft Office Excel with XLMiner”, 1st Edition, Wiley India.
4. Carlo Vercellis, “Business Intelligence: Data Mining and Optimization for Decision Making”, Wiley
India Publication
Skill Development Activities Suggested:
The students with the help of the course teacher can take up technical –activities which will enhance
their skill or the students should interact with industry (small, medium and large), understand their
problems or foresee what can be undertaken for study in the form of research/testing/projects, and for
creative and innovative methods to solve the identified problem. The prepared report shall be evaluated
for CIE marks.
Course outcome (Course Skill Set) :
At the end of the course the student will be able to :
Sl.No Description Blooms Level
CO1 To Demonstrate an understanding of the L2
concepts of importance of data mining, and the
principles of business intelligence
CO2 ToOrganize and Prepare the data needed for L2
data mining using pre preprocessing techniques
CO3 ToImplement the appropriate data mining L2
methods like classification, clustering, or
Frequent Pattern mining on large data sets.
CO4 ToDefine and apply metrics to measure the L2
performance of various data mining algorithms.
CO5 Apply BI to solve practical problems : Analyze L2
the problem domain, use the data collected in
enterprise apply the appropriate data mining
technique, interpret and visualize the results
and provide decision support.
Cloud Computing
Course Code 24MCA252 CIE Marks 50
Teaching 4:0:0 SEE Marks 50
Hours/Week(L:P:SDA
)
Total Hours of 40 External Marks 100
Pedagogy
Credits 4 Exam Hours 03
Course Learning objectives:
Explain the fundamentals of cloud computing
Illustrate the cloud application programming and aneka platform
Contrast different cloud platforms used in industry
Module-1
Introduction to Cloud Computing: Eras of computing, The vision of Cloud Computing, Defining a
cloud, A closer look, Cloud computing reference model, Historical developments: Distributed
systems, Virtualization, Web 2.0; Service oriented computing; Utility oriented computing.
TeachingLearning Process Chalk and Talk/PPT/Web Content
Module-2
Architectures for parallel and distributed computing: Parallel Vs Distributed computing, Elements
of parallel computing, Elements of distributed computing, Technologies for distributed computing.
TeachingLearning Process Chalk and Talk/PPT/Web Content
Module-3
Virtualization: Introduction, Characteristics of virtualized environments, Taxonomy of
virtualization techniques, Virtualization and cloud computing, Pros and cons of virtualization,
Technology examples: Xen: Para virtualization, VmWare: Full virtualization, Microsoft Hyper – V.
TeachingLearning Process Chalk and Talk/PPT/Web Content
Module-4
Cloud computing architecture: Introduction, Cloud reference model: Architecture, IaaS, PaaS,
SaaS, Types of Clouds: Public, Private, Hybrid and Community clouds, Economics of the cloud,
Open challenges.
TeachingLearning Process Chalk and Talk/PPT/Web Content
Module-5
Cloud Platforms in Industry : Amazon web services; Google AppEngine; Microsoft Azure; Cloud
Applications. Scientific applications: Healthcare; Biology; Geo-Science, Business and Consumer
applications: ARM & ERP; Productivity; Social networking
TeachingLearning Process Chalk and Talk/PPT/Web Content
Assessment Details (both CIE and SEE):
Suggested Learning Resources:
Textbooks :
1. Rjkumar Buyya, Christian Vecchiola, and ThamaraiSelci, Mastering Cloud Computing, Tata
McGraw Hill, New Delhi, India, 2013
Reference Books
Cloud Computing for Dummies by Judith Hurwitz, R.Bloor, M. Kanfman, F.Halper (Wiley
India Edition)