3174207
3174207
3174207
Rationale: Today’s world is a data-driven world. Increasingly, the efficient operation of organizations across
sectors relies on the effective use of vast amounts of data. Big data analytics helps us to examine these data to
uncover hidden patterns, correlations, and other insights. It is a fast-growing field and skills in the area are some
of the most in-demand today.
Course Scheme:
Total
Teaching Scheme Assessment Pattern and Marks
Credits Total
Theory Practical Marks
L T PR C
ESE (E) PA(M) ESE (V) PA (I)
3 0 2 4 70 30 30 20 150
Course Content:
Sr. No. of % of
Course Content
No. Hours Weightage
Introduction to Big Data: Introduction to Data, Types of Data, Structured
Data, Unstructured Data, Semi- Structured Data, Meta Data, The Emergence
of ‘New Data’, Comparison of New and Traditional Data
https://syllabus.gtu.ac.in/Page 1
GUJARAT TECHNOLOGICAL UNIVERSITY
BACHELOR OF ENGINEERING SYLLABUS
Subject Code: 3174207
Subject Name: Big Data Analytics
Shuffle and Sort, Task execution, Map Reduce Types and Formats, Map
Reduce Features, Hadoop environment. Hadoop Hybrids, Cluster, Setting up a
Hadoop Cluster, Cluster specification, Cluster Setup and Installation, Hadoop
Configuration, Security in Hadoop, Administering Hadoop,
NoSQL: What is NoSQL? NoSQL business drivers; NoSQL Databases in the
Light of CAP Theorem, NoSQL case studies; NoSQL data architecture
patterns: Key-value stores, Graph stores, Column family (Bigtable) stores,
Document stores, NoSQL Database: Cassandra, Variations of NoSQL
3 architectural patterns; Using NoSQL to manage big data: What is a big data 08 20%
NoSQL solution? Understanding the types of big data problems; Analyzing big
data with a shared-nothing architecture; Choosing distribution models: master-
slave versus peer-to-peer; Four ways that NoSQL systems handle big data
problems
Mining Data Stream: Introduction to Streams Concepts, Stream Data Model
and Architecture, Stream Computing, Sampling Data in a Stream, Filtering
4 Streams, Counting Distinct Elements in a Stream, Estimating moments, 08 15%
Counting oneness in a Window, Decaying Window, Real time Analytics
Platform (RTAP) applications, Case Studies,
Frameworks: Applications on Big Data Using Hive and Pig, Hive
Architecture, Data Flow and Data Types in Hive, Different Types of Tables in
Hive, Partitioning and Bucketing in Hive, Why Apache Pig, Features of
5 08 20%
Apache Pig, Pig vs Mapreduce, Pig Architecture, Data processing operators in
Pig, Hive services, HiveQL, Querying Data in Hive, fundamentals of HBase
and ZooKeeper, IBM InfoSphere BigInsights and Streams.
Spark: Introduction to Data Analysis with Spark, In-Memory Computing
6 with Spark, Spark Basics, Interactive Spark with PySpark, Writing Spark 06 10%
Applications,
Reference Book:
1) Sourabh Mukherjee , Amit Kumar Das and Sayan Goswami “Big Data Simplified”, Pearson
2) Anand Rajaraman and Jeff Ullman “Mining of Massive Datasets”, Cambridge University Press
3) Thomas Eri, Wajid Khattak and Paul Buhler, “Big Data Fundamentals: Concepts, Drivers &
Techniques", Pearson
4) Big Data and Analytics , Seema Acharya, Subhashini Chhellappan, Willey
5) Chris Eaton, Dirk Deroos, Tom Deutsch, George Lapis and Paul Zikopoulcs “Understanding Big data”,
McGraw Hill, 2012
6) Tom White, “HADOOP: The Definitive Guide”, O Reilly 2012
7) Learning Spark: Lightning-Fast Big Data Analysis by Holden Karau
https://syllabus.gtu.ac.in/Page 2
GUJARAT TECHNOLOGICAL UNIVERSITY
BACHELOR OF ENGINEERING SYLLABUS
Subject Code: 3174207
Subject Name: Big Data Analytics
Course Outcome:
After Completion of the Course, Student will able to:
RBT
No Course Outcomes
Level*
01 Identify big data application areas UN
02 Use big data framework AP
03 Model and analyze data by applying selected techniques AN
04 Demonstrate an integrated approach to big data AP
*RM: Remember, UN: Understand, AP: Apply, AN: Analyze, EL: Evaluate, CR: Create
Case Study:
Stage 1:
Selection of case study topics and formation of small working groups of 2/3 students per group. Students engage
with the cases, read through background material provided in the session and work through an initial set of
questions to deepen the understanding of the case. Sample applications and data will be provided to help students
familiarize themselves with the cases and available (big) data.
Stage 2:
The groups are given a specific task relevant to the case in question and are expected to develop a corresponding
big data concept using the knowledge gained in the course and the parameters set by the case study scenario. A
set of questions that help guide through the scenarios will be provided.
Stage 3:
Each group prepares a short 2 –5 page report on their results and a 10 min oral presentation of their big data
concept.
Apart from case study, students can perform following programming exercises:
https://syllabus.gtu.ac.in/Page 3
GUJARAT TECHNOLOGICAL UNIVERSITY
BACHELOR OF ENGINEERING SYLLABUS
Subject Code: 3174207
Subject Name: Big Data Analytics
1. http://in.reuters.com/tools/rss
2. http://www.altova.com/xmlspy.html
3. https://www.w3.org/RDF/
4. https://www.oreilly.com/library/view/data-analytics-with/9781491913734/ch04.html
5. https://data-flair.training/blogs/spark-in-memory-computing/
*******
https://syllabus.gtu.ac.in/Page 4