Gudlavalleru Engineering College R17
BIG DATA ANALYTICS
IV B.Tech- I Semester Branch: CSE & IT
Micro Syllabus
S.No. Topic
UNIT– I: INTRODUCTION TO BIGDATA
1 What is Big Data? What is Big Data Analytics?
Characteristics of Big Data – The Four V’s
Volume
2 Velocity
Variety
Veracity
Why Big Data is important?
3
Different Use cases
Data
Structured
4
Semi-Structured
Un-Structured
5 Data Storage and Analysis
Comparison Hadoop with other systems
RDBMS
6
Grid Computing
Volunteer Computing
7 Brief history of Hadoop
8 Apache Hadoop and Hadoop Eco System
UNIT – II: THE HADOOP DISTRIBUTED FILE SYSTEM
1 Design of HDFS
2 Architecture
Building Blocks of Hadoop
1. Name Node 2. Data Node 3. Secondary Name Node
4. Job Tracker 5. Task Tracker
3
HDFS concepts
HDFS Federation
High availability
Basic file system operations
Copying a file from the local filesystem to HDFS
Copying a file from HDFS to local filesystem
4 Creating directories
Moving files
Deleting files
Listing files and directories
5 Anatomy of a File Read
6 Anatomy of File Write
UNIT – III: INTRODUCTION TO MAP REDUCE
A Weather Dataset
1
Data Format
2 Analyzing weather data with UNIX tools
3 Analyzing the data with Hadoop
Map and Reduce
Page 1 of 2
S.No. Topic
Java Map Reduce (old and new APIs)
Data Flow
Map Reduce dataflow with single reduce task
4
Map Reduce dataflow with multiple reduce tasks
Map Reduce dataflow with no reduce tasks
5 Combiner Functions
6 Running a Distributed Map Reduce Job
UNIT – IV: HOW MAP REDUCE WORKS
Anatomy of Map Reduce job run
Job Submission
Job Initialization
1 Task Assignment
Task Execution
Progress and Status Updates
Job Completion
Shuffle and Sort
2 The Map Side
The Reduce Side
UNIT – V: PIG
1 Admiring the Pig Architecture
2 Going with the Pig Latin Application Flow
Working through the ABCs of Pig Latin
3 Uncovering Pig Latin structures
Looking at Pig data types and syntax
4 Evaluating Local and Distributed Modes of Running Pig Scripts
5 Checking out the Pig Script Interfaces
6 Scripting with Pig Latin
Unit – VI: HIVE
1 Getting Started with Apache Hive
Examining the Hive Clients
The Hive CLI client
2
The web browser as Hive client
SQuirreL as Hive client with the JDBC Driver
3 Working with Hive Data Types
Creating and Managing Databases and Tables with Hive
4 Managing Hive databases
Creating and managing tables with Hive
Seeing How the Hive Data Manipulation Language Works
LOAD DATA examples
5
INSERT examples
Create Table As Select (CTAS) examples
Querying and Analyzing the data
Joining tables with Hive
6 Improving your Hive queries with indexes
Windowing in HiveQL
Other key HiveQL features
Page 2 of 2