BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital
Part A: Content Design
Course Title Stream processing and Analytics
Course No(s)
Credit Units 5
Credit Model
Content Authors
Course Objectives
No
CO1 To introduce the framework for real time stream processing
CO2 To present a survey of tools and techniques for real time stream processing
CO3 To introduce processing various stream processing algorithms
CO4 To introduce approaches to evaluate stream learning algorithms
CO5 To introduce designing solutions to stream processing problems
Text Book(s)
T1 Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data, Byron
Ellis, 2014, Wiley
T2 Knowledge Discovery from Data Streams, João Gama, 2010, Chapman &
Hall/CRC
Reference Book(s) & other resources
R1 Streaming Data: Understanding The Real-Time Pipeline, Andrew G.Psaltis,
2017, Manning Publications
R2 Storm Applied, Sean Allen , Mathew Jankowski, Peter Pathirana, 2015,
Manning Publications
R3 Data Streams: Models and Algorithms, Charu C. Agarwal, 2007, Springer
R4 Fundamentals of Stream Processing, Hentrique MA, Burga Gedik, Deepak
ST, 2014 , Cambridge University Press
Content Structure
1. Introduction to Real Time Big Data Systems
a. Real Time , Streaming Data & Sources
b. Real time streaming system architecture
c. Characteristics of a Real Time Architecture and Processing
2. Configuration and Coordination Systems
a. Motivation
b. Distributed State and Issues
c. Coordination and Configuration using Apache ZooKeeper
3. Data Flow Management
a. Understanding Distributed Data Flows
i. Various Data Delivery and Processing Requirements
ii. N+1 Problem
b. Apache Kafka (High-Throughput Distributed Messaging)
4. Processing Stream Data
a. Elements of Distributed Stream Data Processing
b. Data Processing with Storm
5. Overview of Data Storage – Requirements
a. Need for long-term storage for a real time processing framework
b. Overview of In-memory Storage
c. No-Sql Storage Systems
d. Choosing a right storage solution
6. Visualizing Data
a. Visualizing Streaming Data – Requirements
b. Tools and Examples
7. Introduction to Stream Processing
a. Bounds of Random variables, Poisson Processors
b. Maintaining Simple Statistics from Data Streams
c. Sliding Windows and computing statistics over sliding windows
d. Data Synopsis (Sampling, Histograms, Wavelets, DFT)
8. Exact Aggregation
a. Timed Counting and Summation
b. Multi Resolution Time Series Aggregation
c. Stochastic Optimization
9. Statistical Approximation to Streaming Data
a. Probabilities and Distributions
b. Working with Distributions
c. Sampling Procedures for Streaming Data
10. Approximating Streaming Data with Sketching
a. Registers and Hash Functions
b. Working with Sets
c. The Bloom Filter
d. Distinct Value Sketches
e. The Count-Min Sketch
11. Advanced Topics
a. Clustering techniques for Streaming Data – Hierarchical Methods
b. Decision Tree (VFDT)
12. Case Studies in Design
Learning Outcomes:
No Learning Outcomes
LO1 Understand the Real time streaming systems and applications
LO2 Understand coordination, configuration systems
LO3 Understand Storage, Processing Systems and use it
LO4 Understand processing methods for streaming methods
LO5 Understanding aggregation and statistical approximations
LO6 Understand approximation techniques for data streams
Part B: Learning Plan
Academic Term
Course Title Stream processing and Analytics
Course No
Lead Instructor S P Vimal
Lecture Plan
Lectur Topics Text /
e# Referenc
e
1 Introduction to Real Time Big Data Systems (Real Time ,
Streaming Data & Sources, Real time streaming system
architecture ,Characteristics of a Real Time Architecture and
Processing )
2 Configuration and Coordination Systems (Motivation,
Distributed State and Issues, Coordination and Configuration
using Apache ZooKeeper )
3 Data Flow Management( Understanding Distributed Data
Flows, Various Data Delivery and Processing Requirements,
N+1 Problem, Apache Kafka )
4-5 Processing Stream Data (Elements of Distributed Stream
Data Processing , Data Processing with Storm )
6 Overview of Data Storage – Requirements (Need for
long-term storage for a real time processing framework,
Overview of In-memory Storage, No-Sql Storage Systems,
Choosing a right storage solution)
Visualizing Data (Visualizing Streaming Data –
Requirements, Tools and Examples)
7-8 Introduction to Stream Processing (Bounds of Random
variables, Poisson Processes, Maintaining Simple Statistics
from Data Streams, Sliding Windows and computing
statistics over sliding windows, Data Synopsis (Sampling,
Histograms, Wavelets, DFT))
9 Exact Aggregation ( Timed Counting and Summation, Multi
Resolution Time Series Aggregation, Stochastic Optimization
10-11 Statistical Approximation to Streaming (Probabilities and
Distributions, Working with Distributions, Sampling
Procedures for Streaming Data)
12 Approximating Streaming Data with Sketching( Registers
and Hash Functions, Working with Sets, The Bloom Filter,
Distinct Value Sketches, The Count-Min Sketch)
13-14 Advanced Topics (Clustering techniques for Streaming Data
– Hierarchical Methods,Decision Tree (VFDT), Fast Pattern
Mining)
15-16 Analytics Case Studies, Review
Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No Name Type Duratio Weigh Day, Date, Session, Time
n t
EC-1 Assignment-1
Assignment-2
EC-2 Mid-Semester Test Closed
Book
EC-3 Comprehensive Open
Exam Book
Notes:
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 16 (contact
hours)
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 32) (contact
hours)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the
latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on
the Elearn portal.
Evaluation Guidelines:
1. EC-1 consists of either two Assignments or three Quizzes. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on the
portal, in a timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
3. For Open Book exams: Use of books and any printed / written reference material
(filed or bound) is permitted. However, loose sheets of paper will not be allowed. Use
of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed.
Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies,
the student should follow the procedure to apply for the Make-Up Test/Exam which
will be made available on the Elearn portal. The Make-Up Test/Exam will be
conducted only at selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self study
schedule as given in the course handout, attend the online lectures, and take all the prescribed
evaluation components such as Assignment/Quiz, Mid-Semester Test and Comprehensive
Exam according to the evaluation scheme provided in the handout.