BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital
Part A: Content Design
Course Title STREAM PROCESSING AND ANALYTICS
Course No(s) DSECL ZC556
Credit Units 5
Credit Model
Content Authors PRAVIN PAWAR
Course Description
Data is moving at very rapid space because of which necessarily of scalable systems capable of
processing and analyzing this fast, streaming data has arisen. This course introduces the students with
the architecture of streaming data processing systems. This course also enables students to understand
the complete end-to-end solution for cost-effective analysis and visualization of streaming data with
the help of various open source solutions available in this space. This course also helps students to
learn the implementation and application of algorithms and data structures required for the streaming
applications. Advanced streaming applications like Streaming SQL, Streaming Machine Learning will
be discussed at proper length.
Course Objectives
No
CO1 To introduce the applications of streaming data systems
CO2 To introduce the architecture of streaming data systems
CO3 To introduce the algorithmic techniques used in streaming data systems
CO4 To present survey of tools and techniques required for streaming data analytics
Text Book(s)
T1 Streaming Data: Understanding The Real-Time Pipeline, Andrew G.Psaltis, 2017,
Manning Publications
T2 Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data, Byron
Ellis, 2014, Wiley
Reference Book(s) & other resources
R1 Big Data – Principles and best practices of scalable real-time data systems,
Nathan Marz, James Warren, 2017, Manning Publications
R2 Designing Data Intensive Applications, Martin Kleppmann, O’Reilly
Page | 1
Learning Outcomes:
No Learning Outcomes
LO1 Understand the components of streaming data systems with their capabilities and
characteristics
LO2 Learn the relevant architecture and best practices for processing and analysis of
streaming data
LO3 Gain knowledge about the development of system for data aggregation, delivery
and storage using Open source tools
LO4 Get familiarity with the advance streaming applications like Streaming SQL,
Streaming machine learning
Part B: Learning Plan
Academic Term II Semester 2019 -2020
Course Title STREAM PROCESSING AND ANALYTICS
Course No DSECL ZC556
Lead Instructor Prof. Maninder Singh Bawa
Glossary of Terms
Module M Module is a standalone quantum of designed content. A typical course is
delivered using a string of modules. M2 means module 2.
Contact Hour CH Contact Hour (CH) stands for a hour long live session with students
conducted either in a physical classroom or enabled through
technology. In this model of instruction, instructor led sessions will
be for 32 CH.
Recorded RL RL stands for Recorded Lecture or Recorded Lesson. It is presented to the
Lecture student through an online portal. A given RL unfolds as a sequences of
video segments interleaved with exercises.
Lab Exercises LE Lab exercises associated with various modules
Self-Study SS Specific content assigned for self study
Homework HW Specific problems/design/lab exercises assigned as homework
Page | 2
Modular Structure
No. Title of the Module
M1 Scalable Streaming Data Systems
M2 Streaming Data Systems Architecture
M3 Streaming Data Frameworks
M4 Streaming Analytics
M5 Advanced Streaming Applications
Detailed Lecture Plan
M1: Scalable Streaming Data Systems
Session 1 to 3 / Contact Hour 1 - 6
Time Type Description/Plan Reference
Session 1 CH1 Thinking about Data Systems R1 Ch1
Reliable, Scalable and Maintainable Data Applications
Properties of Data R2 Ch2
CH2 Scaling with the traditional databases R2 Ch1
Big Data Systems
Desired properties of Big Data Systems
Session 2 CH3 Data Model for Big Data R2 Ch2
Generalized Big Data System Architecture Class Notes
CH4 Real time systems T1 Ch1
Difference between Batch processing and Stream Class Notes
Processing
Difference between real time and streaming systems
Session 3 CH5 Streaming Data Applications Class Notes
Databases and Streams R1 Ch11
Usage patterns of Streaming Data Class Notes
CH6 Sources of Streaming Data T2 Ch1
Complex Event Processing Systems Class Notes
Post CH SS Explore more on the non functional requirements of Data Intensive
Applications
Non-functional Requirements for Real World Big Data Systems
IBM Big Data & Analytics RA_V1
Explore more on the differences between the batch processing and
streaming data applications
Batch vs Real time data processing
Page | 3
Identify the use cases of Complex Event Processing Systems
What is stream processing?
complex-event-processing
M2: Streaming Data Systems Architecture
Session 4 to 7 / Contact Hour 7 - 14
Time Type Description/Plan Reference
Session 4 CH7 Generalized Streaming Data Architecture T1 Ch 1
T1 Ch 2
CH8 Lambda Architecture Class Notes
Kappa Architecture
Session CH9 Streaming Data system Component T2 Ch2
5-6 Features of Real time Architecture
A real time architecture checklist
CH 10 Service Configuration and Coordination Systems T2 Ch3
Maintaining the state
Apache ZooKeeper
CH 11 Data Flow Manager T2 Ch4
Managing distributed data flows
CH 12 Apache Kafka T2 Ch4
Kafka Docs
Session CH13 Streaming Data Processor Concepts T2 Ch 5
7-8 Timing Concepts T1 Ch 5
CH14 Windowing T1 Ch5
Joins R1 Ch11
CH15 Storage for Streaming Data T2 Ch6
NoSQL storage Systems
Choosing a Storage technology
CH16 Delivery of Streaming Metrics T2 Ch7
Post CS SS Explore in detail about issues with Lambda Architecture
questioning-the-lambda-architecture
a-brief-introduction-to-two-data-processing-
architectures
Explore the Java APIs exposed by following systems
Apache ZooKeeper
Page | 4
Apache Kafka
Explore the data models of NoSQL data systems
MongoDB
Cassandra
M3: Streaming Data Frameworks
Session 8 to 11 / Contact Hour 15 - 22
Time Type Description/Plan Reference
Session 8 CH 15 Key features of Streaming Data Frameworks Class Notes
Survey of Streaming Data Systems
CH 16 Apache Spark Streaming Spark Streaming
Guide
Session 9 CH 17 Apache Flink Flink Docs
Apache Samza Samza Docs
CH 18 Apache Kafka Streaming Kafka Streaming
Guide
Session CH 19 Apache Storm Architecture Storm Docs
10
CH 20 Apache Storm Concepts T2 Ch 5
Apache Storm Groupings
Session CH 21 Apache Storm Running Example Storm Docs
11
CH 22 Storm – Kafka Integration Example Class Notes
Post CH SS Compare the different streaming data platforms and
identify the use cases for which they are suitable
Implement the streaming data pipeline using the Kafka Kafka Streaming
Streaming library Guide
Implement a streaming data application with Spark Spark Streaming
streaming Guide
Page | 5
M4: Streaming Analytics
Session 12 to 13 / Contact Hour 23 - 26
Time Type Description/Plan Reference
Session CH 23 Exact Aggregation of Streaming Data T2 Ch 8
12 Time Series Analysis
CH 24 Quantization Framework T2 Ch8
Stochastic Optimization
Session CH 25 Registers and Hash Functions T2 Ch 10
13 The Bloom Filter
CH 26 Distinct Value Sketches T2 Ch 10
The Count-Min Sketch
Post CH SS Study illustrations for Streaming data concepts Class Notes
Explore algorithms for aggregation of streaming data
Explore more about the streaming data processing
algorithms for exact results
M5: Advanced Streaming Applications
Session 14 to 15 / Contact Hour 27 - 30
Time Type Description/Plan Reference
Session CH25 Necessity of Streaming SQL Streaming SQL
14 Streaming SQL : Windows Blog
Streaming SQL : Joins
Streaming SQL : Patterns
CH26 Apache Storm support for Streaming SQL storm-sql
Apache Flink support for Streaming SQL flink-stream-sql
Streaming SQL for Apache Kafka Kafka Streaming
SQL
Session CH27 Models for Streaming Data - Linear models T2 Ch 11
15 Models for Streaming Data - Logistic Regression models
CH 28 Forecasting with Models - Exponential Smoothing T2 Ch 11
methods
Forecasting with Models - Regression methods
Session CH 29 Streaming ML Frameworks I structured-
15 streaming-ml
CH 30 Streaming ML Frameworks II
Page | 6
Post CH SS Get familiarized with Streaming SQL tools
storm-sql
Kafka Streaming SQL
Build and deploy machine learning models using Spark
structured streaming
structured-streaming-ml
Session 16 / Contact Hour 31 - 32
Time Type Description/Plan Reference
Session CH31 Review of Streaming Data Systems and Architectures CH 1 to 16
16
CH32 Review of Streaming Data Techniques and Applications CH 17 to 32
Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No Name Type Duration Weight Day, Date, Session, Time
EC-1 Assignment-1 Take-home, - 10% TBD
Assignment-2 Programming - 15% TBD
and use of
platforms
Quiz-1 Online 30 mins 5 TBD
EC-2 Mid-Semester Test Closed Book 2 hours 30% TBD
EC-3 Comprehensive Open Book 3 hours 40% TBD
Exam
Notes:
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 8 (contact hours 1 to 16)
Syllabus for Comprehensive Exam (Open Book): All topics
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the
latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on
the Elearn portal.
Evaluation Guidelines:
1. EC-1 consists of either two Assignments or three Quizzes. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on the
portal, in a timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
3. For Open Book exams: Use of books and any printed / written reference material
(filed or bound) is permitted. However, loose sheets of paper will not be allowed. Use
of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed.
Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies,
the student should follow the procedure to apply for the Make-Up Test/Exam which
will be made available on the Elearn portal. The Make-Up Test/Exam will be
conducted only at selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self
Page | 7
study schedule as given in the course handout, attend the online lectures, and take all the
prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and
Comprehensive Exam according to the evaluation scheme provided in the handout.
Page | 8