[go: up one dir, main page]

0% found this document useful (0 votes)
2 views13 pages

Chapter 7 Notes Final

Chapter 7 discusses various communication patterns for data handling in real-time analytics, including Data Sync, RPC, Simple Messaging, and Publish-Subscribe models. It also covers different protocols for data transmission like Webhooks, HTTP Long Polling, Server-Sent Events, and WebSockets, along with their advantages and limitations. Additionally, the chapter explains the importance of filtering data streams to enhance efficiency and focuses on static and dynamic filtering methods.

Uploaded by

ssitavinya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views13 pages

Chapter 7 Notes Final

Chapter 7 discusses various communication patterns for data handling in real-time analytics, including Data Sync, RPC, Simple Messaging, and Publish-Subscribe models. It also covers different protocols for data transmission like Webhooks, HTTP Long Polling, Server-Sent Events, and WebSockets, along with their advantages and limitations. Additionally, the chapter explains the importance of filtering data streams to enhance efficiency and focuses on static and dynamic filtering methods.

Uploaded by

ssitavinya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Chapter 7: Making The Data Available

1) explain the communication pattern to handle the data


Communication patterns are methods used to send and receive data between
different systems, devices, or applications in real-time analytics. These patterns
help in transferring the right data at the right time, ensuring fast and accurate
results. Some important patterns are:
1. Data Sync (Data Synchronization):
• It means keeping the data updated and consistent across multiple systems
or devices.
• Used when the same data is needed in different places at the same time.
• Sync can be real-time (instant update) or periodic (update after fixed time).
• Example: Google Drive syncing files across devices.
2. Remote Procedure Call (RPC) and Remote Method Invocation (RMI):
• RPC allows one program to call a function in another program over a
network.
• RMI is similar but used in Java, where one object can call a method in another
object remotely.
• These are useful in client-server applications and distributed systems.
3. Simple Messaging:
• Data is sent in the form of messages (like JSON or XML).
• Works well in systems where one part sends data and another receives it.
• Used in IoT, mobile apps, or notifications.
4. Publish-Subscribe (Pub/Sub):
• In this model, publishers send messages and subscribers receive messages
based on topics.
• They don’t need to know about each other directly.
• Used in live chat apps, news feeds, and event-based systems like Kafka or
MQTT.
2) explain
(i) data sync
(ii) remote method invocation and remote procedure call
(iii) simple messaging
(iv) publish subscribe
(i) Data Sync (Data Synchronization)
Definition:
Data sync means keeping data updated and consistent across multiple systems,
devices, or applications.

Benefits:
• Ensures real-time consistency between devices.
• Helps in collaborative work (e.g., Google Docs).
• Supports offline to online data updates.
Drawbacks:
• Real-time sync can consume more network resources.
• Risk of data conflicts when changes happen at the same time.
• Needs proper error handling and conflict resolution.
(ii) Remote Method Invocation (RMI) and Remote Procedure Call (RPC)
Definition:
• RPC allows a program to call a function on a remote server as if it were local.
• RMI is similar but used in Java, where one object can call another object
remotely.

Benefits:
• Makes distributed computing easier.
• Hides the complexity of network communication.
• Encourages modular programming.
Drawbacks:
• Can be slow due to network delay.
• Security risks if exposed to public networks.
• Error handling is more complex than local calls.
(iii) Simple Messaging
Definition:
Simple messaging is the exchange of data as messages using standard formats (like
JSON, XML) through HTTP or MQTT.
Benefits:
• Easy to implement and understand.
• Works well for lightweight data transfer.
• Good for mobile and IoT applications.
Drawbacks:
• Not suitable for large or real-time data.
• Doesn’t guarantee message delivery unless handled manually.
• Can cause delays if using polling.
(iv) Publish-Subscribe (Pub/Sub)
Definition:
In Pub/Sub, publishers send messages to a central system, and subscribers get
updates based on topics they’re interested in.
Benefits:
• Highly scalable for large systems.
• Loosely coupled – publisher and subscriber don’t need to know each other.
• Supports real-time data distribution.
Drawbacks:
• Complex to implement if there are many topics/subscribers.
• Harder to debug and trace errors.
• Needs a reliable message broker (like Kafka, RabbitMQ).
3) Different protocols used to send data to the client
Connection
Protocol Direction Best Use-Case Key Limitation
Type

One-way
Stateless HTTP Server-to-server No continuous
Webhooks (event-
POST notifications streaming
triggered)

One-way Semi-
Basic real-time
Long Polling (client- persistent High server load
chat, updates
initiated) HTTP

One-way
Persistent Live feeds, Text-only, limited
SSE (server-
HTTP notifications mobile support
initiated)

Two-way (full- Gaming, live chat, Slightly complex


WebSockets Persistent TCP
duplex) dashboards to implement

Webhooks: Trigger callbacks via HTTP POSTs to client endpoints upon specific
events. Lightweight and good for server-to-server communication but lacks
reliability guarantees.
- HTTP Long Polling: Maintains a semi-persistent connection. The client sends a
request, and the server holds it until data is available or a timeout occurs. It
emulates real-time communication but isn't as efficient.
- Server-Sent Events (SSE): An HTTP-based protocol that enables the server to push
events to the client over a persistent connection. Ideal for one-way communication
like notifications.
- WebSockets: Establishes a full-duplex TCP connection after an HTTP handshake.
Allows bidirectional communication and is optimal for applications requiring
continuous interaction, such as gaming or chat systems.
4) explain
(i) web hooks
(ii) http long polling
(iii) server-sent events
(iv) web sockets
(i) Webhooks
Webhooks are automated HTTP callbacks triggered by specific events in a system.
For example, when a user signs up, a webhook can send that data to another
system. Webhooks are efficient because they only send data when an event
happens, reducing the need for constant checking. However, they require the
receiving system to be available at the time of the event.

(ii) HTTP Long Polling


HTTP Long Polling is a technique where the client requests data from the server,
and the server keeps the request open until new data is available. Once data is sent,
the client immediately sends another request. This simulates real-time
communication but can be inefficient as it uses up server resources with each open
request, especially when handling many clients.
(iii) Server-Sent Events (SSE)
SSE allows the server to push updates to the client over a single HTTP connection.
It’s one-way communication, meaning the server sends updates to the client, but
the client can’t send data back through the same connection. It’s great for real-time
updates like news feeds or stock prices. While simple to implement, it’s limited to
one-way communication.
SSE data flow with a connected client:

SSE showing connectionless data flow:


(iv) WebSockets
WebSockets provide full-duplex communication, meaning both the client and
server can send data to each other in real-time over a single, persistent connection.
It’s ideal for applications like messaging or online gaming where continuous, low-
latency communication is needed. However, it’s more complex to set up compared
to the other methods.
5) explain filtering the stream
Filtering a stream means processing incoming data to only pass through the
important information based on certain rules or conditions. This way, only the
useful data reaches the systems that need to process it, helping to reduce
unnecessary load and improve efficiency.
1. Reducing Noise in Data
Streams can often contain irrelevant or noisy data that isn’t useful for
analysis. Filtering removes this "noise," leaving behind only the meaningful
data. For example, if a sensor malfunctions and sends random, out-of-range
readings, filters can remove those to ensure the data is clean and reliable.
2. Improving Performance
By passing only relevant data forward, you reduce the load on downstream
systems. Instead of dealing with a huge amount of unimportant data, the
system can focus on what truly matters, making it run faster and more
efficiently.
3. Focusing on Actionable Data
Filtering helps prioritize data that needs immediate attention. For example,
in an alarm system, you might only want to focus on alerts that exceed a
certain severity threshold. This way, your system can react to what's truly
important, rather than being overwhelmed by every small change.
4. Managing Bandwidth and Storage
In environments with limited resources, such as remote sensors or edge
devices, filtering is essential. By only sending the data that matters, you
save on bandwidth and storage space, making the system more cost-
effective and reducing strain on resources.
6) describe where the filtering is used
Filtering is a process used to remove unnecessary data and focus on what really
matters. It’s used in various real-world applications to make systems more
efficient and effective. Here’s how filtering is used in different fields:
1. Social Media
On social media, there’s a lot of content being posted every second. Filtering is used
to remove spam or inappropriate posts, like fake news or harmful comments,
ensuring that users only see safe and relevant content.
2. E-Commerce
E-commerce websites have thousands of products. Filtering helps recommend
products to users based on what they like or have purchased before. For example,
if someone often buys electronics, the site will show them more electronics and
filter out unrelated products like clothing.
3. IoT and Sensors
IoT systems collect huge amounts of data from devices like smart thermostats or
security cameras. Filtering helps to focus only on important changes, such as a
temperature spike above 100°C, and ignores normal data, like a thermostat showing
72°F. This way, the system only alerts when something unusual happens.
4. Finance
In finance, filtering helps traders focus on important stock market data. With so
many updates happening every second, filtering helps to find key market
movements or trends that are relevant to a trader’s strategy, ignoring the noise.
5. Log Management
IT systems produce lots of logs, but not all logs are important. Filtering is used to
find critical errors or security issues and ignore less important ones, making it easier
for IT teams to focus on fixing problems quickly.
6. Healthcare
In healthcare, monitoring devices track a patient’s vital signs like heart rate or blood
pressure. Filtering helps focus on abnormal patterns, such as a dangerously high
heart rate, and ignore normal fluctuations. This helps doctors and nurses act quickly
in emergencies.
7) explain the different types of filtering the stream(static and dynamic)
Feature Static Filtering Dynamic Filtering

Pre-defined and
Rule Definition Based on real-time context or input
fixed

Flexibility Low High

Medium to High (depends on


Performance High (lightweight)
complexity)

Adaptability Not adaptable Adaptable to user, time, or data patterns

Example Use-
System logs filtering Personalized content recommendation
case

In real-time data streaming systems, filtering plays an important role in selecting


only the relevant information from a large flow of data. Without filtering, the
system may become overloaded with unnecessary data, leading to poor
performance and delayed insights.
Filtering helps reduce the volume of data, improves performance, and focuses on
the most meaningful events or messages. There are mainly two types of stream
filtering: Static Filtering and Dynamic Filtering. Both serve different purposes and
are often used together in modern systems.
1. Static Filtering
Static filtering is based on fixed rules or predefined conditions that do not change
once they are set. These rules are usually configured by system administrators or
developers at the design time of the application.
Key Characteristics:
• The filter conditions are hardcoded or set in configuration.
• It does not change dynamically based on the environment or user behavior.
• It is simple to implement and has low overhead.
Advantages:
• Fast and efficient.
• Easy to understand and maintain.
• Works well for consistent, known conditions.
Limitations:
• Lacks flexibility. Cannot adapt to new situations.
• Not suitable for personalized or intelligent filtering.
2. Dynamic Filtering
Dynamic filtering is more advanced. It adapts the filtering logic based on changing
data patterns, user input, context, or real-time analysis. It is commonly used in
systems that deal with personalized experiences or complex data environments.
Key Characteristics:
• Filtering rules are calculated or updated at runtime.
• May use external input, machine learning, or user behavior to decide which
data to filter.
• Highly flexible and powerful.
Advantages:
• Highly customizable and intelligent.
• Can adapt to changing environments.
• Enables personalized filtering in apps and dashboards.
Limitations:
• More complex to design and implement.
• May require more resources (CPU, memory).
• Needs careful tuning and real-time evaluation logic.

You might also like