Apache Nifi

Apache NiFi is an open source software project that automates data flows between systems. It uses a web-based user interface and flow-based programming to manage data flows in real time. NiFi was created by the US National Security Agency and allows users to ingest, transform, and move data between systems in a secure, governed manner.

Uploaded by

manideep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

431 views9 pages

Apache Nifi

Uploaded by

manideep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

APACHE NIFI

Apache NiFi is an open source project which enables the

automation of data flow between systems, known as "data
logistics". The project is written using flow-based programming
and provides a web-based user interface to manage data flows in
real time.
The project was created by the United States National Security
Agency (NSA), originally named Niagarafiles.
NiFi is a single ingestion platform that gives you out-of-the-box
tools to ingest several data sources in a secure and governed
manner is a real differentiator.

NiFi is implemented in the Java programming language and allows

extensions (processors, controller services, and reporting tasks)
to be implemented in Java. In addition, NiFi supports processors
that execute scripts written in Groovy, Jython, and several other
popular scripting languages.
NiFi provides many processors out of the box (293 in Nifi 1.9.2).
You’re on the shoulders of a giant. Those standard processors
handle the vast majority of use cases you may encounter.

Apache NiFi Features:

 Web base UI
 Highly Configurable
 Data Provenance
 Extensible

NiFi Architecture:

NiFi executes within a JVM on a host operating system/cluster

system.
So, the data that the repos in NiFi can handle or maintain fairly
depends in the machine capacity (JVM capacity)
The primary components of NiFi on the JVM are:
Web Server:
The purpose of the web server is to host NiFi’s HTTP-based
command and control API.
Flow Controller:
The flow controller is the brains of the operation. It allocates the
threads on a node to the processors or even manages nodes in a
cluster based NiFi.
Extensions are also processors that are connected in the
UI(canvas)
Flow File:
A flow file has two parts – attributes and content
Attributes have the information like filename, file path and uuid on
default and any variables(attributes) created using update attribute
processor adds up to the FlowFile.
Content has the data that the FlowFile carries.
In real, the content has just the reference to the stream of
data that sits down in the content repository and that is one of the
reasons the throughput is high.
When a processor modifies the content of a FlowFile, the previous
data is kept. NiFi copies-on-write, it modifies the content while
copying it to a new location. The original information is left intact
in the Content Repository and the reference from the flow file
content is pointed to new data(modified data).

Repositories in NiFi:
Flow File Repository: Known to be write-Ahead Log
Flow file repo stores FlowFile attributes, pointer to the content and
state of FlowFile
The FlowFile Repository acts as NiFi’s Write-Ahead Log, so as the
FlowFiles are flowing through the system, each change is logged in
the FlowFile Repository before it happens as a transactional unit of
work. This allows the system to know exactly what step the node is
on when processing a piece of data. If the node goes down while
processing the data, it can easily resume from where it left off upon
restart.
A snapshot is automatically taken periodically by the system,
which creates a new snapshot for each FlowFile. The system
computes a new base checkpoint by serializing each FlowFile in the
hash map and writing it to disk with the filename ".partial". As the
checkpointing proceeds, the new FlowFile baselines are written to
the ".partial" file. Once the checkpointing is done the old
"snapshot" file is deleted and the ".partial" file is renamed
"snapshot".
Content Repository:
The Content Repository is where the actual content bytes of a
given FlowFile live.
Provenance Repository:The Provenance Repository is where all
provenance event data is stored. Every time a FlowFile is modified,
NiFi takes a snapshot of the FlowFile and its context at this point.
The name for this snapshot in NiFi is a Provenance Event.
Provenance enables us to retrace the lineage of the data and
build the full chain of custody for every piece of information
processed in NiFi. On top of offering the complete lineage of the
data, the Provenance Repository also offers to replay the data
from any point in time
The main difference between flowfile repo and provenance repo is
that flow file repo has the latest state of in use flowfiles , whereas
provenance repo has the complete life cycle of every FlowFile that
has been in the flow.
flow.xml.gz:
Everything the DFM puts onto the NiFi User Interface canvas is
written, in real time, to one file called the flow.xml.gz. This file is
located in the nifi/conf directory by default. Any change made on
the canvas is automatically saved to this file. NiFi automatically
creates a backup copy of this file in the archive directory when it
is updated. You can use these archived files to rollback flow
configuration. To do so, stop NiFi, replace flow.xml.gz with a
desired backup copy, then restart NiFi.
NiFi is also able to operate within a cluster.

Starting with the NiFi 1.0 release, a Zero-Master Clustering

paradigm is employed. Each node in a NiFi cluster performs the
same tasks on the data, but each operates on a different set of
data. Apache ZooKeeper elects a single node as the Cluster
Coordinator, and failover is handled automatically by ZooKeeper.
All cluster nodes report heartbeat and status information to the
Cluster Coordinator. The Cluster Coordinator is responsible for
disconnecting and connecting nodes. Additionally, every cluster
has one Primary Node, also elected by ZooKeeper. As a DataFlow
manager, you can interact with the NiFi cluster through the user
interface (UI) of any node. Any change you make is replicated to all
nodes in the cluster, allowing for multiple entry points.
Processor:
NiFi comes with many processors when you install it. If you don’t
find the perfect one for your use case, it’s still possible to build
your own processor
A processor is a component of the NiFi that listen for incoming data
, pull data or publish data to external sources or makes
transformation depending on the type of processor it is. A group of
processors put together with their connections form a process
group.

Types of processors:
 data ingestion
 data transform
 data egress/sending data
 routing and mediation
 database access
 attribute extraction
 system interaction
 splitting and aggregation
 http and udp
 AWS processors

Steps to create a custom processor:

https://medium.com/hashmapinc/creating-custom-processors-and-
controllers-in-apache-nifi-e14148740ea
1) Install maven
2) $ mvn archetype:generate
3) Apply filter nifi
4) Select option 1 to create new processor
5) Choose the version number

Scaling in NiFi:
For each processor, you can specify the number of concurrent
tasks you want to run simultaneously. Like this, the Flow
Controller allocates more resources to this processor, increasing
its throughput. Processors share threads. If one processor
requests more threads, other processors have fewer threads
available to execute
Another way to scale is to increase the number of nodes in your
NiFi cluster. This is called horizontal scaling.

Backpressure in NiFi:
It is a concept where-in if the quantity of FlowFiles go beyond
threshold then the Flow Controller won’t schedule the previous
processor(upstream) to run again until there is room in the queue.
The threshold limits are set in the queue that connects the two
processors. The thresholds are based on either count of files or
the memory that the files occupy.
For example if the Object Threshold is set to 100 and initially 80
files flow from processor-1 to processor-2 , the flow continues as it
is less than threshold but now in the next instance if the 30
FlowFiles flow from p-1 to p-2 the queue shows up the warning
but allows the FlowFiles to flow but the Flow Controller will not
schedule the p-1 until the files moves from p-2 to downstream
processors(files in queue should be less than the Object
Threshold).
The configuration settings for a queue has the following options to
set the threshold limits..
Load Balancing Strategy(Used for cluster set up)
Round Robin:
It distributes the FlowFiles among all available nodes in
round robin fashion. If any particular node is disconnected or
dropped then the FlowFiles that are in queue for that node are re-
distributed among available nodes
Single Node:
All the FlowFiles are routed to a single node and incase that
particular node goes down then the FlowFiles will be in queue
waiting for the node to start.
Partition by Attribute:
The FlowFiles are distributed based on the Attribute value
that you pass to the argument. If load balancing strategy is set to
Partition by Attribute and no attribute value is set then it assumes
that attribute value is NULL and all the FlowFiles that do not have
that Attribute are sent to same node.
Scheduling strategy:
 Timer driven – (viz; 5 sec, 2 min, 1 days)
 Event driven – (implementation stage as of now)
 CRON driven – (*****?) (sec min hrs DOM mon DOW year)
* - supports any format
? - only (DOM or DOW)
L- specify last occurrence of the day in a month
1L means last Sunday of a month. (1-7 Sun-Sat)

Penalization:
During the normal course of processing a piece of data (a FlowFile),
an event may occur that indicates that the data cannot be
processed but the data may be processable at a later time. When
this occurs, the Processor may choose to Penalize the FlowFile. This
will prevent the FlowFile from being Processed for some period of
time.

Yield:
Processor may determine that some situation exists such that the
Processor can no longer make any progress, regardless of the data
that it is processing. In such cases the processor should yield for
some time. By doing this, the Processor is telling the framework
that it should not waste resources triggering this Processor to run,
because there's nothing that it can do - it's better to use those
resources to allow other Processors to run .

Nifi 210415 Exercise Manual
100% (1)
Nifi 210415 Exercise Manual
140 pages
Apple Power Mac G5 Quad 2 5 Dual 2 0 2 3 GHZ Service Repair Manual
91% (11)
Apple Power Mac G5 Quad 2 5 Dual 2 0 2 3 GHZ Service Repair Manual
163 pages
Case Study Data Science Business
100% (1)
Case Study Data Science Business
805 pages
Percona Monitoring and Management Documentation: Date .Getfullyear )
No ratings yet
Percona Monitoring and Management Documentation: Date .Getfullyear )
589 pages
O Level P1 Topical Haseeb Gilani 2023
No ratings yet
O Level P1 Topical Haseeb Gilani 2023
35 pages
Operators
No ratings yet
Operators
223 pages
Coursera Enterprise Catalog - Master
No ratings yet
Coursera Enterprise Catalog - Master
1,702 pages
What Is Apache NiFi
No ratings yet
What Is Apache NiFi
4 pages
Mediawiki Colection
No ratings yet
Mediawiki Colection
241 pages
Data Ingestion Using Nifi: Quick Overview
No ratings yet
Data Ingestion Using Nifi: Quick Overview
24 pages
Chapter 3 - Solving Problems by Searching
No ratings yet
Chapter 3 - Solving Problems by Searching
120 pages
BAI601
No ratings yet
BAI601
2 pages
Buffer Overflow
No ratings yet
Buffer Overflow
22 pages
Spring Cloud Dataflow Reference
No ratings yet
Spring Cloud Dataflow Reference
130 pages
AmazonCloudFront DevGuide
100% (1)
AmazonCloudFront DevGuide
517 pages
Elastic Search Tutorial
No ratings yet
Elastic Search Tutorial
152 pages
MN FPXH User Basic Pidsx en
No ratings yet
MN FPXH User Basic Pidsx en
218 pages
Solace Essentials Activity Guide
No ratings yet
Solace Essentials Activity Guide
38 pages
Linux-Commands and NIFI Template
0% (1)
Linux-Commands and NIFI Template
110 pages
Documentation: Percona Technical Documentation Team
No ratings yet
Documentation: Percona Technical Documentation Team
242 pages
PI Web API 2016 R2 Release Notes
No ratings yet
PI Web API 2016 R2 Release Notes
11 pages
ONVIF RecordingControl Service Spec
No ratings yet
ONVIF RecordingControl Service Spec
35 pages
Monitoring
No ratings yet
Monitoring
43 pages
Finding Algorithms of Products Gift Codes Serials Etc
100% (1)
Finding Algorithms of Products Gift Codes Serials Etc
5 pages
Apache Calcite Tutorial
No ratings yet
Apache Calcite Tutorial
83 pages
Talend ESB Container AG 50b en
No ratings yet
Talend ESB Container AG 50b en
63 pages
AWS All Questions
No ratings yet
AWS All Questions
65 pages
A Beginner-Friendly Introduction To Kubernetes - by David Chong - Towards Data Science
100% (1)
A Beginner-Friendly Introduction To Kubernetes - by David Chong - Towards Data Science
20 pages
Software Assignment 1
No ratings yet
Software Assignment 1
7 pages
Installing and Using Impala
No ratings yet
Installing and Using Impala
248 pages
2016 05 10 Apache Nifi Deep Dive 160511170654
No ratings yet
2016 05 10 Apache Nifi Deep Dive 160511170654
34 pages
Name-Khushi Mehta Class-10 Section-A Subject-Computer Application School - Mussoorie International School Topic - 20 Programs On Bluej
No ratings yet
Name-Khushi Mehta Class-10 Section-A Subject-Computer Application School - Mussoorie International School Topic - 20 Programs On Bluej
26 pages
Aws Perspective
No ratings yet
Aws Perspective
70 pages
Ansible 2
No ratings yet
Ansible 2
15 pages
How To Configure SAP Purchase Order Release Strategy
No ratings yet
How To Configure SAP Purchase Order Release Strategy
12 pages
The Docker Handbook: by Anand Nevase
No ratings yet
The Docker Handbook: by Anand Nevase
57 pages
Integrating Apache Nifi With External API's
No ratings yet
Integrating Apache Nifi With External API's
4 pages
MS-Project 2003 Training Slides - DR Mohan
No ratings yet
MS-Project 2003 Training Slides - DR Mohan
127 pages
English Please 11-Student S
No ratings yet
English Please 11-Student S
3 pages
Scaladayslambda Architecture Spark Cassandra Akka Kafka 150609194508 Lva1 App6891 PDF
No ratings yet
Scaladayslambda Architecture Spark Cassandra Akka Kafka 150609194508 Lva1 App6891 PDF
100 pages
NNs PDF
No ratings yet
NNs PDF
16 pages
AWS IAM Notes
No ratings yet
AWS IAM Notes
12 pages
Oracle Netapp Best Practices
No ratings yet
Oracle Netapp Best Practices
47 pages
04 Resource Monitoring
100% (1)
04 Resource Monitoring
35 pages
01-Docker - 02 - Install Docker Desktop On Windows
No ratings yet
01-Docker - 02 - Install Docker Desktop On Windows
6 pages
Experiences Running Apache Flink at Very Large Scale: @stephanewen Berlin Buzzwords, 2017
No ratings yet
Experiences Running Apache Flink at Very Large Scale: @stephanewen Berlin Buzzwords, 2017
76 pages
### Promethus Counter. Adding Prometheus To A FastAPI App - Python - by Carlos Armando Marcano Vargas - Python in Plain English
No ratings yet
### Promethus Counter. Adding Prometheus To A FastAPI App - Python - by Carlos Armando Marcano Vargas - Python in Plain English
15 pages
Johnson, Gabbrielle (2020) - Algorithmic Bias - On The Implicit Biases of Social Technology (Synthese) .2up
No ratings yet
Johnson, Gabbrielle (2020) - Algorithmic Bias - On The Implicit Biases of Social Technology (Synthese) .2up
11 pages
Building Data Pipelines - 4
No ratings yet
Building Data Pipelines - 4
38 pages
Mark Zuckerberg
No ratings yet
Mark Zuckerberg
8 pages
Mastering JBoss Drools 6 - Sample Chapter
No ratings yet
Mastering JBoss Drools 6 - Sample Chapter
26 pages
ADF Syllabus
No ratings yet
ADF Syllabus
8 pages
Cheat Sheet
No ratings yet
Cheat Sheet
21 pages
JTC Series Lifting Screw Jack, Lifting Screw Drive Spindle, Lifting Jack Gear, Screw Type Lifting Jacks, Jack Screw Assembly For Lifting, Lifting With Acme Jack Screws, Mechanical Lifting Jacks
No ratings yet
JTC Series Lifting Screw Jack, Lifting Screw Drive Spindle, Lifting Jack Gear, Screw Type Lifting Jacks, Jack Screw Assembly For Lifting, Lifting With Acme Jack Screws, Mechanical Lifting Jacks
39 pages
Practical Microservice Security: Laura Bell
No ratings yet
Practical Microservice Security: Laura Bell
66 pages
Recruitment and Selection Lecture Notes
No ratings yet
Recruitment and Selection Lecture Notes
54 pages
5 Aws Websites For Hands-On Projects PDF
No ratings yet
5 Aws Websites For Hands-On Projects PDF
8 pages
Docker Containers For Wireless Networks Explained
No ratings yet
Docker Containers For Wireless Networks Explained
13 pages
A New Era of Cybersecurity The Influence of Artificial Intelligence
No ratings yet
A New Era of Cybersecurity The Influence of Artificial Intelligence
4 pages
Adding Observability To A Kubernetes Cluster Using Prometheus - by Martin Hodges - Jan, 2024 - Medium
No ratings yet
Adding Observability To A Kubernetes Cluster Using Prometheus - by Martin Hodges - Jan, 2024 - Medium
2 pages
CNCF Webinar - Kubernetes 1.16 PDF
No ratings yet
CNCF Webinar - Kubernetes 1.16 PDF
48 pages
SUM45N25
No ratings yet
SUM45N25
6 pages
307 SHYAM Assignment 1
No ratings yet
307 SHYAM Assignment 1
4 pages
Docker Kubernetes Made Easy Interactive Ebook FINAL
No ratings yet
Docker Kubernetes Made Easy Interactive Ebook FINAL
7 pages
Session 14 Alerting
No ratings yet
Session 14 Alerting
24 pages
Going Serverless With AWS 2
No ratings yet
Going Serverless With AWS 2
23 pages
What Are Radio Access Networks (RAN) - Complete Overview
No ratings yet
What Are Radio Access Networks (RAN) - Complete Overview
5 pages
Streamsets: By: Avleen Kaur
No ratings yet
Streamsets: By: Avleen Kaur
23 pages
Drupal and Container Orchestration - Using Kubernetes To Manage All The Things
No ratings yet
Drupal and Container Orchestration - Using Kubernetes To Manage All The Things
21 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Arunachal Pradesh
No ratings yet
Arunachal Pradesh
3 pages
Simatic Hmi Wincc V6.2 Sp2 Installation Notes
No ratings yet
Simatic Hmi Wincc V6.2 Sp2 Installation Notes
42 pages
Aws RDS-2
No ratings yet
Aws RDS-2
8 pages
Click Start TB 6 Answer Key
100% (4)
Click Start TB 6 Answer Key
13 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
Admit Card Aryan
No ratings yet
Admit Card Aryan
1 page
AWS Athena Knowledgebase
No ratings yet
AWS Athena Knowledgebase
4 pages
Nifi Expression Language Cheat Sheet
100% (1)
Nifi Expression Language Cheat Sheet
2 pages
Cambridge IGCSE: Computer Science 0478/22
No ratings yet
Cambridge IGCSE: Computer Science 0478/22
12 pages
Explain Terraform vs. Other Software
No ratings yet
Explain Terraform vs. Other Software
5 pages
Metasploit
No ratings yet
Metasploit
4 pages
Expert Days 2018: SUSE Enterprise Storage
No ratings yet
Expert Days 2018: SUSE Enterprise Storage
15 pages
ASTM D1143 Description
0% (1)
ASTM D1143 Description
3 pages
ACS180 Flyer
No ratings yet
ACS180 Flyer
2 pages
15 Reasons To Use Redis As An Application Cache: Itamar Haber
No ratings yet
15 Reasons To Use Redis As An Application Cache: Itamar Haber
9 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
For Speed and Agility
No ratings yet
For Speed and Agility
14 pages
K8S Architecture: Kubelet Kube-Proxy Kubelet Kube-Proxy K8s API Server
No ratings yet
K8S Architecture: Kubelet Kube-Proxy Kubelet Kube-Proxy K8s API Server
2 pages
What Is A Load Balancer
No ratings yet
What Is A Load Balancer
3 pages
Nikhil Doddad Resume
No ratings yet
Nikhil Doddad Resume
2 pages

Apache Nifi

Uploaded by

Apache Nifi

Uploaded by

APACHE NIFI

Apache NiFi is an open source project which enables the

NiFi is implemented in the Java programming language and allows

Apache NiFi Features:

NiFi executes within a JVM on a host operating system/cluster

Starting with the NiFi 1.0 release, a Zero-Master Clustering

Steps to create a custom processor:

You might also like