[go: up one dir, main page]

0% found this document useful (0 votes)
137 views24 pages

Data Ingestion Using Nifi: Quick Overview

This document provides an overview of Apache NiFi and its role in data ingestion. It outlines the agenda which includes an overview of NiFi, understanding its core concepts and capabilities for data ingestion, how it compares to traditional ETL tools, and its role in large-scale data engineering. It also references resources for code, documentation and videos. A demo of a simple NiFi pipeline to copy files from the local file system to HDFS is promised.

Uploaded by

dtanonimo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views24 pages

Data Ingestion Using Nifi: Quick Overview

This document provides an overview of Apache NiFi and its role in data ingestion. It outlines the agenda which includes an overview of NiFi, understanding its core concepts and capabilities for data ingestion, how it compares to traditional ETL tools, and its role in large-scale data engineering. It also references resources for code, documentation and videos. A demo of a simple NiFi pipeline to copy files from the local file system to HDFS is promised.

Uploaded by

dtanonimo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Ingestion using NiFi

Quick Overview

training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi Layout as a service
• Key Concepts such as Flow Files, Attributes etc
• Understanding how to access the documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• Demo - Simple pipeline to copy files from Local File System and HDFS
Resources
• Code and Documentation will be available in GitHub Repository.
• Videos will be available over YouTube as part of this playlist. Videos
will be streamed for free and will be available for free for few weeks
after which they will become member only (except this one).

training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Client

Client Web/App Server

Client Switch Switch


Firewall Web/App Server Firewall
Database
Client
Web/App Server
Client

Client
Files

Database

Databases
Data Integration
Web/App Server Batch or Real Time

Web/App Server BI/DW


• For batch get data from databases
by querying data from Database
Web/App Server • Batch Tools: Informatica, Ab Initio
etc External
• For real time get data from web Apps
server logs or database logs
• Real time tools: Goldengate to get
data from database logs, Kafka to
get data from web server logs
Files
Database

Application Databases
logs
Data Lake

Mainframes
BI/DW

IOT Device
Data

External
Apps

Modern Large Scale Data Engineering Architecture


Files
Database

Application Databases
logs
Data Lake

Mainframes
BI/DW

IOT Device
Data

External
Apps

Modern Large Scale Data Engineering Architecture


Files
Database
Data Processing
(EMR, Databricks, Docker)
Application Databases
logs

Data Lake
Mainframes (S3, ADLS) Ingestion
Ingestion BI/DW

IOT Device
Data NiFi helps in Ingestion and basic scheduling
External
Apps

Modern Large Scale Data Engineering Architecture


Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Understanding NiFi as a service
• NiFi is a data ingestion tool and it is typically configured on edge
nodes or client nodes.
• It can be configured on multiple nodes as a cluster for HA, Fault
Tolerance and Load Balancing.
• It can be integrated with Kerberos for Security.
• NiFi is an external service and requires configuration to integrate with
Data Engineering tools like Spark, Kafka, Hadoop etc.
• NiFi is provided as one of the key services under
Cloudera/Hortonworks Distributions.

training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Core Concepts
Here are the core concepts of NiFi one should be familiar with. One will
understand all these concepts while exploring NiFi in depth as part of
the NiFi Workshop Series.
• Processors
• Processor Groups
• Flowfiles
• Attributes
• Controller Services
• NiFi Expression Language
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Accessing NiFi Documentation
• NiFi documentation is accessible from any processor by using usage
that is available in right click menu.

training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Capabilities of NiFi as a Data Ingestion Tool
• Can consume data from most of the sources into Data Lake.
• Can port the data from Data Lake to downstream systems.
• We can also take care of file format conversion while loading data into
Data Lake using NiFi.
• NiFi also provides abilities to apply almost all the standard row level
transformations either by using JOLT or SQL in an incremental fashion.
• NiFi can also be leveraged for orchestrating as well as scheduling the
Data Pipelines.
• However, NiFi might not be the most appropriate tool to load heavy
data as baseline and also not good at complex transformations.

training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi vs. Traditional ETL Tools
• NiFi is primarily an ingestion tool.
• It works well to extract and load the data into Data Lake with out
complex transformations.
• NiFi is very good at getting data between hops by dealing with files
rather than manipulating data.
• NiFi is capable of building simple and generic pipelines to get data
between hops with out restricting the flow with schema.
• You can build a very simple flow in minutes to get data from
thousands of files belonging to hundreds of tables into Data Lake. You
will see that as part of the demo later.

training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Role of NiFi in Data Engineering at Scale
• Get data from databases into data lake
• Consume data from Kafka topics into data lake
• Get data from app server log files into data lake (using Minifi)
• Get data from Data Lake into file servers.
• Get data from on-prem Data Lake into Cloud such as S3, ADLS etc.
• Get processed data from Data Lake into Databases or Data
Warehouses.

training@itversity.com
Files
Database

Application Databases
logs
Data Lake

Mainframes
BI/DW

IOT Device
Data

External
Apps

Modern Large Scale Data Engineering Architecture


Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Demo – Simple Data Pipeline
• Build a simple pipeline to get files from local file system into HDFS.

training@itversity.com

You might also like