Data Ingestion Using Nifi: Quick Overview
Data Ingestion Using Nifi: Quick Overview
Quick Overview
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi Layout as a service
• Key Concepts such as Flow Files, Attributes etc
• Understanding how to access the documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• Demo - Simple pipeline to copy files from Local File System and HDFS
Resources
• Code and Documentation will be available in GitHub Repository.
• Videos will be available over YouTube as part of this playlist. Videos
will be streamed for free and will be available for free for few weeks
after which they will become member only (except this one).
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Client
Client
Files
Database
Databases
Data Integration
Web/App Server Batch or Real Time
Application Databases
logs
Data Lake
Mainframes
BI/DW
IOT Device
Data
External
Apps
Application Databases
logs
Data Lake
Mainframes
BI/DW
IOT Device
Data
External
Apps
Data Lake
Mainframes (S3, ADLS) Ingestion
Ingestion BI/DW
IOT Device
Data NiFi helps in Ingestion and basic scheduling
External
Apps
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Core Concepts
Here are the core concepts of NiFi one should be familiar with. One will
understand all these concepts while exploring NiFi in depth as part of
the NiFi Workshop Series.
• Processors
• Processor Groups
• Flowfiles
• Attributes
• Controller Services
• NiFi Expression Language
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Accessing NiFi Documentation
• NiFi documentation is accessible from any processor by using usage
that is available in right click menu.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Capabilities of NiFi as a Data Ingestion Tool
• Can consume data from most of the sources into Data Lake.
• Can port the data from Data Lake to downstream systems.
• We can also take care of file format conversion while loading data into
Data Lake using NiFi.
• NiFi also provides abilities to apply almost all the standard row level
transformations either by using JOLT or SQL in an incremental fashion.
• NiFi can also be leveraged for orchestrating as well as scheduling the
Data Pipelines.
• However, NiFi might not be the most appropriate tool to load heavy
data as baseline and also not good at complex transformations.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi vs. Traditional ETL Tools
• NiFi is primarily an ingestion tool.
• It works well to extract and load the data into Data Lake with out
complex transformations.
• NiFi is very good at getting data between hops by dealing with files
rather than manipulating data.
• NiFi is capable of building simple and generic pipelines to get data
between hops with out restricting the flow with schema.
• You can build a very simple flow in minutes to get data from
thousands of files belonging to hundreds of tables into Data Lake. You
will see that as part of the demo later.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Role of NiFi in Data Engineering at Scale
• Get data from databases into data lake
• Consume data from Kafka topics into data lake
• Get data from app server log files into data lake (using Minifi)
• Get data from Data Lake into file servers.
• Get data from on-prem Data Lake into Cloud such as S3, ADLS etc.
• Get processed data from Data Lake into Databases or Data
Warehouses.
training@itversity.com
Files
Database
Application Databases
logs
Data Lake
Mainframes
BI/DW
IOT Device
Data
External
Apps
training@itversity.com