Acquire Data Using CLI and Flume
Course Road Map
Lesson 5: Introduction to the Hadoop
Module 1: Big Data Management System Distributed File System (HDFS)
Lesson 6: Acquire Data using CLI, Fuse-
Module 2: Data Acquisition and Storage DFS, and Flume
Lesson 07: Acquire and Access Data
Module 3: Data Access and Processing
Using Oracle NoSQL Database
Module 4: Data Unification and Analysis Lesson 08: Primary Administrative Tasks
for Oracle NoSQL Database
Module 5: Using and Managing Oracle
Big Data Appliance
6-2
Objectives
After completing this lesson, you should be able to:
• Describe Uses of the Command Line Interface (CLI)
• Define Flume
• Describe the data-flow mechanism of Flume
• Identify the options for configuring Flume
6-3
Viewing File System Contents Using the CLI
6-4
Loading Data Using the CLI
Put files into HDFS:
$ hadoop fs –put ?site.xml
*site.xml /u01/bigdatasql_config/bigdatalite
6-5
What is Flume?
• Is a distributed service for collecting, aggregating, and
moving large data to a centralized data store
• Was developed by Apache
• Has the following features:
– Simple
– Reliable
– Fault tolerant and High Availability
– Used for online analytic applications
6-6
Flume: Architecture
A sink is responsible for delivering
the event to the next agent or
terminal repository (like HDFS) in the
Source Sink flow
• Logger
• Avro
• Hdfs
• file_roll
• org.apache.flume.sink.kafka.KafkaSink
Channel
Agent
HDFS
Web
Server
6-7
Flume Channels (Hold Events)
• Memory channel
• JDBC channel
• File channel
• Custom channel
Source Sink
Channel
Agent
Web
HDFS
Server
6-8
Flume: Data Flows
1. Agent 2. Processor 3. Collector
Source Sink Source Regex Sink Source Sink
Extract
browser name
Downstream Upstream Downstream
Tail Apache from log string Upstream HDFS://
processor agent collector
HTTPD logs and attach it to processor namenode/
HTTPD node node node
event node /weblogs/ HDFS
%(browser)/
6-9
Configuring Flume
1. Create a configuration file (flume.conf).
2. Store the file in the flume-ng/conf directory.
3. Configure individual components.
4. (Optional) Edit flume-env.sh.
5. Verify the installation by running the following command:
$ flume-ng help
6 - 10
Exploring a flume*.conf File
6 - 11
Additional Resources
• http://flume.apache.org/index.html
6 - 12
Summary
In this lesson, you should have learned to:
• Describe Uses of the Command Line Interface (CLI)
• Define Flume
• Describe the data-flow mechanism of Flume
• Identify the options for configuring Flume
6 - 13