[go: up one dir, main page]

0% found this document useful (0 votes)
80 views13 pages

06 - Acquire Data Using CLI and Flume

This document discusses acquiring data using the command line interface (CLI) and Flume. It defines Flume as a distributed service for collecting, aggregating, and moving large data to centralized storage. Flume uses agents containing sources, sinks, and channels to transfer event data from web servers to HDFS in a reliable, fault-tolerant manner. The document demonstrates configuring and using the CLI to load files into HDFS, and provides an example Flume configuration and data flow.

Uploaded by

Jose Evanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views13 pages

06 - Acquire Data Using CLI and Flume

This document discusses acquiring data using the command line interface (CLI) and Flume. It defines Flume as a distributed service for collecting, aggregating, and moving large data to centralized storage. Flume uses agents containing sources, sinks, and channels to transfer event data from web servers to HDFS in a reliable, fault-tolerant manner. The document demonstrates configuring and using the CLI to load files into HDFS, and provides an example Flume configuration and data flow.

Uploaded by

Jose Evanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Acquire Data Using CLI and Flume

Course Road Map

Lesson 5: Introduction to the Hadoop


Module 1: Big Data Management System Distributed File System (HDFS)

Lesson 6: Acquire Data using CLI, Fuse-


Module 2: Data Acquisition and Storage DFS, and Flume

Lesson 07: Acquire and Access Data


Module 3: Data Access and Processing
Using Oracle NoSQL Database

Module 4: Data Unification and Analysis Lesson 08: Primary Administrative Tasks
for Oracle NoSQL Database

Module 5: Using and Managing Oracle


Big Data Appliance

6-2
Objectives

After completing this lesson, you should be able to:


• Describe Uses of the Command Line Interface (CLI)
• Define Flume
• Describe the data-flow mechanism of Flume
• Identify the options for configuring Flume

6-3
Viewing File System Contents Using the CLI

6-4
Loading Data Using the CLI

Put files into HDFS:


$ hadoop fs –put ?site.xml
*site.xml /u01/bigdatasql_config/bigdatalite

6-5
What is Flume?

• Is a distributed service for collecting, aggregating, and


moving large data to a centralized data store
• Was developed by Apache
• Has the following features:
– Simple
– Reliable
– Fault tolerant and High Availability
– Used for online analytic applications

6-6
Flume: Architecture

A sink is responsible for delivering


the event to the next agent or
terminal repository (like HDFS) in the
Source Sink flow
• Logger
• Avro
• Hdfs
• file_roll
• org.apache.flume.sink.kafka.KafkaSink
Channel

Agent

HDFS
Web
Server

6-7
Flume Channels (Hold Events)

• Memory channel
• JDBC channel
• File channel
• Custom channel

Source Sink

Channel

Agent

Web
HDFS
Server

6-8
Flume: Data Flows

1. Agent 2. Processor 3. Collector

Source Sink Source Regex Sink Source Sink

Extract
browser name
Downstream Upstream Downstream
Tail Apache from log string Upstream HDFS://
processor agent collector
HTTPD logs and attach it to processor namenode/
HTTPD node node node
event node /weblogs/ HDFS
%(browser)/

6-9
Configuring Flume

1. Create a configuration file (flume.conf).


2. Store the file in the flume-ng/conf directory.
3. Configure individual components.
4. (Optional) Edit flume-env.sh.
5. Verify the installation by running the following command:
$ flume-ng help

6 - 10
Exploring a flume*.conf File

6 - 11
Additional Resources

• http://flume.apache.org/index.html

6 - 12
Summary

In this lesson, you should have learned to:


• Describe Uses of the Command Line Interface (CLI)
• Define Flume
• Describe the data-flow mechanism of Flume
• Identify the options for configuring Flume

6 - 13

You might also like