0% found this document useful (0 votes)

52 views26 pages

2019 - LSP - Unit 10 - Ecosystem Tools

The document provides information about tools for the Hadoop ecosystem including Hue, Oozie, and Sqoop. Hue is described as a web portal for accessing various ecosystem components like HDFS, Hive, HBase, and workflows like Oozie and Pig. Examples of using Hue to browse HDFS, view jobs, and run Hive queries are shown. Oozie is summarized as a workflow scheduler to control Hadoop jobs using directed acyclic graphs. It supports components like MapReduce, Pig, Hive, and Sqoop. An example workflow and running a job are depicted. Sqoop is outlined as a tool for transferring data between Hadoop and other data stores like SQL

Uploaded by

Venkatesh Konada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views26 pages

2019 - LSP - Unit 10 - Ecosystem Tools

Uploaded by

Venkatesh Konada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Ecosystem Tools

CISC525 – Unit 10
Sangwhan Cha
Phil Grim
Before Unit 10
 Project Draft is due on June 3, 11 pm
-> Each member should submit it individually. (Same ppt file)

 Final Project is due on June 17, 11 pm

-> Each member should submit it individually. (Same ppt file)
-> Will include voice annotation of the team members presenting the materials
-> Team evaluation is due on June 17, 11 pm
- Team evaluation template is provided

 Final Exam : June 11 to June 17, 11 pm

 There is the last assignment in unit 10

FAQ
- The entire solution in .pptx format. Here each team member records their voice while
presenting a part of the solution. We were thinking of a screen record and exporting that as a
video. Does that work? Will Moodle support such formats?

: No, it doesn’t work. Each of you should record their voice by inserting Audio for all slides in
your ppt.
: For inserting audio, please, Click “Insert” -> “Audio” ->” Record Audio in the ppt.

- The team evaluation sheet that you will share.

Each team member uploads his/her updated copy on the Moodle.
Here, Moodle should allow us to upload multiple files?

: Yes, you can upload 2 files (your final project and team evaluation)
Learning Goals

 Students will be able to demonstrate

the use of Apache Hue to interface with
Big Data Ecosystem components.
 Students will be able to explain the
characteristics and uses of Apache
Sqoop.
 Students will be able to explain the
characteristics and uses of Apache
Oozie.
Overview

Hue
Oozie
Sqoop
Hue
 Hadoop User Experience, formerly known as Cloudera Desktop
 Open Source under Apache License v2.0
 Web portal to many Ecosystem components and functions
 Hadoop
 File browsing, upload, download
 MapReduce Job Browsing
 Data Access
 Hive
 HBase
 Impala
 SQL Databases
 Workflows
 Oozie
 Pig
 Sqoop
Hue
Examples -1
 User Home Folder
 Familiar interface for file browsing
Hue
Examples -2
 HDFS Browser
Hue
Examples 3
 Job Browser
Hue Examples 4

 Job Browser
Hue
Examples 5

 Hive Queries
Hue
Examples 6

 Hive Queries
Hue
Examples 7
 HBase Browser
Sqoop
 Tool for efficiently transferring data between Hadoop and traditional data
stores such as RDBMSs.
 Generates MapReduce job to accomplish transfers
 Can both import and export data
 Sequence Files
 Hive
 HBase
 Accumulo
 Avro
Sqoop –Contd-1
 Natively supports many database systems with JDBC drivers
 Oracle
 MySQL
 PostgreSQL
 Microsoft SQL Server
 Provides API for supporting other data sources and file types
 Informatica
 Pentaho
 Couchbase
 Supports full table import/export, incremental updates
 Generates Java code that can be re-used in MapReduce jobs.
Sqoop Contd-2
$ sqoop help
Running Sqoop version: 1.4.5-mapr-1410
usage: sqoop COMMAND [ARGS]

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the
results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

See 'sqoop help COMMAND' for information on a specific command.

Sqoop Contd-3
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp --m 1

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/22 15:24:56 INFO tool.CodeGenTool: Beginning code generation
14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 14/12/22
15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
14/12/22 15:24:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
14/12/22 15:25:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-
hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar
----------------------------------------------------- -----------------------------------------------------
14/12/22 15:25:40 INFO mapreduce.Job: The url to track the job:
http://localhost:8088/proxy/application_1419242001831_0001/
14/12/22 15:26:45 INFO mapreduce.Job: Job job_1419242001831_0001 running in uber mode : false
14/12/22 15:26:45 INFO mapreduce.Job: map 0% reduce 0%
14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0% 14/12/22 15:28:16 INFO mapreduce.Job: Job
job_1419242001831_0001 completed successfully
----------------------------------------------------- -----------------------------------------------------
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.
Sqoop Contd.-4
Oozie

 Workflow scheduler system to control Hadoop jobs

 Workflows implemented as Directed Acyclical Graphs of actions
 Oozie Coordinator jobs used to schedule recurring jobs triggered by time
 Supports many ecosystem components out of the box
 MapReduce
 Pig
 Hive
 Sqoop
 Command line and Web interface, Hue integration
Oozie
Workflow Example
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow"> <!—Step 3 -->
<start to = "Create_External_Table" />

<!—Step 1 -->
<action name = "Insert_into_Table">
<hive xmlns = "uri:oozie:hive-action:0.4">
<action name = "Create_External_Table"> <job-tracker>xyz.com:8088</job-tracker>
<hive xmlns = "uri:oozie:hive-action:0.4"> <name-node>hdfs://rootname</name-node>
<job-tracker>xyz.com:8088</job-tracker> <script>hdfs_path_of_script/Copydata.hive</script>
<name-node>hdfs://rootname</name-node>
<param>database_name</param>
<script>hdfs_path_of_script/external.hive</script>
</hive>
</hive>
<ok to = "Create_orc_Table" /> <ok to = "end" />
<error to = "kill_job" /> <error to = "kill_job" />
</action> </action>

<!—Step 2 -->
<kill name = "kill_job">
<action name = "Create_orc_Table">
<message>Job failed</message>
<hive xmlns = "uri:oozie:hive-action:0.4"> </kill>
<job-tracker>xyz.com:8088</job-tracker>
<name-node>hdfs://rootname</name-node> <end name = "end" />
<script>hdfs_path_of_script/orc.hive</script>
</hive>
</workflow-app>
<ok to = "Insert_into_Table" />
<error to = "kill_job" />
</action>
Oozie
Workflow Example 2
Oozie
Running Job

$ oozie job --oozie http://host_name:8080/oozie -D

oozie.wf.application.path=hdfs://namenodepath/pathof_workflow_xml/workflow.xml-
run
Oozie
Coordinator and Bundle Example
<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name = <bundle-app xmlns = 'uri:oozie:bundle:0.1'
"coord_copydata_from_external_orc" frequency = "5 * * * *" start = name = 'bundle_copydata_from_external_orc'>
"2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone =
"America/Los_Angeles"> <controls>
<kick-off-time>${kickOffTime}</kick-off-time>
<controls> </controls>
<timeout>1</timeout>
<concurrency>1</concurrency> <coordinator name = 'coord_copydata_from_external_orc'
<execution>FIFO</execution> >
<throttle>1</throttle> <app-path>pathof_coordinator_xml</app-path>
</controls> <configuration>
<property>
<action> <name>startTime1</name>
<workflow> <value>time to start</value>
<app-path>pathof_workflow_xml/workflow.xml</app-path> </property>
</workflow> </configuration>
</action>
</coordinator>

</bundle-app>
</coordinator-app>
Oozie
Hue Integration
Oozie
Hue Integration –Contd. 1
Continued Reading

Sqoop
http://sqoop.apache.org

Oozie
http://oozie.apache.org

Hue Website
http://gethue.com

SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Hadoop
No ratings yet
Hadoop
104 pages
CO3 Session 19
No ratings yet
CO3 Session 19
29 pages
BDT Unit04
No ratings yet
BDT Unit04
136 pages
Cloudera Testpassport CCD-470
No ratings yet
Cloudera Testpassport CCD-470
33 pages
22241A66C5 Assignment21
No ratings yet
22241A66C5 Assignment21
16 pages
BDT Unit04
No ratings yet
BDT Unit04
89 pages
BDT Unit04
No ratings yet
BDT Unit04
89 pages
bd1718 12 Othertools
No ratings yet
bd1718 12 Othertools
50 pages
Ax Ur Courant Number
No ratings yet
Ax Ur Courant Number
84 pages
Big - Data - ISE 2
No ratings yet
Big - Data - ISE 2
12 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
12 pages
BDA Unit 3
No ratings yet
BDA Unit 3
30 pages
Trend Nologies Curriculum
No ratings yet
Trend Nologies Curriculum
30 pages
Approved
No ratings yet
Approved
52 pages
MMS2017 SCCM SQL Optimal Performance v05
No ratings yet
MMS2017 SCCM SQL Optimal Performance v05
64 pages
CP R80.20.M1 Gaia AdminGuide
No ratings yet
CP R80.20.M1 Gaia AdminGuide
266 pages
Group Project - Particle Simulation - Computer Graphics
No ratings yet
Group Project - Particle Simulation - Computer Graphics
6 pages
Digi SM-110 Operation and Programming Manual
No ratings yet
Digi SM-110 Operation and Programming Manual
7 pages
MODULE 2 Hadoop Ecosystem Tools
No ratings yet
MODULE 2 Hadoop Ecosystem Tools
44 pages
Online Billing User Guide v1 0
No ratings yet
Online Billing User Guide v1 0
35 pages
Big Data Spark Cs606pc Syllabus
No ratings yet
Big Data Spark Cs606pc Syllabus
4 pages
SAP Object Oriented ABAP Interview Questions Answers
No ratings yet
SAP Object Oriented ABAP Interview Questions Answers
13 pages
8 MapReduce Different Phases 08-01-2025
No ratings yet
8 MapReduce Different Phases 08-01-2025
28 pages
Introduction To BigData Hadoop
No ratings yet
Introduction To BigData Hadoop
12 pages
Sets Bda
No ratings yet
Sets Bda
19 pages
Apache - SQOOP and Flume
No ratings yet
Apache - SQOOP and Flume
16 pages
Practise Quiz Ccd-470 Exam (05-2014) - Cloudera Quiz Learning
No ratings yet
Practise Quiz Ccd-470 Exam (05-2014) - Cloudera Quiz Learning
74 pages
Hadoop Intro - Part1
No ratings yet
Hadoop Intro - Part1
45 pages
Hadoop
No ratings yet
Hadoop
13 pages
Cloud Computing Era Practice
No ratings yet
Cloud Computing Era Practice
75 pages
CP E80.61 EndpointSecurity AdminGuide
No ratings yet
CP E80.61 EndpointSecurity AdminGuide
196 pages
SCCM
No ratings yet
SCCM
1,900 pages
Module IV
No ratings yet
Module IV
5 pages
Governance Info-Assurance Cyber-Security
No ratings yet
Governance Info-Assurance Cyber-Security
77 pages
Java Aayansh Resume-1
No ratings yet
Java Aayansh Resume-1
1 page
13 Lecture
No ratings yet
13 Lecture
23 pages
Rani Hod Forensic Challenge 2010 - Challenge-2 - Submission
No ratings yet
Rani Hod Forensic Challenge 2010 - Challenge-2 - Submission
18 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
4 pages
Module 2.2
No ratings yet
Module 2.2
32 pages
CP E80.70 RemoteAccessClients ForWin AdminGuide
No ratings yet
CP E80.70 RemoteAccessClients ForWin AdminGuide
142 pages
BD U-5 (Anupam Sir)
No ratings yet
BD U-5 (Anupam Sir)
12 pages
CP R77.30.03 EndpointSecurity AdminGuide
No ratings yet
CP R77.30.03 EndpointSecurity AdminGuide
252 pages
Rstrainings Hadoop Course Content: Curriculum For Hadoop 2.X
No ratings yet
Rstrainings Hadoop Course Content: Curriculum For Hadoop 2.X
7 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
Updated Hadoop Course Content..
No ratings yet
Updated Hadoop Course Content..
7 pages
Certified Cloudera
No ratings yet
Certified Cloudera
5 pages
Goal Solution: How To Find Location of Install, Autoconfig, Patching, Clone and Other Logs in EBS R12 (ID 804603.1)
No ratings yet
Goal Solution: How To Find Location of Install, Autoconfig, Patching, Clone and Other Logs in EBS R12 (ID 804603.1)
4 pages
Expression of Interest Legal Management System
No ratings yet
Expression of Interest Legal Management System
28 pages
How To Backup SCCM 2012 R2 Server
No ratings yet
How To Backup SCCM 2012 R2 Server
11 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
CIS Redhat Linux 5 Benchmark v2.0.0
No ratings yet
CIS Redhat Linux 5 Benchmark v2.0.0
182 pages
How Can I Change The Location of Docker Images When Using Docker Desktop On WSL2 With Windows 10 Home - Stack Overflow
No ratings yet
How Can I Change The Location of Docker Images When Using Docker Desktop On WSL2 With Windows 10 Home - Stack Overflow
7 pages
WXXBinarylog
No ratings yet
WXXBinarylog
9 pages
BDA Module-4
No ratings yet
BDA Module-4
4 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
55 pages
Excel Notes
No ratings yet
Excel Notes
8 pages
Eramet Group AD 2008 and SCCM 2007R3 Architecture
No ratings yet
Eramet Group AD 2008 and SCCM 2007R3 Architecture
16 pages
Working of Hive: Mapreduce: It Is A Parallel Programming Model For Processing Large Amounts
No ratings yet
Working of Hive: Mapreduce: It Is A Parallel Programming Model For Processing Large Amounts
3 pages
CT2 BDTT
No ratings yet
CT2 BDTT
6 pages
Data Manipulation at Scale
No ratings yet
Data Manipulation at Scale
8 pages
Unit Ii Compile and Build Using Maven Notes
No ratings yet
Unit Ii Compile and Build Using Maven Notes
63 pages
Hadoop Training in Hyderabad
No ratings yet
Hadoop Training in Hyderabad
6 pages
Running Ha Do Op Michel Noll
No ratings yet
Running Ha Do Op Michel Noll
23 pages
Rama
No ratings yet
Rama
7 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
Webgoat Installation: If You Get Message That It Cannot Find The Download
No ratings yet
Webgoat Installation: If You Get Message That It Cannot Find The Download
2 pages
BigData - Oozie
No ratings yet
BigData - Oozie
5 pages
Unilever Integration Quick Start Guide
No ratings yet
Unilever Integration Quick Start Guide
12 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
Cloudera CCD 410
100% (1)
Cloudera CCD 410
21 pages
Web Technology Assignment One
No ratings yet
Web Technology Assignment One
4 pages
Statement, Prepared Statement and Callable Statement Are Interfaces Which Providw The Way To Interact With The Databases (Mysql
No ratings yet
Statement, Prepared Statement and Callable Statement Are Interfaces Which Providw The Way To Interact With The Databases (Mysql
5 pages
Advanced Operations On AWS
No ratings yet
Advanced Operations On AWS
1 page
Concurrency Control
No ratings yet
Concurrency Control
4 pages
BOLTCALC Is A Specialised Computer Program
No ratings yet
BOLTCALC Is A Specialised Computer Program
2 pages
Quiz 5 CISC661: Correct 10.00 Points Out of 10.00
No ratings yet
Quiz 5 CISC661: Correct 10.00 Points Out of 10.00
5 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Endpoint Security Agent: Agent Administration Guide Release 29
No ratings yet
Endpoint Security Agent: Agent Administration Guide Release 29
151 pages
HTML5 Cheat Sheet - MikkeGoes
No ratings yet
HTML5 Cheat Sheet - MikkeGoes
12 pages
CandidateinstructionManual PDF
No ratings yet
CandidateinstructionManual PDF
21 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Mobile Print/Scan Guide For Brother Iprint&Scan (Windows Phone)
No ratings yet
Mobile Print/Scan Guide For Brother Iprint&Scan (Windows Phone)
12 pages
FortiSIEM Feb 8th 2017
No ratings yet
FortiSIEM Feb 8th 2017
34 pages
Manga Studio - Beginner's Guide PDF
100% (1)
Manga Studio - Beginner's Guide PDF
95 pages
Big Data & Hadoop - Course Curriculum
No ratings yet
Big Data & Hadoop - Course Curriculum
6 pages
Top 10 Strategies For Oracle Performance Part 3
No ratings yet
Top 10 Strategies For Oracle Performance Part 3
8 pages
Banking Data Analysis On Hadoop
No ratings yet
Banking Data Analysis On Hadoop
21 pages
Professional Summary
No ratings yet
Professional Summary
7 pages
Objective Type Questions: Answer: B
No ratings yet
Objective Type Questions: Answer: B
39 pages
Exam Sections: 1. Infrastructure Objectives
0% (1)
Exam Sections: 1. Infrastructure Objectives
2 pages
Esys VO Coding Step by Step Guide
100% (1)
Esys VO Coding Step by Step Guide
13 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
2 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
Big Data
No ratings yet
Big Data
2 pages
React Portfolio App Development: Increase your online presence and create your personal brand
From Everand
React Portfolio App Development: Increase your online presence and create your personal brand
Abdelfattah Ragab
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet

2019 - LSP - Unit 10 - Ecosystem Tools

Uploaded by

2019 - LSP - Unit 10 - Ecosystem Tools

Uploaded by

Ecosystem Tools

 Final Project is due on June 17, 11 pm

 Final Exam : June 11 to June 17, 11 pm

 There is the last assignment in unit 10

- The team evaluation sheet that you will share.

 Students will be able to demonstrate

See 'sqoop help COMMAND' for information on a specific command.

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

 Workflow scheduler system to control Hadoop jobs

$ oozie job --oozie http://host_name:8080/oozie -D

You might also like