[go: up one dir, main page]

0% found this document useful (0 votes)
52 views26 pages

2019 - LSP - Unit 10 - Ecosystem Tools

The document provides information about tools for the Hadoop ecosystem including Hue, Oozie, and Sqoop. Hue is described as a web portal for accessing various ecosystem components like HDFS, Hive, HBase, and workflows like Oozie and Pig. Examples of using Hue to browse HDFS, view jobs, and run Hive queries are shown. Oozie is summarized as a workflow scheduler to control Hadoop jobs using directed acyclic graphs. It supports components like MapReduce, Pig, Hive, and Sqoop. An example workflow and running a job are depicted. Sqoop is outlined as a tool for transferring data between Hadoop and other data stores like SQL

Uploaded by

Venkatesh Konada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views26 pages

2019 - LSP - Unit 10 - Ecosystem Tools

The document provides information about tools for the Hadoop ecosystem including Hue, Oozie, and Sqoop. Hue is described as a web portal for accessing various ecosystem components like HDFS, Hive, HBase, and workflows like Oozie and Pig. Examples of using Hue to browse HDFS, view jobs, and run Hive queries are shown. Oozie is summarized as a workflow scheduler to control Hadoop jobs using directed acyclic graphs. It supports components like MapReduce, Pig, Hive, and Sqoop. An example workflow and running a job are depicted. Sqoop is outlined as a tool for transferring data between Hadoop and other data stores like SQL

Uploaded by

Venkatesh Konada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Ecosystem Tools

CISC525 – Unit 10
Sangwhan Cha
Phil Grim
Before Unit 10
 Project Draft is due on June 3, 11 pm
-> Each member should submit it individually. (Same ppt file)

 Final Project is due on June 17, 11 pm


-> Each member should submit it individually. (Same ppt file)
-> Will include voice annotation of the team members presenting the materials
-> Team evaluation is due on June 17, 11 pm
- Team evaluation template is provided

 Final Exam : June 11 to June 17, 11 pm

 There is the last assignment in unit 10


FAQ
- The entire solution in .pptx format. Here each team member records their voice while
presenting a part of the solution. We were thinking of a screen record and exporting that as a
video. Does that work? Will Moodle support such formats?

: No, it doesn’t work. Each of you should record their voice by inserting Audio for all slides in
your ppt.
: For inserting audio, please, Click “Insert” -> “Audio” ->” Record Audio in the ppt.

- The team evaluation sheet that you will share.


Each team member uploads his/her updated copy on the Moodle.
Here, Moodle should allow us to upload multiple files?

: Yes, you can upload 2 files (your final project and team evaluation)
Learning Goals

 Students will be able to demonstrate


the use of Apache Hue to interface with
Big Data Ecosystem components.
 Students will be able to explain the
characteristics and uses of Apache
Sqoop.
 Students will be able to explain the
characteristics and uses of Apache
Oozie.
Overview

Hue
Oozie
Sqoop
Hue
 Hadoop User Experience, formerly known as Cloudera Desktop
 Open Source under Apache License v2.0
 Web portal to many Ecosystem components and functions
 Hadoop
 File browsing, upload, download
 MapReduce Job Browsing
 Data Access
 Hive
 HBase
 Impala
 SQL Databases
 Workflows
 Oozie
 Pig
 Sqoop
Hue
Examples -1
 User Home Folder
 Familiar interface for file browsing
Hue
Examples -2
 HDFS Browser
Hue
Examples 3
 Job Browser
Hue Examples 4

 Job Browser
Hue
Examples 5

 Hive Queries
Hue
Examples 6

 Hive Queries
Hue
Examples 7
 HBase Browser
Sqoop
 Tool for efficiently transferring data between Hadoop and traditional data
stores such as RDBMSs.
 Generates MapReduce job to accomplish transfers
 Can both import and export data
 Sequence Files
 Hive
 HBase
 Accumulo
 Avro
Sqoop –Contd-1
 Natively supports many database systems with JDBC drivers
 Oracle
 MySQL
 PostgreSQL
 Microsoft SQL Server
 Provides API for supporting other data sources and file types
 Informatica
 Pentaho
 Couchbase
 Supports full table import/export, incremental updates
 Generates Java code that can be re-used in MapReduce jobs.
Sqoop Contd-2
$ sqoop help
Running Sqoop version: 1.4.5-mapr-1410
usage: sqoop COMMAND [ARGS]

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the
results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

See 'sqoop help COMMAND' for information on a specific command.


Sqoop Contd-3
$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp --m 1

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5


14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/22 15:24:56 INFO tool.CodeGenTool: Beginning code generation
14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 14/12/22
15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
14/12/22 15:24:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
14/12/22 15:25:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-
hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar
----------------------------------------------------- -----------------------------------------------------
14/12/22 15:25:40 INFO mapreduce.Job: The url to track the job:
http://localhost:8088/proxy/application_1419242001831_0001/
14/12/22 15:26:45 INFO mapreduce.Job: Job job_1419242001831_0001 running in uber mode : false
14/12/22 15:26:45 INFO mapreduce.Job: map 0% reduce 0%
14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0% 14/12/22 15:28:16 INFO mapreduce.Job: Job
job_1419242001831_0001 completed successfully
----------------------------------------------------- -----------------------------------------------------
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.
Sqoop Contd.-4
Oozie

 Workflow scheduler system to control Hadoop jobs


 Workflows implemented as Directed Acyclical Graphs of actions
 Oozie Coordinator jobs used to schedule recurring jobs triggered by time
 Supports many ecosystem components out of the box
 MapReduce
 Pig
 Hive
 Sqoop
 Command line and Web interface, Hue integration
Oozie
Workflow Example
<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow"> <!—Step 3 -->
<start to = "Create_External_Table" />

<!—Step 1 -->
<action name = "Insert_into_Table">
<hive xmlns = "uri:oozie:hive-action:0.4">
<action name = "Create_External_Table"> <job-tracker>xyz.com:8088</job-tracker>
<hive xmlns = "uri:oozie:hive-action:0.4"> <name-node>hdfs://rootname</name-node>
<job-tracker>xyz.com:8088</job-tracker> <script>hdfs_path_of_script/Copydata.hive</script>
<name-node>hdfs://rootname</name-node>
<param>database_name</param>
<script>hdfs_path_of_script/external.hive</script>
</hive>
</hive>
<ok to = "Create_orc_Table" /> <ok to = "end" />
<error to = "kill_job" /> <error to = "kill_job" />
</action> </action>

<!—Step 2 -->
<kill name = "kill_job">
<action name = "Create_orc_Table">
<message>Job failed</message>
<hive xmlns = "uri:oozie:hive-action:0.4"> </kill>
<job-tracker>xyz.com:8088</job-tracker>
<name-node>hdfs://rootname</name-node> <end name = "end" />
<script>hdfs_path_of_script/orc.hive</script>
</hive>
</workflow-app>
<ok to = "Insert_into_Table" />
<error to = "kill_job" />
</action>
Oozie
Workflow Example 2
Oozie
Running Job

$ oozie job --oozie http://host_name:8080/oozie -D


oozie.wf.application.path=hdfs://namenodepath/pathof_workflow_xml/workflow.xml-
run
Oozie
Coordinator and Bundle Example
<coordinator-app xmlns = "uri:oozie:coordinator:0.2" name = <bundle-app xmlns = 'uri:oozie:bundle:0.1'
"coord_copydata_from_external_orc" frequency = "5 * * * *" start = name = 'bundle_copydata_from_external_orc'>
"2016-00-18T01:00Z" end = "2025-12-31T00:00Z"" timezone =
"America/Los_Angeles"> <controls>
<kick-off-time>${kickOffTime}</kick-off-time>
<controls> </controls>
<timeout>1</timeout>
<concurrency>1</concurrency> <coordinator name = 'coord_copydata_from_external_orc'
<execution>FIFO</execution> >
<throttle>1</throttle> <app-path>pathof_coordinator_xml</app-path>
</controls> <configuration>
<property>
<action> <name>startTime1</name>
<workflow> <value>time to start</value>
<app-path>pathof_workflow_xml/workflow.xml</app-path> </property>
</workflow> </configuration>
</action>
</coordinator>

</bundle-app>
</coordinator-app>
Oozie
Hue Integration
Oozie
Hue Integration –Contd. 1
Continued Reading

Sqoop
http://sqoop.apache.org

Oozie
http://oozie.apache.org

Hue Website
http://gethue.com

You might also like