[go: up one dir, main page]

0% found this document useful (0 votes)
101 views9 pages

Sqoop Import & Export Guide

The document describes various commands and options for using Sqoop to import and export data between HDFS and relational databases. Key points include using Sqoop to import data from a MySQL table into HDFS in various formats (text, Parquet, Avro etc.), specify the warehouse and target directories, import selectively or with conditions, handle null values, list databases and tables, perform incremental imports, create Sqoop jobs, import into Hive, and export from HDFS to MySQL tables.

Uploaded by

Fabina Mani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views9 pages

Sqoop Import & Export Guide

The document describes various commands and options for using Sqoop to import and export data between HDFS and relational databases. Key points include using Sqoop to import data from a MySQL table into HDFS in various formats (text, Parquet, Avro etc.), specify the warehouse and target directories, import selectively or with conditions, handle null values, list databases and tables, perform incremental imports, create Sqoop jobs, import into Hive, and export from HDFS to MySQL tables.

Uploaded by

Fabina Mani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Sqoop Import

Running Simple Sqoop Command


Below Sqoop import command will ingest data from table customers in MySQL database. It will use
MySQL JDBC connector to connect to database using username and password provided in the Sqoop
command. It will ingest the data parallelly using 4 mappers that will run as MapReduce program in
Yarn cluster and will load the data into default warehouse directory. Below Sqoop command will
generate a sub folder inside default warehouse directory with the same name as table name and all
output part files will be generated inside that subfolder.

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers

Managing Warehouse Dir


Default warehouse directory can be changed using “warehouse-dir” flag.

hdfs dfs -mkdir /user/new-warehouse-dir

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--warehouse-dir /user/sqoop/new-warehouse-dir

Managing Target Dir


Instead of generating outp ut inside warehouse directory, we can define an output location called as
target directory under which Sqoop command will load output files. Target directory can be defined
by using “—target-dir” flag

hdfs dfs -mkdir /user/sample

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \
--password admin123 \

--table customers \

--target-dir /user/sample

Import Data as Parquet File


By default, data is generated as text file in which columns are separated by commas but Sqoop
command can also be used to generate parquet file. To import the data of table as parquet file, you
need to use “—as-parquetfile” flag in the Sqoop command

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/parquet \

--as-parquetfile

Import Data as Avro File


To import the data of table as avro file, you need to use “—as-avrodatafile” flag in the Sqoop
command

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/avro \

--as-avrodatafile

Import Data as Sequence File


To import the data of table as sequence file, you need to use “—as-sequencefile” flag in the Sqoop
command

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \
--password admin123 \

--table customers \

--target-dir /user/customer/parquet \

--as-sequencefile

Import Data as Gzip compressed File


To import the data in gzip compression, you need to use “--compress” flag in the Sqoop command

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/gzip \

--compress

Import Data as Snappy compressed File


To import the data in snappy compression, you need to use “--compress” and “—compression-
codec” flag in the Sqoop command

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/snappy \

--compress

--compression-codec snappy

Import Data as Bzip2 compressed File


To import the data in bzip compression, you need to use “--compress” and “—compression-codec”
flag in the Sqoop command

sqoop import \
--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/bzip2 \

--compress

--compression-codec bzip2

Import Data as lz4 compressed File


To import the data in lz4 compression, you need to use “--compress” and “—compression-codec”
flag in the Sqoop command

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/lz4 \

--compress

--compression-codec lz4

Conditional Import
You can include a where condition in the Sqoop command using “--where” flag to import data based
on some condition.

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/conditional \

--where "customer_id>100 "


Selective Import
You can also specify the selective columns in the Sqoop command using “--columns” flag to import
only data from those columns

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customer/selected \

--columns "customer_id,customer_fname,customer_lname "

Running SQL Query in Sqoop


Sqoop is very flexible and allows you to run sql queries inside the sqoop command using “—query”
flag. You need to provide “—spli-by” column explicitly

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--target-dir /user/customer/query \

--query 'Select customer_id,customer_fname,customer_lname from customers where


customer_id>100 AND $CONDITIONS' \

--split-by customer_id

Split-By in Sqoop Command


sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \
--table products \

--split-by product_id

Fields delimiters
You can explicitly specify the delimiters to be used in your output files to separate the fields. By
default, if we don’t specify anything, it is comma.

sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customers/delimited \

--fields-terminated-by '|'

Handling NullString/Null No-Strings


sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--target-dir /user/customers/handling-null \

--null-string “xxx” \

--null-non-string “yyy”

List Database
Using sqoop, you can list the database in your relational database.

sqoop list-databases \

--connect jdbc:mysql://localhost/ \

--username root \

--password admin123
List Tables
Using sqoop, you can list the tables in your relational database.

sqoop list-tables \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123

Incremental Append
sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table orders \

--target-dir /user/orders/incremental \

--incremental append \

--check-column order_id \

--last-value 100006

Incremental Append Sqoop Job


sqoop job –create order_update_job \

-- import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table orders \

--target-dir /user/orders/incremental \

--incremental append \

--check-column order_id \

--last-value 100006

To see the job:


Sqoop job –list

To execute the job:


Sqoop job --exec order_update_job

Hive Import
sqoop import \

--connect jdbc:mysql://localhost/retail_db \

--username root \

--password admin123 \

--table customers \

--hive-import \

--create-hive-table \

--hive-database default \

--hive-table customer_mysql

Sqoop Export

HDFS to MySQL
sqoop export \

--connect jdbc:mysql://localhost/export_db \

--username root \

--password admin123 \

--table customer_export \

--export-dir /user/customers/text

HDFS to MySQL (Pipe Delimited File)


sqoop export \

--connect jdbc:mysql://localhost/export_db \

--username root \

--password admin123 \
--table customer_delimited \

--export-dir /user/customers/delimited \

--input-fields-terminated-by '|'

You might also like