Sqoop Import
Running Simple Sqoop Command
Below Sqoop import command will ingest data from table customers in MySQL database. It will use
MySQL JDBC connector to connect to database using username and password provided in the Sqoop
command. It will ingest the data parallelly using 4 mappers that will run as MapReduce program in
Yarn cluster and will load the data into default warehouse directory. Below Sqoop command will
generate a sub folder inside default warehouse directory with the same name as table name and all
output part files will be generated inside that subfolder.
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers
Managing Warehouse Dir
Default warehouse directory can be changed using “warehouse-dir” flag.
hdfs dfs -mkdir /user/new-warehouse-dir
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--warehouse-dir /user/sqoop/new-warehouse-dir
Managing Target Dir
Instead of generating outp ut inside warehouse directory, we can define an output location called as
target directory under which Sqoop command will load output files. Target directory can be defined
by using “—target-dir” flag
hdfs dfs -mkdir /user/sample
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/sample
Import Data as Parquet File
By default, data is generated as text file in which columns are separated by commas but Sqoop
command can also be used to generate parquet file. To import the data of table as parquet file, you
need to use “—as-parquetfile” flag in the Sqoop command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/parquet \
--as-parquetfile
Import Data as Avro File
To import the data of table as avro file, you need to use “—as-avrodatafile” flag in the Sqoop
command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/avro \
--as-avrodatafile
Import Data as Sequence File
To import the data of table as sequence file, you need to use “—as-sequencefile” flag in the Sqoop
command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/parquet \
--as-sequencefile
Import Data as Gzip compressed File
To import the data in gzip compression, you need to use “--compress” flag in the Sqoop command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/gzip \
--compress
Import Data as Snappy compressed File
To import the data in snappy compression, you need to use “--compress” and “—compression-
codec” flag in the Sqoop command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/snappy \
--compress
--compression-codec snappy
Import Data as Bzip2 compressed File
To import the data in bzip compression, you need to use “--compress” and “—compression-codec”
flag in the Sqoop command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/bzip2 \
--compress
--compression-codec bzip2
Import Data as lz4 compressed File
To import the data in lz4 compression, you need to use “--compress” and “—compression-codec”
flag in the Sqoop command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/lz4 \
--compress
--compression-codec lz4
Conditional Import
You can include a where condition in the Sqoop command using “--where” flag to import data based
on some condition.
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/conditional \
--where "customer_id>100 "
Selective Import
You can also specify the selective columns in the Sqoop command using “--columns” flag to import
only data from those columns
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customer/selected \
--columns "customer_id,customer_fname,customer_lname "
Running SQL Query in Sqoop
Sqoop is very flexible and allows you to run sql queries inside the sqoop command using “—query”
flag. You need to provide “—spli-by” column explicitly
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--target-dir /user/customer/query \
--query 'Select customer_id,customer_fname,customer_lname from customers where
customer_id>100 AND $CONDITIONS' \
--split-by customer_id
Split-By in Sqoop Command
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table products \
--split-by product_id
Fields delimiters
You can explicitly specify the delimiters to be used in your output files to separate the fields. By
default, if we don’t specify anything, it is comma.
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customers/delimited \
--fields-terminated-by '|'
Handling NullString/Null No-Strings
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--target-dir /user/customers/handling-null \
--null-string “xxx” \
--null-non-string “yyy”
List Database
Using sqoop, you can list the database in your relational database.
sqoop list-databases \
--connect jdbc:mysql://localhost/ \
--username root \
--password admin123
List Tables
Using sqoop, you can list the tables in your relational database.
sqoop list-tables \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123
Incremental Append
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table orders \
--target-dir /user/orders/incremental \
--incremental append \
--check-column order_id \
--last-value 100006
Incremental Append Sqoop Job
sqoop job –create order_update_job \
-- import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table orders \
--target-dir /user/orders/incremental \
--incremental append \
--check-column order_id \
--last-value 100006
To see the job:
Sqoop job –list
To execute the job:
Sqoop job --exec order_update_job
Hive Import
sqoop import \
--connect jdbc:mysql://localhost/retail_db \
--username root \
--password admin123 \
--table customers \
--hive-import \
--create-hive-table \
--hive-database default \
--hive-table customer_mysql
Sqoop Export
HDFS to MySQL
sqoop export \
--connect jdbc:mysql://localhost/export_db \
--username root \
--password admin123 \
--table customer_export \
--export-dir /user/customers/text
HDFS to MySQL (Pipe Delimited File)
sqoop export \
--connect jdbc:mysql://localhost/export_db \
--username root \
--password admin123 \
--table customer_delimited \
--export-dir /user/customers/delimited \
--input-fields-terminated-by '|'