Beginner’s Guide for Sqoop – Tutorial

In this tutorial, we will be discussing about the basics of Sqoop. You will also learn how to import data from RDBMS to HDFS and to export data from HDFS into RDBMS using Sqoop.
Note: We hope that Sqoop is already installed in your system and all the required connectors and jar files are imported into lib folder of your Sqoop.
Importing a Table from RDBMS to HDFS
Here we are using a sample MySQL data and importing it into HDFS using Sqoop. Let us see how to create a table in MySQL.
  • Login to your MySQL shell.
  • Create a database by using the below command:
create database Acadgild;
  • To work in the created database, use the following command:
use Acadgild
Now we are in Acadgild database.
  • Create a table employee by using the below command:
create table employee(emp_id INT NOT NULL AUTO_INCREMENT,emp_name VARCHAR(100),emp_sal INT,PRIMARY KEY(emp_id));

A table has been created with name employee and with the columns emp_id,emp_name,emp_sal. The scheme of this table can be checked using the following command:

describe employee;


  • Let’s now try inserting some sample data into the created table by using the below command:
INSERT INTO employee(emp_id,emp_name,emp_sal) VALUES (1,'kk',50000), (4,'onk',6000),(7,'mj',70000);
  • We can check the inserted using this command: select * from employee;
We have successfully created a table in MySQL, and we will now import the same into HDFS by using Sqoop.
The following command can be used to import the table into HDFS.
Syntax:

sqoop import --connect jdbc:mysql://localhost/db_name --username $user_name --password $password --table tbl_name -m1
Here we are connecting to MySQL through JDBC connectors and using the database Acadgild. Here it is necessary to specify the MySQL ‘s username and password and the table name.
In our case, the command will be as shown below:

sqoop import --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 --table employee -m1
Here ‘-m’ specifies the number of map task that can be run simultaneously and ‘m1’ means that only one map task can run.

Note: If we do not use -m1 at the end of the statement, for each record in the MySQL table we will get separate files in the HDFS.
Now the data in RDBMS has been successfully imported into HDFS. By default, the files will be stored here: /user/$user_name/table_name/part-m-00000 file.
Refer to the below image for the same.

Importing all the Tables in a Database to HDFS

We can import all the tables present in a database by specifying import-all-tables as shown in the below command:
sqoop import-all-tables --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 -m1
When we specify import-all-tables in the command, Sqoop will internaly call ‘import all tables’ tool. This tool will import all the tables into the HDFS so that we can specify the directory of our own by using —target-dir.
While using the above command, for every table a MapReduce job will be launched and each table will get imported into HDFS sequentially.
We now have two tables in our MySQL database.

When we check the /user/$User_name/ directory of HDFS, we can see that the two tables have been successfully imported. You can refer the below screen shot for the same.


Importing table data into a Specific Directory in HDFS

We can also import data to a specific directory by specifying it in the command as –target-dir as shown in the below command:
sqoop import --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 --table employee -m1 --target-dir /sqoop_data/employee/


Importing Specific Table Data into HDFS

We can also import specific data in RDBMS table to HDFS, by specifying the Where Class in Sqoop command as shown below:
sqoop import --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 --table employee -m1 --target-dir /sqoop_data/employee/specific_data/ --where emp_id=1
We can check the contents of imported table data by using the command hadoop fs -cat ‘/path of the file’. You can refer the below screen shot for the same.

Importing Table as a Sequence File into HDFS

It is possible to store the RDBMS data as a sequence file, by specifying —as-sequencefile as specified in the below command:
sqoop import --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 --table employee -m1 --target-dir /sqoop_data/employee/squence/ --as-sequencefile
As the sequence file stores the contents in binary format, we will get the binary output as shown in the below screen shot.
We have now successfully stored the RDBMS data in a sequence file using Sqoop.

Importing Table as a Avro File into HDFS

We can also store the RDBMS data as an Avro file by specifying –as-avrodatafile as shown in the below command:
sqoop import --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 --table employee -m1 --target-dir /sqoop_data/employee/avro/ --as-avrodatafile
When storing a file using Avro file format, we will get the output file with .avro extension and the contents inside the file will be in binary format. We can see the same in the below screenshot:
We have now successfully stored the contents of RDBMS in an Avro file using Sqoop.
The above mentioned scenarios are just few of the circumstances for importing data from RDBMS to HDFS using Sqoop.
Now, let’s take a look at how to export data from HDFS to RDBMS using Sqoop.

Export Data from HDFS to RDBMS

To export the data in HDFS to MySQL, first we need to create a table in MYSQL which matches the schema of the data in HDFS.
Let’s export the Customers data set shown in the below screen shot in HDFS to RDBMS. 
It consists of the fields, Customer_Id, Customer_name and Customer_city.
Let’s create a table in MYSQL using the below command:

create table customers(cust_id INT,Name varchar(50),city varchar(50));
We can check the schema of the created table using the following command:
describe customers;

We have theCustomers data set in HDFS in the below specified location.
/Customers.csv
We will now export the data in above location of HDFS into RDBMS using Sqoop, through the below command:

sqoop export --connect jdbc:mysql://localhost/Acadgild --table customers --export-dir /Customers.csv

The data has now been successfully transferred from HDFS to RDBMS. We can check the data in MySQL using the below command:

select * from customers;
We can see that the contents of dataset is now available inside the table customers, which is present in MySQL. Thus we have successfully transferred data from HDFS to RDBMS.
Hope this Sqoop helped you to learn how to import and export data between HDFS and RDBMS. 

No comments:

Post a Comment