Exporting Files From HDFS To MySQL Using SQOOP

Apache Sqoop is a tool designed to efficiently transfer bulk data between Hadoop and structured datastores such as relational databases.
In this blog, we will see how to export data from HDFS to MySQL using sqoop, with weblog entry as an example.

Steps to Export Data from HDFS to MySQL

Follow below steps to transfer data from HDFS to MySQL table:
Step1:
Create a new database in the MySQL instance.
CREATE DATABASE db1;
NOTE: It is not mandatory to create a new database all the time. You can ‘use’ preexisting databases as well.
Step 2:
Create a table named acad.
USE db1;
CREATE TABLE acad (
emp_id int(2),
emp_name varchar(10),
emp_sal int(10),
date date);
The image below indicates that the table inside MySQL is empty.
Figure 1
Give a command, describe <table name>, to show the various fields and types of it.
This will help in comparing the type of data present inside HDFS which is ready to be mapped.
Figure 2
The files inside HDFS must have the same format as that of MySQL table, to enable the mapping of the data.
Refer the screenshot below to see two files which are ready to be mapped inside MySQL.
Figure 3
Step3: 
Export the input.txt and input2.txt file from HDFS to MySQL
sqoop export  –connect jdbc:mysql://localhost/db1 –username sqoop –password root –table acad –export-dir /sqoop_msql/ -m 1
Where: -m denotes the number of mapper you want to run
NOTE: The target table must exist in MySQL.
To obtain a filtered map, we can use the following option:
–input-fields-terminated-by ‘/t’ –MySQL-delimiters 
Where ‘/t’ denotes tab.
Once the table inside MySQL and data inside HDFS is ready to be mapped, we can execute the export command. Refer the screenshot below:
Figure 4
Once you give the export command, the job completion statement should be displayed.
Figure 5
Note that only Map job needs to be completed. Other error messages will be displayed because of software version compatibility. These errors can be ignored.

How it Works

Sqoop calls the JDBC driver written in the –connect statement from the location where Sqoop is installed. The –username and –password options are used to authenticate the user and Sqoop, internally generates the same command against the MySQL instance.
The –table argument defines the MySQL table name, that will receive the data from HDFS. This table must be created prior to running the export command. Sqoop uses the number of columns, their types, and the metadata of the table to validate the data inserted from the HDFS directory. When the export statement is executed, it initiates and creates INSERT statements in MySQl. For example, the export-job will read each line of the input.txt file from HDFS and produces the following intermediate statements.
INSERT INTO acad VALUES (5,”HADOOP”,50000,’2011-03-21′);
INSERT INTO acad VALUES(6,”SPARK”,600000,’2011-03-22′);
INSERT INTO acad VALUES(7,”JAVA”,700000,’2011-03-23′);
By default, Sqoop export creates INSERT statements. If the –update-key argument is stated, UPDATE statements will be created instead. 
The -m argument sets the number of map jobs for reading the file splits from HDFS. Each mapper will have its own connection to the MySQL Server.
Now, on querying inside MySQL, we see that all the data is mapped inside the table.
Figure 6
Hope this blog was useful in understanding the process of exporting data from HDFS to MySQL, using sqoop.

No comments:

Post a Comment