Today we will see how to load data from Hive to RDBMS (MySQL) using Sqoop Export command.
Below are some of key observations that need to be keep in mind before proceeding with Sqoop Export process —
☛ Exporting table schema must exists in target RDBMS.
☛ The data which is exported from HDFS to RDBMS table, the data must be in sync with target table schema, i.e., Datatypes of fields, sequence of fields, constraints, etc.
☛ If any one record in the exported file is not matching with table schema, the entire file export will fail.
☛ The default operation is to insert all the record from the input files to the database table using the INSERT statement. In update mode, Sqoop generates the UPDATE statement that replaces the existing record into the database.
Step 1 — Create database and table in MySQL Database Server:
Below are some of key observations that need to be keep in mind before proceeding with Sqoop Export process —
☛ Exporting table schema must exists in target RDBMS.
☛ The data which is exported from HDFS to RDBMS table, the data must be in sync with target table schema, i.e., Datatypes of fields, sequence of fields, constraints, etc.
☛ If any one record in the exported file is not matching with table schema, the entire file export will fail.
☛ The default operation is to insert all the record from the input files to the database table using the INSERT statement. In update mode, Sqoop generates the UPDATE statement that replaces the existing record into the database.
Step 1 — Create database and table in MySQL Database Server:
mysql> create database mydb; mysql> use mydb; mysql> create table person -> (empid varchar(11) NOT NULL PRIMARY KEY, -> name varchar(50), -> birthdate date, -> birthplace varchar(30), -> birthcountry varchar(3), -> birthstate varchar(3), -> country varchar(3), -> address1 varchar(50), -> address2 varchar(50), -> address3 varchar(50), -> address4 varchar(50), -> city varchar(50), -> county varchar(30), -> state varchar(3), -> postal int, -> phone varchar(15), -> emailaddress varchar(100) -> ); mysql> desc person; +--------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+-------+ | empid | varchar(11) | NO | PRI | NULL | | | name | varchar(50) | YES | | NULL | | | birthdate | date | YES | | NULL | | | birthplace | varchar(30) | YES | | NULL | | | birthcountry | varchar(3) | YES | | NULL | | | birthstate | varchar(3) | YES | | NULL | | | country | varchar(3) | YES | | NULL | | | address1 | varchar(50) | YES | | NULL | | | address2 | varchar(50) | YES | | NULL | | | address3 | varchar(50) | YES | | NULL | | | address4 | varchar(50) | YES | | NULL | | | city | varchar(50) | YES | | NULL | | | county | varchar(30) | YES | | NULL | | | state | varchar(3) | YES | | NULL | | | postal | int(11) | YES | | NULL | | | phone | varchar(15) | YES | | NULL | | | emailaddress | varchar(100) | YES | | NULL | | +--------------+--------------+------+-----+---------+-------+ 17 rows in set (0.00 sec) mysql> create table person_stg -> (empid varchar(11) NOT NULL PRIMARY KEY, -> name varchar(50), -> birthdate date, -> birthplace varchar(30), -> birthcountry varchar(3), -> birthstate varchar(3), -> country varchar(3), -> address1 varchar(50), -> address2 varchar(50), -> address3 varchar(50), -> address4 varchar(50), -> city varchar(50), -> county varchar(30), -> state varchar(3), -> postal int, -> phone varchar(15), -> emailaddress varchar(100) -> ); mysql> desc person_stg; +--------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+-------+ | empid | varchar(11) | NO | PRI | NULL | | | name | varchar(50) | YES | | NULL | | | birthdate | date | YES | | NULL | | | birthplace | varchar(30) | YES | | NULL | | | birthcountry | varchar(3) | YES | | NULL | | | birthstate | varchar(3) | YES | | NULL | | | country | varchar(3) | YES | | NULL | | | address1 | varchar(50) | YES | | NULL | | | address2 | varchar(50) | YES | | NULL | | | address3 | varchar(50) | YES | | NULL | | | address4 | varchar(50) | YES | | NULL | | | city | varchar(50) | YES | | NULL | | | county | varchar(30) | YES | | NULL | | | state | varchar(3) | YES | | NULL | | | postal | int(11) | YES | | NULL | | | phone | varchar(15) | YES | | NULL | | | emailaddress | varchar(100) | YES | | NULL | | +--------------+--------------+------+-----+---------+-------+ 17 rows in set (0.00 sec)
Step 2 — Execute SQOOP EXPORT command on hadoop cluster:
$ sqoop export \ > --connect jdbc:mysql://localhost/mydb \ > --username root -P \ > --export-dir /tmp/hive/person/part-m-00000 \ --HDFS source path of data to be exported > --table person \ --RDBMS table to populate > --staging-table person_stg \ --Good practice to load data into staging table first. Avoids any inconsistency > --clear-staging-table \ --Indicates that any data present in the staging table can be deleted > --verbose \ --Print more information while working, useful for debugging > --fields-terminated-by ','; --Hive table/HDFS file field delimiter Warning: /usr/hdp/2.3.0.0-2557/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 2016-01-05 15:26:44,870 INFO - [main:] ~ Running Sqoop version: 1.4.6.2.3.0.0-2557 (Sqoop:92) Enter password: 2016-01-05 15:26:46,317 INFO - [main:] ~ Preparing to use a MySQL streaming resultset. (MySQLManager:69) 2016-01-05 15:26:46,318 INFO - [main:] ~ Beginning code generation (CodeGenTool:92) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2016-01-05 15:26:46,699 INFO - [main:] ~ Executing SQL statement: SELECT t.* FROM `person` AS t LIMIT 1 (SqlManager:757) 2016-01-05 15:26:46,750 INFO - [main:] ~ Executing SQL statement: SELECT t.* FROM `person` AS t LIMIT 1 (SqlManager:757) 2016-01-05 15:26:46,756 INFO - [main:] ~ HADOOP_MAPRED_HOME is /usr/hdp/2.3.0.0-2557/hadoop-mapreduce (CompilationManager:94) Note: /tmp/sqoop-root/compile/fb0efb7d2a0bf1bac7d97f9e484649fb/person.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 2016-01-05 15:26:48,812 INFO - [main:] ~ Writing jar file: /tmp/sqoop-root/compile/fb0efb7d2a0bf1bac7d97f9e484649fb/person.jar (CompilationManager:330) 2016-01-05 15:26:48,828 INFO - [main:] ~ Data will be staged in the table: person_stg (ExportJobBase:357) 2016-01-05 15:26:48,829 INFO - [main:] ~ Beginning export of person (ExportJobBase:378) 2016-01-05 15:26:48,840 INFO - [main:] ~ Deleted 0 records from `person_stg` (SqlManager:1053) 2016-01-05 15:26:49,194 INFO - [main:] ~ mapred.jar is deprecated. Instead, use mapreduce.job.jar (deprecation:1173) 2016-01-05 15:26:50,245 INFO - [main:] ~ mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative (deprecation:1173) 2016-01-05 15:26:50,252 INFO - [main:] ~ mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative (deprecation:1173) 2016-01-05 15:26:50,253 INFO - [main:] ~ mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps (deprecation:1173) 2016-01-05 15:26:50,800 INFO - [main:] ~ Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ (TimelineClientImpl:296) 2016-01-05 15:26:50,950 INFO - [main:] ~ Connecting to ResourceManager at sandbox.hortonworks.com/192.168.88.129:8050 (RMProxy:98) 2016-01-05 15:26:52,717 INFO - [main:] ~ Total input paths to process : 1 (FileInputFormat:283) 2016-01-05 15:26:52,721 INFO - [main:] ~ Total input paths to process : 1 (FileInputFormat:283) 2016-01-05 15:26:52,811 INFO - [main:] ~ number of splits:4 (JobSubmitter:198) 2016-01-05 15:26:52,831 INFO - [main:] ~ mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative (deprecation:1173) 2016-01-05 15:26:53,037 INFO - [main:] ~ Submitting tokens for job: job_1455596313072_0009 (JobSubmitter:287) 2016-01-05 15:26:53,503 INFO - [main:] ~ Submitted application application_1455596313072_0009 (YarnClientImpl:274) 2016-01-05 15:26:53,596 INFO - [main:] ~ The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1455596313072_0009/ (Job:1294) 2016-01-05 15:26:53,597 INFO - [main:] ~ Running job: job_1455596313072_0009 (Job:1339) 2016-01-05 15:26:59,778 INFO - [main:] ~ Job job_1455596313072_0009 running in uber mode : false (Job:1360) 2016-01-05 15:26:59,779 INFO - [main:] ~ map 0% reduce 0% (Job:1367) 2016-01-05 15:27:11,210 INFO - [main:] ~ map 75% reduce 0% (Job:1367) 2016-01-05 15:27:12,229 INFO - [main:] ~ map 100% reduce 0% (Job:1367) 2016-01-05 15:27:12,251 INFO - [main:] ~ Job job_1455596313072_0009 completed successfully (Job:1378) 2016-01-05 15:27:12,473 INFO - [main:] ~ Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=573776 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=72660 HDFS: Number of bytes written=0 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=4 Data-local map tasks=4 Total time spent by all maps in occupied slots (ms)=34919 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=34919 Total vcore-seconds taken by all map tasks=34919 Total megabyte-seconds taken by all map tasks=8729750 Map-Reduce Framework Map input records=223 Map output records=223 Input split bytes=608 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=413 CPU time spent (ms)=4910 Physical memory (bytes) snapshot=570736640 Virtual memory (bytes) snapshot=3342573568 Total committed heap usage (bytes)=526909440 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 (Job:1385) 2016-01-05 15:27:12,480 INFO - [main:] ~ Transferred 70.957 KB in 22.2127 seconds (3.1944 KB/sec) (ExportJobBase:301) 2016-01-05 15:27:12,485 INFO - [main:] ~ Exported 223 records. (ExportJobBase:303) 2016-01-05 15:27:12,486 INFO - [main:] ~ Starting to migrate data from staging table to destination. (ExportJobBase:453) 2016-01-05 15:27:12,488 INFO - [main:] ~ Migrated 223 records from `person_stg` to `person` (SqlManager:1089)
Step 3 — Verify data loaded into MySQL table successfully:
mysql> select count(*) from mydb.person; +----------+ | count(*) | +----------+ | 223 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from mydb.person_stg; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)
No comments:
Post a Comment