CommonLounge Archive

Sqoop Tutorial

March 09, 2018

Sqoop is a tool that uses the mapreduce framework to export data from relational databases into HDFS in a parallel fashion. That’s all that Sqoop does — it is a simple but effective tool. Let’s make a MySQL database and put it into HDFS using Sqoop.

Sqoop Example — MySQL, HDFS, and Hive

First thing we need to do is create a database in MySql.

mysql -u root -p

The password is cloudera.

create database people;
use people;
create table friends(id int, name varchar(20), nickname varchar(20));
desc friends;
mysql> desc friends;
+----------+-------------+------+-----+---------+-------+
| Field    | Type        | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| id       | int(11)     | YES  |     | NULL    |       |
| name     | varchar(20) | YES  |     | NULL    |       |
| nickname | varchar(20) | YES  |     | NULL    |       |
+----------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

Let’s insert a few values into this table:

mysql> INSERT INTO friends VALUES (1, "Donald Duck", "D. Duck");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO friends VALUES (2, "Goofy", "Goofy");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO friends VALUES (3, "Bruce Wayne", "Batman");
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO friends VALUES(4, "Clark Kent", "Superman");
Query OK, 1 row affected (0.01 sec)
Select * from friends;
+------+-------------+----------+
| id   | name        | nickname |
+------+-------------+----------+
|    1 | Donald Duck | D. Duck  |
|    2 | Goofy       | Goofy    |
|    3 | Bruce Wayne | Batman   |
|    4 | Clark Kent  | Superman |
+------+-------------+----------+
4 rows in set (0.00 sec)

Sqoop Example — Import MySQL Table into Hive

Next, we will import this into Hive using Sqoop.

sqoop import --connect jdbc:mysql://quickstart:3306/people --username root -P --split-by id --columns id,name,nickname --table friends --target-dir /sqoop/people/ --fields-terminated-by "," --hive-import --create-hive-table --hive-table moviedata.friends

Let’s jump into the Hive and query the table to see it.

beeline -u jdbc:hive2://quickstart:10000/default -n admin -d org.apache.hive.jdbc.HiveDriver
use moviedata;
select * from friends;
+-------------+---------------+-------------------+--+
| friends.id  | friends.name  | friends.nickname  |
+-------------+---------------+-------------------+--+
| 1           | Donald Duck   | D. Duck           |
| 2           | Goofy         | Goofy             |
| 3           | Bruce Wayne   | Batman            |
| 4           | Clark Kent    | Superman          |
+-------------+---------------+-------------------+--+
4 rows selected (0.201 seconds)

Well done. You have imported a MySQL table into Hive.

Sqoop Example — Export Hive Table to MySQL

Let’s try and export a Hive table into MySql. To prepare for the table, we must have a table created inside MySql so we are going to build a table inside the people database and create a table that we are going to populate with our ratings hive table. Let’s jump into it.

mysql -u root -p
#password cloudera
use people;
create table ratings(id int, movieid  int, rating INT, tstamp varchar(20));
Query OK, 0 rows affected (0.01 sec)
sqoop export --connect jdbc:mysql://quickstart:3306/people --username root --password cloudera --table ratings --export-dir /moviedata/ratings

Now the sqoop export command will take the data out of HDFS and put the data into an already defined table inside of a relational database.

Now we just need to jump into MySQL and check out the table that we just created.

mysql -u root -p
#password cloudera
use people;
select * from ratings;
| 3129 |    1136 |      1 | 969365204  |
| 3129 |     466 |      3 | 969281574  |
|  149 |    2002 |      2 | 977590434  |
| 3129 |    3826 |      3 | 969278845  |
|  149 |    1200 |      5 | 977588505  |
| 3129 |    3688 |      4 | 969291462  |
|  149 |    1202 |      3 | 977589370  |
| 3129 |    3689 |      3 | 969291537  |
|  149 |    3744 |      4 | 996939065  |
| 3129 |    2889 |      3 | 969281005  |
|  149 |    2006 |      4 | 978190099  |
| 3129 |     471 |      3 | 969280818  |
|  149 |    1204 |      5 | 977588131  |
| 3129 |     477 |      3 | 969279612  |
|  149 |    2871 |      4 | 977591435  |
| 3129 |    3690 |      2 | 969291537  |
|  149 |    1206 |      5 | 977588555  |

DON’T FORGET TO DESTROY YOUR DIGITAL OCEAN MACHINE!

Congratulations you have successfully imported and exported a Hive table using Sqoop.

Conclusion

Sqoop is a great way to import and export relational data in and out of Hadoop.


© 2016-2022. All rights reserved.