Sqoop Import

Our HadoopTpoint App is now available in google play store,please rate and comment it in play store : W3Schools

SQOOP IMPORT:

The sqoop import tool imports individual tables from RDBMS to HDFS.Sqoop import tool takes data from RDBMS with four mappers by default,to HDFS.In the process of importing sqoop provides java classes and jar file.

Sqoop  import <——->  either Database(tables) or Mainframe(datasets)

sqoop import

sqoop import

 

Sqoop tool ‘importis used to import table data from the table to the Hadoop file system as a text file or a binary file.

The following command is used to import the student table from MySQL database server to HDFS.

         sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –passwordhadoopmysql –table student

Every sqoop tool can use generic-args and import-args/export-args.

sqoop import (generic-args) (import-args)

sqoop-import (generic-args) (import-args)

MYSQL:

mysql -u root -p

enter password:hadoopmysql

mysql> create database sqoop1;

mysql> use sqoop1;

mysql> create table student(sname VARCHAR(10),sid int,sage int,colg varchar(10),addr varchar(10));

mysql> describe student;

mysql> load data local infile ‘/home/username/Desktop/stu’ into table student;

Sqoop Import From RDBMS to HDFS:

With Primary Key:

By default 4 mappers

       sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student

Note: The below query will get exception like -m1 or –split-by tools used if no primary key is found.

Without Primary Key

–split-by column_name for 4 mappers

       sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student –split-by sname

-m 1 for single mapper

sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student -m 1

Target directory:

We can specify the target directory while importing table data into HDFS.

Sqoop offers two parameters for specifying custom output directories: –target-dir

and –warehouse-dir . Use the –target-dir parameter to specify the directory on

HDFS where Sqoop should import your data.

sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student –split-by sname –target-dir /hadoop/tables/student

HDFS LOCATION:  /hadoop/tables/student

Warehouse-directory:

–warehouse-dir allows you to specify only the parent directory.

sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student –split-by sname –warehouse-dir /hadoop/tables/

HDFS LOCATION:  /hadoop/tables/student

Options-file:

–options-file option to pass confidential information. connection url,username and password for databases.

       sqoop import –options-file /path/of/the/file –table student –split-by sname –target-dir /user/username/sqoop-student

–table tool argument is tables with other objects like views,etc..

Columns tool:

–columns option like selecting partucular columns (select sname,sage from student)

sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student –m 1 –columns ‘sname,sage’ –target-dir /user/username/sqoop-student-columns

Where tool:

We can import the subset of table data using where tool.

–where option

    sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student –split-by sname –columns ‘sname’ –where “sname = hadoop” –target-dir /user/username/sqoop-student-where

Delimiters:

Default delimiter in hadoop is ‘,’ but we can change using –fields-terminated-by

sqoop import –connect jdbc:mysql://localhost:3306/sqoop1 –username root –password hadoopmysql –table student –split-by sname –fields-terminated-by ‘:’ –lines-terminated-by ‘\n’ –target-dir /user/username/sqoop1-student-customization

 

Data Formats:

   –as-textfile (By default)

   –as-sequencefile

   –as-avrodatafile

 

Comments

  1. Please tell me how to import data form db2

    • mahesh chimmiri says:

      If you want to connect with DB2 you must know Connect URL for Db2 server,Port number,Username,passwords and Database permissions

      Server=myAddress:myPortNumber;Database=myDataBase;UID=myUsername;PWD=myPassword;

      This is standerd way of Db2 Connection

Speak Your Mind

*