命令

sqoop级命令

$ bin/sqoop help
usage: sqoop COMMAND [ARGS]
Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

sqoop下级import命令

$ bin/sqoop help import # 或者 import --help
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
   --connect <jdbc-uri>                         Specify JDBC connect
                                                string
   --connection-manager <class-name>            Specify connection manager
                                                class name
   --connection-param-file <properties-file>    Specify connection
                                                parameters file
   --driver <class-name>                        Manually specify JDBC
                                                driver class to use
   --hadoop-home <hdir>                         Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home <dir>                   Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password <password>                        Set authentication
                                                password
   --password-alias <password-alias>            Credential provider
                                                password alias
   --password-file <password-file>              Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --username <username>                        Set authentication
                                                username
   --verbose                                    Print more information
                                                while working
Import control arguments:
   --append                                                   Imports data
                                                              in append
                                                              mode
   <追加，导入数据时，每次导入的数据以另外的文件保存>
   --as-avrodatafile                                          Imports data
                                                              to Avro data
                                                              files
   --as-parquetfile                                           Imports data
                                                              to Parquet
                                                              files
   --as-sequencefile                                          Imports data
                                                              to
                                                              SequenceFile
                                                              s
   --as-textfile                                              Imports data
                                                              as plain
                                                              text
                                                              (default)
   --boundary-query <statement>                               Set boundary
                                                              query for
                                                              retrieving
                                                              max and min
                                                              value of the
                                                              primary key
   --columns <col,col,col...>                                 Columns to
                                                              import from
                                                              table
   --compression-codec <codec>                                Compression
                                                              codec to use
                                                              for import
   --delete-target-dir                                        Imports data
                                                              in delete
                                                              mode
   <删除模式，导入时，删除目标目录，"--append" and "--delete-target-dir" 不能一起使用>
   --direct                                                   Use direct
                                                              import fast
                                                              path
   --direct-split-size <n>                                    Split the
                                                              input stream
                                                              every 'n'
                                                              bytes when
                                                              importing in
                                                              direct mode
-e,--query <statement>                                        Import
                                                              results of
                                                              SQL
                                                              'statement'
   --fetch-size <n>                                           Set number
                                                              'n' of rows
                                                              to fetch
                                                              from the
                                                              database
                                                              when more
                                                              rows are
                                                              needed
   --inline-lob-limit <n>                                     Set the
                                                              maximum size
                                                              for an
                                                              inline LOB
-m,--num-mappers <n>                                          Use 'n' map
                                                              tasks to
                                                              import in
                                                              parallel
   --mapreduce-job-name <name>                                Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --merge-key <column>                                       Key column
                                                              to use to
                                                              join results
   <指定某一列，作为join结果的键，整合表下的多个数据文件part-m-00000、part-m-00001等，合并成一个>
   --split-by <column-name>                                   Column of
                                                              the table
                                                              used to
                                                              split work
                                                              units
   --table <table-name>                                       Table to
                                                              read
   --target-dir <dir>                                         HDFS plain
                                                              table
                                                              destination
   <导入时，HDFS中表数据的存储目录。HDFS普通表目的地>
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler <validation-failurehandler>    Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold <validation-threshold>              Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator <validator>                                    Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator
   --warehouse-dir <dir>                                      HDFS parent
                                                              for table
                                                              destination
   --where <where clause>                                     WHERE clause
                                                              to use
                                                              during
                                                              import
-z,--compress                                                 Enable
                                                              compression
Incremental import arguments:
   --check-column <column>        Source column to check for incremental
                                  change
   --incremental <import-type>    Define an incremental import of type
                                  'append' or 'lastmodified'
   <增量导入类型>
   <append 追加，追加文件的形式，放入表目录下>
   <lastmodified 最后修改，需要几个条件>
      <--check-column指定列类型是timestamp或date>
      <需额外添加import控制参数：--append或--merge-key>
   --last-value <value>           Last imported value in the incremental
                                  check column
   <查找上限与下限边界的值，Lower bound, Lower bound>
Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing
                                      character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: \n  escaped-by: \
                                      optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character
Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character
Hive arguments:
   --create-hive-table                         Fail if the target hive
                                               table exists
   --hive-database <database-name>             Sets the database name to
                                               use when importing to hive
   --hive-delims-replacement <arg>             Replace Hive record \0x01
                                               and row delimiters (\n\r)
                                               from imported string fields
                                               with user-defined string
   <替换导入的字符串字段中的十六进制字符“\0x01”和行分隔符“\n\r”>
   --hive-drop-import-delims                   Drop Hive record \0x01 and
                                               row delimiters (\n\r) from
                                               imported string fields
   <与hive-delims-replacement冲突，去除“\0x01”和“\n\r”>
   --hive-home <dir>                           Override $HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key <partition-key>        Sets the partition key to
                                               use when importing to hive
   --hive-partition-value <partition-value>    Sets the partition value to
                                               use when importing to hive
   --hive-table <table-name>                   Sets the table name to use
                                               when importing to hive
   --map-column-hive <arg>                     Override mapping for
                                               specific column to hive
                                               types.
HBase arguments:
   --column-family <family>    Sets the target column family for the
                               import
   --hbase-bulkload            Enables HBase bulk loading
   --hbase-create-table        If specified, create missing HBase tables
   --hbase-row-key <col>       Specifies which input column to use as the
                               row key
   --hbase-table <table>       Import to <table> in HBase
HCatalog arguments:
   --hcatalog-database <arg>                        HCatalog database name
   --hcatalog-home <hdir>                           Override $HCAT_HOME
   --hcatalog-partition-keys <partition-key>        Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values <partition-value>    Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table <arg>                           HCatalog table name
   --hive-home <dir>                                Override $HIVE_HOME
   --hive-partition-key <partition-key>             Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value <partition-value>         Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive <arg>                          Override mapping for
                                                    specific column to
                                                    hive types.
HCatalog import specific options:
   --create-hcatalog-table            Create HCatalog before import
   --hcatalog-storage-stanza <arg>    HCatalog storage stanza for table
                                      creation
Accumulo arguments:
   --accumulo-batch-size <size>          Batch size in bytes
   --accumulo-column-family <family>     Sets the target column family for
                                         the import
   --accumulo-create-table               If specified, create missing
                                         Accumulo tables
   --accumulo-instance <instance>        Accumulo instance name.
   --accumulo-max-latency <latency>      Max write latency in milliseconds
   --accumulo-password <password>        Accumulo password.
   --accumulo-row-key <col>              Specifies which input column to
                                         use as the row key
   --accumulo-table <table>              Import to <table> in Accumulo
   --accumulo-user <user>                Accumulo user name.
   --accumulo-visibility <vis>           Visibility token to be applied to
                                         all rows imported
   --accumulo-zookeepers <zookeepers>    Comma-separated list of
                                         zookeepers (host:port)
Code generation arguments:
   --bindir <dir>                        Output directory for compiled
                                         objects
   --class-name <name>                   Sets the generated class name.
                                         This overrides --package-name.
                                         When combined with --jar-file,
                                         sets the input class.
   --input-null-non-string <null-str>    Input null non-string
                                         representation
   --input-null-string <null-str>        Input null string representation
   --jar-file <file>                     Disable code generation; use
                                         specified jar
   --map-column-java <arg>               Override mapping for specific
                                         columns to java types
   --null-non-string <null-str>          Null non-string representation
   --null-string <null-str>              Null string representation
   --outdir <dir>                        Output directory for generated
                                         code
   --package-name <name>                 Put auto-generated classes in
                                         this package
Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
At minimum, you must specify --connect and --table
Arguments to mysqldump and other subprograms may be supplied
after a '--' on the command line.

数据导入流程分析

导入hive分两步：导入用户目录下/user/jack/，创建table表目录，写入数据。导入hive表。

第一步：导入mysql.help_keyword的数据到hdfs的默认路径
第二步：自动仿造mysql.help_keyword去创建一张hive表, 创建在默认的default库中
第三步：把临时目录中的数据导入到hive表中

数据库连接
import目录
分隔符
import模式
map数量

追加模式导入(—append)

两次导入数据。使用—append参数时，追加保存数据文件。

$ mysql -uroot -p123456
mysql> select * from user;
+----+-------+
| id | name  |
+----+-------+
|  1 | jack  |
|  2 | tom   |
|  3 | white |
|  4 | black |
+----+-------+
# 删除目录数据
$ ~/Documents/hadoop/bin/hadoop fs -rm -r -f /user/sqoop/user
# 第1次导入
$ bin/sqoop import \
--connect jdbc:mysql://master:3306/mydb \
--username root \
--password 123456 \
--table user \
--append \
--target-dir /user/sqoop/user \
--mysql-delimiters \
--where 'id < 3' \
--m 1
# 第二次导入
$ bin/sqoop import \
--connect jdbc:mysql://master:3306/mydb \
--username root \
--password 123456 \
--table user \
--append \
--target-dir /user/sqoop/user \
--mysql-delimiters \
--where 'id > 2 and id < 5' \
--m 1
# 查看结果
$ ~/Documents/hadoop/bin/hadoop fs -ls /user/sqoop/user
Found 2 items
-rw-r--r--   1 jack supergroup         13 2020-05-18 22:52 /user/sqoop/user/part-m-00000
-rw-r--r--   1 jack supergroup         16 2020-05-18 22:53 /user/sqoop/user/part-m-00001
$ ~/Documents/hadoop/bin/hadoop fs -cat /user/sqoop/user/part-m-00000
1,jack
2,tom
$ ~/Documents/hadoop/bin/hadoop fs -cat /user/sqoop/user/part-m-00001
3,white
4,black

增量更新id(int)

—last-value 需要按照实际情况更改。以时间（date、timestamp）或数字类型（id）的列来作为增量导入的判断标准时，会以指定的下界为标准，不按实际更新，可能会引发重复导入的问题。以时间类型的列作为判断标准时，超过机器当前时区的时间时，可能会忽略超过当前事件的行数据。
—incremental 指定增量模式，append模式（—incremental append或—incremental lastmodified —append）下，增量数据以另一个文件的形式保存在相同位置；merge模式（—incremental lastmodified —merge-key）下，以指定的列作为键，增量导入数据之后，合并所有文件。

导入第一批数据

$ mysql -uroot -p123456
mysql> select * from user;
+----+-------+
| id | name  |
+----+-------+
|  1 | jack  |
|  2 | tom   |
|  3 | white |
|  4 | black |
+----+-------+
# 导入一部分，预先
$ vi opts/mysql_import.opt
import 
--connect 
jdbc:mysql://master:3306/mydb
--username
root
--password 
123456 
--table 
user
--delete-target-dir
--hive-import
--hive-overwrite
--mysql-delimiters
--where
'id < 3'
--m 
1
# --delete-target-dir 导入hive失败时，删除写入临时目录的数据
# --mysql-delimiters 使用MySQL的默认delimiter集合:
#     [fields: ,] [lines: \n] [escaped-by: \] [optionally-enclosed-by: ']
#     
# 执行sqoop
$ bin/sqoop --options-file opts/mysql_import.opt
# 查看数据
$ ~/Documents/hive/bin/hive
hive> select * from user;
OK
1       jack
2       tom
hive> dfs -cat /user/hive/warehouse/user/*;
1,jack
2,tom

增量导入(append)

# 修改sqoop脚本文件，增量导入
$ vi opts/mysql_import_incremental.opt
import
--connect
jdbc:mysql://master:3306/mydb
--username
root
--password
123456
--table
user
--m
1
--target-dir
/user/hive/warehouse/user
--mysql-delimiters
--incremental
append
--check-column
id
--last-value
2
# --hive-import 不能与 append混用，所以需要指定target-dir
# incremental import模式，append方式，按照last-value的值来导入
# 执行sqoop
$ bin/sqoop --options-file opts/mysql_import_incremental.opt
# 查看数据，append文件
$ ~/Documents/hive/bin/hive
hive> dfs -ls /user/hive/warehouse/user;
Found 2 items
-rwxrwxr-x   1 jack supergroup         13 2020-05-14 18:30 /user/hive/warehouse/user/part-m-00000
-rw-r--r--   1 jack supergroup         37 2020-05-14 18:36 /user/hive/warehouse/user/part-m-00001
hive> dfs -cat /user/hive/warehouse/user/*;
1,jack
2,tom
3,white
4,black
hive> select * from user;                  
OK
1       jack
2       tom
3       white
4       black
# 再次执行，不对--last-value进行修改，依旧为2，结果如下所示
# 此时check-column指定起始边界，所以上限与下限编辑诶依旧是 2~4，重复导入3、4数据。
hive> select * from user;                  
OK
1       jack
2       tom
3       white
4       black
3       white
4       black

增量导入(lastmodified)

mysql> CREATE TABLE `log` (
    -> `log_id` char(12) PRIMARY KEY,
    -> `create_time` timestamp,
    -> `content` varchar(255)) 
mysql> INSERT INTO `log` VALUES ('abcabcabcabc', '2020-05-15 12:56:28', 'aaaaaaa');
mysql> INSERT INTO `log` VALUES ('abcabcabcab1', '2020-05-20 12:56:32', 'bbbbbbb');
mysql> INSERT INTO `log` VALUES ('abcabcabcab2', '2020-05-30 12:56:37', 'ccccccc');
# 初始化hive表和数据
$ vi opts/mysql_import_time.opt
import
--connect
jdbc:mysql://master:3306/mydb
--username
root
--password
123456
--table
log
--m
1
--delete-target-dir
--hive-import
--mysql-delimiters
# 执行，查看数据
$ bin/sqoop --options-file opts/mysql_import_time.opt
$ ~/Documents/hive/bin/hive
hive> select * from log;
OK
abcabcabcabc    2020-05-15 12:56:28.0   aaaaaaa
abcabcabcab1    2020-05-20 12:56:32.0   bbbbbbb
abcabcabcab2    2020-05-30 12:56:37.0   ccccccc
Time taken: 0.078 seconds, Fetched: 4 row(s)
hive> dfs -ls /user/hive/warehouse/log;
Found 1 items
-rwxrwxr-x   1 jack supergroup        129 2020-05-14 22:51 /user/hive/warehouse/log/part-m-00000
# --hive-import 不能与 append混用，所以需要指定target-dir
# incremental import模式，
# lastmodified导入类型，以“时间”类型列，来比较增量导入
# 缺少--append或--merge-key，指定数据的导入方式
# --append 追加（文件），新增数据时，追加part-m-00000格式文件
# --merge-key 合并（文件），新增数据时，合并part-m-00000格式文件
# --merge-key or --append is required when using --incremental lastmodified and the output directory exists.
# 报错：Column type is neither timestamp nor date!
# 原因，--check-column指定列，只能是timestamp或date类型
# 列类型错误，修改last-value的值，4 -> "2020-01-02 22:20:38" 
# timestamp超过机器的现实时间，也不会读取，追加写入，
# 当前机器时间实际是2020-05-14 22:52:28，那么追加之后的数据就不会成功
# 先前初始化导入的数据没有问题。
# append模式
# 新增数据，理论导出abcabcabcab3，不导入abcabcabcab4，因为机器时间未到（具体查看源码实现，我没看，猜的）
mysql> INSERT INTO `log` (`log_id`, `create_time`, `content`) VALUES 
    -> ('abcabcabcab3', '2020-05-07 13:44:38', 'dddddd');
mysql> INSERT INTO `log` (`log_id`, `create_time`, `content`) VALUES 
    -> ('abcabcabcab4', '2020-05-31 13:19:58', 'dddddddd');
# 修改sqoop脚本文件，增量导入
$ vi opts/mysql_import_time_incremental.opt
import
--connect
jdbc:mysql://master:3306/mydb
--username
root
--password
123456
--table
log
--m
1
--target-dir
/user/hive/warehouse/log
--mysql-delimiters
--incremental
lastmodified
--check-column
create_time
--last-value
"2020-01-14 22:34:53.0"
--append
# 排除已有的，找到已更新的，且合理的（timestamp小于等于实际时间），只有abcabcabcab3。
# 执行sqoop
$ bin/sqoop --options-file opts/mysql_import_time_incremental.opt
# 查看数据，append文件
$ ~/Documents/hive/bin/hive
hive> dfs -ls /user/hive/warehouse/log;
Found 2 items
-rwxrwxr-x   1 jack supergroup        129 2020-05-14 22:51 /user/hive/warehouse/log/part-m-00000
-rw-r--r--   1 jack supergroup         42 2020-05-14 22:53 /user/hive/warehouse/log/part-m-00001
hive> select * from log;
OK
abcabcabcabc    2020-05-15 12:56:28.0   aaaaaaa
abcabcabcab1    2020-05-20 12:56:32.0   bbbbbbb
abcabcabcab2    2020-05-30 12:56:37.0   ccccccc
abcabcabcab3    2020-05-07 13:44:38.0   dddddd
# merge-key模式
# 该模式下，将整合所有文件为一个文件。
# 新增数据，理论导出 abcabcabcab5 与 abcabcabcab6
mysql> INSERT INTO `log` (`log_id`, `create_time`, `content`) VALUES 
    -> ('abcabcabcab5', '2020-05-08 13:44:38', 'dddddd');
mysql> INSERT INTO `log` (`log_id`, `create_time`, `content`) VALUES 
    -> ('abcabcabcab6', '2020-05-09 13:19:58', 'dddddddd');
# 修改sqoop脚本文件，增量导入
$ vi opts/mysql_import_time_incremental.opt
import
--connect
jdbc:mysql://master:3306/mydb
--username
root
--password
123456
--table
log
--m
1
--target-dir
/user/hive/warehouse/log
--mysql-delimiters
--incremental
lastmodified
--check-column
create_time
--last-value
"2020-01-02 22:20:38"
--merge-key
log_id
# 执行sqoop
$ bin/sqoop --options-file opts/mysql_import_time_incremental.opt
# 查看数据，append文件
$ ~/Documents/hive/bin/hive
hive> dfs -ls /user/hive/warehouse/log;
Found 2 items
-rw-r--r--   1 jack supergroup          0 2020-05-14 22:56 /user/hive/warehouse/log/_SUCCESS
-rw-r--r--   1 jack supergroup        257 2020-05-14 22:56 /user/hive/warehouse/log/part-r-00000
hive> select * from log;
OK
abcabcabcab1    2020-05-20 12:56:32.0   bbbbbbb
abcabcabcab2    2020-05-30 12:56:37.0   ccccccc
abcabcabcab3    2020-05-07 13:44:38.0   dddddd
abcabcabcab5    2020-05-08 13:44:38.0   dddddd
abcabcabcab6    2020-05-09 13:19:58.0   dddddddd
abcabcabcabc    2020-05-15 12:56:28.0   aaaaaaa
# 重复导入
# 以append模式，再次导入，修改--merge-key为--append
hive> dfs -ls /user/hive/warehouse/log;                    
Found 3 items
-rw-r--r--   1 jack supergroup          0 2020-05-14 22:56 /user/hive/warehouse/log/_SUCCESS
-rw-r--r--   1 jack supergroup        128 2020-05-14 23:17 /user/hive/warehouse/log/part-m-00001
-rw-r--r--   1 jack supergroup        257 2020-05-14 22:56 /user/hive/warehouse/log/part-r-00000
hive> dfs -cat /user/hive/warehouse/log/*;
abcabcabcab3,2020-05-07 13:44:38.0,dddddd
abcabcabcab5,2020-05-08 13:44:38.0,dddddd
abcabcabcab6,2020-05-09 13:19:58.0,dddddddd
abcabcabcab1,2020-05-20 12:56:32.0,bbbbbbb
abcabcabcab2,2020-05-30 12:56:37.0,ccccccc
abcabcabcab3,2020-05-07 13:44:38.0,dddddd
abcabcabcab5,2020-05-08 13:44:38.0,dddddd
abcabcabcab6,2020-05-09 13:19:58.0,dddddddd
abcabcabcabc,2020-05-15 12:56:28.0,aaaaaaa
# 合并，去除重复
# 以merge模式，再次导入，修改--append为--merge-key
hive> select * from log;                  
OK
abcabcabcab1    2020-05-20 12:56:32.0   bbbbbbb
abcabcabcab2    2020-05-30 12:56:37.0   ccccccc
abcabcabcab3    2020-05-07 13:44:38.0   dddddd
abcabcabcab5    2020-05-08 13:44:38.0   dddddd
abcabcabcab6    2020-05-09 13:19:58.0   dddddddd
abcabcabcabc    2020-05-15 12:56:28.0   aaaaaaa
Time taken: 0.055 seconds, Fetched: 6 row(s)
hive> dfs -ls /user/hive/warehouse/log;   
Found 2 items
-rw-r--r--   1 jack supergroup          0 2020-05-14 23:23 /user/hive/warehouse/log/_SUCCESS
-rw-r--r--   1 jack supergroup        257 2020-05-14 23:23 /user/hive/warehouse/log/part-r-00000

sqoop任务

创建job

增量导入hive，之前，需创建hive表，或提前导入部分数据，自动创建hive表。

# 初始化数据
$ mysql -uroot -p123456
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `name` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `user` VALUES ('1', 'jack');
INSERT INTO `user` VALUES ('2', 'tom');
INSERT INTO `user` VALUES ('3', 'white');
INSERT INTO `user` VALUES ('4', 'black');
$ bin/sqoop job \
--create user_import_incr \
--meta-connect jdbc.mysql://master:3306/sqoop \
--import \
--connect jdbc:mysql://node03:3306/userdb \
--username root --password 123456 \
--table emp \
--incremental append \
--check-column id \
--last-value 1202 \
--target-dir /sqoop/increment/emp \
-m 1

参考

未开启hive元数据服务，数据加载到hive时失败

# 启动hive元数据服务，配置元数据服务时才需要
$ ~/Documents/hive/bin/hive --service metastore
# hive元数据服务未启动，导入失败，删除第一步的临时存放目录
$ ~/Documents/hadoop/bin/hdfs dfs -rm -r -f /user/jack/user
# 手动加载临时数据文件到hive表中