资源规划
| 组件 | bigdata-node1 | bigdata-node2 | bigdata-node3 |
|---|---|---|---|
| OS | centos7.6 | centos7.6 | centos7.6 |
| JDK | jvm | jvm | jvm |
| HDFS | NameNode/SecondaryNameNode/DataNode/JobHistoryServer/ApplicationHistoryServer | DataNode | DataNode |
| YARN | ResourceManager/NodeManager | NodeManager | NodeManager |
| Hive | HiveServer2/Metastore/CLI/Beeline | CLI/Beeline | CLI/Beeline |
| MySQL | N.A | N.A | MySQL-5.7.30 |
| Sqoop | sqoop | N.A | N.A |
安装介质
版本:sqoop-1.4.6.binhadoop-2.0.4-alpha.tar.gz
下载:[http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.binhadoop-2.0.4-alpha.tar.gz](http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz)
环境准备
安装Hadoop
安装MySQL
安装Sqoop
根据资源规划,Sqoop安装在节点bigdata-node1上。
解压缩
cd /sharewget http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gztar -zvxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C ~/modules/rm sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
配置
配置sqoop-env.sh:
cd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/confcp sqoop-env-template.sh sqoop-env.shvi ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf/sqoop-env.sh
配置如下:
# 末尾添加export HADOOP_COMMON_HOME=/home/vagrant/modules/hadoop-2.7.2export HADOOP_MAPRED_HOME=/home/vagrant/modules/hadoop-2.7.2export HIVE_HOME=/home/vagrant/modules/apache-hive-2.3.4-bin
加载JDBC驱动
拷贝jdbc驱动到sqoop的lib目录下,如:
cp mysql-connector-java-5.1.47.jar ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/
验证
cd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alphabin/sqoop help
测试Sqoop是否能够成功连接数据库。
cd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alphabin/sqoop list-databases --connect jdbc:mysql://bigdata-node3:3306/ --username root --password 123456
案例
数据准备
Hive库表
# 需提前启动HDFS和Yarn服务# 创建数据文件vi ~/datas/stu.txt
内容如下(注意:请检查确定列分割符为\t):
00001 zhangsan00002 lisi00003 wangwu00004 zhaoliu
创建库表并加载数据到Hive表:
cd ~/modules/apache-hive-2.3.4-bin/bin./hive# 创建表hive>> CREATE TABLE stu(id INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;# 加载数据hive>> load data local inpath '/home/vagrant/datas/stu.txt' into table stu;# 查看库表hive>> select * from stu;
MySQL库表
登录MySQL客户端:
# bigdata-node3(MySQL安装节点,root)source /etc/profilemysql -uroot -p123456
执行如下SQL:
-- 创建数据库create database `testdb` default character set utf8 collate utf8_general_ci;-- 切换数据库use testdb;-- 清除库表DROP TABLE IF EXISTS `stu`;-- 新建库表CREATE TABLE `stu` (`id` bigint(20) NOT NULL,`name` varchar(255) DEFAULT NULL,PRIMARY KEY (`id`));-- 授权grant all on testdb.* to root@'%' identified by '123456' with grant option;grant all privileges on testdb.* to 'root'@'%' identified by '123456' with grant option;flush privileges;quit;
脚本开发
创建一个.opt文件。
mkdir -p ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/jobscd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alphatouch jobs/job_Hive2MySQL.opt
编写sqoop脚本。
cd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alpharm -rf jobs/job_Hive2MySQL.optvi jobs/job_Hive2MySQL.opt
脚本内容如下:
export--connectjdbc:mysql://bigdata-node3:3306/testdb?useSSL=false--usernameroot--password123456--tablestu--num-mappers2--export-dir/user/hive/warehouse/stu/stu.txt--input-fields-terminated-by"\t"--input-null-string'\\N'--input-null-non-string'\\N'
执行该脚本。
# 请先启动HDFS、Yarn、MySQL等服务cd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alphabin/sqoop --options-file jobs/job_Hive2MySQL.opt
扩展
Sqoop导出模式。
Sqoop有多种导出方式(全量、增量、更新)等模式。
- 利用sqoop对mysql执行DML操作。
利用Sqoop对MySQL进行查询、添加、删除等操作。
# 删除(delete from)cd ~/modules/sqoop-1.4.6.bin__hadoop-2.0.4-alphabin/sqoop eval \--connect jdbc:mysql://bigdata-node3:3306/testdb?useSSL=false \--username root \--password 123456 \--query "delete from stu "# 清除(truncate table)bin/sqoop eval \--connect jdbc:mysql://bigdata-node3:3306/testdb?useSSL=false \--username root \--password 123456 \--query "truncate table stu "# 存储过程调用(call procedure_xxx())sqoop eval \--connect jdbc:mysql://bigdata-node3:3306/testdb?useSSL=false \--username root \--password 123456 \--query "call procedure_clear_stu()"
参考
CSDN:Sqoop导出模式——全量、增量insert、更新update的介绍以及脚本示例
https://blog.csdn.net/wiborgite/article/details/80991567
