资源规划

组件 bigdata-node1 bigdata-node2 bigdata-node3
OS centos7.6 centos7.6 centos7.6
JDK jvm jvm jvm
HDFS NameNode/SecondaryNameNode/DataNode/JobHistoryServer/ApplicationHistoryServer DataNode DataNode
YARN ResourceManager/NodeManager NodeManager NodeManager
Hive HiveServer2/Metastore/CLI/Beeline CLI/Beeline CLI/Beeline
MySQL N.A N.A MySQL Server

安装介质

版本:apache-hive-2.3.4-bin.tar.gz
下载:http://archive.apache.org/dist/hive

环境准备

安装Hadoop

参考:《CentOS7.6-安装Hadoop-2.7.2

安装MySQL

参考:《CentOS7.6-安装MySQL-5.7.30

Hive服务端安装

解压缩

  1. # 登录bigdata-node1节点
  2. cd /share
  3. wget http://archive.apache.org/dist/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz
  4. tar -zxvf apache-hive-2.3.4-bin.tar.gz -C ~/modules/
  5. rm apache-hive-2.3.4-bin.tar.gz

创建相关目录

  1. cd ~/modules/apache-hive-2.3.4-bin/conf
  2. cp hive-env.sh.template hive-env.sh
  3. cp hive-default.xml.template hive-site.xml
  4. cp hive-log4j2.properties.template hive-log4j2.properties
  5. mkdir ~/modules/apache-hive-2.3.4-bin/logs
  6. mkdir ~/modules/apache-hive-2.3.4-bin/tmpdir

配置Hive

  1. 配置hive-log4j2.properties。

    1. vi ~/modules/apache-hive-2.3.4-bin/conf/hive-log4j2.properties

    配置如下:

    1. # 日志目录需要提前创建
    2. property.hive.log.dir=/home/vagrant/modules/apache-hive-2.3.4-bin/logs
  2. 配置hive-env.sh。

    1. vi ~/modules/apache-hive-2.3.4-bin/conf/hive-env.sh

    配置如下:

    1. # 末尾添加
    2. export HADOOP_HOME=/home/vagrant/modules/hadoop-2.7.2
    3. export HIVE_CONF_DIR=/home/vagrant/modules/apache-hive-2.3.4-bin/conf
    4. export HIVE_AUX_JARS_PATH=/home/vagrant/modules/apache-hive-2.3.4-bin/lib

    3.配置hive-site.xml。

    1. rm -rf ~/modules/apache-hive-2.3.4-bin/conf/hive-site.xml
    2. vi ~/modules/apache-hive-2.3.4-bin/conf/hive-site.xml

    服务端配置:

    1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    3. <configuration>
    4. <property>
    5. <name>system:java.io.tmpdir</name>
    6. <value>/home/vagrant/modules/apache-hive-2.3.4-bin/tmpdir</value>
    7. </property>
    8. <property>
    9. <name>system:user.name</name>
    10. <value>vagrant</value>
    11. </property>
    12. <property>
    13. <name>hive.metastore.warehouse.dir</name>
    14. <value>/user/hive/warehouse</value>
    15. </property>
    16. <!-- Hive服务端配置 -->
    17. <property>
    18. <name>javax.jdo.option.ConnectionURL</name>
    19. <value>jdbc:mysql://bigdata-node3:3306/hive2_metadata?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    20. </property>
    21. <property>
    22. <name>javax.jdo.option.ConnectionDriverName</name>
    23. <value>com.mysql.jdbc.Driver</value>
    24. </property>
    25. <property>
    26. <name>javax.jdo.option.ConnectionUserName</name>
    27. <value>hive2</value>
    28. </property>
    29. <property>
    30. <name>javax.jdo.option.ConnectionPassword</name>
    31. <value>hive2</value>
    32. </property>
    33. </configuration>

    客户端配置:

    1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    3. <configuration>
    4. <property>
    5. <name>system:java.io.tmpdir</name>
    6. <value>/home/vagrant/modules/apache-hive-2.3.4-bin/tmpdir</value>
    7. </property>
    8. <property>
    9. <name>system:user.name</name>
    10. <value>vagrant</value>
    11. </property>
    12. <!-- Hive客户端配置 -->
    13. <property>
    14. <name>hive.metastore.warehouse.dir</name>
    15. <value>/user/hive/warehouse</value>
    16. </property>
    17. <property>
    18. <name>hive.metastore.uris</name>
    19. <value>thrift://bigdata-node1:9083</value>
    20. </property>
    21. <property>
    22. <name>hive.metastore.local</name>
    23. <value>false</value>
    24. </property>
    25. </configuration>

    配置Hadoop

    1.配置core-site.xml。

    1. vi ~/modules/hadoop-2.7.2/etc/hadoop/core-site.xml

    配置如下:

    1. <!-- hiveserver2增加了权限控制,需要在hadoop的配置 -->
    2. <property>
    3. <name>hadoop.proxyuser.vagrant.hosts</name>
    4. <value>*</value>
    5. </property>
    6. <property>
    7. <name>hadoop.proxyuser.vagrant.groups</name>
    8. <value>*</value>
    9. </property>

    2.配置hdfs-site.xml。

    1. vi ~/modules/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

    配置如下:

    1. <property>
    2. <name>dfs.webhdfs.enabled</name>
    3. <value>true</value>
    4. </property>

    修改完配置文件后注意分发到集群其他节点。

    1. scp -r ~/modules/hadoop-2.7.2/etc/hadoop/core-site.xml vagrant@bigdata-node2:~/modules/hadoop-2.7.2/etc/hadoop/
    2. scp -r ~/modules/hadoop-2.7.2/etc/hadoop/core-site.xml vagrant@bigdata-node3:~/modules/hadoop-2.7.2/etc/hadoop/
    3. scp -r ~/modules/hadoop-2.7.2/etc/hadoop/hdfs-site.xml vagrant@bigdata-node2:~/modules/hadoop-2.7.2/etc/hadoop/
    4. scp -r ~/modules/hadoop-2.7.2/etc/hadoop/hdfs-site.xml vagrant@bigdata-node3:~/modules/hadoop-2.7.2/etc/hadoop/

    之后在HDFS上创建Hive仓库存储目录。

    1. # 启动HDFS并创建Hive相关目录
    2. cd ~/modules/hadoop-2.7.2/
    3. # 创建Hive相关路径并赋权
    4. bin/hdfs dfs -mkdir -p /user/hive/warehouse
    5. bin/hdfs dfs -mkdir -p /user/hive/tmp
    6. bin/hdfs dfs -mkdir -p /user/hive/log
    7. bin/hdfs dfs -chmod -R 777 /user/hive/warehouse
    8. bin/hdfs dfs -chmod -R 777 /user/hive/tmp
    9. bin/hdfs dfs -chmod -R 777 /user/hive/log

    环境变量设置

    1. vi ~/.bashrc # :$到达行尾添加

    配置如下:

    1. export HIVE_HOME=/home/vagrant/modules/apache-hive-2.3.4-bin
    2. export PATH=$HIVE_HOME/bin:$PATH

    环境变量生效:

    1. source ~/.bashrc

    集成MySQL

    1.上传MySQL驱动包至${HIVE_HOME}/lib,推荐mysql-connector-java-5.1.40.jar以上版本。(下载地址:https://mvnrepository.com/artifact/mysql/mysql-connector-java

    1. cp /share/mysql-connector-java-5.1.47.jar /home/vagrant/modules/apache-hive-2.3.4-bin/lib/

    2.创建元数据库用户。

    1. # bigdata-node3(MySQL安装节点,root)
    2. source /etc/profile
    3. mysql -uroot -p123456
    4. CREATE USER 'hive2'@'%' IDENTIFIED BY 'hive2';
    5. # CREATE user 'hive2'@'localhost' IDENTIFIED BY 'hive2';
    6. GRANT ALL PRIVILEGES ON *.* TO 'hive2'@'%' WITH GRANT OPTION;
    7. GRANT ALL PRIVILEGES ON *.* TO 'hive2'@'localhost' with grant option;
    8. flush privileges;
    9. quit;

    3.初始化元数据,看到 schemaTool completed ,即初始化成功!

    1. cd /home/vagrant/modules/apache-hive-2.3.4-bin
    2. schematool -initSchema -dbType mysql -verbose

    4.元数据库授权。

    1. # bigdata-node3(MySQL安装节点,root)
    2. source /etc/profile
    3. mysql -uroot -p123456
    4. use mysql;
    5. select User, Host from user;
    6. update user set host='%' where host='localhost';
    7. -- delete from user where host='localhost' and User='hive2';
    8. -- 删除root用户的其他host(%之外)
    9. use hive2_metadata;
    10. grant all on hive2_metadata.* to hive2@'%' identified by 'hive2';
    11. grant all on hive2_metadata.* to hive2@localhost identified by 'hive2';
    12. ALTER DATABASE hive2_metadata CHARACTER SET latin1;
    13. flush privileges;
    14. quit;

    5.配置hive-site.xml(以上章节已配置)。
    6.无秘钥登录,确保本节点到集群其他节点无秘钥登录。
    7.Hive服务启动与测试

    1. # 需提前启动HDFS和Yarn服务
    2. # 创建数据文件
    3. vi ~/datas/stu.txt

    内容如下(注意:请检查确定列分割符为\t):

    1. 00001 zhangsan
    2. 00002 lisi
    3. 00003 wangwu
    4. 00004 zhaoliu

    创建库表并加载数据到Hive表:

    1. cd ~/modules/apache-hive-2.3.4-bin/bin
    2. ./hive
    3. # 打开debug模式
    4. ./hive -hiveconf hive.root.logger=DEBUG,console
    5. # 创建表
    6. hive>> CREATE TABLE stu(id INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
    7. # 加载数据
    8. hive>> load data local inpath '/home/vagrant/datas/stu.txt' into table stu;
    9. # 查看库表
    10. hive>> select * from stu;

    服务端验证

    CLI本地连接Hive

    1. cd ~/modules/apache-hive-2.3.4-bin
    2. # 启动CLI,将自动开启Metastore服务
    3. bin/hive
    4. # 查看数据库
    5. hive>> show databases;
    6. # 使用默认数据库
    7. hive>> use default;
    8. # 查看表
    9. hive>> show tables;

    Beeline本地连接Hive

    1. cd ~/modules/apache-hive-2.3.4-bin
    2. # 1)启动hiveserver2,将自动开启Metastore服务
    3. bin/hiveserver2 >/dev/null 2>&1 &
    4. # 检查hiveserver2是否正常启动
    5. ps -aux| grep hiveserver2
    6. # 检查hiveserver2端口
    7. # sudo yum install net-tools # netstat安装
    8. netstat -nl|grep 10000
    9. # 检查metastore是否正常启动
    10. ps -aux| grep metastore
    11. # 2)启动beeline。
    12. # 方式1
    13. bin/beeline
    14. beeline>> !connect jdbc:hive2://localhost:10000 hive2 hive2
    15. # 方式2
    16. bin/beeline -u jdbc:hive2://localhost:10000

    Hive客户端安装

    1. # 分发Hive到客户端节点
    2. scp -r ~/modules/apache-hive-2.3.4-bin vagrant@bigdata-node2:~/modules/
    3. scp -r ~/modules/apache-hive-2.3.4-bin vagrant@bigdata-node3:~/modules/

    注意:分发后需修改Hive配置(配置见上述相关章节内容)。

    客户端验证

    CLI远程连接Hive

    1. # 服务端需开启metastore服务
    2. cd ~/modules/apache-hive-2.3.4-bin
    3. bin/hive --service metastore >/dev/null 2>&1 &
    4. # 启动客户端CLI
    5. bin/hive

    Beeline远程连接Hive

    1. # 服务端需开启HiveServer2服务(自动开启Metastore服务)
    2. cd ~/modules/apache-hive-2.3.4-bin
    3. # 启动客户端Beeline
    4. # 方式1
    5. bin/beeline
    6. beeline>> !connect jdbc:hive2://bigdata-node1:10000 hive2 hive2
    7. # 方式2
    8. bin/beeline -u jdbc:hive2://bigdata-node1:10000