环境概览

操作系统

Ubuntu Server 20.04.3 LTS 64 bit 操作系统

部署的版本

软件 版本 获取方法
OpenJDK 1.8.0_312 sudo apt update
sudo apt install openjdk-8-jdk
ZooKeeper 3.4.6 在ZooKeeper官网下载所需版本的软件包。
下载地址:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
Hadoop 3.1.3 在Hadoop官网下载所需版本的软件包。
下载地址:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.3/hadoop-3.1.3.tar.gz
Hive 3.1.0 在Hive官网下载所需版本的软件包。
下载地址:https://archive.apache.org/dist/hive/hive-3.1.0/apache-hive-3.1.0-bin.tar.gz

部署 ZooKeeper 和 Hadoop

Flink On Yarn 部署指南

部署 Hive

安装 MariaDB

  • 在 node01 上安装 MariaDB
    1. sudo apt update
    2. sudo apt install mariadb-server

一旦安装完成,MariaDB 服务将会自动启动

  1. sudo systemctl status mariadb
  • 配置开机自动启动

    1. systemctl enable mariadb
  • 为 hive 用户赋予全部权限并设置密码 ```bash mysql

mysql> GRANT ALL PRIVILEGES ON . TO ‘hive’@’localhost’ IDENTIFIED BY ‘hive’; mysql> GRANT ALL PRIVILEGES ON . TO ‘hive’@’%’ IDENTIFIED BY ‘hive’;

flush privileges;

  1. - 修改 MariaDB 配置文件允许远程登录,将 `127.0.0.1` 改为 `0.0.0.0`
  2. ```bash
  3. vim etc/mysql/mariadb.conf.d/50-server.cnf
  4. bind-address = 0.0.0.0"
  • 重启 MariaDB
    1. sudo service mariadb restart

下载并安装 Hive

  • 下载并解压 Hive

    1. cd /usr/local
    2. wget https://archive.apache.org/dist/hive/hive-3.1.0/apache-hive-3.1.0-bin.tar.gz
    3. tar -zxvf hadoop-3.1.3.tar.gz
  • 建立软链接,便于后期版本更换

    1. ln -s apache-hive-3.1.0-bin hive

添加 Hive 到环境变量

  • 打开配置文件

    1. vi /etc/profile
  • 添加 Hive 到环境变量

    1. export HIVE_HOME=/usr/local/hive
    2. export PATH=$HIVE_HOME/bin:$PATH
  • 使环境变量生效

    1. source /etc/profile

    修改 Hive 配置文件

    切换到 Hive 配置文件目录

    1. cd $HIVE_HOME/conf

    修改 hive-env.sh

    1. cp hive-env.sh.template hive-env.sh
    1. echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> hive-env.sh
    2. echo "export HADOOP_HOME=/usr/local/hadoop" >> hive-env.sh
    3. echo "export HIVE_CONF_DIR=/usr/local/hive/conf" >> hive-env.sh

    修改 hive-log4j2.properties 日志参数配置文件

    1. cp hive-log4j2.properties.template hive-log4j2.properties

    修改hive-log4j2.properties文件相关参数的值,如下所示:

    1. property.hive.log.dir = /usr/local/hive/log

    修改 hive-site.xml

    1. cp hive-default.xml.template hive-site.xml

使用以下命令将for&#替换成for,防止初始化时出现编码问题:

  1. sed -i 's/for&#/for/g' hive-site.xml

修改 hive-site.xml 文件相关参数的值,或者可以直接加在hive-site.xml的最后面,如下所示:
用户名密码以及 zookeeper 的 IP 和端口需要用户进行自定义配置

  1. <property>
  2. <name>javax.jdo.option.ConnectionURL</name>
  3. <value>jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true</value>
  4. </property>
  5. <property>
  6. <name>javax.jdo.option.ConnectionDriverName</name>
  7. <value>com.mysql.jdbc.Driver</value>
  8. </property>
  9. <property>
  10. <name>javax.jdo.option.ConnectionUserName</name>
  11. <value>hive</value>
  12. </property>
  13. <property>
  14. <name>javax.jdo.option.ConnectionPassword</name>
  15. <value>hive</value>
  16. <!-- 此处的密码是安装MariaDB时设置的数据库密码 -->
  17. </property>
  18. <property>
  19. <name>hive.exec.local.scratchdir</name>
  20. <value>/tmp/hive</value>
  21. </property>
  22. <property>
  23. <name>hive.downloaded.resources.dir</name>
  24. <value>/tmp/${hive.session.id}_resources</value>
  25. </property>
  26. <property>
  27. <name>hive.querylog.location</name>
  28. <value>/tmp/hive</value>
  29. </property>
  30. <property>
  31. <name>hive.server2.logging.operation.log.location</name>
  32. <value>/tmp/hive/operation_logs</value>
  33. </property>
  34. <property>
  35. <name>hive.tez.exec.print.summary</name>
  36. <value>true</value>
  37. </property>
  38. <property>
  39. <name>hive.tez.container.size</name>
  40. <value>10240</value>
  41. </property>
  42. <property>
  43. <name>hive.exec.dynamic.partition.mode</name>
  44. <value>nonstrict</value>
  45. </property>
  46. <property>
  47. <name>hive.exec.max.dynamic.partitions</name>
  48. <value>100000</value>
  49. </property>
  50. <property>
  51. <name>hive.exec.max.dynamic.partitions.pernode</name>
  52. <value>100000</value>
  53. </property>
  54. <property>
  55. <name>hive.exec.max.created.files</name>
  56. <value>1000000</value>
  57. </property>
  58. <property>
  59. <name>hive.execution.engine</name>
  60. <value>tez</value>
  61. </property>
  62. <property>
  63. <name>hive.zookeeper.quorum</name>
  64. <value>node01:2181,node02:2181,node03:2181</value>
  65. </property>

修改后完整的配置文件 hive-site.xml

启动与验证 Hive

启动 Hive 前的准备工作

  • 下载 JDBC ```bash cd /usr/local/hive/lib

wget https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar

  1. - 创建 Hive 数据存放目录
  2. ```bash
  3. hdfs dfs -mkdir /tmp
  4. hdfs dfs -mkdir -p /user/hive/warehouse
  5. hdfs dfs -chmod g+w /tmp
  6. hdfs dfs -chmod g+w /user/hive/warehouse
  • 创建 Hive 日志目录

    1. mkdir -p /usr/local/hive/log/
    2. touch /usr/local/hive/log/hiveserver2.log
    3. touch /usr/local/hive/log/hiveserver2.err

    初始化 Hive

  • 使用 hadoop 的guava.jar 替换掉 hive 中的版本 ```bash cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/hive/lib

rm /usr/local/hive/lib/guava-19.0.jar

  1. > 如果不执行这一步会报错,因为 hive hadoopguava.jar 版本不一致
  2. >
  3. - 初始化 Hive
  4. ```bash
  5. schematool -dbType mysql -initSchema

启动 hive metastore

  • 启动 hive metastore
    1. hive --service metastore -p 9083 &

启动 hiveserver2

  • 启动 hiveserver2

    1. nohup hiveserver2 1>/usr/local/hive/log/hiveserver2.log 2>/usr/local/hive/log/hiveserver2.err &
  • 查看启动进度 ```bash tail -f /usr/local/hive/log/hiveserver2.err

nohup: ignoring input which: no hbase in (/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) 2021-01-18 11:32:22: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 824030a3-2afe-488c-a2fa-7d98cfc8f7bd Hive Session ID = 1031e326-2088-4025-b2e2-c9bb1e81b03d Hive Session ID = 32203873-49ad-44b7-987c-da1aae8b3375 Hive Session ID = d7be9389-11c6-46cb-90d6-a91a2d5199b8 OK

  1. 启动会比较花时间,需要等一下
  2. - 查看端口,默认 hiveserver2 的端口号为 10000
  3. ```bash
  4. netstat -anp | grep 10000

如下所示即为启动成功

  1. tcp6 0 0 :::10000 :::* LISTEN 27800/java

在 node01 使用 beeline 连接

  • node01 使用 beeline 连接
    1. beeline -u jdbc:hive2://node01:10000 -n root

回显信息如下:

  1. SLF4J: Class path contains multiple SLF4J bindings.
  2. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  3. SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  5. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
  6. Connecting to jdbc:hive2://server1:10000
  7. Connected to: Apache Hive (version 3.1.0)
  8. Driver: Hive JDBC (version 3.1.0)
  9. Transaction isolation: TRANSACTION_REPEATABLE_READ
  10. Beeline version 3.1.0 by Apache Hive
  11. 0: jdbc:hive2://server1:10000>
  • 查看已创建的数据库
    1. 0: jdbc:hive2://server1:10000> show databases;
    2. INFO : Compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases
    3. INFO : Concurrency mode is disabled, not creating a lock manager
    4. INFO : Semantic Analysis Completed (retrial = false)
    5. INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
    6. INFO : Completed compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.903 seconds
    7. INFO : Concurrency mode is disabled, not creating a lock manager
    8. INFO : Executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases
    9. INFO : Starting task [Stage-0:DDL] in serial mode
    10. INFO : Completed executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.029 seconds
    11. INFO : OK
    12. INFO : Concurrency mode is disabled, not creating a lock manager
    13. +----------------+
    14. | database_name |
    15. +----------------+
    16. | default |
    17. +----------------+
    18. 1 row selected (1.248 seconds)

参考文档