环境概览
操作系统
Ubuntu Server 20.04.3 LTS 64 bit 操作系统
部署的版本
软件 | 版本 | 获取方法 |
---|---|---|
OpenJDK | 1.8.0_312 | sudo apt update sudo apt install openjdk-8-jdk |
ZooKeeper | 3.4.6 | 在ZooKeeper官网下载所需版本的软件包。 下载地址:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz |
Hadoop | 3.1.3 | 在Hadoop官网下载所需版本的软件包。 下载地址:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.3/hadoop-3.1.3.tar.gz |
Hive | 3.1.0 | 在Hive官网下载所需版本的软件包。 下载地址:https://archive.apache.org/dist/hive/hive-3.1.0/apache-hive-3.1.0-bin.tar.gz |
部署 ZooKeeper 和 Hadoop
部署 Hive
安装 MariaDB
- 在 node01 上安装 MariaDB
sudo apt update
sudo apt install mariadb-server
一旦安装完成,MariaDB 服务将会自动启动
sudo systemctl status mariadb
配置开机自动启动
systemctl enable mariadb
为 hive 用户赋予全部权限并设置密码 ```bash mysql
mysql> GRANT ALL PRIVILEGES ON . TO ‘hive’@’localhost’ IDENTIFIED BY ‘hive’; mysql> GRANT ALL PRIVILEGES ON . TO ‘hive’@’%’ IDENTIFIED BY ‘hive’;
flush privileges;
- 修改 MariaDB 配置文件允许远程登录,将 `127.0.0.1` 改为 `0.0.0.0`
```bash
vim etc/mysql/mariadb.conf.d/50-server.cnf
bind-address = 0.0.0.0"
- 重启 MariaDB
sudo service mariadb restart
下载并安装 Hive
下载并解压 Hive
cd /usr/local
wget https://archive.apache.org/dist/hive/hive-3.1.0/apache-hive-3.1.0-bin.tar.gz
tar -zxvf hadoop-3.1.3.tar.gz
建立软链接,便于后期版本更换
ln -s apache-hive-3.1.0-bin hive
添加 Hive 到环境变量
打开配置文件
vi /etc/profile
添加 Hive 到环境变量
export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$PATH
使环境变量生效
source /etc/profile
修改 Hive 配置文件
切换到 Hive 配置文件目录
cd $HIVE_HOME/conf
修改 hive-env.sh
cp hive-env.sh.template hive-env.sh
echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> hive-env.sh
echo "export HADOOP_HOME=/usr/local/hadoop" >> hive-env.sh
echo "export HIVE_CONF_DIR=/usr/local/hive/conf" >> hive-env.sh
修改 hive-log4j2.properties 日志参数配置文件
cp hive-log4j2.properties.template hive-log4j2.properties
修改hive-log4j2.properties文件相关参数的值,如下所示:
property.hive.log.dir = /usr/local/hive/log
修改 hive-site.xml
cp hive-default.xml.template hive-site.xml
使用以下命令将for&#替换成for,防止初始化时出现编码问题:
sed -i 's/for&#/for/g' hive-site.xml
修改 hive-site.xml 文件相关参数的值,或者可以直接加在hive-site.xml的最后面,如下所示:
用户名密码以及 zookeeper 的 IP 和端口需要用户进行自定义配置
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<!-- 此处的密码是安装MariaDB时设置的数据库密码 -->
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/${hive.session.id}_resources</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hive</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/hive/operation_logs</value>
</property>
<property>
<name>hive.tez.exec.print.summary</name>
<value>true</value>
</property>
<property>
<name>hive.tez.container.size</name>
<value>10240</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions</name>
<value>100000</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions.pernode</name>
<value>100000</value>
</property>
<property>
<name>hive.exec.max.created.files</name>
<value>1000000</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>node01:2181,node02:2181,node03:2181</value>
</property>
修改后完整的配置文件 hive-site.xml
启动与验证 Hive
启动 Hive 前的准备工作
- 下载 JDBC ```bash cd /usr/local/hive/lib
- 创建 Hive 数据存放目录
```bash
hdfs dfs -mkdir /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse
创建 Hive 日志目录
mkdir -p /usr/local/hive/log/
touch /usr/local/hive/log/hiveserver2.log
touch /usr/local/hive/log/hiveserver2.err
初始化 Hive
使用 hadoop 的guava.jar 替换掉 hive 中的版本 ```bash cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/hive/lib
rm /usr/local/hive/lib/guava-19.0.jar
> 如果不执行这一步会报错,因为 hive 和 hadoop的guava.jar 版本不一致
>
- 初始化 Hive
```bash
schematool -dbType mysql -initSchema
启动 hive metastore
- 启动 hive metastore
hive --service metastore -p 9083 &
启动 hiveserver2
启动 hiveserver2
nohup hiveserver2 1>/usr/local/hive/log/hiveserver2.log 2>/usr/local/hive/log/hiveserver2.err &
查看启动进度 ```bash tail -f /usr/local/hive/log/hiveserver2.err
nohup: ignoring input which: no hbase in (/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) 2021-01-18 11:32:22: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 824030a3-2afe-488c-a2fa-7d98cfc8f7bd Hive Session ID = 1031e326-2088-4025-b2e2-c9bb1e81b03d Hive Session ID = 32203873-49ad-44b7-987c-da1aae8b3375 Hive Session ID = d7be9389-11c6-46cb-90d6-a91a2d5199b8 OK
启动会比较花时间,需要等一下
- 查看端口,默认 hiveserver2 的端口号为 10000
```bash
netstat -anp | grep 10000
如下所示即为启动成功
tcp6 0 0 :::10000 :::* LISTEN 27800/java
在 node01 使用 beeline 连接
- node01 使用 beeline 连接
beeline -u jdbc:hive2://node01:10000 -n root
回显信息如下:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://server1:10000
Connected to: Apache Hive (version 3.1.0)
Driver: Hive JDBC (version 3.1.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0 by Apache Hive
0: jdbc:hive2://server1:10000>
- 查看已创建的数据库
0: jdbc:hive2://server1:10000> show databases;
INFO : Compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.903 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.029 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (1.248 seconds)
参考文档