0. 目标
- 基于 hadoop docker 集群搭建 ,在 hd01 搭建安装 hive 服务端
实现用 mysql 存储 metastore 和 hiveserver2 的搭建
1. 各文件准备
dockerfileHDFS
- docker-compose.yml
- element/ 目录
FROM centos:7
# SSH
RUN yum install -y openssh-server sudo
RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
RUN echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
RUN yum install -y openssh-clients
RUN echo "root:123456" | chpasswd
RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN mkdir /var/run/sshd
# JAVA
ADD element/images/jdk-8u141-linux-x64.tar.gz /usr/local/
RUN mv /usr/local/jdk1.8.0_141 /usr/local/jdk1.8
ENV JAVA_HOME /usr/local/jdk1.8
ENV PATH $JAVA_HOME/bin:$PATH
# HADOOP
ADD element/images/hadoop-3.3.0.tar.gz /usr/local
RUN mv /usr/local/hadoop-3.3.0 /usr/local/hadoop
ENV HADOOP_HOME /usr/local/hadoop
ENV PATH $HADOOP_HOME/bin:$PATH
# HIVE
ADD element/images/apache-hive-3.1.2-bin.tar.gz /usr/local
RUN mv /usr/local/apache-hive-3.1.2-bin /usr/local/hive-3.1.2
ENV HIVE_HOME /usr/local/hive-3.1.2
ENV PATH $HIVE_HOME/bin:$PATH
RUN yum install -y which sudo
RUN mkdir /mysh
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
1.2 docker-compose.yml
看到这份 docker-compose.yml 比 hadoop 集群增加的主要内容,可以看出,增加了一个 mysql 容器,而不是响应的增加 hive,因为:
- hive 是在 hd01 中启动了,在
volumes:
中体现了这一点。 - mysql 是 hive 中的 metastore 需要使用到的。
version: '3.5' services: hd01: image: my-hadoop:3.3.0 container_name: hd01 hostname: hd01 extra_hosts: - "hd-mysql57-01:172.24.0.6" - "hd02:172.24.0.12" - "hd03:172.24.0.13" networks: hd-network: ipv4_address: 172.24.0.11 volumes: - ${PWD}/element/configure/hadoop/etc-hd01:/usr/local/hadoop/etc - ${PWD}/element/configure/hadoop/dokshare:/usr/local/hadoop/dokshare - ${PWD}/element/configure/hive-3.1.2/conf:/usr/local/hive-3.1.2/conf - ${PWD}/element/configure/hive-3.1.2/lib:/usr/local/hive-3.1.2/lib - ${PWD}/element/configure/hive-3.1.2/dokshare:/usr/local/hive-3.1.2/dokshare - ${PWD}/element/mysh:/mysh environment: - HDFS_NAMENODE_USER=root - HDFS_DATANODE_USER=root - HDFS_SECONDARYNAMENODE_USER=root - YARN_RESOURCEMANAGER_USER=root - YARN_NODEMANAGER_USER=root hd02: image: my-hadoop:3.3.0 container_name: hd02 hostname: hd02 extra_hosts: - "hd-mysql57-01:172.24.0.6" - "hd01:172.24.0.11" - "hd03:172.24.0.13" networks: hd-network: ipv4_address: 172.24.0.12 volumes: - ${PWD}/element/configure/hadoop/etc-hd02:/usr/local/hadoop/etc - ${PWD}/element/configure/hadoop/dokshare:/usr/local/hadoop/dokshare - ${PWD}/element/mysh:/mysh environment: - HDFS_NAMENODE_USER=root - HDFS_DATANODE_USER=root - HDFS_SECONDARYNAMENODE_USER=root - YARN_RESOURCEMANAGER_USER=root - YARN_NODEMANAGER_USER=root hd03: image: my-hadoop:3.3.0 container_name: hd03 hostname: hd03 extra_hosts: - "hd-mysql57-01:172.24.0.6" - "hd01:172.24.0.11" - "hd02:172.24.0.12" networks: hd-network: ipv4_address: 172.24.0.13 volumes: - ${PWD}/element/configure/hadoop/etc-hd03:/usr/local/hadoop/etc - ${PWD}/element/configure/hadoop/dokshare:/usr/local/hadoop/dokshare - ${PWD}/element/mysh:/mysh environment: - HDFS_NAMENODE_USER=root - HDFS_DATANODE_USER=root - HDFS_SECONDARYNAMENODE_USER=root - YARN_RESOURCEMANAGER_USER=root - YARN_NODEMANAGER_USER=root hd-mysql57-01: image: mysql:5.7 container_name: hd-mysql57-01 hostname: hd-mysql57-01 networks: hd-network: ipv4_address: 172.24.0.6 environment: #最好使用此设定时区,其它静像也可以使用 - TZ=CST-8 - MYSQL_ROOT_PASSWORD=123456 - MYSQL_DATABASE=hive - MYSQL_USER=my_user - MYSQL_PASSWORD=my_pw #可以加--default-time-zone='+8:00'设定时区 command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci volumes: #本地文件目录 - ${PWD}/element/mysql57/data/:/var/lib/mysql networks: hd-network: name: hd-network ipam: config: - subnet: 172.24.0.0/24
1.3 element/ 目录
1.3.1 element/configure/hive-3.1.2/ 目录
- conf/ 目录
映射 docker 容器的对应的 conf/ 目录,增加文件 hive-site.xml
,用于自定义配置:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hd-mysql57-01:3306/hive?useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hd01:9083</value>
</property>
<!--
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hd01</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
-->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
- dokshare/ 目录
这个目录主要作用是共享文件,共享后续操作使用hive需要用到的资料。
- lib/ 目录
- 由于 hive 和 hadoop 使用的 lib 不同,会出现问题,所以通过 docker 映射 lib/ 目录,并将 hadoop 的 lib/ 目录下的文件
guava-27.0-jre.jar
替换guava-19.0.jar
- 并删除文件
log4j-slf4j-impl-2.10.0.jar
1.3.2 element/images/ 目录
下载文件:apache-hive-3.1.2-bin.tar.gz
1.3.3 element/mysh/ 目录
管理 sh 脚本:hiveserver.sh
```xml!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs if [ ! -d $HIVE_LOG_DIR ] then mkdir -p $HIVE_LOG_DIR fi
- 由于 hive 和 hadoop 使用的 lib 不同,会出现问题,所以通过 docker 映射 lib/ 目录,并将 hadoop 的 lib/ 目录下的文件
Check if process run normal. Args 1 is name of process. Args 2 is port of process.
function check_process() { pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk ‘{print $2}’) ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk ‘{print $7}’ | cut -d ‘/‘ -f 1) echo $pid [[ “$pid” =~ “$ppid” ]] && [ “$ppid” ] && return 0 || return 1 }
function hive_start() { metapid=$(check_process HiveMetastore 9083) cmd=”nohup hive —service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &” [ -z “$metapid” ] && eval $cmd || echo “Metastore服务已启动” server2pid=$(check_process HiveServer2 10000) cmd=”nohup hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &” [ -z “$server2pid” ] && eval $cmd || echo “HiveServer2服务已启动” }
function hive_stop() { metapid=$(check_process HiveMetastore 9083) [ “$metapid” ] && kill $metapid || echo “Metastore服务未启动” server2pid=$(check_process HiveServer2 10000) [ “$server2pid” ] && kill $server2pid || echo “HiveServer2服务未启动” }
case $1 in “start”) hive_start ;; “stop”) hive_stop ;; “restart”) hive_stop sleep 2 hive_start ;; “status”) check_process HiveMetastore 9083 >/dev/null && echo “Metastore服务运行正常” || echo “Metastore服务运行异常” check_process HiveServer2 10000 >/dev/null && echo “HiveServer2服务运行正常” || echo “HiveServer2服务运行异常” ;; *) echo “Invalid Args!” echo ‘Usage: ‘$(basename $0)’ start|stop|restart|status’ ;; esac ```
2. 启动
- 修改使用 mysql 存储 matestore
[hd01 /usr/local/hive-3.1.2] bin/schematool -initSchema -dbType mysql -verbose
- 启动 matestore
[hd01 /usr/local/hive-3.1.2] nohup hive —service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &
- 启动 hiveserver2 客户端
[hd01 /usr/local/hive-3.1.2] bin/hive —service hiveserver2
- 启动 hive 客户端
[hd01 /usr/local/hive-3.1.2] bin/hive
3. 说明
由于 hiveserver2 是一个新的客户端,需要检测是否把 hive 的引擎改成 tez 或 spark,因此需要一段时间才能出现效果,所以,我在 hive-site.xml
中把 hiveserver2 关闭了,使用原始的客户端(后续学习再打开)。