0. 目标

  1. 基于 hadoop docker 集群搭建 ,在 hd01 搭建安装 hive 服务端
  2. 实现用 mysql 存储 metastore 和 hiveserver2 的搭建

    1. 各文件准备

  3. dockerfileHDFS

  4. docker-compose.yml
  5. element/ 目录
    1. configure/
      1. hive-3.1.2/
        1. conf/
        2. dokshare/
        3. lib/
    2. images/
      1. apache-hive-3.1.2-bin.tar.gz
    3. mysh/
      1. hiveservice.sh

        1.1 dockerfileHDFS

        由于之前的 hadoop 集群中,没有安装 hive,因此需要在 docker 镜像中添加上 hive

  1. FROM centos:7
  2. # SSH
  3. RUN yum install -y openssh-server sudo
  4. RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
  5. RUN echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
  6. RUN yum install -y openssh-clients
  7. RUN echo "root:123456" | chpasswd
  8. RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
  9. RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
  10. RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
  11. RUN mkdir /var/run/sshd
  12. # JAVA
  13. ADD element/images/jdk-8u141-linux-x64.tar.gz /usr/local/
  14. RUN mv /usr/local/jdk1.8.0_141 /usr/local/jdk1.8
  15. ENV JAVA_HOME /usr/local/jdk1.8
  16. ENV PATH $JAVA_HOME/bin:$PATH
  17. # HADOOP
  18. ADD element/images/hadoop-3.3.0.tar.gz /usr/local
  19. RUN mv /usr/local/hadoop-3.3.0 /usr/local/hadoop
  20. ENV HADOOP_HOME /usr/local/hadoop
  21. ENV PATH $HADOOP_HOME/bin:$PATH
  22. # HIVE
  23. ADD element/images/apache-hive-3.1.2-bin.tar.gz /usr/local
  24. RUN mv /usr/local/apache-hive-3.1.2-bin /usr/local/hive-3.1.2
  25. ENV HIVE_HOME /usr/local/hive-3.1.2
  26. ENV PATH $HIVE_HOME/bin:$PATH
  27. RUN yum install -y which sudo
  28. RUN mkdir /mysh
  29. EXPOSE 22
  30. CMD ["/usr/sbin/sshd", "-D"]

1.2 docker-compose.yml

看到这份 docker-compose.yml 比 hadoop 集群增加的主要内容,可以看出,增加了一个 mysql 容器,而不是响应的增加 hive,因为:

  1. hive 是在 hd01 中启动了,在 volumes: 中体现了这一点。
  2. mysql 是 hive 中的 metastore 需要使用到的。
    version: '3.5'
    services:
    hd01:
    image: my-hadoop:3.3.0
    container_name: hd01
    hostname: hd01
    extra_hosts:
    - "hd-mysql57-01:172.24.0.6"
    - "hd02:172.24.0.12"
    - "hd03:172.24.0.13"
    networks:
    hd-network:
    ipv4_address: 172.24.0.11
    volumes:
    - ${PWD}/element/configure/hadoop/etc-hd01:/usr/local/hadoop/etc
    - ${PWD}/element/configure/hadoop/dokshare:/usr/local/hadoop/dokshare
    - ${PWD}/element/configure/hive-3.1.2/conf:/usr/local/hive-3.1.2/conf
    - ${PWD}/element/configure/hive-3.1.2/lib:/usr/local/hive-3.1.2/lib
    - ${PWD}/element/configure/hive-3.1.2/dokshare:/usr/local/hive-3.1.2/dokshare
    - ${PWD}/element/mysh:/mysh
    environment:
    - HDFS_NAMENODE_USER=root
    - HDFS_DATANODE_USER=root
    - HDFS_SECONDARYNAMENODE_USER=root
    - YARN_RESOURCEMANAGER_USER=root
    - YARN_NODEMANAGER_USER=root
    hd02:
    image: my-hadoop:3.3.0
    container_name: hd02
    hostname: hd02
    extra_hosts:
    - "hd-mysql57-01:172.24.0.6"
    - "hd01:172.24.0.11"
    - "hd03:172.24.0.13"
    networks:
    hd-network:
    ipv4_address: 172.24.0.12
    volumes:
    - ${PWD}/element/configure/hadoop/etc-hd02:/usr/local/hadoop/etc
    - ${PWD}/element/configure/hadoop/dokshare:/usr/local/hadoop/dokshare
    - ${PWD}/element/mysh:/mysh
    environment:
    - HDFS_NAMENODE_USER=root
    - HDFS_DATANODE_USER=root
    - HDFS_SECONDARYNAMENODE_USER=root
    - YARN_RESOURCEMANAGER_USER=root
    - YARN_NODEMANAGER_USER=root
    hd03:
    image: my-hadoop:3.3.0
    container_name: hd03
    hostname: hd03
    extra_hosts:
    - "hd-mysql57-01:172.24.0.6"
    - "hd01:172.24.0.11"
    - "hd02:172.24.0.12"
    networks:
    hd-network:
    ipv4_address: 172.24.0.13
    volumes:
    - ${PWD}/element/configure/hadoop/etc-hd03:/usr/local/hadoop/etc
    - ${PWD}/element/configure/hadoop/dokshare:/usr/local/hadoop/dokshare
    - ${PWD}/element/mysh:/mysh
    environment:
    - HDFS_NAMENODE_USER=root
    - HDFS_DATANODE_USER=root
    - HDFS_SECONDARYNAMENODE_USER=root
    - YARN_RESOURCEMANAGER_USER=root
    - YARN_NODEMANAGER_USER=root
    hd-mysql57-01:
    image: mysql:5.7
    container_name: hd-mysql57-01
    hostname: hd-mysql57-01
    networks:
    hd-network:
    ipv4_address: 172.24.0.6
    environment:
    #最好使用此设定时区,其它静像也可以使用
    - TZ=CST-8
    - MYSQL_ROOT_PASSWORD=123456
    - MYSQL_DATABASE=hive
    - MYSQL_USER=my_user
    - MYSQL_PASSWORD=my_pw
    #可以加--default-time-zone='+8:00'设定时区
    command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
    volumes:
    #本地文件目录
    - ${PWD}/element/mysql57/data/:/var/lib/mysql
    networks:
    hd-network:
    name: hd-network
    ipam:
    config:
      - subnet: 172.24.0.0/24
    

    1.3 element/ 目录

    1.3.1 element/configure/hive-3.1.2/ 目录

  1. conf/ 目录

映射 docker 容器的对应的 conf/ 目录,增加文件 hive-site.xml ,用于自定义配置:

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hd-mysql57-01:3306/hive?useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>

  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hd01:9083</value>
  </property>
  <!--
     <property>
  <name>hive.server2.thrift.bind.host</name>
  <value>hd01</value>
     </property>
     <property>
  <name>hive.server2.thrift.port</name>
  <value>10000</value>
     </property>
 -->
  <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
  </property>
</configuration>
  1. dokshare/ 目录

这个目录主要作用是共享文件,共享后续操作使用hive需要用到的资料。

  1. lib/ 目录
    1. 由于 hive 和 hadoop 使用的 lib 不同,会出现问题,所以通过 docker 映射 lib/ 目录,并将 hadoop 的 lib/ 目录下的文件 guava-27.0-jre.jar 替换 guava-19.0.jar
    2. 并删除文件 log4j-slf4j-impl-2.10.0.jar

      1.3.2 element/images/ 目录

      下载文件:apache-hive-3.1.2-bin.tar.gz

      1.3.3 element/mysh/ 目录

      管理 sh 脚本: hiveserver.sh ```xml

      !/bin/bash

      HIVE_LOG_DIR=$HIVE_HOME/logs if [ ! -d $HIVE_LOG_DIR ] then mkdir -p $HIVE_LOG_DIR fi

Check if process run normal. Args 1 is name of process. Args 2 is port of process.

function check_process() { pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk ‘{print $2}’) ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk ‘{print $7}’ | cut -d ‘/‘ -f 1) echo $pid [[ “$pid” =~ “$ppid” ]] && [ “$ppid” ] && return 0 || return 1 }

function hive_start() { metapid=$(check_process HiveMetastore 9083) cmd=”nohup hive —service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &” [ -z “$metapid” ] && eval $cmd || echo “Metastore服务已启动” server2pid=$(check_process HiveServer2 10000) cmd=”nohup hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &” [ -z “$server2pid” ] && eval $cmd || echo “HiveServer2服务已启动” }

function hive_stop() { metapid=$(check_process HiveMetastore 9083) [ “$metapid” ] && kill $metapid || echo “Metastore服务未启动” server2pid=$(check_process HiveServer2 10000) [ “$server2pid” ] && kill $server2pid || echo “HiveServer2服务未启动” }

case $1 in “start”) hive_start ;; “stop”) hive_stop ;; “restart”) hive_stop sleep 2 hive_start ;; “status”) check_process HiveMetastore 9083 >/dev/null && echo “Metastore服务运行正常” || echo “Metastore服务运行异常” check_process HiveServer2 10000 >/dev/null && echo “HiveServer2服务运行正常” || echo “HiveServer2服务运行异常” ;; *) echo “Invalid Args!” echo ‘Usage: ‘$(basename $0)’ start|stop|restart|status’ ;; esac ```

2. 启动

  1. 修改使用 mysql 存储 matestore

[hd01 /usr/local/hive-3.1.2] bin/schematool -initSchema -dbType mysql -verbose

  1. 启动 matestore

[hd01 /usr/local/hive-3.1.2] nohup hive —service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &

  1. 启动 hiveserver2 客户端

[hd01 /usr/local/hive-3.1.2] bin/hive —service hiveserver2

  1. 启动 hive 客户端

[hd01 /usr/local/hive-3.1.2] bin/hive

3. 说明

由于 hiveserver2 是一个新的客户端,需要检测是否把 hive 的引擎改成 tez 或 spark,因此需要一段时间才能出现效果,所以,我在 hive-site.xml中把 hiveserver2 关闭了,使用原始的客户端(后续学习再打开)。