安装 Hive
Hive 元数据配置到 MySQL
- 使用元数据服务的方式访问 Hive
- 使用 JDBC 方式访问 Hive
Hive 常用交互命令
Hive 常见属性配置
遇到的错误
- User: root is not allowed to impersonate root

安装 Hive

下载Hivehttp://archive.apache.org/dist/hive/
把 apache-hive-3.1.2-bin.tar.gz 上传到 linux 的/opt/software 目录下
解压 apache-hive-3.1.2-bin.tar.gz 到/opt/module/目录下面 tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/
修改 apache-hive-3.1.2-bin.tar.gz 的名称为 hivemv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive
修改/etc/profile.d/my_env.sh，添加环境变量vim /etc/profile.d/my_env.sh #HIVE_HOME export HIVE_HOME=/opt/module/hive export PATH=$PATH:$HIVE_HOME/bin
解决日志 Jar 包冲突 mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak
初始化元数据库bin/schematool -dbType derby -initSchema
启动 Hivebin/hive
使用 Hivehive> show databases; hive> show tables; hive> create table test(id int); hive> insert into test values(1); hive> select * from test;
在 CRT 窗口中开启另一个窗口开启 Hive，在/tmp/atguigu 目录下监控 hive.log 文件

Hive 元数据配置到 MySQL
原因在于 Hive 默认使用的元数据库为 derby，开启 Hive 之后就会占用元数据库，且不与其他客户端共享数据，所以我们需要将 Hive 的元数据地址改为 MySQL。
安装MySQL，修改 mysql 库下的 user 表中的 root 用户允许任意 ip 连接。（忽略安装步骤）update mysql.user set host=’%’ where user=’root’; flush privileges;
将 MySQL 的 JDBC 驱动拷贝到 Hive 的 lib 目录下 cp /opt/software/mysql-connector-java-5.1.37.jar $HIVE_HOME/lib
在$HIVE_HOME/conf 目录下新建 hive-site.xml 文件vim $HIVE_HOME/conf/hive-site.xml <?xml version=”1.0”?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

javax.jdo.option.ConnectionURL
jdbc:mysql://hadoop102:3306/metastore?useSSL=false

javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver

javax.jdo.option.ConnectionUserName
root

javax.jdo.option.ConnectionPassword
123456

hive.metastore.schema.verification
false

hive.metastore.event.db.notification.api.auth
false

hive.metastore.warehouse.dir
/user/hive/warehouse
登陆 MySQL mysql -uroot -p123456
新建 Hive 元数据库mysql> create database metastore; mysql> quit;
初始化 Hive 元数据库

schematool -initSchema -dbType mysql -verbose

启动 Hive bin/hive
使用 Hivehive> show databases; hive> show tables; hive> create table test (id int); hive> insert into test values(1); hive> select * from test;
在 CRT 窗口中开启另一个窗口开启 Hivehive> show databases; hive> show tables; hive> select * from aa;

使用元数据服务的方式访问 Hive
在 hive-site.xml 文件中添加如下配置信息

hive.metastore.uris
thrift://hadoop102:9083
启动 metastorehive —service metastore 2021-11-14 16:58:08: Starting Hive Metastore Server 注意: 启动后窗口不能再操作，需打开一个新的 shell 窗口做别的操作
启动 hive bin/hive

使用 JDBC 方式访问 Hive
在 hive-site.xml 文件中添加如下配置信息

hive.server2.thrift.bind.host
hadoop102

hive.server2.thrift.port
10000
启动 hiveserver2bin/hive —service hiveserver2
启动 beeline 客户端（需要多等待一会）bin/beeline -u jdbc:hive2://hadoop102:10000 -n root
看到如下界面Connecting to jdbc:hive2://hadoop102:10000 Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive 0: jdbc:hive2://hadoop102:10000>
编写 hive 服务启动脚本前台启动的方式导致需要打开多个 shell 窗口，可以使用如下方式后台方式启动 vim $HIVE_HOME/bin/hiveservices.sh #!/bin/bash HIVE_LOG_DIR=$HIVE_HOME/logs if [ ! -d $HIVE_LOG_DIR ] then mkdir -p $HIVE_LOG_DIR fi#检查进程是否运行正常，参数 1 为进程名，参数 2 为进程端口 function check_process() { pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk ‘{print $2}’) ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk ‘{print $7}’ | cut - d ‘/‘ -f 1) echo $pid [[ “$pid” =~ “$ppid” ]] && [ “$ppid” ] && return 0 || return 1 } function hive_start() { metapid=$(check_process HiveMetastore 9083) cmd=”nohup hive —service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &” [ -z “$metapid” ] && eval $cmd || echo “Metastroe 服务已启动” server2pid=$(check_process HiveServer2 10000) cmd=”nohup hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &” [ -z “$server2pid” ] && eval $cmd || echo “HiveServer2 服务已启动” } function hive_stop() { metapid=$(check_process HiveMetastore 9083) [ “$metapid” ] && kill $metapid || echo “Metastore 服务未启动” server2pid=$(check_process HiveServer2 10000) [ “$server2pid” ] && kill $server2pid || echo “HiveServer2 服务未启动” } case $1 in “start”) hive_start ;; “stop”) hive_stop ;; “restart”) hive_stop sleep 2 hive_start ;; “status”) check_process HiveMetastore 9083 >/dev/null && echo “Metastore 服务运行正常” || echo “Metastore 服务运行异常” check_process HiveServer2 10000 >/dev/null && echo “HiveServer2 服务运行正常” || echo “HiveServer2 服务运行异常” ;; *) echo Invalid Args! echo ‘Usage: ‘$(basename $0)’ start|stop|restart|status’ ;; esac
添加执行权限 chmod +x $HIVE_HOME/bin/hiveservices.sh
启动 Hive 后台服务 hiveservices.sh start

Hive 常用交互命令
“-e”不进入 hive 的交互窗口执行 sql 语句bin/hive -e “select id from student;”
“-f”执行脚本中 sql 语句
- 在/opt/module/hive/下创建 datas 目录并在 datas 目录下创建 hivef.sql 文件 touch hivef.sql
- 文件中写入正确的 sql 语句select *from student;
- 执行文件中的 sql 语句bin/hive -f /opt/module/hive/datas/hivef.sql
- 执行文件中的 sql 语句并将结果写入文件中 bin/hive -f /opt/module/hive/datas/hivef.sql > /opt/module/datas/hive_result.txt
退出 hive 窗口：hive(default)>exit; hive(default)>quit;
在 hive cli 命令窗口中如何查看 hdfs 文件系统hive(default)>dfs -ls /;
查看在 hive 中输入的所有历史命令
- 进入到当前用户的根目录 /root
- 查看. hivehistory 文件
  Hive 常见属性配置
  Hive 运行日志信息配置
Hive 的 log 默认存放在/tmp/atguigu/hive.log 目录下（当前用户名下）
修改 hive 的 log 存放日志到/opt/module/hive/logs
- 修改/opt/module/hive/conf/hive-log4j2.properties.template 文件名称为hive-log4j2.properties mv hive-log4j2.properties.template hivelog4j2.properties
- 在 hive-log4j2.properties 文件中修改 log 存放位置hive.log.dir=/opt/module/hive/logs
  打印当前库和表头
  在 hive-site.xml 中加入如下两个配置:
  
  hive.cli.print.header
  true
  
  hive.cli.print.current.db
  true

参数配置方式

查看当前所有的配置信息set
参数的配置三种方式
- 配置文件方式默认配置文件：hive-default.xml用户自定义配置文件：hive-site.xml注意：用户自定义配置会覆盖默认配置。另外，Hive 也会读入 Hadoop 的配置，因为 Hive是作为Hadoop 的客户端启动的，Hive 的配置会覆盖 Hadoop 的配置。配置文件的设定对本机启动的所有 Hive 进程都有效。
- 命令行参数方式启动 Hive 时，可以在命令行添加-hiveconf param=value 来设定参数 bin/hive -hiveconf mapred.reduce.tasks=10; 注意：仅对本次 hive 启动有效
- 参数声明方式可以在 HQL 中使用 SET 关键字设定参数 set mapred.reduce.tasks=100; 注意：仅对本次 hive 启动有效

上述三种设定方式的优先级依次递增。即配置文件<命令行参数<参数声明。注意某些系
统级的参数，例如 log4j 相关的设定，必须用前两种方式设定，因为那些参数的读取在会话
建立以前已经完成了。

遇到的错误

User: root is not allowed to impersonate root

修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项
hadoop.proxyuser.root.hosts hadoop.proxyuser.root.groups
例如User: gyt is not allowed to impersonate anonymous则需要将xml变更为如下格式
hadoop.proxyuser.gyt.hosts hadoop.proxyuser.gyt.groups

大数据

2.Hive 安装部署

安装 Hive

Hive 元数据配置到 MySQL

使用元数据服务的方式访问 Hive

使用 JDBC 方式访问 Hive

Hive 常用交互命令

Hive 常见属性配置

Hive 运行日志信息配置

打印当前库和表头

参数配置方式

遇到的错误

User: root is not allowed to impersonate root

2.Hive 安装部署

安装 Hive

Hive 元数据配置到 MySQL

使用元数据服务的方式访问 Hive

使用 JDBC 方式访问 Hive

Hive 常用交互命令

Hive 常见属性配置

Hive 运行日志信息配置

打印 当前库 和 表头

参数配置方式

遇到的错误

User: root is not allowed to impersonate root

打印当前库和表头