SpringBoot整合Hive - 《Java？Java！》

1.准备工作
2.导入依赖、编写代码
3.简单测试
4.遇到的问题

1.准备工作

因为Hive基于Hadoop，所以我们需要Hadoop的环境，具体Hadoop的环境如何搭建可以查看前几篇文章。
要使用JDBC访问Hive，可能会出现权限问题，我们需要在Hadoop的配置文件core-site.xml中加入如下配置，赋予root访问权限

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

此外，要使用JDBC则必须开启HiveServer2，可以通过下面的方式启动HiveServer2

hiverserver2 &
hive --service hiveserver2

但我遇到了输出语句打印到控制台的问题，所以我这里用下面这个命令

 nohup hiveserver2 >> /usr/local/hive/log/hiveserver2.log 2>&1 &

默认开启10000号端口

2.导入依赖、编写代码

因为Hive基于Hadoop的原因，我们这里用之前Hadoop与SpringBoot整合的项目来继续整合Hive

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.7.0</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.zym</groupId>
    <artifactId>hadoop_springboot</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>hadoop_springboot</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
        <hadoop.version>3.3.3</hadoop.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
        <!-- hadoop -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-streaming</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-yarn-common</artifactId>
            <version>${hadoop.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-distcp</artifactId>
            <version>${hadoop.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>${hadoop.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.2.1</version>
        </dependency>
        <!-- hive依赖 -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.3.0</version>
        </dependency>
        <!-- 中文分词器 -->
        <dependency>
            <groupId>cn.bestwu</groupId>
            <artifactId>ik-analyzers</artifactId>
            <version>5.1.0</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.46</version>
        </dependency>
        <!-- 此处的导入依赖与mapreduce有关 -->
        <dependency>
            <groupId>jdk.tools</groupId>
            <artifactId>jdk.tools</artifactId>
            <version>1.8</version>
            <scope>system</scope>
            <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-jdbc</artifactId>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

application.yml

# tomcat thread = 200
server:
  tomcat:
    max-threads: 1000
# edit tomcat port
  port: 8900
# session time 30
  session-timeout: 60
spring:
  application:
    name: hadoop
  servlet:
    multipart:
      max-file-size: 50MB
      max-request-size: 50MB
  datasource:
    hive: #hive数据源
      url: jdbc:hive2://hadoop0:10000/test
      type: com.alibaba.druid.pool.DruidDataSource
      username: hive1
      password: Newpassword2022!!!
      driver-class-name: org.apache.hive.jdbc.HiveDriver
    common-config: #连接池统一配置，应用到所有的数据源
      initialSize: 1
      minIdle: 1
      maxIdle: 5
      maxActive: 50
      maxWait: 10000
      timeBetweenEvictionRunsMillis: 10000
      minEvictableIdleTimeMillis: 300000
      validationQuery: select 'x'
      testWhileIdle: true
      testOnBorrow: false
      testOnReturn: false
      poolPreparedStatements: true
      maxOpenPreparedStatements: 20
      filters: stat
hdfs:
  path: hdfs://hadoop0:9000
  username: root

DataSourceCommonProperties.java

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Data
@NoArgsConstructor
@AllArgsConstructor
@Component
@ConfigurationProperties(prefix = "spring.datasource.common-config", ignoreUnknownFields = false)
public class DataSourceCommonProperties {
    private int initialSize = 10;
    private int minIdle;
    private int maxIdle;
    private int maxActive;
    private int maxWait;
    private int timeBetweenEvictionRunsMillis;
    private int minEvictableIdleTimeMillis;
    private String validationQuery;
    private boolean testWhileIdle;
    private boolean testOnBorrow;
    private boolean testOnReturn;
    private boolean poolPreparedStatements;
    private int maxOpenPreparedStatements;
    private String filters;
    private String mapperLocations;
    private String typeAliasPackage;
}

HiveDruidConfiguration.java

import com.alibaba.druid.pool.DruidDataSource;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import javax.sql.DataSource;
import java.sql.SQLException;
@Slf4j
@Configuration
@EnableConfigurationProperties({HiveJdbcProperties.class, DataSourceCommonProperties.class})
public class HiveDruidConfiguration {
    @Autowired
    private HiveJdbcProperties hiveJdbcProperties;
    @Autowired
    private DataSourceCommonProperties dataSourceCommonProperties;
    @Bean("hiveDruidDataSource") //新建bean实例
    @Qualifier("hiveDruidDataSource")//标识
    public DataSource dataSource(){
        DruidDataSource datasource = new DruidDataSource();
        //配置数据源属性
        datasource.setUrl(hiveJdbcProperties.getUrl());
        datasource.setUsername(hiveJdbcProperties.getUsername());
        datasource.setPassword(hiveJdbcProperties.getPassword());
        datasource.setDriverClassName(hiveJdbcProperties.getDriverClassName());
        //配置统一属性
        datasource.setInitialSize(dataSourceCommonProperties.getInitialSize());
        datasource.setMinIdle(dataSourceCommonProperties.getMinIdle());
        datasource.setMaxActive(dataSourceCommonProperties.getMaxActive());
        datasource.setMaxWait(dataSourceCommonProperties.getMaxWait());
        datasource.setTimeBetweenEvictionRunsMillis(dataSourceCommonProperties.getTimeBetweenEvictionRunsMillis());
        datasource.setMinEvictableIdleTimeMillis(dataSourceCommonProperties.getMinEvictableIdleTimeMillis());
        datasource.setValidationQuery(dataSourceCommonProperties.getValidationQuery());
        datasource.setTestWhileIdle(dataSourceCommonProperties.isTestWhileIdle());
        datasource.setTestOnBorrow(dataSourceCommonProperties.isTestOnBorrow());
        datasource.setTestOnReturn(dataSourceCommonProperties.isTestOnReturn());
        datasource.setPoolPreparedStatements(dataSourceCommonProperties.isPoolPreparedStatements());
        try {
            datasource.setFilters(dataSourceCommonProperties.getFilters());
        } catch (SQLException e) {
            log.error("Druid configuration initialization filter error.", e);
        }
        return datasource;
    }
}

HiveJdbcConfiguration.java

import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
import javax.sql.DataSource;
@Configuration
public class HiveJdbcConfiguration {
    @Bean("hiveJdbcTemplate")
    @Qualifier("hiveJdbcTemplate")
    public JdbcTemplate jdbcTemplate(@Qualifier("hiveDruidDataSource") DataSource dataSource) {
        return new JdbcTemplate(dataSource);
    }
}

HiveJdbcProperties.java

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Data
@NoArgsConstructor
@AllArgsConstructor
@Component
@ConfigurationProperties(prefix = "spring.datasource.hive", ignoreUnknownFields = false)
public class HiveJdbcProperties {
    private String url;
    private String type;
    private String username;
    private String password;
    private String driverClassName;
}

HiveController.java

import com.zym.result.Result;
import com.zym.service.HiveService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/hadoop/hive")
@Slf4j
public class HiveController {
    @Autowired
    private HiveService hiveService;
    @GetMapping("/listAllTables")
    public Result listAllTables(){
        return new Result("SUCCESS","获取所有表成功",hiveService.listAllTables());
    }
    @GetMapping("/selectFromTable")
    public Result selectFromTable(String tableName){
        return new Result("SUCCESS","获取表内容成功",hiveService.selectFromTable(tableName));
    }
    @GetMapping("/select")
    public Result select(String hql){
        return new Result("SUCCESS","获取内容",hiveService.select(hql));
    }
}

HiveService.java

import java.util.List;
public interface HiveService {
    Object select(String hql);
    List<String> listAllTables();
    List<String> describeTable(String tableName);
    List<String> selectFromTable(String tableName);
}

HiveServiceImpl.java

import com.zym.service.HiveService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import org.springframework.util.StringUtils;
import javax.sql.DataSource;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
@Slf4j
@Service
public class HiveServiceImpl implements HiveService {
    @Autowired
    @Qualifier("hiveJdbcTemplate")
    private JdbcTemplate hiveJdbcTemplate;
    @Autowired
    @Qualifier("hiveDruidDataSource")
    private DataSource hiveDruidDataSource;
    @Override
    public Object select(String hql) {
        return hiveJdbcTemplate.queryForList(hql);
    }
    @Override
    public List<String> listAllTables() {
        List<String> result = new ArrayList<>();
        try {
            Statement statement = hiveDruidDataSource.getConnection().createStatement();
            String sql = "show tables";
            log.info("Running: " + sql);
            ResultSet resultSet = statement.executeQuery(sql);
            while (resultSet.next()) {
                result.add(resultSet.getString(1));
            }
            return result;
        } catch (SQLException throwables) {
            log.error(throwables.getMessage());
        }
        return Collections.emptyList();
    }
    @Override
    public List<String> describeTable(String tableName) {
        if (StringUtils.isEmpty(tableName)){
            return Collections.emptyList();
        }
        List<String> result = new ArrayList<>();
        try {
            Statement statement = hiveDruidDataSource.getConnection().createStatement();
            String sql = "describe " + tableName;
            log.info("Running" + sql);
            ResultSet resultSet = statement.executeQuery(sql);
            while (resultSet.next()) {
                result.add(resultSet.getString(1));
            }
            return result;
        } catch (SQLException throwables) {
            log.error(throwables.getMessage());
        }
        return Collections.emptyList();
    }
    @Override
    public List<String> selectFromTable(String tableName) {
        if (StringUtils.isEmpty(tableName)){
            return Collections.emptyList();
        }
        List<String> result = new ArrayList<>();
        try {
            Statement statement = hiveDruidDataSource.getConnection().createStatement();
            String sql = "select * from " + tableName;
            log.info("Running" + sql);
            ResultSet resultSet = statement.executeQuery(sql);
            int columnCount = resultSet.getMetaData().getColumnCount();
            String str = null;
            while (resultSet.next()) {
                str = "";
                for (int i = 1; i < columnCount; i++) {
                    str += resultSet.getString(i) + " ";
                }
                str += resultSet.getString(columnCount);
                log.info(str);
                result.add(str);
            }
            return result;
        } catch (SQLException throwables) {
            log.error(throwables.getMessage());
        }
        return Collections.emptyList();
    }
}

3.简单测试

获取数据库中所有表

获取表中数据

执行Hive QL

4.遇到的问题

从控制台日志看不出来，我们到我们自定义的输出路径查看一下

与控制台报错大差不差，还是看不懂，只能看出来在job阶段出错了，我们再查看一下hive的日志

cat /tmp/root/hive.log

……内存溢出，行吧

解决方法：在Hive的配置文件hive-env.sh里面配置heap大小

然后重新启动Hiveserver2即可

这里有一个遗留问题，在测试时发现，无论利用beeline还是原生jdbc，不需要输入用户名与密码也可链接上，目前还没找到什么原因，有懂的人可以分享一下为什么。

——通过在网上找了一些资料明白了一些，Hive安装的时候默认是没有用户名和密码的，如果想启用需要自己写代码自定义验证，然后给配置文件中加入用户名密码的信息

参考链接：
https://blog.csdn.net/Winter_chen001/article/details/110449266
https://blog.csdn.net/yamaxifeng_132/article/details/119869080