1.准备工作
因为Hive基于Hadoop,所以我们需要Hadoop的环境,具体Hadoop的环境如何搭建可以查看前几篇文章。
要使用JDBC访问Hive,可能会出现权限问题,我们需要在Hadoop的配置文件core-site.xml中加入如下配置,赋予root访问权限
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
此外,要使用JDBC则必须开启HiveServer2,可以通过下面的方式启动HiveServer2
hiverserver2 &
hive --service hiveserver2
但我遇到了输出语句打印到控制台的问题,所以我这里用下面这个命令
nohup hiveserver2 >> /usr/local/hive/log/hiveserver2.log 2>&1 &
默认开启10000号端口
2.导入依赖、编写代码
因为Hive基于Hadoop的原因,我们这里用之前Hadoop与SpringBoot整合的项目来继续整合Hive
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.0</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.zym</groupId>
<artifactId>hadoop_springboot</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>hadoop_springboot</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
<hadoop.version>3.3.3</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-streaming</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-distcp</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.2.1</version>
</dependency>
<!-- hive依赖 -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.3.0</version>
</dependency>
<!-- 中文分词器 -->
<dependency>
<groupId>cn.bestwu</groupId>
<artifactId>ik-analyzers</artifactId>
<version>5.1.0</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.46</version>
</dependency>
<!-- 此处的导入依赖与mapreduce有关 -->
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
application.yml
# tomcat thread = 200
server:
tomcat:
max-threads: 1000
# edit tomcat port
port: 8900
# session time 30
session-timeout: 60
spring:
application:
name: hadoop
servlet:
multipart:
max-file-size: 50MB
max-request-size: 50MB
datasource:
hive: #hive数据源
url: jdbc:hive2://hadoop0:10000/test
type: com.alibaba.druid.pool.DruidDataSource
username: hive1
password: Newpassword2022!!!
driver-class-name: org.apache.hive.jdbc.HiveDriver
common-config: #连接池统一配置,应用到所有的数据源
initialSize: 1
minIdle: 1
maxIdle: 5
maxActive: 50
maxWait: 10000
timeBetweenEvictionRunsMillis: 10000
minEvictableIdleTimeMillis: 300000
validationQuery: select 'x'
testWhileIdle: true
testOnBorrow: false
testOnReturn: false
poolPreparedStatements: true
maxOpenPreparedStatements: 20
filters: stat
hdfs:
path: hdfs://hadoop0:9000
username: root
DataSourceCommonProperties.java
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Data
@NoArgsConstructor
@AllArgsConstructor
@Component
@ConfigurationProperties(prefix = "spring.datasource.common-config", ignoreUnknownFields = false)
public class DataSourceCommonProperties {
private int initialSize = 10;
private int minIdle;
private int maxIdle;
private int maxActive;
private int maxWait;
private int timeBetweenEvictionRunsMillis;
private int minEvictableIdleTimeMillis;
private String validationQuery;
private boolean testWhileIdle;
private boolean testOnBorrow;
private boolean testOnReturn;
private boolean poolPreparedStatements;
private int maxOpenPreparedStatements;
private String filters;
private String mapperLocations;
private String typeAliasPackage;
}
HiveDruidConfiguration.java
import com.alibaba.druid.pool.DruidDataSource;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import javax.sql.DataSource;
import java.sql.SQLException;
@Slf4j
@Configuration
@EnableConfigurationProperties({HiveJdbcProperties.class, DataSourceCommonProperties.class})
public class HiveDruidConfiguration {
@Autowired
private HiveJdbcProperties hiveJdbcProperties;
@Autowired
private DataSourceCommonProperties dataSourceCommonProperties;
@Bean("hiveDruidDataSource") //新建bean实例
@Qualifier("hiveDruidDataSource")//标识
public DataSource dataSource(){
DruidDataSource datasource = new DruidDataSource();
//配置数据源属性
datasource.setUrl(hiveJdbcProperties.getUrl());
datasource.setUsername(hiveJdbcProperties.getUsername());
datasource.setPassword(hiveJdbcProperties.getPassword());
datasource.setDriverClassName(hiveJdbcProperties.getDriverClassName());
//配置统一属性
datasource.setInitialSize(dataSourceCommonProperties.getInitialSize());
datasource.setMinIdle(dataSourceCommonProperties.getMinIdle());
datasource.setMaxActive(dataSourceCommonProperties.getMaxActive());
datasource.setMaxWait(dataSourceCommonProperties.getMaxWait());
datasource.setTimeBetweenEvictionRunsMillis(dataSourceCommonProperties.getTimeBetweenEvictionRunsMillis());
datasource.setMinEvictableIdleTimeMillis(dataSourceCommonProperties.getMinEvictableIdleTimeMillis());
datasource.setValidationQuery(dataSourceCommonProperties.getValidationQuery());
datasource.setTestWhileIdle(dataSourceCommonProperties.isTestWhileIdle());
datasource.setTestOnBorrow(dataSourceCommonProperties.isTestOnBorrow());
datasource.setTestOnReturn(dataSourceCommonProperties.isTestOnReturn());
datasource.setPoolPreparedStatements(dataSourceCommonProperties.isPoolPreparedStatements());
try {
datasource.setFilters(dataSourceCommonProperties.getFilters());
} catch (SQLException e) {
log.error("Druid configuration initialization filter error.", e);
}
return datasource;
}
}
HiveJdbcConfiguration.java
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;
import javax.sql.DataSource;
@Configuration
public class HiveJdbcConfiguration {
@Bean("hiveJdbcTemplate")
@Qualifier("hiveJdbcTemplate")
public JdbcTemplate jdbcTemplate(@Qualifier("hiveDruidDataSource") DataSource dataSource) {
return new JdbcTemplate(dataSource);
}
}
HiveJdbcProperties.java
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
@Data
@NoArgsConstructor
@AllArgsConstructor
@Component
@ConfigurationProperties(prefix = "spring.datasource.hive", ignoreUnknownFields = false)
public class HiveJdbcProperties {
private String url;
private String type;
private String username;
private String password;
private String driverClassName;
}
HiveController.java
import com.zym.result.Result;
import com.zym.service.HiveService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/hadoop/hive")
@Slf4j
public class HiveController {
@Autowired
private HiveService hiveService;
@GetMapping("/listAllTables")
public Result listAllTables(){
return new Result("SUCCESS","获取所有表成功",hiveService.listAllTables());
}
@GetMapping("/selectFromTable")
public Result selectFromTable(String tableName){
return new Result("SUCCESS","获取表内容成功",hiveService.selectFromTable(tableName));
}
@GetMapping("/select")
public Result select(String hql){
return new Result("SUCCESS","获取内容",hiveService.select(hql));
}
}
HiveService.java
import java.util.List;
public interface HiveService {
Object select(String hql);
List<String> listAllTables();
List<String> describeTable(String tableName);
List<String> selectFromTable(String tableName);
}
HiveServiceImpl.java
import com.zym.service.HiveService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import org.springframework.util.StringUtils;
import javax.sql.DataSource;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
@Slf4j
@Service
public class HiveServiceImpl implements HiveService {
@Autowired
@Qualifier("hiveJdbcTemplate")
private JdbcTemplate hiveJdbcTemplate;
@Autowired
@Qualifier("hiveDruidDataSource")
private DataSource hiveDruidDataSource;
@Override
public Object select(String hql) {
return hiveJdbcTemplate.queryForList(hql);
}
@Override
public List<String> listAllTables() {
List<String> result = new ArrayList<>();
try {
Statement statement = hiveDruidDataSource.getConnection().createStatement();
String sql = "show tables";
log.info("Running: " + sql);
ResultSet resultSet = statement.executeQuery(sql);
while (resultSet.next()) {
result.add(resultSet.getString(1));
}
return result;
} catch (SQLException throwables) {
log.error(throwables.getMessage());
}
return Collections.emptyList();
}
@Override
public List<String> describeTable(String tableName) {
if (StringUtils.isEmpty(tableName)){
return Collections.emptyList();
}
List<String> result = new ArrayList<>();
try {
Statement statement = hiveDruidDataSource.getConnection().createStatement();
String sql = "describe " + tableName;
log.info("Running" + sql);
ResultSet resultSet = statement.executeQuery(sql);
while (resultSet.next()) {
result.add(resultSet.getString(1));
}
return result;
} catch (SQLException throwables) {
log.error(throwables.getMessage());
}
return Collections.emptyList();
}
@Override
public List<String> selectFromTable(String tableName) {
if (StringUtils.isEmpty(tableName)){
return Collections.emptyList();
}
List<String> result = new ArrayList<>();
try {
Statement statement = hiveDruidDataSource.getConnection().createStatement();
String sql = "select * from " + tableName;
log.info("Running" + sql);
ResultSet resultSet = statement.executeQuery(sql);
int columnCount = resultSet.getMetaData().getColumnCount();
String str = null;
while (resultSet.next()) {
str = "";
for (int i = 1; i < columnCount; i++) {
str += resultSet.getString(i) + " ";
}
str += resultSet.getString(columnCount);
log.info(str);
result.add(str);
}
return result;
} catch (SQLException throwables) {
log.error(throwables.getMessage());
}
return Collections.emptyList();
}
}
3.简单测试
4.遇到的问题
从控制台日志看不出来,我们到我们自定义的输出路径查看一下
与控制台报错大差不差,还是看不懂,只能看出来在job阶段出错了,我们再查看一下hive的日志
cat /tmp/root/hive.log
……内存溢出,行吧
解决方法:在Hive的配置文件hive-env.sh里面配置heap大小
然后重新启动Hiveserver2即可
这里有一个遗留问题,在测试时发现,无论利用beeline还是原生jdbc,不需要输入用户名与密码也可链接上,目前还没找到什么原因,有懂的人可以分享一下为什么。
——通过在网上找了一些资料明白了一些,Hive安装的时候默认是没有用户名和密码的,如果想启用需要自己写代码自定义验证,然后给配置文件中加入用户名密码的信息
参考链接:
https://blog.csdn.net/Winter_chen001/article/details/110449266
https://blog.csdn.net/yamaxifeng_132/article/details/119869080