前提
前提是机器已经有整合好的hive.
准备
1.Spark 要接管 Hive 需要把Hive的conf目录下的 hive-site.xml 复制到Spark的conf/目录下.
2.因为Hive的元数据信息都是存在MySQL里面,所以需要把 Mysql 的驱动 复制到 Spark的jars/目录下.我的是mysql-connector-java-5.1.49.jar .
3.如果访问不到hdfs, 则需要把core-site.xml和hdfs-site.xml 拷贝到Spark的conf/目录下.
直接操作,没啥难度的
hive-site.xml配置文件
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExsit=true&characterEncoding=UTF-8&verifyServerCertificate=false&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.exec.mode.local.auto</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>zjj101</value>
<description>The list of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
<description>The port of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
</property>
</configuration>
需要注意问题
因为是SparkSQL整合Hive所以需要关注一下hive.execution.engine属性,我之前是hive集成tez的.后来配置文件直接拿过来就报错了, 后来我给hive-size.xml里面的下面这段注释掉就好了,不注释掉的话会报错.
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
用sparkShell访问一下
[root@zjj101 ~]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/11/22 16:19:41 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://172.16.10.101:4040
Spark context available as 'sc' (master = local[*], app id = local-1606033173402).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default| student| false|
+--------+---------+-----------+
scala> spark.sql("select * from student").show
+----+--------+
| id| name|
+----+--------+
| 1|zhangsan|
+----+--------+
说明已经集成成功了.