前提

前提是机器已经有整合好的hive.

准备

1.Spark 要接管 Hive 需要把Hive的conf目录下的 hive-site.xml 复制到Spark的conf/目录下.
2.因为Hive的元数据信息都是存在MySQL里面,所以需要把 Mysql 的驱动 复制到 Spark的jars/目录下.我的是mysql-connector-java-5.1.49.jar .
3.如果访问不到hdfs, 则需要把core-site.xml和hdfs-site.xml 拷贝到Spark的conf/目录下.

直接操作,没啥难度的

hive-site.xml配置文件

  1. <?xml version="1.0"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <configuration>
  4. <property>
  5. <name>javax.jdo.option.ConnectionPassword</name>
  6. <value>mysql</value>
  7. </property>
  8. <property>
  9. <name>javax.jdo.option.ConnectionURL</name>
  10. <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExsit=true&amp;characterEncoding=UTF-8&amp;verifyServerCertificate=false&amp;useSSL=false</value>
  11. </property>
  12. <property>
  13. <name>javax.jdo.option.ConnectionDriverName</name>
  14. <value>com.mysql.jdbc.Driver</value>
  15. </property>
  16. <property>
  17. <name>hive.cli.print.header</name>
  18. <value>true</value>
  19. </property>
  20. <property>
  21. <name>hive.cli.print.current.db</name>
  22. <value>true</value>
  23. </property>
  24. <property>
  25. <name>hive.execution.engine</name>
  26. <value>tez</value>
  27. </property>
  28. <property>
  29. <name>javax.jdo.option.ConnectionUserName</name>
  30. <value>root</value>
  31. </property>
  32. <property>
  33. <name>javax.jdo.option.ConnectionPassword</name>
  34. <value>root</value>
  35. </property>
  36. <property>
  37. <name>datanucleus.schema.autoCreateAll</name>
  38. <value>true</value>
  39. </property>
  40. <property>
  41. <name>hive.metastore.schema.verification</name>
  42. <value>false</value>
  43. </property>
  44. <property>
  45. <name>hive.exec.mode.local.auto</name>
  46. <value>true</value>
  47. </property>
  48. <property>
  49. <name>hive.zookeeper.quorum</name>
  50. <value>zjj101</value>
  51. <description>The list of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
  52. </property>
  53. <property>
  54. <name>hive.zookeeper.client.port</name>
  55. <value>2181</value>
  56. <description>The port of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
  57. </property>
  58. </configuration>

需要注意问题

因为是SparkSQL整合Hive所以需要关注一下hive.execution.engine属性,我之前是hive集成tez的.后来配置文件直接拿过来就报错了, 后来我给hive-size.xml里面的下面这段注释掉就好了,不注释掉的话会报错.

  1. <property>
  2. <name>hive.execution.engine</name>
  3. <value>tez</value>
  4. </property>

用sparkShell访问一下

  1. [root@zjj101 ~]# spark-shell
  2. Setting default log level to "WARN".
  3. To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  4. 20/11/22 16:19:41 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
  5. Spark context Web UI available at http://172.16.10.101:4040
  6. Spark context available as 'sc' (master = local[*], app id = local-1606033173402).
  7. Spark session available as 'spark'.
  8. Welcome to
  9. ____ __
  10. / __/__ ___ _____/ /__
  11. _\ \/ _ \/ _ `/ __/ '_/
  12. /___/ .__/\_,_/_/ /_/\_\ version 2.1.1
  13. /_/
  14. Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
  15. Type in expressions to have them evaluated.
  16. Type :help for more information.
  17. scala> spark.sql("show tables").show
  18. +--------+---------+-----------+
  19. |database|tableName|isTemporary|
  20. +--------+---------+-----------+
  21. | default| student| false|
  22. +--------+---------+-----------+
  23. scala> spark.sql("select * from student").show
  24. +----+--------+
  25. | id| name|
  26. +----+--------+
  27. | 1|zhangsan|
  28. +----+--------+

说明已经集成成功了.