hive命令里有一行是指向spark的jars的,旧版本可能用的旧的spark目录结构,而报错:
/spark/lib/spark-assembly-*.jar: No such file or directory
在hive脚本中第116行附近,参考网页,做如下修改
#sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
sparkAssemblyPath=`ls ${SPARK_HOME}/jars/*.jar`
然后再启动spark-shell就能用sql了
spark.sql("show databases").show()
在idea里写的话,pom.xml如下
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>spark_proj</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<id>scala-compile</id>
<goals>
<goal>compile</goal>
</goals>
<configuration>
<!--includes是一个数组,包含要编译的code-->
<includes>
<include>**/*.scala</include>
</includes>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<!-- <dependency>-->
<!-- <groupId>org.apache.spark</groupId>-->
<!-- <artifactId>spark-core_2.11</artifactId>-->
<!-- <version>2.4.5</version>-->
<!-- </dependency>-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.5</version>
</dependency>
</dependencies>
</project>
代码如下
import org.apache.spark.sql.SparkSession
object Main extends App{
// print("Hi")
val spark=SparkSession.builder()
.master("yarn-cluster")
.appName("hi world")
.config("hive.metastore.uri","thrift://sandbox01:9083")
.enableHiveSupport()
.getOrCreate()
val sc=spark.sparkContext
spark.sql("show databases").show()
sc.stop()
// print("lo")
}
打包后在集群上提交
bin/spark-submit --class Main --master yarn --deploy-mode cluster --name testing_hive /path/to/packaged.jar
然后在YARN网页上点开Application Id再点Log即可看到运行结果的显示(数据库名字列表)
