hive命令里有一行是指向spark的jars的,旧版本可能用的旧的spark目录结构,而报错:

    1. /spark/lib/spark-assembly-*.jar: No such file or directory

    在hive脚本中第116行附近,参考网页,做如下修改

    #sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
    sparkAssemblyPath=`ls ${SPARK_HOME}/jars/*.jar`
    

    然后再启动spark-shell就能用sql了

    spark.sql("show databases").show()
    

    在idea里写的话,pom.xml如下

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.example</groupId>
        <artifactId>spark_proj</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <properties>
            <maven.compiler.source>8</maven.compiler.source>
            <maven.compiler.target>8</maven.compiler.target>
        </properties>
        <build>
            <plugins>
                <plugin>
                    <groupId>org.scala-tools</groupId>
                    <artifactId>maven-scala-plugin</artifactId>
                    <version>2.15.2</version>
                    <executions>
                        <execution>
                            <id>scala-compile</id>
                            <goals>
                                <goal>compile</goal>
                            </goals>
                            <configuration>
                                <!--includes是一个数组,包含要编译的code-->
                                <includes>
                                    <include>**/*.scala</include>
                                </includes>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>2.11.7</version>
            </dependency>
    <!--        <dependency>-->
    <!--            <groupId>org.apache.spark</groupId>-->
    <!--            <artifactId>spark-core_2.11</artifactId>-->
    <!--            <version>2.4.5</version>-->
    <!--        </dependency>-->
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.11</artifactId>
                <version>2.4.5</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_2.11</artifactId>
                <version>2.4.5</version>
            </dependency>
    
        </dependencies>
    </project>
    

    代码如下

    import org.apache.spark.sql.SparkSession
    object Main extends App{
    //  print("Hi")
      val spark=SparkSession.builder()
        .master("yarn-cluster")
        .appName("hi world")
        .config("hive.metastore.uri","thrift://sandbox01:9083")
        .enableHiveSupport()
        .getOrCreate()
      val sc=spark.sparkContext
      spark.sql("show databases").show()
      sc.stop()
    //  print("lo")
    }
    

    打包后在集群上提交

    bin/spark-submit --class Main --master yarn --deploy-mode cluster --name testing_hive /path/to/packaged.jar
    

    然后在YARN网页上点开Application Id再点Log即可看到运行结果的显示(数据库名字列表)