官方文档地址http://spark.apache.org/docs/latest/sql-getting-started.html
官方提供的测试数据文件http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
解压当前文件夹examples/src/main/resources

SparkSession

通过SparkSession.builder()创建

  1. import org.apache.spark.sql.SparkSession
  2. val spark = SparkSession
  3. .builder()
  4. .appName("Spark SQL basic example")
  5. .config("spark.some.config.option", "some-value")
  6. .getOrCreate()
  7. // For implicit conversions like converting RDDs to DataFrames
  8. import spark.implicits._

读取json文件

  1. package cn.bx.spark
  2. import org.apache.spark.sql.{DataFrame, SparkSession}
  3. object SparkSessionApp {
  4. def main(args: Array[String]): Unit = {
  5. val spark: SparkSession = SparkSession.builder().appName("SparkSessionApp").master("local[*]").getOrCreate()
  6. val people: DataFrame = spark.read.json("resources/people.json")
  7. people.show()
  8. spark.stop()
  9. }
  10. }

打印结果

  1. +----+-------+
  2. | age| name|
  3. +----+-------+
  4. |null|Michael|
  5. | 30| Andy|
  6. | 19| Justin|
  7. +----+-------+