准备数据

必须是一行,不能是多行的.,

  1. [{ "name":"zhangsan", "age":"18"},{"name":"lisi","age":"25"}]

SparkShell读取Linux本地json文件

root/soft/person.json就是本机路径

var df = spark.read.format(“json”).load(“file:///root/soft/person.json”)

  1. scala> var df = spark.read.format("json").load("file:///root/soft/person.json")
  2. df: org.apache.spark.sql.DataFrame = [_corrupt_record: string]
  3. scala> df.show
  4. +--------------------+
  5. | _corrupt_record|
  6. +--------------------+
  7. | [|
  8. | {|
  9. | "name":"z...|
  10. | "age":"18"|
  11. | },|
  12. | {|
  13. | "name":"l...|
  14. | "age":"25"|
  15. | }|
  16. | ]|
  17. +--------------------+
  18. scala> var df = spark.read.format("json").load("file:///root/soft/person.json")
  19. df: org.apache.spark.sql.DataFrame = [age: string, name: string]
  20. scala> df.show
  21. +---+--------+
  22. |age| name|
  23. +---+--------+
  24. | 18|zhangsan|
  25. | 25| lisi|
  26. +---+--------+

SparkShell读取HDFS上的json文件

文件在hdfs的data目录下面

  1. scala> var df2 = spark.read.json("/data/person.json")
  2. df2: org.apache.spark.sql.DataFrame = [age: string, name: string]
  3. scala> df2.show
  4. +---+--------+
  5. |age| name|
  6. +---+--------+
  7. | 18|zhangsan|
  8. | 25| lisi|
  9. +---+--------+
  10. scala>