准备数据
必须是一行,不能是多行
[{ "name":"zhangsan", "age":"18"},{"name":"lisi","age":"25"}]
SparkShell读取Linux本地json文件
root/soft/person.json就是本机路径
var df = spark.read.format(“json”).load(“file:///root/soft/person.json”)
scala> var df = spark.read.format("json").load("file:///root/soft/person.json")df: org.apache.spark.sql.DataFrame = [_corrupt_record: string]scala> df.show+--------------------+| _corrupt_record|+--------------------+| [|| {|| "name":"z...|| "age":"18"|| },|| {|| "name":"l...|| "age":"25"|| }|| ]|+--------------------+scala> var df = spark.read.format("json").load("file:///root/soft/person.json")df: org.apache.spark.sql.DataFrame = [age: string, name: string]scala> df.show+---+--------+|age| name|+---+--------+| 18|zhangsan|| 25| lisi|+---+--------+
SparkShell读取HDFS上的json文件
文件在hdfs的data目录下面
scala> var df2 = spark.read.json("/data/person.json")df2: org.apache.spark.sql.DataFrame = [age: string, name: string]scala> df2.show+---+--------+|age| name|+---+--------+| 18|zhangsan|| 25| lisi|+---+--------+scala>
