准备数据
必须是一行,不能是多行
[{ "name":"zhangsan", "age":"18"},{"name":"lisi","age":"25"}]
SparkShell读取Linux本地json文件
root/soft/person.json就是本机路径
var df = spark.read.format(“json”).load(“file:///root/soft/person.json”)
scala> var df = spark.read.format("json").load("file:///root/soft/person.json")
df: org.apache.spark.sql.DataFrame = [_corrupt_record: string]
scala> df.show
+--------------------+
| _corrupt_record|
+--------------------+
| [|
| {|
| "name":"z...|
| "age":"18"|
| },|
| {|
| "name":"l...|
| "age":"25"|
| }|
| ]|
+--------------------+
scala> var df = spark.read.format("json").load("file:///root/soft/person.json")
df: org.apache.spark.sql.DataFrame = [age: string, name: string]
scala> df.show
+---+--------+
|age| name|
+---+--------+
| 18|zhangsan|
| 25| lisi|
+---+--------+
SparkShell读取HDFS上的json文件
文件在hdfs的data目录下面
scala> var df2 = spark.read.json("/data/person.json")
df2: org.apache.spark.sql.DataFrame = [age: string, name: string]
scala> df2.show
+---+--------+
|age| name|
+---+--------+
| 18|zhangsan|
| 25| lisi|
+---+--------+
scala>