[RDD] groupBy分组后排序案例

浏览 83 扫码分享 2023-11-23 13:22:14

var df =  spark.sparkContext.makeRDD(Seq(
      ("1", "张三", "数学", 89),
      ("1", "张三", "英语", 78),
      ("1", "张三", "物理", 92),
      ("2", "李四", "数学", 82),
      ("2", "李四", "英语", 98)
    )).toDF("id", "name", "course", "score")
    df.rdd.groupBy(row => row.getAs[String]("id")).mapValues(rows => {
      rows.maxBy(row => row.getAs[Int]("score"))
    }).map(_._2).foreach(println)

找出同一个用户中分数最高的记录

若有收获，就点个赞吧

让时间为你证明