准备数据

test.txt

  1. 111
  2. sss
  3. ddd
  4. as
  5. zjj
  6. assssd
  7. zjj
  8. aaa
  9. sssds
  10. aaa

创建Maven项目

https://blog.csdn.net/qq_41489540/article/details/109430098

依赖xml

  1. <dependencies>
  2. <dependency>
  3. <groupId>org.apache.spark</groupId>
  4. <artifactId>spark-core_2.11</artifactId>
  5. <version>2.1.1</version>
  6. </dependency>
  7. </dependencies>
  8. <build>
  9. <plugins>
  10. <!-- 打包插件, 否则 scala 类不会编译并打包进去 -->
  11. <plugin>
  12. <groupId>net.alchim31.maven</groupId>
  13. <artifactId>scala-maven-plugin</artifactId>
  14. <version>3.4.6</version>
  15. <executions>
  16. <execution>
  17. <goals>
  18. <goal>compile</goal>
  19. <goal>testCompile</goal>
  20. </goals>
  21. </execution>
  22. </executions>
  23. </plugin>
  24. </plugins>
  25. </build>

编写wordcount

  1. import org.apache.spark.rdd.RDD
  2. import org.apache.spark.{SparkConf, SparkContext}
  3. object WordCount {
  4. def main(args: Array[String]): Unit = {
  5. // 指定要分析的文件
  6. val filePath ="D:\\Downloads\\test.txt"
  7. // 1. 创建一个SparkContext 打包的时候, 把master的设置去掉, 在提交的时候使用 --maser 来设置master
  8. //这样运行只能是local模式,
  9. val conf: SparkConf = new SparkConf().setMaster("local[2]") .setAppName("WordCount")
  10. val sc: SparkContext = new SparkContext(conf)
  11. // 2. 从数据源得到一个RDD
  12. val lineRDD: RDD[String] = sc.textFile(filePath)
  13. // 3. 对RDD做各种转换
  14. val resultRDD: RDD[(String, Int)] = lineRDD
  15. .flatMap(_.split("\\W"))
  16. .map((_, 1))
  17. .reduceByKey(_ + _)
  18. // 4. 执行一个行动算子 (collect: 把各个节点计算后的数据, 拉取到驱动端)
  19. val wordCountArr = resultRDD.collect()
  20. wordCountArr.foreach(println)
  21. // 5. 关闭SparkContext
  22. sc.stop()
  23. }
  24. }

运行

  1. (ddd,1)
  2. (zjj,2)
  3. (as,1)
  4. (sssds,1)
  5. (assssd,1)
  6. (sss,1)
  7. (111,1)
  8. (aaa,2)

码云代码地址

https://gitee.com/crow1/ZJJ_SparkCore

找demo01