准备数据
test.txt
111sssdddaszjjassssdzjjaaasssdsaaa
创建Maven项目
https://blog.csdn.net/qq_41489540/article/details/109430098
依赖xml
<dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.1.1</version></dependency></dependencies><build><plugins><!-- 打包插件, 否则 scala 类不会编译并打包进去 --><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.4.6</version><executions><execution><goals><goal>compile</goal><goal>testCompile</goal></goals></execution></executions></plugin></plugins></build>
编写wordcount
import org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}object WordCount {def main(args: Array[String]): Unit = {// 指定要分析的文件val filePath ="D:\\Downloads\\test.txt"// 1. 创建一个SparkContext 打包的时候, 把master的设置去掉, 在提交的时候使用 --maser 来设置master//这样运行只能是local模式,val conf: SparkConf = new SparkConf().setMaster("local[2]") .setAppName("WordCount")val sc: SparkContext = new SparkContext(conf)// 2. 从数据源得到一个RDDval lineRDD: RDD[String] = sc.textFile(filePath)// 3. 对RDD做各种转换val resultRDD: RDD[(String, Int)] = lineRDD.flatMap(_.split("\\W")).map((_, 1)).reduceByKey(_ + _)// 4. 执行一个行动算子 (collect: 把各个节点计算后的数据, 拉取到驱动端)val wordCountArr = resultRDD.collect()wordCountArr.foreach(println)// 5. 关闭SparkContextsc.stop()}}
运行
(ddd,1)(zjj,2)(as,1)(sssds,1)(assssd,1)(sss,1)(111,1)(aaa,2)
码云代码地址
https://gitee.com/crow1/ZJJ_SparkCore
找demo01
