准备数据
test.txt
111
sss
ddd
as
zjj
assssd
zjj
aaa
sssds
aaa
创建Maven项目
https://blog.csdn.net/qq_41489540/article/details/109430098
依赖xml
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- 打包插件, 否则 scala 类不会编译并打包进去 -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.4.6</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
编写wordcount
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
def main(args: Array[String]): Unit = {
// 指定要分析的文件
val filePath ="D:\\Downloads\\test.txt"
// 1. 创建一个SparkContext 打包的时候, 把master的设置去掉, 在提交的时候使用 --maser 来设置master
//这样运行只能是local模式,
val conf: SparkConf = new SparkConf().setMaster("local[2]") .setAppName("WordCount")
val sc: SparkContext = new SparkContext(conf)
// 2. 从数据源得到一个RDD
val lineRDD: RDD[String] = sc.textFile(filePath)
// 3. 对RDD做各种转换
val resultRDD: RDD[(String, Int)] = lineRDD
.flatMap(_.split("\\W"))
.map((_, 1))
.reduceByKey(_ + _)
// 4. 执行一个行动算子 (collect: 把各个节点计算后的数据, 拉取到驱动端)
val wordCountArr = resultRDD.collect()
wordCountArr.foreach(println)
// 5. 关闭SparkContext
sc.stop()
}
}
运行
(ddd,1)
(zjj,2)
(as,1)
(sssds,1)
(assssd,1)
(sss,1)
(111,1)
(aaa,2)
码云代码地址
https://gitee.com/crow1/ZJJ_SparkCore
找demo01