1.1 配置父项目Spark
- 创建Maven项目Spark,删除src和test文件夹作为父项目
- 添加scala支持
修改pom
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.16.10</version>
</dependency>
</dependencies>
1.2 配置WordCount模组
配置pom
<build> <finalName>WordCount</finalName> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.2</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
编写代码 ```scala import org.apache.spark.{SparkConf, SparkContext}
object WordCount { def main(args: Array[String]): Unit = { //1.创建Spark配置conf,依赖conf生成SparkContext环境 val conf = new SparkConf() conf.setAppName(“WordCount”) conf.setMaster(“local[*]”) //2.创建SparkContext环境 val sc = new SparkContext(conf) //3.创建RDD val rdd = sc.textFile(“text.txt”) //4.处理 val res = rdd.flatMap(.split(“ “)).map((, 1)).reduceByKey(+,1).sortBy(_._2) //5.输出 res.foreach(println) } } ```