scala为了充分使用多核cpu,提供了并行集合(有别于前面的串行集合),用于多核环境的并行计算
主要用到的算法有:
Divide and conquer: 分治算法,scala通过splitters,combiners等抽象层来实现,主要原理是将计算工作分解很多任务,分发给一些处理器去完成,并将他们处理结果合并返回
Work stealin: 算法,主要用于任务调度负载均衡(load-balancing),通俗点说,就是完成自己的所有任务之后,发现其他人还有活没干完,主动(或被安排)帮他人一起干,这样达到尽早干完的目的
scala> val x = List(1, 2, 3, 4, 5, 6)
x: List[Int] = List(1, 2, 3, 4, 5, 6)
//转换为并行化集合
scala> x.par
res3: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4, 5, 6)
scala> x.par.sum
res4: Int = 21
注意fold
val arr = Array(1, 2, 3, 4, 5, 6)
println(arr.par.sum) //21
println(arr.par.fold(10)(_+_)) //81
因为fold的初始值在每个线程中都有
如果想避免这个,就用有特定顺序的
println(arr.par.foldLeft(10)(_+_)) //31
提示: foldLeft和foldRight有一种缩写方法对应分别是:/
和:\
val list4 = List(1, 9, 2, 8)
# 0是初始值
# 相当于 val i6 = (0 /: list4)( (res, next) => res - next)
val i6 = (0 /: list4)(_ - _)
println(i6)
val sentence = "落花人独立,微雨燕双飞"
# 字符串也是字符的集合
# 取出一个字到剩余集合里面找
val i7 = (Map[Char, Int]() /: sentence)((m, c) => m + (c -> (m.getOrElse(c, 0) + 1)))
println(i7)
记录中间值
val i8= (1 to 10).scanLeft(0)(_ + _)
println(i8)
查看处理的线程
val result1 = (0 to 1000).map{case _ => Thread.currentThread.getName}.distinct
val result2 = (0 to 1000).par.map{case _ => Thread.currentThread.getName}.distinct
println(result1)
println(result2)
输出
Vector(main)
ParVector(scala-execution-context-global-11, scala-execution-context-global-17, scala-execution-context-global-15, scala-execution-context-global-13, scala-execution-context-global-16, scala-execution-context-global-12, scala-execution-context-global-14, scala-execution-context-global-18)