7.1 缓存
- persist(StorageLevel),把数据以序列化的形式缓存在JVM的堆空间中
- cache(),调用了persist方法
StorageLevel
object StorageLevel {
val NONE = new StorageLevel(false, false, false, false)
val DISK_ONLY = new StorageLevel(true, false, false, false)
val DISK_ONLY_2 = new StorageLevel(true, false, false, false, 2)
val MEMORY_ONLY = new StorageLevel(false, true, false, true)
val MEMORY_ONLY_2 = new StorageLevel(false, true, false, true, 2)
val MEMORY_ONLY_SER = new StorageLevel(false, true, false, false)
val MEMORY_ONLY_SER_2 = new StorageLevel(false, true, false, false, 2)
val MEMORY_AND_DISK = new StorageLevel(true, true, false, true)
val MEMORY_AND_DISK_2 = new StorageLevel(true, true, false, true, 2)
val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)
val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)
val OFF_HEAP = new StorageLevel(true, true, true, false, 1)
}
7.2 CheckPoint
Lineage过长会造成容错成本过高,可以在中间某个节点持久化,断开RDD依赖关系以降低容错成本,这个持久化的动作就是CheckPoint,
设置检查点会创建一个二进制文件
- 将二进制文件存储至checkpoint目录,由SparkContext.setCheckpointDir()设置
- checkpoint操作要等到Action操作才会触发