1. 拉取镜像

1. 创建存放资源的文件夹

  1. mkdir -p ~/k8s
  2. mkdir ~/k8s/spark-helm
  3. cd ~/k8s/spark-helm

2. 从官方Helm库拉取镜像

  1. helm search repo spark
  2. helm install incubator/sparkoperator --namespace spark-operator

2. 更改配置

CRD结构

Spark Operator涉及到的CRD结构如下:

  1. ScheduledSparkApplication
  2. |__ ScheduledSparkApplicationSpec
  3. |__ SparkApplication
  4. |__ ScheduledSparkApplicationStatus
  5. |__ SparkApplication
  6. |__ SparkApplicationSpec
  7. |__ DriverSpec
  8. |__ SparkPodSpec
  9. |__ ExecutorSpec
  10. |__ SparkPodSpec
  11. |__ Dependencies
  12. |__ MonitoringSpec
  13. |__ PrometheusSpec
  14. |__ SparkApplicationStatus
  15. |__ DriverInfo

作业编排

如果我要提交一个作业,那么我就可以定义如下一个SparkApplication的yaml,关于yaml里面的字段含义,可以参考CRD文档。

  1. apiVersion: sparkoperator.k8s.io/v1beta1
  2. kind: SparkApplication
  3. metadata:
  4. ...
  5. spec:
  6. deps: {}
  7. driver:
  8. coreLimit: 200m
  9. cores: 0.1
  10. labels:
  11. version: 2.3.0
  12. memory: 512m
  13. serviceAccount: spark
  14. executor:
  15. cores: 1
  16. instances: 1
  17. labels:
  18. version: 2.3.0
  19. memory: 512m
  20. image: gcr.io/ynli-k8s/spark:v2.4.0
  21. mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
  22. mainClass: org.apache.spark.examples.SparkPi
  23. mode: cluster
  24. restartPolicy:
  25. type: OnFailure
  26. onFailureRetries: 3
  27. onFailureRetryInterval: 10
  28. onSubmissionFailureRetries: 5
  29. onSubmissionFailureRetryInterval: 20
  30. type: Scala
  31. status:
  32. sparkApplicationId: spark-5f4ba921c85ff3f1cb04bef324f9154c9
  33. applicationState:
  34. state: COMPLETED
  35. completionTime: 2018-02-20T23:33:55Z
  36. driverInfo:
  37. podName: spark-pi-83ba921c85ff3f1cb04bef324f9154c9-driver
  38. webUIAddress: 35.192.234.248:31064
  39. webUIPort: 31064
  40. webUIServiceName: spark-pi-2402118027-ui-svc
  41. webUIIngressName: spark-pi-ui-ingress
  42. webUIIngressAddress: spark-pi.ingress.cluster.com
  43. executorState:
  44. spark-pi-83ba921c85ff3f1cb04bef324f9154c9-exec-1: COMPLETED
  45. LastSubmissionAttemptTime: 2018-02-20T23:32:27Z

3. 提交作业

  1. kubectl apply -f spark-pi.yaml

参考

知乎:Spark on Kubernetes的现状与挑战
https://zhuanlan.zhihu.com/p/76318638