1. 拉取镜像
1. 创建存放资源的文件夹
mkdir -p ~/k8s
mkdir ~/k8s/spark-helm
cd ~/k8s/spark-helm
2. 从官方Helm库拉取镜像
helm search repo spark
helm install incubator/sparkoperator --namespace spark-operator
2. 更改配置
CRD结构
Spark Operator涉及到的CRD结构如下:
ScheduledSparkApplication
|__ ScheduledSparkApplicationSpec
|__ SparkApplication
|__ ScheduledSparkApplicationStatus
|__ SparkApplication
|__ SparkApplicationSpec
|__ DriverSpec
|__ SparkPodSpec
|__ ExecutorSpec
|__ SparkPodSpec
|__ Dependencies
|__ MonitoringSpec
|__ PrometheusSpec
|__ SparkApplicationStatus
|__ DriverInfo
作业编排
如果我要提交一个作业,那么我就可以定义如下一个SparkApplication的yaml,关于yaml里面的字段含义,可以参考CRD文档。
apiVersion: sparkoperator.k8s.io/v1beta1
kind: SparkApplication
metadata:
...
spec:
deps: {}
driver:
coreLimit: 200m
cores: 0.1
labels:
version: 2.3.0
memory: 512m
serviceAccount: spark
executor:
cores: 1
instances: 1
labels:
version: 2.3.0
memory: 512m
image: gcr.io/ynli-k8s/spark:v2.4.0
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
mainClass: org.apache.spark.examples.SparkPi
mode: cluster
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
type: Scala
status:
sparkApplicationId: spark-5f4ba921c85ff3f1cb04bef324f9154c9
applicationState:
state: COMPLETED
completionTime: 2018-02-20T23:33:55Z
driverInfo:
podName: spark-pi-83ba921c85ff3f1cb04bef324f9154c9-driver
webUIAddress: 35.192.234.248:31064
webUIPort: 31064
webUIServiceName: spark-pi-2402118027-ui-svc
webUIIngressName: spark-pi-ui-ingress
webUIIngressAddress: spark-pi.ingress.cluster.com
executorState:
spark-pi-83ba921c85ff3f1cb04bef324f9154c9-exec-1: COMPLETED
LastSubmissionAttemptTime: 2018-02-20T23:32:27Z
3. 提交作业
kubectl apply -f spark-pi.yaml
参考
知乎:Spark on Kubernetes的现状与挑战
https://zhuanlan.zhihu.com/p/76318638