1.1 基于docker的spark配置文件

docker-compose-spark.yml

  1. version: '2'
  2. services:
  3. spark:
  4. image: docker.io/bitnami/spark:3
  5. environment:
  6. - SPARK_MODE=master
  7. - SPARK_RPC_AUTHENTICATION_ENABLED=no
  8. - SPARK_RPC_ENCRYPTION_ENABLED=no
  9. - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
  10. - SPARK_SSL_ENABLED=no
  11. ports:
  12. - '8080:8080'
  13. spark-worker-1:
  14. image: docker.io/bitnami/spark:3
  15. environment:
  16. - SPARK_MODE=worker
  17. - SPARK_MASTER_URL=spark://spark:7077
  18. - SPARK_WORKER_MEMORY=1G
  19. - SPARK_WORKER_CORES=1
  20. - SPARK_RPC_AUTHENTICATION_ENABLED=no
  21. - SPARK_RPC_ENCRYPTION_ENABLED=no
  22. - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
  23. - SPARK_SSL_ENABLED=no
  24. spark-worker-2:
  25. image: docker.io/bitnami/spark:3
  26. environment:
  27. - SPARK_MODE=worker
  28. - SPARK_MASTER_URL=spark://spark:7077
  29. - SPARK_WORKER_MEMORY=1G
  30. - SPARK_WORKER_CORES=1
  31. - SPARK_RPC_AUTHENTICATION_ENABLED=no
  32. - SPARK_RPC_ENCRYPTION_ENABLED=no
  33. - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
  34. - SPARK_SSL_ENABLED=no

该目录下执行

  1. docker-compose up

在浏览器中输入 localhost:8080 访问master 的web UI

pyspark 本地demo

  1. from pyspark import SparkContext
  2. sc = SparkContext("local", "count app")
  3. words = sc.parallelize(
  4. ["scala",
  5. "java",
  6. "hadoop",
  7. "spark",
  8. "akka",
  9. "spark vs hadoop",
  10. "pyspark",
  11. "pyspark and spark"
  12. ])
  13. counts = words.count()
  14. print("Number of elements in RDD -> %i" % counts)

参考:

  1. https://www.cnblogs.com/jiujiubashiyi/p/15600979.html