1.1 基于docker的spark配置文件
docker-compose-spark.yml
version: '2'services:spark:image: docker.io/bitnami/spark:3environment:- SPARK_MODE=master- SPARK_RPC_AUTHENTICATION_ENABLED=no- SPARK_RPC_ENCRYPTION_ENABLED=no- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no- SPARK_SSL_ENABLED=noports:- '8080:8080'spark-worker-1:image: docker.io/bitnami/spark:3environment:- SPARK_MODE=worker- SPARK_MASTER_URL=spark://spark:7077- SPARK_WORKER_MEMORY=1G- SPARK_WORKER_CORES=1- SPARK_RPC_AUTHENTICATION_ENABLED=no- SPARK_RPC_ENCRYPTION_ENABLED=no- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no- SPARK_SSL_ENABLED=nospark-worker-2:image: docker.io/bitnami/spark:3environment:- SPARK_MODE=worker- SPARK_MASTER_URL=spark://spark:7077- SPARK_WORKER_MEMORY=1G- SPARK_WORKER_CORES=1- SPARK_RPC_AUTHENTICATION_ENABLED=no- SPARK_RPC_ENCRYPTION_ENABLED=no- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no- SPARK_SSL_ENABLED=no
该目录下执行
docker-compose up
在浏览器中输入 localhost:8080 访问master 的web UI
pyspark 本地demo
from pyspark import SparkContextsc = SparkContext("local", "count app")words = sc.parallelize(["scala","java","hadoop","spark","akka","spark vs hadoop","pyspark","pyspark and spark"])counts = words.count()print("Number of elements in RDD -> %i" % counts)
参考:
