前置环境

  • Hadoop环境安装
  • Scala环境安装

    下载安装

    https://archive.apache.org/dist/spark/
    https://spark.apache.org/downloads.html
    tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz -C /opt/bigdata/
    ln -s spark-3.0.3-bin-hadoop2.7 spark (方便版本更新)
    vim /etc/profile.d/spark_env.sh
    1. #set spark environment
    2. export SPARK_HOME=/opt/bigdata/spark
    3. export PATH=$PATH:$SPARK_HOME/bin
    source /etc/profile
    spark-shell

    目录授权用户

    chown -R god:god /opt/bigdata

    本地模式

    cp spark-env.sh.template spark-env.sh

    vim spark-env.sh

    ```scala JAVA_HOME=/usr/java/default SCALA_HOME=/opt/bigdata/scala

HADOOP_CONF_DIR=/opt/bigdata/hadoop/etc/hadoop

<a name="tBfCt"></a>
## 启动服务
spark-shell --master local[2]<br />本地模式运行圆周率:<br />cd /opt/bigdata/spark/examples/jars/<br />spark-submit --master local[2] --class org.apache.spark.examples.SparkPi spark-examples_2.12-3.0.3.jar 10
<a name="kGgPf"></a>
# 集群模式(Standalone)
<a name="jALPR"></a>
## vim spark-env.sh
```shell
JAVA_HOME=/usr/java/default
SCALA_HOME=/opt/bigdata/scala

HADOOP_CONF_DIR=/opt/bigdata/hadoop/etc/hadoop

# spark config
SPARK_MASTER_HOST=master01
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8081
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081

SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://mycluster/spark/eventLogs -Dspark.history.fs.cleaner.enabled=true"

vim slaves(workers【3.x】)

node01
node02

vim spark-defaults.conf

cp spark-defaults.conf.template spark-defaults.conf

spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://mycluster/spark/eventLogs
spark.eventLog.compress          true

HDFS目录创建

hdfs dfs -mkdir -p /spark/eventLogs

启动服务

  • start-master.sh 或 stop-master.sh
  • start-slaves.sh 或 stop-slaves.sh
  • start-history-server.sh 或 stop-history-server.sh

    验证服务

  • Master WEB:http://172.16.179.150:8081/

  • History WEB:http://172.16.179.150:18080/
  • spark-submit —master spark://master01:7077 —class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 10
    执行完毕,可在WEB页面查看

    HA高可用集群

    HA集群是在【集群模式】基础之上搭建

    vim spark-env.sh

    ```shell

    当前配置需要注释掉

    SPARK_MASTER_HOST=master01

    配置zookeeper集群

    SPARK_DAEMON_JAVA_OPTS=”-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master01:2181,node01:2181,node02:2181 -Dspark.deploy.zookeeper.dir=/spark-ha”
<a name="BMQmk"></a>
## 启动服务

- start-master.sh 
- 确认当前可用的主节点,然后在主节点上执行 start-slaves.sh
- cd /opt/bigdata/spark/examples/jars
- spark-submit --master **spark://master01:7077,node01:7077** --class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 10<br />执行完毕,可在WEB页面查看
<a name="s4OPj"></a>
# Spark on YARN
<a name="SOsRl"></a>
## 设置环境变量
cd /opt/bigdata/spark/conf<br />vim spark-env.sh
```shell
YARN_CONF_DIR=/opt/bigdata/hadoop/etc/hadoop

历史服务MRHistoryServer

cd /opt/bigdata/hadoop/etc/hadoop
vim yarn-site.xml

  <property>
    <name>yarn.log.server.url</name>
    <value>http:master01:19888/jobhistory/logs</value>
  </property>

历史服务HistoryServer

cd /opt/bigdata/spark/conf
vim spark-defaults.conf

spark.yarn.historyServer.address  master01:18080

配置依赖Spark Jar包

避免每次运行都需要下载jar资源
hdfs dfs -mkdir -p /spark/apps/jars/
hdfs dfs -put /opt/bigdata/spark/jars/* /spark/apps/jars/

YARN资源检查(测试配置)

cd /opt/bigdata/hadoop/etc/hadoop
vim yarn-site.xml

  <property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>

启动服务

  • 启动HDFS、YARN
  • 启动历史服务:mr-jobhistory-daemon.sh start historyserver
  • 启动Spark历史服务:start-history-server.sh
  • cd /opt/bigdata/spark/examples/jars
    可用自定义上传的jar:hdfs://master01:8020/spark/apps/spark*.jar
  • spark-submit —master yarn —class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 10
    执行完毕,可在WEB页面查看