前置环境
- Hadoop环境安装
- Scala环境安装
下载安装
https://archive.apache.org/dist/spark/
https://spark.apache.org/downloads.html
tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz -C /opt/bigdata/
ln -s spark-3.0.3-bin-hadoop2.7 spark (方便版本更新)
vim /etc/profile.d/spark_env.sh
source /etc/profile#set spark environment
export SPARK_HOME=/opt/bigdata/spark
export PATH=$PATH:$SPARK_HOME/bin
spark-shell目录授权用户
chown -R god:god /opt/bigdata本地模式
cp spark-env.sh.template spark-env.shvim spark-env.sh
```scala JAVA_HOME=/usr/java/default SCALA_HOME=/opt/bigdata/scala
HADOOP_CONF_DIR=/opt/bigdata/hadoop/etc/hadoop
<a name="tBfCt"></a>
## 启动服务
spark-shell --master local[2]<br />本地模式运行圆周率:<br />cd /opt/bigdata/spark/examples/jars/<br />spark-submit --master local[2] --class org.apache.spark.examples.SparkPi spark-examples_2.12-3.0.3.jar 10
<a name="kGgPf"></a>
# 集群模式(Standalone)
<a name="jALPR"></a>
## vim spark-env.sh
```shell
JAVA_HOME=/usr/java/default
SCALA_HOME=/opt/bigdata/scala
HADOOP_CONF_DIR=/opt/bigdata/hadoop/etc/hadoop
# spark config
SPARK_MASTER_HOST=master01
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8081
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://mycluster/spark/eventLogs -Dspark.history.fs.cleaner.enabled=true"
vim slaves(workers【3.x】)
vim spark-defaults.conf
cp spark-defaults.conf.template spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://mycluster/spark/eventLogs
spark.eventLog.compress true
HDFS目录创建
hdfs dfs -mkdir -p /spark/eventLogs
启动服务
- start-master.sh 或 stop-master.sh
- start-slaves.sh 或 stop-slaves.sh
start-history-server.sh 或 stop-history-server.sh
验证服务
Master WEB:http://172.16.179.150:8081/
- History WEB:http://172.16.179.150:18080/
- spark-submit —master spark://master01:7077 —class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 10
执行完毕,可在WEB页面查看HA高可用集群
HA集群是在【集群模式】基础之上搭建vim spark-env.sh
```shell当前配置需要注释掉
SPARK_MASTER_HOST=master01
配置zookeeper集群
SPARK_DAEMON_JAVA_OPTS=”-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master01:2181,node01:2181,node02:2181 -Dspark.deploy.zookeeper.dir=/spark-ha”
<a name="BMQmk"></a>
## 启动服务
- start-master.sh
- 确认当前可用的主节点,然后在主节点上执行 start-slaves.sh
- cd /opt/bigdata/spark/examples/jars
- spark-submit --master **spark://master01:7077,node01:7077** --class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 10<br />执行完毕,可在WEB页面查看
<a name="s4OPj"></a>
# Spark on YARN
<a name="SOsRl"></a>
## 设置环境变量
cd /opt/bigdata/spark/conf<br />vim spark-env.sh
```shell
YARN_CONF_DIR=/opt/bigdata/hadoop/etc/hadoop
历史服务MRHistoryServer
cd /opt/bigdata/hadoop/etc/hadoop
vim yarn-site.xml
<property>
<name>yarn.log.server.url</name>
<value>http:master01:19888/jobhistory/logs</value>
</property>
历史服务HistoryServer
cd /opt/bigdata/spark/conf
vim spark-defaults.conf
spark.yarn.historyServer.address master01:18080
配置依赖Spark Jar包
避免每次运行都需要下载jar资源
hdfs dfs -mkdir -p /spark/apps/jars/
hdfs dfs -put /opt/bigdata/spark/jars/* /spark/apps/jars/
YARN资源检查(测试配置)
cd /opt/bigdata/hadoop/etc/hadoop
vim yarn-site.xml
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
启动服务
- 启动HDFS、YARN
- 启动历史服务:mr-jobhistory-daemon.sh start historyserver
- 启动Spark历史服务:start-history-server.sh
- cd /opt/bigdata/spark/examples/jars
可用自定义上传的jar:hdfs://master01:8020/spark/apps/spark*.jar - spark-submit —master yarn —class org.apache.spark.examples.SparkPi spark-examples_2.12-3.1.2.jar 10
执行完毕,可在WEB页面查看