Problem
配置好hadoop,spark,yarn之后,执行pyspark程序,出现警告
WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set
Reason
没有指定jars包,会导致不断上传jar包到分布式存储
Solution
上传spark中的jar包到HDFS
# 查看目录hadoop fs -ls -R /# 创建jars文件夹hadoop fs -mkdir -p /user/spark/jars# 上传jar包hadoop fs -put $SPARK_HOME/jars/* /user/spark/jars
