Problem

配置好hadoop,spark,yarn之后,执行pyspark程序,出现警告

WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set

Reason
没有指定jars包,会导致不断上传jar包到分布式存储

Solution

上传spark中的jar包到HDFS

  1. # 查看目录
  2. hadoop fs -ls -R /
  3. # 创建jars文件夹
  4. hadoop fs -mkdir -p /user/spark/jars
  5. # 上传jar包
  6. hadoop fs -put $SPARK_HOME/jars/* /user/spark/jars