问题背景

  1. spark thrift server本地日志文件过大
  2. yarn applicationMaster redirect to spark
    1. executors
  • jdbc thrift server通过${SPARK_HOME}/sbin/start-thriftserver.sh —master yarn-client启动

分析定位

  • 本地日志应该是通过重定向yarn-client 任务的输出到文件的
  1. vim ${SPARK_HOME}/sbin/start-thriftserver.sh
    • 没有相关信息,继续调用 ${SPARK_HOME}/sbin/spark-daemon.sh
  2. vim ${SPARK_HOME}/sbin/spark-daemon.sh

    1. # line 128
    2. execute_command() {
    3. if [ -z ${SPARK_NO_DAEMONIZE+set} ]; then
    4. nohup -- "$@" >> $log 2>&1 < /dev/null &
    5. newpid="$!"
    6. echo "$newpid" > "$pid"
    7. # Poll for up to 5 seconds for the java process to start
    8. for i in {1..10}
    9. do
    10. if [[ $(ps -p "$newpid" -o comm=) =~ "java" ]]; then
    11. break
    12. fi
    13. sleep 0.5
    14. done
    15. sleep 2
    16. # Check if the process has died; in that case we'll tail the log so the user can see
    17. if [[ ! $(ps -p "$newpid" -o comm=) =~ "java" ]]; then
    18. echo "failed to launch: $@"
    19. tail -10 "$log" | sed 's/^/ /'
    20. echo "full log in $log"
    21. fi
    22. else
    23. "$@"
    24. fi
    25. }

解决方案

  • yarn-client 任务的输出按照日志分类输出到文件
  • 关于本地重定向日志,可以直接重定向到/dev/null

  1. $SPARK_HOME/sbin/start-thriftserver.sh --name "XXX Thrift Server" --master yarn-client --queue xxx --num-executors 2 --conf spark.driver.memory=10g --executor-memory 6g --conf spark.executor.memoryOverhead=2048 \
  2. --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --hiveconf hive.default.fileformat=parquet \
  3. --files "$SPARK_HOME/xxx_conf/log4j.properties" \
  4. --conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties \
  5. --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties

在log4j.properties里面,修改配置为将日志定向到文件里,去掉原本输出到Console的配置。为避免同一个executor的不同job同时写一直日志文件的现象,需要将日志文件的输出路径指定为spark.yarn.app.container.log.dir。
这样,因为不同的任务使用的不同container,将会动态的创建日志到当前任务的container目录下,日志的输出和原来的stdout和stderr一样的效果。

  • $SPARK_HOME/conf/log4j.properties ```basic log4j.rootLogger =INFO,stdout,I,E

output to console

log4j.appender.stdout = org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target = System.out log4j.appender.stdout.layout = org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm} %5p %t %c{2}:%L - %m%n

output error to files

log4j.appender.E=org.apache.log4j.DailyRollingFileAppender log4j.appender.E.layout=org.apache.log4j.PatternLayout log4j.appender.E.layout.conversionPattern=%-d{yyyy-MM-dd HH:mm:ss} %5p %t %c{2}:%L - %m%n log4j.appender.E.maxFileSize=100MB log4j.appender.E.maxBackupIndex=5 log4j.appender.E.Append = true log4j.appender.E.Threshold = ERROR log4j.appender.E.file=/home/root/log/streaming/stderror.log log4j.appender.E.encoding=UTF-8

output info to files

log4j.appender.I=org.apache.log4j.DailyRollingFileAppender log4j.appender.I.layout=org.apache.log4j.PatternLayout log4j.appender.I.layout.conversionPattern=%-d{yyyy-MM-dd HH:mm:ss} %5p %t %c{2}:%L - %m%n log4j.appender.I.maxFileSize=100MB log4j.appender.I.maxBackupIndex=5 log4j.appender.I.Append = true log4j.appender.I.Threshold = INFO log4j.appender.I.file=/home/root/log/streaming/stdout.log log4j.appender.I.encoding=UTF-8

  1. - ${SPARK_HOME}/conf/spark-defaults.conf
  2. ```bash
  3. spark.eventLog.enabled true
  4. spark.eventLog.dir hdfs://SERVICE-HADOOP-admin-1//var/log/spark_hislog
  5. spark.history.fs.logDirectory hdfs://SERVICE-HADOOP-admin-1//var/log/spark_hislog
  6. spark.history.fs.update.interval 20s
  7. spark.history.fs.cleaner.enabled true
  8. spark.history.fs.cleaner.maxAge 30d
  9. spark.history.fs.cleaner.interval 1d
  10. spark.sql.warehouse.dir hdfs://SERVICE-HADOOP-admin-1//user/hive/warehouse
  11. spark.driver.memory 4g
  12. spark.executor.memory 4g
  13. spark.driver.extraJavaOptions -XX:MaxPermSize=1024m -XX:PermSize=256m
  14. spark.port.maxRetries 100

方案验证

思考总结

Reference