Hadoop YARN 模式

一、提交任务

  1. 1. 提交 streaming 任务
  2. spark-submit \
  3. --name SafeRealtimeClickStreaming \
  4. --queue root.realtime \
  5. --class com.dw2345.dw_realtime.module.safe.streaming.SafeRealtimeClickStreaming \
  6. --master yarn \
  7. --deploy-mode client \
  8. --driver-cores 2 \
  9. --driver-memory 4024M \
  10. --executor-memory 4024M \
  11. --num-executors 2 \
  12. --conf "spark.executorEnv.JAVA_HOME=/usr/local/jdk1.8" \
  13. --conf "spark.yarn.appMasterEnv.JAVA_HOME=/usr/local/jdk1.8" \
  14. --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC" \
  15. --conf "spark.streaming.backpressure.enabled=true" \
  16. --conf "spark.streaming.kafka.maxRatePerPartition=10000" \
  17. --conf "spark.streaming.blockInterval=1000" \
  18. --conf "spark.driver.extraClassPath=${SBT_HOME}/ivy-repository/cache/mysql/mysql-connector-java/jars/mysql-connector-java-5.1.30.jar" \
  19. ~/app/dw_realtime/target/scala-2.11/dw_realtime.jar develop SafeRealtimeClickLog 10 /data/log/real_time/offset/SafeRealtimeClick
  20. PS: 垃圾回收和内存使用
  21. 通过打开 Java 的并发标识 - 清除收集器来减少 GC 引起的不可预测的长暂停,清除收集器总体上会耗费更多的资源,但是会较少暂停的发生
  22. --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
  23. 2. 动态部署
  24. spark-submit \
  25. --name SafeRealtimeClickStreaming \
  26. --queue root.realtime \
  27. --class com.dw2345.dw_realtime.module.safe.streaming.SafeRealtimeClickStreaming \
  28. --master yarn \
  29. --deploy-mode client \
  30. --driver-cores 2 \
  31. --driver-memory 4024M \
  32. --executor-memory 4024M \