Spark 遇到的一些错误

一、Spark 与 Hbase 的错误

1. 错误

  1. Exception in thread "main" java.io.IOException: java.lang.reflect.InvocationTargetException
  2. at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
  3. at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
  4. at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
  5. at com.angejia.dw.hadoop.hbase.HBaseClient.<init>(HBaseClient.scala:65)
  6. at com.angejia.dw.recommend.inventory.InventoryIBCF$.init(InventoryIBCF.scala:56)
  7. at com.angejia.dw.recommend.inventory.InventoryIBCF$.main(InventoryIBCF.scala:36)
  8. at com.angejia.dw.recommend.inventory.InventoryIBCF.main(InventoryIBCF.scala)
  9. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  10. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  11. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  12. at java.lang.reflect.Method.invoke(Method.java:606)
  13. at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
  14. at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
  15. at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
  16. at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
  17. at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  18. Caused by: java.lang.reflect.InvocationTargetException
  19. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  20. at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  21. at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  22. at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  23. at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
  24. ... 15 more
  25. Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.RpcRetryingCallerFactory.instantiate(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hbase/client/ServerStatisticTracker;)Lorg/apache/hadoop/hbase/client/RpcRetryingCallerFactory;
  26. at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.createAsyncProcess(ConnectionManager.java:2317)
  27. at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:688)
  28. at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
  29. ... 20 more

2.解决

  • 因为在 spark-env.sh 设置错误的环境变量导致相关类不能加载进来
  • 配置好对应的环境变量
  1. 1. 设置 ~/.bashrc
  2. # Environment variables required by hadoop
  3. export HADOOP_HOME_WARN_SUPPRESS=true
  4. export HADOOP_HOME=/usr/lib/hadoop
  5. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
  6. export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
  7. # HBase
  8. export HBASE_HOME=/usr/local/hbase
  9. export HBASE_CONF_DIR=$HBASE_HOME/conf
  10. # spark
  11. export SPARK_HOME=/usr/local/spark
  12. export SPARK_CONF_DIR=$SPARK_HOME/conf
  13. #libs
  14. export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/hadoop-lzo.jar:$LD_LIBRARY_PATH
  15. 2. 设置 spark 配置
  16. vim $SPARK_HOME/conf/spark-env.sh
  17. SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native:$HADOOP_HOME/lib
  18. SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/hadoop-lzo.jar
  19. export SPARK_WORKER_MEMORY=4000M
  20. export SPARK_DRIVER_MEMORY=5000M

二. topology.py 错误

  1. WARN ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 172.16.24.148
  2. java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/opt/case/app"): error=2, No such file or directory
  3. at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
  4. at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
  5. 这种错误是因为所在执行的 NodeManager 节点没有安装 yarn 客户端导致的, 常见于 CDH 集群版本中。
  6. 解决方法:通过 ClouderaManager NodeManager 节点上部署 Yarn 客户端