scala
- 报错:WARN deploy.SparkSubmit$$anon$2: Failed to load
要在代码开头指定包名并在submit中添加—class20/11/02 13:37:06 WARN deploy.SparkSubmit$$anon$2: Failed to load json2es.
java.lang.ClassNotFoundException: json2es
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:810)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
如: ```scala package cn.test
object json2es { def main(args:Array{String}){ println(“hello world”); } }
则:
```scala
spark-submit --class cn.test.json2es xx.jar
python
以下两套是常用的启动方式,配置可以互相参考,看配置名称基本能知道是什么含义,如果读者不清楚,则先要补充学习spark的基本知识和JVM基本知识。
这是我常用的一套启动方式,用于启动jupyter的
kinit -kt /etc/var/keytab/cac.keytab cac/node4@HADOOP.COM
myconf="/home/pqchen/RuZhi/kmeansofmissjudgment/server/conf/"
pyspark --executor-memory=20G \
--executor-cores=5 \
--driver-memory=2G \
--conf spark.dynamicAllocation.maxExecutors=5 \
--conf spark.default.parallelism=200 \
--conf spark.memory.fraction=0.9 \
--conf spark.memory.storagefraction=0.3 \
--conf spark.memory.offHeap.size=1G \
--conf spark.executor.memoryOverhead=1G\
--conf spark.debug.maxToStringFields=1000\
--conf spark.kryoserializer.buffer.max=1500m\
--conf spark.driver.maxResultSize=1500m\
--conf spark.kryoserializer.buffer=1500m --jars /home/pqchen/RuZhi/kmeansofmissjudgment/server/conf/jar/mysql-connector-java-8.0.16.jar,/home/pqchen/.local/lib/python3.6/site-packages/graphframes-0.8.0-spark2.4-s_2.11.jar --driver-class-path /home/pqchen/RuZhi/kmeansofmissjudgment/server/conf/jar/mysql-connector-java-8.0.16.jar\
--py-files /home/pqchen/.local/lib/python3.6/site-packages/geoip2.zip,/home/pqchen/.local/lib/python3.6/site-packages/maxminddb.zip\
--files /home/pqchen/RuZhi/GeoLite2-City.mmdb
这是我常用的提交任务方式,配置齐全
export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda-5.2.0/envs/py3.6/bin/python3
export PATH=/opt/cloudera/parcels/Anaconda-5.2.0/envs/py3.6/bin/:${PATH}
export PYSPARK_DRIVER_PYTHON=python
RUNNINGPATH=$PWD
DownloadDataDir=$RUNNINGPATH/dbresult/clusterResult/statistics`date +%Y-%m`
DownloadDataRemoteHome=/home/cacboy/work/afterdelete/kmeansResult
kinit -kt /etc/var/keytab/cac.keytab cac/node4@HADOOP.COM
if [ -d "$RUNNINGPATH/dist" ]; then
rm -f $RUNNINGPATH/dist/*
fi
python3 setup.py bdist_egg > /dev/null
egg=$RUNNINGPATH/dist/`ls $RUNNINGPATH/dist/ `
myconf="$RUNNINGPATH/server/conf/"
spark-submit --executor-memory=5G \
--executor-cores=10 \
--driver-memory=4G \
--conf spark.dynamicAllocation.maxExecutors=10 \
--conf spark.default.parallelism=150 \
--conf spark.memory.fraction=0.85 \
--conf spark.memory.storagefraction=0.3 \
--conf spark.memory.offHeap.size=2G \
--conf spark.executor.memoryOverhead=2048 \
--conf spark.core.connection.ack.wait.timeout=300 \
--conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=file:./server/conf/log4j.properties' \
--conf spark.local.dir=//home/dfs/tmp \
--py-files $egg \
--jars $myconf/jar/mysql-connector-java-8.0.16.jar \
--driver-class-path $myconf/jar/mysql-connector-java-8.0.16.jar \
./server/cluster/main.py --dbConfigField='cac'