希望构建六亿点,152亿边的实体和关系数据

提交命令:
spark-submit
—master yarn —driver-memory 16g
—num-executors 6 —executor-cores 3 —executor-memory 5g
—class BuildDataForStarGraph WordCounttt-1.0-SNAPSHOT.jar
100000000 600000000 700000000 hdfs://10.3.70.126:8020/lcg/StarGraph/vertices
spark-submit —master yarn —driver-memory 16g
—num-executors 12 —executor-cores 3 —executor-memory 20g
—conf spark.executor.memoryOverhead=4096m
—class louvain.BuildEdgeDataForStarGraph WordCounttt-1.0-SNAPSHOT.jar
hdfs://10.3.70.126:8020/lcg/StarGraph/edges

提交spark任务(添加额外的jar包)

spark-submit —master yarn
—deploy-mode cluster
—conf spark.yarn.jars=hdfs:///lcg/kgimport_test/tasklib/.jar,hdfs:///opt/awakentrain/spark/jars/.jar
—class com.hikvision.medusa.platform.task.data.input.BatchImportDataTask
hdfs:///lcg/kg_import_test/medusa-platform-task-data.jar
someArgs

注意点

  • spark.yarn.jars 设定这个向yarn上提交,里面一定要有spark的包,这里提交的任务,使用这里面配好的spark包,而不是集群里面的spark包
  • spark.jars 会把这里的包,加上SPARK_HOME里面的包的一起,上传,并作为任务的依赖包