優化前1小時寫入未完成
表1:400萬
es 438萬行0.3kb/行,8節點6分片1副本,寫入到jg hbase表同樣的8節點 50region,再反寫到es原index中
spark 100executor,2min 寫完,200萬/min,連讀帶寫入jg和反寫id到es:30萬/s
es 438萬行0.3kb/行,8節點6分片1副本
image.png
spark 100executor,2min 寫完,200萬/min,30萬/s
image.png
jg hbase表 50region
表2:300萬
es 345萬行記錄
image.png
spark 50executor,3min 寫完,100萬/min,20萬/s
image.png
優化點
- 增加repartition 到20,每個partition處理20萬左右
- 增加spark executor memory到2g,之前1g經常溢寫到硬盤,比較慢
- jg create時默認region = 50,增大并發
- jg batch loading = true,id block size = 100000,buffer = 102400
lib問題解決:
- 增加http-client
- 增加guava
- 增加hbase-client
放到每個yarn nm節點上,啟動spark時
--conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/jars/my/*" \
啟動:
nohup spark2-submit \
--conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/jars/my/*" \
--master yarn-client \
--class com.didichuxing.sts.dcp.knowledgebase.sparkprocess.app.Es2JgVertexJob \
--executor-memory 2g \
dcp-knowledgebase-sparkprocess-1.0-SNAPSHOT-jar-with-dependencies.jar \
dws_kg_vertex_vehicle \
100.90.170.15:9200,100.90.170.16:9200,100.90.164.32:9200 \
100.90.170.15,100.90.170.16,100.90.164.32 \
2181 \
janusgraph_biggraph1 \
100.90.170.15:9200,100.90.170.16:9200,100.90.164.32:9200 \
janusgraph_biggraph1 \
100000 \
102400 \
6 \
20 \
> logs_veh_jg.logs &
連接:
val builder = JanusGraphFactory.build
builder.set("storage.backend", "hbase")
builder.set("storage.hostname", jgConf(0))
builder.set("storage.port", jgConf(1))
builder.set("storage.hbase.table", jgConf(2))
builder.set("storage.hbase.skip-schema-check", "true")
builder.set("index.es.backend", "elasticsearch")
builder.set("index.es.hostname", jgConf(3))
builder.set("index.es.index-name", jgConf(4))
// BulkLoad
builder.set("storage.batch-loading", "true")
builder.set("ids.block-size", jgConf(5)) //default 10000, the number of vertices you expect to add per JanusGraph instance per hour.
builder.set("ids.authority.wait-time", "1000ms") //default 300ms
builder.set("ids.renew-timeout", "120000") //default 120000ms
builder.set("storage.buffer-size", jgConf(6)) //default 1024