es百萬行寫入jg性能優化

優化前1小時寫入未完成

表1:400萬

es 438萬行0.3kb/行,8節點6分片1副本,寫入到jg hbase表同樣的8節點 50region,再反寫到es原index中
spark 100executor,2min 寫完,200萬/min,連讀帶寫入jg和反寫id到es:30萬/s

es 438萬行0.3kb/行,8節點6分片1副本

image.png

spark 100executor,2min 寫完,200萬/min,30萬/s

image.png

jg hbase表 50region

表2:300萬

es 345萬行記錄


image.png

spark 50executor,3min 寫完,100萬/min,20萬/s


image.png

優化點

  1. 增加repartition 到20,每個partition處理20萬左右
  2. 增加spark executor memory到2g,之前1g經常溢寫到硬盤,比較慢
  3. jg create時默認region = 50,增大并發
  4. jg batch loading = true,id block size = 100000,buffer = 102400

lib問題解決:

  • 增加http-client
  • 增加guava
  • 增加hbase-client

放到每個yarn nm節點上,啟動spark時

--conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/jars/my/*" \

啟動:

nohup spark2-submit \
          --conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/jars/my/*" \
          --master yarn-client \
          --class com.didichuxing.sts.dcp.knowledgebase.sparkprocess.app.Es2JgVertexJob \
          --executor-memory 2g \
          dcp-knowledgebase-sparkprocess-1.0-SNAPSHOT-jar-with-dependencies.jar \
          dws_kg_vertex_vehicle \
          100.90.170.15:9200,100.90.170.16:9200,100.90.164.32:9200 \
          100.90.170.15,100.90.170.16,100.90.164.32 \
          2181 \
          janusgraph_biggraph1 \
          100.90.170.15:9200,100.90.170.16:9200,100.90.164.32:9200 \
          janusgraph_biggraph1 \
          100000 \
          102400 \
          6 \
          20 \
          > logs_veh_jg.logs &

連接:

val builder = JanusGraphFactory.build
builder.set("storage.backend", "hbase")
builder.set("storage.hostname", jgConf(0))
builder.set("storage.port", jgConf(1))
builder.set("storage.hbase.table", jgConf(2))
builder.set("storage.hbase.skip-schema-check", "true")
builder.set("index.es.backend", "elasticsearch")
builder.set("index.es.hostname", jgConf(3))
builder.set("index.es.index-name", jgConf(4))

// BulkLoad
builder.set("storage.batch-loading", "true")
builder.set("ids.block-size", jgConf(5)) //default 10000, the number of vertices you expect to add per JanusGraph instance per hour.
builder.set("ids.authority.wait-time", "1000ms") //default 300ms
builder.set("ids.renew-timeout", "120000") //default 120000ms
builder.set("storage.buffer-size", jgConf(6)) //default 1024
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。