Druid單機版安裝及離線導入數據
1.概述
本文快速安裝基于單機服務器,很多配置可以默認不需要修改,數據存儲在操作系統級別的磁盤。推出快速安裝的目的,便于了解并指導基于Druid進行大數據分析的開發流程。
2.安裝要求
Java 8 or higher
Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
8G of RAM
2 vCPUs
3.zookeeper安裝
本次采單機版安裝,如果采用分布式安裝,則需要修改Druid相應配置,反之不需要。Zookeeper默認啟用2181端口監聽。
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz -o zookeeper-3.4.10.tar.gz
tar -xzf zookeeper-3.4.10.tar.gz
cd zookeeper-3.4.10
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
? zookeeper-3.4.10 jps
10565 QuorumPeerMain
17832 Jps
4.Druid安裝
curl -O http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz
tar -xzf druid-0.12.3-bin.tar.gz
cd druid-0.12.3
解壓后 Druid 相關目錄說明
LICENSE - 許可證文件。
bin/ - 快速啟動腳本。
conf/* - 集群安裝配置(包括Hadoop)。
conf-quickstart/* - 快速啟動相關配置。
extensions/* - Druid擴展。
hadoop-dependencies/* - Druid hadoop依賴。
lib/* - Druid核心軟件包。
quickstart/* - 快速啟動示例文件及數據。
5.啟動 Druid 準備
啟動Druid相關服務之前,我們需要做兩件事:
- 啟動Zookeeper
- 切換到Druid根目錄,執行 bin/init
6.啟動 Druid 相關服務
啟動5個Druid進程在不同遠程終端窗口,因為是單機模式,所有進程在同一服務器上;在大的分布式集群中,很多Druid進程可以在同一服務器,我們需要啟動的5個Druid進程:Historical、Broker、coordinator、overlord、middleManager。
啟動historical
java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical
注意跟官網的區別,druid安裝目錄下沒有examples目錄
java `cat examples/conf/druid/coordinator/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
啟動broker
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker
啟動coordinator
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
啟動overload
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord
啟動middleManager
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
7.Druid 控制臺
如果上述服務啟動成功,則可以訪問如下控制臺
- 訪問http://localhost:8090/console.html 可以查看數據批量導入Druid的任務執情況,間隔一段時間刷新一下控制臺,如果看到SUCCESS任務狀態,說明任務執行成功,如下圖所示:
- 訪問http://localhost:8081/ 查看任完成進度、數據分片情況、索引創建等
8.導入離線數據到Druid
{ "type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "/Users/zzy/Documents/zzy/software/druid-0.12.3/quickstart",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
}
},
"dataSchema" : {
"dataSource" : "wikiticker",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "time"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "added",
"type" : "longSum",
"fieldName" : "added"
},
{
"name" : "deleted",
"type" : "longSum",
"fieldName" : "deleted"
},
{
"name" : "delta",
"type" : "longSum",
"fieldName" : "delta"
},
{
"name" : "user_unique",
"type" : "hyperUnique",
"fieldName" : "user"
}
]
},
"tuningConfig" : {
"type" : "index",
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 5000000
},
"jobProperties" : {}
}
}
}
注意baseDir最好是絕對路徑
執行curl命令
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index_local.json localhost:8090/druid/indexer/v1/task
控制臺打印如下
? druid-0.12.3 curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index_local.json localhost:8090/druid/indexer/v1/task
{"task":"index_wikiticker_2018-11-27T03:33:42.307Z"}%
去overlord console查看下task的狀態http://localhost:8090/console.html
任務狀態是failed的
查看日志發現報錯如下:
2018-11-27T03:10:43,416 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[AbstractTask{id='index_wikiticker_2018-11-27T03:10:39.850Z', groupId='index_wikiticker_2018-11-27T03:10:39.850Z', taskResource=TaskResource{availabilityGroup='index_wikiticker_2018-11-27T03:10:39.850Z', requiredCapacity=1}, dataSource='wikiticker', context={}}]
java.lang.IllegalStateException: Failed to create directory within 10000 attempts (tried 1543288243332-0 to 1543288243332-9999)
at com.google.common.io.Files.createTempDir(Files.java:600) ~[guava-16.0.1.jar:?]
at io.druid.segment.indexing.RealtimeTuningConfig.createNewBasePersistDirectory(RealtimeTuningConfig.java:58) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.segment.indexing.RealtimeTuningConfig.makeDefaultTuningConfig(RealtimeTuningConfig.java:68) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.segment.realtime.FireDepartment.<init>(FireDepartment.java:62) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:572) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:264) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
2018-11-27T03:10:43,420 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_wikiticker_2018-11-27T03:10:39.850Z] status changed to [FAILED].
2018-11-27T03:10:43,423 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_wikiticker_2018-11-27T03:10:39.850Z",
"status" : "FAILED",
"duration" : 109
}
解決方法:手動創建臨時目錄,比如上面的臨時目錄var/tmp
mkdir -p tmp
? druid-0.12.3 ll var/tmp
total 0
drwxr-xr-x 2 zzy staff 64 Nov 27 11:33 1543289625953-0
? druid-0.12.3 pwd
/Users/zzy/Documents/zzy/software/druid-0.12.3
注意在druid目錄下創建,不是根目錄!!!
load本地數據成功后,可以在coordinator頁面看到多了一個wikiticker的datasources
查看數據
curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty
返回如下
? druid-0.12.3 curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/\?pretty
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"edits" : 33,
"page" : "Wikipedia:Vandalismusmeldung"
}, {
"edits" : 28,
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
}, {
"edits" : 27,
"page" : "Jeremy Corbyn"
}, {
"edits" : 21,
"page" : "Wikipedia:Administrators' noticeboard/Incidents"
}, {
"edits" : 20,
"page" : "Flavia Pennetta"
}, {
"edits" : 18,
"page" : "Total Drama Presents: The Ridonculous Race"
}, {
"edits" : 18,
"page" : "User talk:Dudeperson176123"
}, {
"edits" : 18,
"page" : "Wikipédia:Le Bistro/12 septembre 2015"
}, {
"edits" : 17,
"page" : "Wikipedia:In the news/Candidates"
}, {
"edits" : 17,
"page" : "Wikipedia:Requests for page protection"
}, {
"edits" : 16,
"page" : "Utente:Giulio Mainardi/Sandbox"
}, {
"edits" : 16,
"page" : "Wikipedia:Administrator intervention against vandalism"
}, {
"edits" : 15,
"page" : "Anthony Martial"
}, {
"edits" : 13,
"page" : "Template talk:Connected contributor"
}, {
"edits" : 12,
"page" : "Chronologie de la Lorraine"
}, {
"edits" : 12,
"page" : "Wikipedia:Files for deletion/2015 September 12"
}, {
"edits" : 12,
"page" : "Гомосексуальный образ жизни"
}, {
"edits" : 11,
"page" : "Constructive vote of no confidence"
}, {
"edits" : 11,
"page" : "Homo naledi"
}, {
"edits" : 11,
"page" : "Kim Davis (county clerk)"
}, {
"edits" : 11,
"page" : "Vorlage:Revert-Statistik"
}, {
"edits" : 11,
"page" : "Конституция Японской империи"
}, {
"edits" : 10,
"page" : "The Naked Brothers Band (TV series)"
}, {
"edits" : 10,
"page" : "User talk:Buster40004"
}, {
"edits" : 10,
"page" : "User:Valmir144/sandbox"
} ]
} ]%
執行Druid SQL查詢
SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
cat quickstart/wikipedia-top-pages-sql.json
{
"query":"SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE \"__time\" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10"
}
執行命令
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql
返回結果
[{"page":"Wikipedia:Vandalismusmeldung","Edits":33},
{"page":"User:Cyde/List of candidates for speedy deletion/Subpage","Edits":28},
{"page":"Jeremy Corbyn","Edits":27},
{"page":"Wikipedia:Administrators' noticeboard/Incidents","Edits":21},
{"page":"Flavia Pennetta","Edits":20},
{"page":"Total Drama Presents: The Ridonculous Race","Edits":18},
{"page":"User talk:Dudeperson176123","Edits":18},
{"page":"Wikipédia:Le Bistro/12 septembre 2015","Edits":18},
{"page":"Wikipedia:In the news/Candidates","Edits":17},
{"page":"Wikipedia:Requests for page protection","Edits":17}]
更多查詢查看官網Tutorial: Querying data
至此Druid單機版及導入離線數據完成,后面會繼續更新Druid其他的文章,歡迎關注交流學習。
參考:
http://yangyangmyself.iteye.com/blog/2321487
http://druid.io/docs/latest/tutorials/index.html