<h4>背景:</h4>
我們知道在java中可以有最少3鐘方式來實現定時任務:1、普通thread(里面使用while循環以及sleep) 2、Timer和TimerTask 3、ScheduledExecutorService ,另外還有功能更全的Quartz框架或者是spring集成的Quartz。當然從標題就知道我們今天不是講這些東西,而是講講storm中自帶的定時功能使用,可以使用場景如:每分鐘統計訂單數據累計數據總和等。當然這其中最好的搭配就是使用kafka來做訂單消息推送,目前我們只講個本地main demo。
<h4>一、tick全解</h4>
<b>1、tick的功能</b>
Apache Storm中內置了一種定時機制——tick,它能夠讓任何bolt的所有task每隔一段時間(精確到秒級,用戶可以自定義)收到一個來自systemd的tick stream的tick tuple,bolt收到這樣的tuple后可以根據業務需求完成相應的處理。Tick功能從Apache Storm 0.8.0版本開始支持,本文在Apache Storm 0.9.5上測試。
<b>2、為bolt設置tick</b>
若希望某個bolt每隔一段時間做一些操作,那么可以將bolt繼承BaseBasicBolt/BaseRichBolt,并重寫getComponentConfiguration()方法。在方法中設置Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS的值,單位是秒。
getComponentConfiguration()是backtype.storm.topology.IComponent接口中定義的方法,在此方法的實現中可以定義以Topology開頭的此bolt特定的Config。
<pre>
</pre>
這樣設置之后,此bolt的所有task都會每隔一段時間收到一個來自systemd的tick stream的tick tuple,因此execute()方法可以實現如下:
<pre>
</pre>
<b>3、全局tick</b>
若希望Topology中的每個bolt都每隔一段時間做一些操作,那么可以定義一個Topology全局的tick,同樣是設置Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS的值:
<pre>
<b>當我們在整個Topology上設置tick和我們單個運算bolt上沖突時,其優先級如何呢?事實是在更小范圍的bolt設置的tick優先級更高</b>
<b>4、定時精度問題</b>
Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS是精確到秒級的。例如某bolt設置Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS為10s,理論上說bolt的每個task應該每個10s收到一個tick tuple。實際測試發現,這個時間間隔的精確性是很高的,一般延遲(而不是提前)時間在1-2ms左右。
<h4>二、代碼實現</h4>
1、spout代碼
<pre>
public class TickWordSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private String[] sentences = {"a","b","c"};
private int index = 0;
public void nextTuple() {
this.collector.emit(new Values(sentences[index]));
index ++;
if(index >= sentences.length){
index = 0;
}
try {
Thread.sleep(1);
} catch (InterruptedException e) {
}
}
@SuppressWarnings("rawtypes")
public void open(Map config, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
</pre>
2、bolt代碼
<pre>
public class TickWordCountBolt extends BaseBasicBolt{
Map<String, Integer> counts = new ConcurrentHashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
if (tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID) && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID)) {
System.err.println("TickWordCount bolt: " + new SimpleDateFormat("yyyy-MM-dd HH:mm:ss:SSS").format(new Date()));
//模擬聚合打印結果
for (String key : counts.keySet()) {
System.err.println("key: " + key + " count: " + counts.get(key));
}
//模擬10秒鐘的結果處理以后清空操作
counts.clear();
} else {
String result = tuple.getStringByField("word");
if(counts.get(result) == null){
counts.put(result, 1);
}else{
counts.put(result, counts.get(result) + 1);
}
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
//設置10秒發送一次tick心跳
@SuppressWarnings("static-access")
@Override
public Map<String, Object> getComponentConfiguration() {
Config conf = new Config();
conf.put(conf.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10);
return conf;
}
}
</pre>
3、main調試代碼
<pre>
public class TickTest {
@SuppressWarnings("static-access")
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new TickWordSpout());
//啟動3個線程按word值進行分組處理
builder.setBolt("count", new TickWordCountBolt(),3).fieldsGrouping("spout", new Fields("word"));
Config conf = new Config();
//設置一個全局的Topology發送tick心跳時間,測試優先級
conf.put(conf.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 7);
conf.setDebug(false);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
}
}
}
---------------------------------------輸出結果------------------------------------------------
TickWordCount bolt: 2016-09-17 12:41:23:031
key: b count: 3014
TickWordCount bolt: 2016-09-17 12:41:23:041
key: c count: 3017
TickWordCount bolt: 2016-09-17 12:41:23:053
key: a count: 3021
</br>
TickWordCount bolt: 2016-09-17 12:41:33:031
key: b count: 3294
TickWordCount bolt: 2016-09-17 12:41:33:041
key: c count: 3294
TickWordCount bolt: 2016-09-17 12:41:33:053
key: a count: 3295
</br>
TickWordCount bolt: 2016-09-17 12:41:43:031
key: b count: 3294
TickWordCount bolt: 2016-09-17 12:41:43:041
key: c count: 3294
TickWordCount bolt: 2016-09-17 12:41:43:053
key: a count: 3293
</br>
TickWordCount bolt: 2016-09-17 12:41:53:031
key: b count: 3297
TickWordCount bolt: 2016-09-17 12:41:53:041
key: c count: 3297
TickWordCount bolt: 2016-09-17 12:41:53:053
key: a count: 3298
</br>
TickWordCount bolt: 2016-09-17 12:42:03:031
key: b count: 3293
TickWordCount bolt: 2016-09-17 12:42:03:041
key: c count: 3294
TickWordCount bolt: 2016-09-17 12:42:03:053
key: a count: 3293
</pre>
從這組測試數據來看,每組都是相隔10s執行0延遲,不過在測試中也有發現延遲1-2ms的情況,還是比較精準的。
<h4>三、tick實現代碼淺顯分析</h4>
TopologyBuilder.setBolt
<pre>
public BoltDeclarer setBolt(String id, IBasicBolt bolt, Number parallelism_hint) {
return setBolt(id, new BasicBoltExecutor(bolt), parallelism_hint);
}
</pre>
<pre>
public BoltDeclarer setBolt(String id, IRichBolt bolt, Number parallelism_hint) {
validateUnusedId(id);
initCommon(id, bolt, parallelism_hint);
_bolts.put(id, bolt);
return new BoltGetter(id);
}
</pre>
<pre>
<b> //Map conf = component.getComponentConfiguration();能夠獲取設置的tick發送心跳的設置</b>
private void initCommon(String id, IComponent component, Number parallelism) {
ComponentCommon common = new ComponentCommon();
common.set_inputs(new HashMap<GlobalStreamId, Grouping>());
if(parallelism!=null) common.set_parallelism_hint(parallelism.intValue());
Map conf = component.getComponentConfiguration();
if(conf!=null) common.set_json_conf(JSONValue.toJSONString(conf));
_commons.put(id, common);
}
</pre>
tick功能的使用就講到這里啦。。。