NSQ通過topic區分不同的消息隊列,每個topic具有不同的channel,同一個topic下的每一個消息會被廣播到每個channel中。
消息從生產者到消費者之路
nsq同時支持HTTP協議和TCP協議,客戶端可以通過tcp經過特定的協議發布一個消息到nsq的指定topic,或者通過http協議的指定接口。
我們先來看一條消息由客戶端發布到NSQ的topic會發生什么。
從topic到channel
下面是簡單的流程圖:
無論是http還是tcp調用,都會調用nsqd/topic.go/Topic.PutMessage方法。內部會把它放入memoryMsgChan這個Buffered Channel。buffer的大小由配置設定,超過了buffer大小的消息會寫入backend,即diskq。
至此,put消息的同步操作完成,剩下的工作由這個topic的協程異步完成,這個協程執行nsqd/topic.go/Topic.messagePump方法。這個方法的源碼如下:
// messagePump從memoryMsgChan或者diskq里拿出message,并轉發到這個topic下的每個Channel之中。
func (t *Topic) messagePump() {
var msg *Message
var buf []byte
var err error
var chans []*Channel
var memoryMsgChan chan *Message
var backendChan chan []byte
t.RLock()
for _, c := range t.channelMap {
chans = append(chans, c)
}
t.RUnlock()
if len(chans) > 0 {
memoryMsgChan = t.memoryMsgChan
backendChan = t.backend.ReadChan()
}
for {
select {
case msg = <-memoryMsgChan:
case buf = <-backendChan:
msg, err = decodeMessage(buf)
if err != nil {
t.ctx.nsqd.logf(LOG_ERROR, "failed to decode message - %s", err)
continue
}
case <-t.channelUpdateChan: //topic channels update
chans = chans[:0]
t.RLock()
for _, c := range t.channelMap {
chans = append(chans, c)
}
t.RUnlock()
if len(chans) == 0 || t.IsPaused() {
memoryMsgChan = nil
backendChan = nil
} else {
memoryMsgChan = t.memoryMsgChan
backendChan = t.backend.ReadChan()
}
continue
case pause := <-t.pauseChan:
if pause || len(chans) == 0 {
memoryMsgChan = nil
backendChan = nil
} else {
memoryMsgChan = t.memoryMsgChan
backendChan = t.backend.ReadChan()
}
continue
case <-t.exitChan:
goto exit
}
//遍歷所有訂閱topic的channel
for i, channel := range chans {
chanMsg := msg
// 除了第一個channel,都需要復制message,每個channel需要unique的消息
if i > 0 {
chanMsg = NewMessage(msg.ID, msg.Body)
chanMsg.Timestamp = msg.Timestamp
chanMsg.deferred = msg.deferred
}
if chanMsg.deferred != 0 {
channel.PutMessageDeferred(chanMsg, chanMsg.deferred)
continue
}
err := channel.PutMessage(chanMsg)
if err != nil {
t.ctx.nsqd.logf(LOG_ERROR,
"TOPIC(%s) ERROR: failed to put msg(%s) to channel(%s) - %s",
t.name, msg.ID, channel.name, err)
}
}
}
exit:
t.ctx.nsqd.logf(LOG_INFO, "TOPIC(%s): closing ... messagePump", t.name)
}
這段代碼非常簡單,但是這部分異步的操作不同于許多傳統語言的實現,比如放到線程池里去執行一段代碼。
NSQ的這種方式在高并發的環境下并沒有加很多的鎖,而是通過channel和單協程操作關鍵數據結構的方式實現。channel實現協程間的通信,每一個數據結構對象(需要高并發操作的一組相關數據)都會在創建之初啟動一個維護協程(messagePump),負責用select監聽其它協程發給這組結構的消息(包含需要對數據進行的操作),并在無競爭的情況下操作這組數據。這樣的操作串行了所有對共享數據的所有操作,避免大量使用鎖。需要注意的是,在這里,這些對數據的串行操作都是讀寫數據結構,還有寫到其它channel做通信之類的操作,應當要避免特別耗時的計算或者同步的IO,否則會造成channel的阻塞。
這也是golang下并發開發的一種比較常見的范式,golang推薦的同步方式是通信,而不是共享內存,這種范式也是這種思想的體現。詳細可以看Effective Go - Concurrency這部分怎么說:
Share by communicating
Concurrent programming is a large topic and there is space only for some Go-specific highlights here.
Concurrent programming in many environments is made difficult by the subtleties required to implement correct access to shared variables. Go encourages a different approach in which shared values are passed around on channels and, in fact, never actively shared by separate threads of execution. Only one goroutine has access to the value at any given time. Data races cannot occur, by design. To encourage this way of thinking we have reduced it to a slogan:
Do not communicate by sharing memory; instead, share memory by communicating.
This approach can be taken too far. Reference counts may be best done by putting a mutex around an integer variable, for instance. But as a high-level approach, using channels to control access makes it easier to write clear, correct programs.
One way to think about this model is to consider a typical single-threaded program running on one CPU. It has no need for synchronization primitives. Now run another such instance; it too needs no synchronization. Now let those two communicate; if the communication is the synchronizer, there's still no need for other synchronization. Unix pipelines, for example, fit this model perfectly. Although Go's approach to concurrency originates in Hoare's Communicating Sequential Processes (CSP), it can also be seen as a type-safe generalization of Unix pipes.
我們也可以看到,NSQ代碼也會用到鎖,那么什么時候用鎖,什么時候用channel呢?最簡單的原則就是,哪種用起來自然就用哪一種,哪種簡單用哪種,哪種效率高用哪種。