NSQ源碼分析(1)-nsqd消息的生產

NSQ通過topic區分不同的消息隊列,每個topic具有不同的channel,同一個topic下的每一個消息會被廣播到每個channel中。

消息從生產者到消費者之路

nsq同時支持HTTP協議和TCP協議,客戶端可以通過tcp經過特定的協議發布一個消息到nsq的指定topic,或者通過http協議的指定接口。

我們先來看一條消息由客戶端發布到NSQ的topic會發生什么。

從topic到channel

下面是簡單的流程圖:

Alt text

無論是http還是tcp調用,都會調用nsqd/topic.go/Topic.PutMessage方法。內部會把它放入memoryMsgChan這個Buffered Channel。buffer的大小由配置設定,超過了buffer大小的消息會寫入backend,即diskq。
至此,put消息的同步操作完成,剩下的工作由這個topic的協程異步完成,這個協程執行nsqd/topic.go/Topic.messagePump方法。這個方法的源碼如下:

// messagePump從memoryMsgChan或者diskq里拿出message,并轉發到這個topic下的每個Channel之中。
func (t *Topic) messagePump() {    
    var msg *Message
    var buf []byte
    var err error
    var chans []*Channel
    var memoryMsgChan chan *Message
    var backendChan chan []byte

    t.RLock()
    for _, c := range t.channelMap {
        chans = append(chans, c)
    }
    t.RUnlock()

    if len(chans) > 0 {
        memoryMsgChan = t.memoryMsgChan
        backendChan = t.backend.ReadChan()
    }

    for {
        select {
        case msg = <-memoryMsgChan:
        case buf = <-backendChan:
            msg, err = decodeMessage(buf)
            if err != nil {
                t.ctx.nsqd.logf(LOG_ERROR, "failed to decode message - %s", err)
                continue
            }
        case <-t.channelUpdateChan: //topic channels update
            chans = chans[:0]
            t.RLock()
            for _, c := range t.channelMap {
                chans = append(chans, c)
            }
            t.RUnlock()
            if len(chans) == 0 || t.IsPaused() {
                memoryMsgChan = nil
                backendChan = nil
            } else {
                memoryMsgChan = t.memoryMsgChan
                backendChan = t.backend.ReadChan()
            }
            continue
        case pause := <-t.pauseChan:
            if pause || len(chans) == 0 {
                memoryMsgChan = nil
                backendChan = nil
            } else {
                memoryMsgChan = t.memoryMsgChan
                backendChan = t.backend.ReadChan()
            }
            continue
        case <-t.exitChan:
            goto exit
        }
        
        //遍歷所有訂閱topic的channel
        for i, channel := range chans {
            chanMsg := msg
            // 除了第一個channel,都需要復制message,每個channel需要unique的消息
            if i > 0 {
                chanMsg = NewMessage(msg.ID, msg.Body)
                chanMsg.Timestamp = msg.Timestamp
                chanMsg.deferred = msg.deferred
            }
            if chanMsg.deferred != 0 {
                channel.PutMessageDeferred(chanMsg, chanMsg.deferred)
                continue
            }
            err := channel.PutMessage(chanMsg)
            if err != nil {
                t.ctx.nsqd.logf(LOG_ERROR,
                    "TOPIC(%s) ERROR: failed to put msg(%s) to channel(%s) - %s",
                    t.name, msg.ID, channel.name, err)
            }
        }
    }

exit:
    t.ctx.nsqd.logf(LOG_INFO, "TOPIC(%s): closing ... messagePump", t.name)
}

這段代碼非常簡單,但是這部分異步的操作不同于許多傳統語言的實現,比如放到線程池里去執行一段代碼。

NSQ的這種方式在高并發的環境下并沒有加很多的鎖,而是通過channel和單協程操作關鍵數據結構的方式實現。channel實現協程間的通信,每一個數據結構對象(需要高并發操作的一組相關數據)都會在創建之初啟動一個維護協程(messagePump),負責用select監聽其它協程發給這組結構的消息(包含需要對數據進行的操作),并在無競爭的情況下操作這組數據。這樣的操作串行了所有對共享數據的所有操作,避免大量使用鎖。需要注意的是,在這里,這些對數據的串行操作都是讀寫數據結構,還有寫到其它channel做通信之類的操作,應當要避免特別耗時的計算或者同步的IO,否則會造成channel的阻塞。

這也是golang下并發開發的一種比較常見的范式,golang推薦的同步方式是通信,而不是共享內存,這種范式也是這種思想的體現。詳細可以看Effective Go - Concurrency這部分怎么說:

Share by communicating
Concurrent programming is a large topic and there is space only for some Go-specific highlights here.
Concurrent programming in many environments is made difficult by the subtleties required to implement correct access to shared variables. Go encourages a different approach in which shared values are passed around on channels and, in fact, never actively shared by separate threads of execution. Only one goroutine has access to the value at any given time. Data races cannot occur, by design. To encourage this way of thinking we have reduced it to a slogan:
Do not communicate by sharing memory; instead, share memory by communicating.
This approach can be taken too far. Reference counts may be best done by putting a mutex around an integer variable, for instance. But as a high-level approach, using channels to control access makes it easier to write clear, correct programs.
One way to think about this model is to consider a typical single-threaded program running on one CPU. It has no need for synchronization primitives. Now run another such instance; it too needs no synchronization. Now let those two communicate; if the communication is the synchronizer, there's still no need for other synchronization. Unix pipelines, for example, fit this model perfectly. Although Go's approach to concurrency originates in Hoare's Communicating Sequential Processes (CSP), it can also be seen as a type-safe generalization of Unix pipes.

我們也可以看到,NSQ代碼也會用到鎖,那么什么時候用鎖,什么時候用channel呢?最簡單的原則就是,哪種用起來自然就用哪一種,哪種簡單用哪種,哪種效率高用哪種。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • 經驗靠積累,年齡越大積累越多,但這僅僅是理論,經驗也是一種技能,要經過刻意練習,比如同樣是下棋,有人可能成為大師,...
    a242022b9660閱讀 384評論 0 1
  • 微博改變一切 導讀:微博微信的崛起代表著中國的Web3.0時代的到來,其中微博可算是較早的探路者。社交通訊+社會化...
    L_alan閱讀 1,015評論 0 1
  • “我的鐵皮石斛!~” 后院又傳來大姐的哀嚎。花鵲兒就知道,這一定又是離夢的杰作。 果然,不一會哀嚎聲再起……不過,...
    商茹冰閱讀 185評論 0 0
  • 學習一個新的技能的時候都要忘記自己是一個大專生、本科生、有著十年工作經驗的職業經理人等等,忘記這些身份,開始學習。...
    懶蟲的憂慮生活閱讀 219評論 0 0
  • 生活地陽光,且孤獨 不是有人陪伴就可以躲避的孤“毒”。 一杯敬月光??? ?http://m.kugou.com/...
    一白兔閱讀 289評論 0 0