PBFT算法部分翻譯

算法

??Our algorithm is a form of state machine replication the service is modeled as a state machine that is replicated across different nodes in a distributed system. Each state machine replica maintains the service state and implements the service operations. We denote the set of replicas R by and identify each replica using an integer in {0, ..., |R| - 1} . For simplicity, we assume |R| = 3f + 1 where f is the maximum number of replicas that may be faulty; although there could be more than 3f + 1 replicas, the additional replicas degrade performance (since more and bigger messages are being exchanged) without providing improved resiliency.
??PBFT算法采用一種狀態(tài)機復制的形式:服務被建模為一個狀態(tài)機,在分布式系統(tǒng)中的不同節(jié)點之間復制。 每個狀態(tài)機副本節(jié)點都維護服務的狀態(tài)并實現(xiàn)服務操作。 我們用 {0, ..., |R| - 1} 中所有的整數(shù)表示一組副本節(jié)點,并使用每一個整數(shù)來標識每個副本節(jié)點。 為了簡單,我們假設 |R| = 3f + 1f 是可能有錯誤的副本節(jié)點的最大數(shù)目; 盡管副本節(jié)點的總數(shù)有可能超過 3f + 1, 但是額外的副本降低了性能(因為更多和更大的消息交換)而不會提供更高的容錯能力。

??The replicas move through a succession of con?gurations called views. In a view one replica is the primary and the others are backups. Views are numbered con-secutively. The primary of a view is replica p such that p = v mod |R| , where v is the view number. View changes are carried out when it appears that the primary has failed. Viewstamped Replication and Paxos used a similar approach to tolerate benign faults (as dis-cussed in Section 8.)
??副本節(jié)點通過一系列配置來移動,這些配置被稱為視圖。在一個視圖中,一個副本節(jié)點是主節(jié)點,其他副本節(jié)點是備份節(jié)點。 視圖是連續(xù)編號的。 一個視圖中的主要副本的編號為 p,由公式 p = v mod | R | 計算得來,其中 v 是視圖編號。 當主節(jié)點出現(xiàn)故障時,將執(zhí)行視圖更換。Viewstamped ReplicationPaxos 使用類似的方法來容忍良性故障(如第8節(jié)討論的)

The algorithm works roughly as follows:
該算法工作流程大致如下:

  1. A client sends a request to invoke a service operation to the primary
  2. The primary multicasts the request to the backups
  3. Replicas execute the request and send a reply to the client
  4. The client waits for f + 1 replies from different replicas with the same result; this is the result of the operation.

  1. 客戶端向主節(jié)點發(fā)送調(diào)用服務操作的請求。
  2. 主節(jié)點組播請求到各備份節(jié)點。
  3. 副本節(jié)點們執(zhí)行請求并發(fā)送回復給客戶端。
  4. 客戶端等待來自不同副本的 f + 1 個相同的回復; 這就是這次請求的結(jié)果。

??Like all state machine replication techniques, we impose two requirements on replicas: they must be deterministic (i.e., the execution of an operation in a given state and with a given set of arguments must always produce the same result) and they must start in the same state. Given these two requirements, the algorithm ensures the safety property by guaranteeing that all non-faulty replicas agree on a total order for the execution of requests despite failures.
??像所有狀態(tài)機復制技術一樣,我們對副本節(jié)點施加兩個要求:它們必須是確定性的(即,在給定狀態(tài)以及給定一組參數(shù)的狀態(tài)下,執(zhí)行操作必須始終產(chǎn)生相同的結(jié)果),并且它們必須以同樣的狀態(tài)開始服務。考慮到這兩個要求,該算法通過保證即使在失敗的情況下,所有無故障副本節(jié)點都會就總的執(zhí)行請求的順序達成一致,來確保安全性。

??The remainder of this section describes a simpli?ed version of the algorithm. We omit discussion of how nodes recover from faults due to lack of space. We also omit details related to message retransmissions. Furthermore, we assume that message authentication is achieved using digital signatures rather than the more ef?cient scheme based on message authentication codes; Section 5 discusses this issue further. A detailed formalization of the algorithm using the I/O automaton model is presented in.
??本節(jié)的其余部分介紹了該算法的簡化版本。 我們忽略了節(jié)點由于缺乏存儲空間而從故障中恢復的問題。 我們也省略了與消息重傳有關的細節(jié)。 此外,我們假設消息認證是使用數(shù)字簽名來實現(xiàn)的,而不是基于消息認證碼的更有效的方案。 第5節(jié)將進一步討論這個問題。 提出了使用 I / O 自動機模型的算法的詳細形式化。

4.1 The Client

??A client c requests the execution of state machine operation o by sending [圖片上傳失敗...(image-302321-1511683393005)] message to the primary. Timestamp is used to ensure exactly-once semantics for the execution of client requests. Timestamps for c's requests are totally ordered such that later requests have higher timestamps than earlier ones; for example, the timestamp could be the value of the client’s local clock when the request is issued.
??客戶端通過向主節(jié)點發(fā)送一個消息

png.latex.png
來請求執(zhí)行狀態(tài)機操作。時間戳用于確保執(zhí)行客戶端請求的唯一,保證該請求不會重復執(zhí)行。請求的時間戳是完全有序的,以便后來的請求比以前的有更高的時間戳。例如,時間戳可以是發(fā)出請求時客戶端本地時鐘的值。

??Each message sent by the replicas to the client includes the current view number, allowing the client to track the view and hence the current primary. A client sends a request to what it believes is the current primary using a point-to-point message. The primary atomically multicasts the request to all the backups using the protocol described in the next section.
??副本發(fā)送給客戶端的每個消息都包含當前視圖編號,使得客戶端可以追蹤視圖以及當前的主視圖。客戶端使用點對點消息向其認為的當前主服務器發(fā)送請求。主要使用下一節(jié)中描述的協(xié)議將請求自動多播到所有備份。

??A replica sends the reply to the request directly to the client. The reply has the form [圖片上傳失敗...(image-ed216c-1511683393005)] where v is the current view number, t is the timestamp of the corresponding request, i is the replica number, and r is the result of executing the requested operation.
??副本將答復直接發(fā)送到客戶端。答復的形式為 [圖片上傳失敗...(image-a0bfe1-1511683393005)],其中 v 是當前視圖編號,t 是相應請求的時間戳,i 是副本編號,r 是執(zhí)行請求的操作的結(jié)果。

??The client waits for f + 1 replies with valid signatures from different replicas, and with the same t and r, before accepting the result . This ensures that the result is valid, since at most f replicas can be faulty.
??客戶端在接受結(jié)果之前等待來自不同副本的有效簽名的f + 1回復,并且使用相同的t和r。 這確保了結(jié)果是有效的,因為至多f副本可能有錯誤。

??If the client does not receive replies soon enough, it broadcasts the request to all replicas. If the request has already been processed, the replicas simply re-send the reply; replicas remember the last reply message they sent to each client. Otherwise, if the replica is not the primary, it relays the request to the primary. If the primary does not multicast the request to the group, it will eventually be suspected to be faulty by enough replicas to cause a view change.
??如果客戶端沒有及時收到回復,它會將請求廣播到所有副本。 如果請求已經(jīng)被處理,副本只需重新發(fā)送答復; 副本記住他們發(fā)送給每個客戶端的最后回復消息。 否則,如果副本不是主節(jié)點,它會將請求轉(zhuǎn)發(fā)到主節(jié)點。 如果主節(jié)點沒有組播請求,則最終會被足夠的副本認為該主節(jié)點已失效,引起視圖更改。

??In this paper we assume that the client waits for one request to complete before sending the next one. But we can allow a client to make asynchronous requests, yet preserve ordering constraints on them.
??在本文中,我們假設客戶端在發(fā)送下一個請求之前會等待前一個請求完成。 但是,在保持對它們的排序約束的情況下,可以允許客戶端發(fā)出異步請求。

4.2 Normal-Case Operation

??The state of each replica includes the state of the service, a message log containing messages the replica has accepted, and an integer denoting the replica's current view. We describe how to truncate the log in Section 4.3.
??每個副本的狀態(tài)包括服務的狀態(tài),副本已接受消息的消息日志,以及表示副本當前視圖的整數(shù)。我們在4.3節(jié)中描述如何截斷日志。

??When the primary, p, receives a client request, m, it starts a three-phase protocol to atomically multicast the request to the replicas. The primary starts the protocol immediately unless the number of messages for which the protocol is in progress exceeds a given maximum. In this case, it buffers the request. Buffered requests are multicast later as a group to cut down on message traf?c and CPU overheads under heavy load;this optimization is similar to a group commit in transactional systems . For simplicity, we ignore this optimization in the description below.
??當 primary 節(jié)點 p 接收到一個客戶端請求 m 時,主節(jié)點 p 會啟動一個三階段協(xié)議,以原子方式將請求發(fā)送到其他副本。主節(jié)點立即啟動協(xié)議,除非協(xié)議正在進行的消息數(shù)量超過給定的最大值。在這種情況下,它會緩沖請求。緩存的請求稍后作為一個組進行多播,以在負載較重的情況下減少消息流量和CPU開銷;這種優(yōu)化類似于事務性系統(tǒng)中的組提交。為了簡單起見,我們在下面的描述中忽略了這個優(yōu)化。

??The three phases are pre-prepare, prepare, and commit. The pre-prepare and prepare phases are used to totally order requests sent in the same view even when the primary, which proposes the ordering of requests, is faulty. The prepare and commit phases are used to ensure that requests that commit are totally ordered across views.
??三個階段分別為預準備,準備和提交。預準備階段和準備階段用于完全排列在同一視圖中發(fā)送的請求,即使提出請求排序的主要部分有故障。準備和提交階段用于確保提交的請求在視圖中完全排序。

??In the pre-prepare phase, the primary assigns a sequence number, n, to the request, multicasts a pre-prepare message with m piggybacked to all the backups, and appends the message to its log. The message has the form [圖片上傳失敗...(image-faac88-1511683393005)], where v indicates the view in which the message is being sent, m is the client's request message, and d is m's digest.
??在準預備階段,主要為請求分配一個序列號n,接著將預準備消息多播到所有的備份節(jié)點,并將消息添加加到它的日志中。消息的形式為[圖片上傳失敗...(image-533e80-1511683393005)],其中 v 表示正在發(fā)送的消息所在視圖的編號,m 是客戶端的請求消息,d是m的摘要.

??Requests are not included in pre-prepare messages to keep them small. This is important because pre-prepare messages are used as a proof that the request was assigned sequence number n in view v in view changes. Additionally, it decouples the protocol to totally order requests from the protocol to transmit the request to the replicas; allowing us to use a transport optimized for small messages for protocol messages and a transport optimized for large messages for large requests.
??預準備的消息中不包括請求(可能的意思是不包含請求本身的內(nèi)容),以減小預準備消息的大小。這一點很重要,因為預準備消息被用來作為在視圖 v 中視圖變化中為請求分配序號 n 的證據(jù)。另外,它將協(xié)議解耦以完全從協(xié)議中命令請求發(fā)送到副本;允許我們使用針對小型消息優(yōu)化的傳輸協(xié)議消息,以及針對大型消息針對大型消息優(yōu)化的傳輸。

A backup accepts a pre-prepare message provided:

  1. the signatures in the request and the pre-prepare message are correct and d is the digest for m;
  2. it is in view v;
  3. it has not accepted a pre-prepare message for view v and sequence number n containing a different digest;
  4. the sequence number in the pre-prepare message is between a low water mark,h , and a high water markm H.

  1. 請求中的簽名和預準備消息是正確的,并且 d 是 m 的摘要;
  2. 它是在視圖 v;
  3. 未接受包含不同摘要的,并且視圖編號為 v 和序列號為 n 的預先準備消息;
  4. 預先準備消息中的序列號 n 在低水位 h 和高水位 H 之間。

??The last condition prevents a faulty primary from exhausting the space of sequence numbers by selecting a very large one. We discuss how H and h advance in Section 4.3.
??最后一個條件通過選擇一個非常大的一個來防止一個有缺陷的初級排除序列號的空間。我們在4.3節(jié)中討論H和H如何前進。

??If backup i accepts the [圖片上傳失敗...(image-d83fb5-1511683393005)] message, it enters the prepare phase by multicasting a [圖片上傳失敗...(image-cd0504-1511683393005)] message to all other replicas and adds both messagesto its log. Otherwise, it does nothing.
??如果備份節(jié)點 i 接受 [圖片上傳失敗...(image-89f25c-1511683393005)] 消息,它將通過向所有其他副本多播一個 [圖片上傳失敗...(image-f8cfbf-1511683393005)] 消息從而進入準備階段,并添加這兩個消息到其日志種。否則,它什么都不做。

??A replica (including the primary) accepts prepare messages and adds them to its log provided their signatures are correct, their view number equals the replica’s current view, and their sequence number is between h and H.
??只要準備消息的簽名是正確的,它們的視圖編號等于副本的當前視圖,并且它們的序列號介于h和H,副本(包括主節(jié)點)便接受準備消息,并將它們添加到日志中

??We de?ne the predicate prepared(m, v, n, i) to be true if and only if replica i has inserted in its log: the request m, a pre-prepare for m in view v with sequence number n, and 2f prepares from different backups that match the pre-prepare. The replicas verify whether the prepares match the pre-prepare by checking that they have the same view, sequence number, and digest.
??當且僅當副本 i 已經(jīng)插入到它的日志中時,我們定義謂詞 prepared(m, v, n, i) 為真:請求 m,序列號為 n 的視圖 v 中 預準備 m, 2f 個與來自不同備份節(jié)點,并與預準備消息準備消息。副本通過檢查它們是否具有相同的視圖,序列號和摘要來驗證準備消息是否與預準備消息相匹配。

??The pre-prepare and prepare phases of the algorithm guarantee that non-faulty replicas agree on a total order for the requests within a view. More precisely, they ensure the following invariant: if prepared(m, v, n, i) is true then prepared(m', v, n, j) is false for any non-faulty j replica (including i=j) and any m' such that D(m') ≠ D(m). This is true because prepared(m', v, n, i) and |R| = 3f +1 imply that at least f + 1 non-faultyreplicas have sent a pre-prepare or prepare for in view with sequence number n. Thus, for prepared(m', v, n, j) to be true at least one of these replicas needs to have sent two con?icting prepares (or pre-prepares if it is the primary for v), i.e., two prepares with the same view and sequence number and a different digest. But this is not possible because the replica is not faulty. Finally, our assumption about the strength of message digests ensures that the probability that m ≠ m and D(m') ≠ D(m) is negligible.
??算法的預準備和準備階段保證無故障的副本們在視圖內(nèi)的請求的總排序上達成一致。更精確地說,它們確保了以下不變:如果prepared(m, v, n, i) 為真,那么對于任何非故障的副本 j,prepared(m', v, n, j) 和任何 m' 使得 D(m') ≠ D(m) 都是錯誤的。這是真的,因為prepared(m', v, n, i)| R | = 3f + 1意味著至少有f + 1個非故障復制已經(jīng)發(fā)送了一個預先準備或準備在順序號為n的視圖中。因此,對于 prepared(m', v, n, i) 為真的這些副本中至少有一個需要發(fā)送兩個沖突的準備消息(或者預準備消息,如果它是 v的主要準備),即兩個準備消息具有相同的視圖號和序列號以及不同的摘要。但這是不可能的,因為副本沒有錯誤。最后,我們關于消息摘要強度的假設保證了 m ≠ mD(m') ≠ D(m)的概率是可以忽略的。

??Replica multicastsa [圖片上傳失敗...(image-eb299-1511683393005)] to the other replicas when [圖片上傳失敗...(image-fe64-1511683393005)] becomes true. This starts the commit phase. Replicas accept commit messages and insert them in their log provided they are properly signed, the view number in the message is equal to the replica’s current view, and the sequence number is between h and
??當 [圖片上傳失敗...(image-738fdf-1511683393005)]成為true時,將其多播到其他副本節(jié)點。這便是提交階段的開始。只要它們的簽名正確,消息中的視圖編號等于副本的當前視圖,序列號介于h和H,副本節(jié)點便接受提交消息并將它們插入日志中。

??We de?ne the committed and committed-local predicates as follows: committed(m, v, n) is true if and only if prepare(m, v, n, i) is true for all i in some set of f + 1 non-faultyreplicas;and committed-local(m, v, n, i) is true if and only if prepared(m, v, n, i) is true and i has accepted 2f + 1 commits (possibly including its own) from different replicas that match the pre-prepare for m; a commit matches a pre-prepare if they have the same view, sequence number, and digest.
??我們定義如下 committed 和 committed-local:當且僅當在某些集合中,對于所有的 i,committed(m, v, n) 是真的,i 已經(jīng)接受了 2f + 1 非故障重復;committed-local(m, v, n, i) 從不同的副本提交(可能包括它自己的),這些副本與預準備消息中的 m 相匹配;如果具有相同的視圖,序列號和摘要,則提交與預準備相匹配。

??The commit phase ensures the following invariant: if committed-local(m, v, n, i) is true for some non-faulty i then committed(m, v, n) is true. This invariant and the view-change protocol described in Section 4.4 ensure that non-faulty replicas agree on the sequence numbers of requests that commit locally even if they commit in different views at each replica. Furthermore, it ensures that any request that commits locally at a non-faulty replica will commit at f + 1 or more non-faulty replicas eventually.
??提交階段確保以下不變:如果對于一些沒有錯誤的i,committed-local(m, v, n, i) 是真的,那么 committed(m, v, n) 是真的。第4.4節(jié)中介紹的這種不變式和視圖更改協(xié)議可確保非故障副本同意本地提交的請求的序列號,即使它們在每個副本的不同視圖中提交。此外,它確保任何在沒有錯誤的副本上本地提交的請求將最終提交到f + 1或更多無故障的副本。

??Each replica executes the operation requested by after committed-local(m, v, n, i) is true and i's state re?ects the sequential execution of all requests with lower sequence numbers. This ensures that all nonfaulty replicas execute requests in the same order as required to provide the safety property. After executing there quested operation, replicas send a reply to the client. Replicas discard requests whose timestamp is lower than the timestamp in the last reply they sent to the client to guarantee exactly-once semantics.
??每個副本執(zhí)行committed-local(m, v, n, i) 后所請求的操作,并且 i 的狀態(tài)按照提供安全屬性所需的相同順序執(zhí)行請求。在執(zhí)行那里操作后,副本發(fā)送一個答復給客戶端。副本放棄其時間戳低于發(fā)送給客戶端的最后一個回復中的時間戳的請求,以保證一次的語義。

??We do not rely on ordered message delivery, and therefore it is possible for a replica to commit requests out of order. This does not matter since it keeps the preprepare, prepare, and commit messages logged until the corresponding request can be executed.
??我們不依賴于有序的消息傳遞,因此副本可能無序地提交請求。這并不重要,因為它保持預準備,準備和提交記錄的消息,直到相應的請求可以被執(zhí)行。

??Figure 1 shows the operation of the algorithm in the normal case of no primary faults. Replica0 is the primary, replica 3 is faulty, and C is the client.
??圖1顯示了在主節(jié)點沒有故障的正常情況下算法的操作。副本0是主要的,副本3是有缺陷的,C是客戶端。
[圖片上傳失敗...(image-7b1d93-1511683393005)]

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 230,622評論 6 544
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 99,716評論 3 429
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 178,746評論 0 383
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經(jīng)常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,991評論 1 318
  • 正文 為了忘掉前任,我火速辦了婚禮,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 72,706評論 6 413
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 56,036評論 1 329
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 44,029評論 3 450
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 43,203評論 0 290
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 49,725評論 1 336
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 41,451評論 3 361
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 43,677評論 1 374
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 39,161評論 5 365
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 44,857評論 3 351
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 35,266評論 0 28
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 36,606評論 1 295
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 52,407評論 3 400
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 48,643評論 2 380

推薦閱讀更多精彩內(nèi)容

  • 因為之前遇到了這個問題,所以想和大家分享一下。 輸入框是textField。 _cardNumberTextFie...
    伊樂兔閱讀 5,227評論 12 4
  • 我們從小被教導要謙遜忍讓,卻沒有被告知如何去面對和解決沖突。 很多夫妻吵架之后就是冷戰(zhàn),這是很危險的。因為冷戰(zhàn)唯一...
    運安閣主閱讀 229評論 0 0
  • 若相惜,莫相離 若相離,莫相忘 帶著生命的本真,遙遙而及 那些走過的歲月,那些說過的話 那些塵封在心中的記憶 莫失...
    千層云林閱讀 351評論 2 23